0% found this document useful (0 votes)
36 views4 pages

A Study of Open-Source Data Mining Tools For Forecasting: Nurdatillah Hasim Norhaidah Abu Haris

Uploaded by

Pooja Ban
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views4 pages

A Study of Open-Source Data Mining Tools For Forecasting: Nurdatillah Hasim Norhaidah Abu Haris

Uploaded by

Pooja Ban
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

A Study of Open-Source Data Mining Tools for Forecasting

Nurdatillah Hasim Norhaidah Abu Haris


Universiti Kuala Lumpur Universiti Kuala Lumpur
Malaysian Institute of Information Technology Malaysian Institute of Information Technology
Software Engineering and Information Systems Section Software Engineering and Information Systems Section
nurdatillah@unikl.edu.my norhaidah@unikl.edu.my

ABSTRACT learning software can be highly beneficial for the whole field [10].
This paper described five open-sources Data Mining (DM) tools
which are Weka, RapidMiner, KEEL, Orange and Tanagra. The This paper presented some of the tools that can be used by
features and functionality of these DM tools can be benefited by researchers for forecasting. The tools are WEKA, RapidMiner,
educators and researchers. The DM algorithms embedded in the KEEL, Tanagra and Orange. The features of these tools are
tools can be utilized for forecasting. Weka and RapidMiner have presented based on previous studies and sources. These tools has
most of the desire characteristic for a fully-functional and flexible different capabilities which provides researchers a platform to
platform therefore their use can be recommended for most of DM support their research activities.
tasks.
2. TOOLS CHARACTERISTICS
Categories and Subject Descriptors The sources these open-source DM tools were presented in Table
H.2.8. [Database Applications]: Database Applications – data 1.
mining.
Table 1. Open-Source DM Tool List
General Terms Tool Company
Source
Algorithms, Measurement, Languages. Name Name
University
WEKA of
Keywords http://www.cs.waikato.ac.nz/ml/weka/
Waikato
Data Mining, Tools, Forecasting.
RAPID rapid-
MINER i.com http://rapidminer.com
1. INTRODUCTION
Data mining (DM) has attracted a great deal of attention in the University
information industry and in society as a whole recent years, due to KEEL of http://www.keels.es
the wide availability of huge amounts of data and the imminent Granada
need for turning such data into useful information and knowledge. University
The information and knowledge gained can be used for ORANGE of
applications ranging from market analysis, fraud detection, and http://orange.biolab.si
Ljubljana
customer retention, to production control and science exploration
[11]. This is an attractive yet challenging task for both industry University
experts and academician to automatically discovering knowledge of
from databases [5]. DM in problem solving has been practiced TANAGRA Lumiere http://eric.univ-
widely, however certain programming expertise is require along Lyon 2 lyon2.fr/~ricco/tanagra
with the time and effort to write a computer program in order to (France)
implement the sophisticated algorithm according to the user needs
[3]. Over the past few years, there are many software tools that The activity is measured by the frequency of updates and time of
have been developed to reduce this task. Several of them are latest update. The common open-source licenses are General
available as open-source tools. The acceptance of open source Public License (GPL), Mozilla Public License (MPL), Berkeley
model of sharing information for implementations of machine Software Distribution (BSD), Netscape Public License (NPL) and
Lesser General Public License (LGPL) [10]. These characteristics
were listed in table 2. Some of the open source DM tools have
Permission to make digital or hard copies of all or part of this work for Graphical User Interface (GUI) functionalities, command line of
personal or classroom use is granted without fee provided that copies are
both.
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights Table 3 shows a summary of the studied characteristic that has
for components of this work owned by others than ACM must be been selected by evaluating all software tools, tutorials and
honored. Abstracting with credit is permitted. To copy otherwise, or
guidelines for the usage of such suites. There are four level of
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee. Request permissions from
support in these characteristics: none (N), basic support (B),
Permissions@acm.org. intermediate support (I) and advanced support (A). If there are no
IMCOM '15, January 08 - 10 2015, BALI, Indonesia
Copyright 2015 ACM 978-1-4503-3377-1/15/01…$15.00
http://dx.doi.org/10.1145/2701126.2701152
intermediate levels of support, the notation used is Yes (Y) for DM tools. There are 5 categories involved which are Graphical
supporting and No (N) for no-supporting. User Interface, Input/Output, Pre-Processing Variety, Learning
Variety, Run Type and Advanced Features. The categories were
Categorization of the characteristics of these DM tools will help
then further divided into details on which support level the tools
users to understand and apply their researches to the most suitable
provided.
Table 2. General Characteristics of DM Tools (Weka, RapidMiner, Orange – source[7])
Name Weka RapidMiner KEEL Orange Tanagra
Lumière
Univ. of Waikato, RapidMiner, Univ. of Granada, KNIME.com AG,
Developer University Lyon
New Zealand Germany Spain Switzerland
2, France
Programming
Java Java Java Java C++
Language
Open-source (v.5
Open-source, or lower); Closed Open-source, Open-source, Open-source,
License
GNU GPL 3 source (free GNU GPL 3 GNU GPL 3 GNU GPL 3
started ed. v.6)
Current Version 3.6.10 6 2 2.7 1.4.50
GUI/Command
Both GUI Command line Both GUI
line
General data General data General data General data General data
Main purpose
mining mining mining mining mining
Table 3. Detailed Characteristics of DM Tools (Weka, RapidMiner, Orange – source[7])

Missing values imputation


Graph representation

Data Base collection


ARFF data format

Other data format


Data management
Data visualization

Instance Selection

Association Rules
Feature Selection

Post-processing

Meta-learning

Statistical test
Discretization

Classification

Off-line run
On-line run
Regression

Clustering

Current Graphical Pre-processing Run Advanced


Tool Name Version Input / Output Learning Variety
Interface Variety Types Features
Weka 3.6.10 Y A A Y Y Y I A B B A A A A Y N N I N

RapidMiner 6 N A A Y Y Y I A B B A A A A Y N N A B

KEEL 2.0 Y A A Y Y Y A I B B A A A A Y Y Y I A

Orange 2.7 Y A A N Y N A I B B I N I I N Y N N N

Tanagra 1.4.50 N A A Y Y N B A B N A I A A Y N N I A

Weka has a few features:


3. TOOLS FEATURES AND
a. Apply learning method to a dataset and analyze its
FUNCTIONALITIES output to learn more about the data.
Each of these DM tools have their own features and
functionalities to ease researchers with their studies. All features b. Learned models can be use to generate predictions on
and functionalities of these DM tools are described as below: new instances.

3.1 Weka c. Predictions can be choose based on the application of


several different learners and comparison on their
Weka offers four option functionalities. These functionalities are
performance.
command-line interface (CLI), Explorer, Experimenter and
Knowledge flow. Explorer allows the definition of data source, Eventhough, Weka support many model evaluation procedures
data preparation, machine learning algorithms, and visualization. and metrics, but many data survey and visualization methods are
The Experimenter is used mainly for comparison of the absence [6]. Weka also more oriented towards classification and
performance of different algorithms on same dataset. regression problems and less towards descriptive statistics and
clustering methods. The support for big data, text mining and is performed by placing widgets on the canvas and connecting
semi-supervised learning is also currently limited [7]. their inputs and outputs.

3.2 RapidMiner However, the number of available widgets are limited when
RapidMiner was developed by the company named RapidMiner compared to other tool such as RapidMiner. Furthermore,
from Germany. The past versions (v. 5 or lower) were open widgets currently in development can be found in the “Prototype”
source. The latest version (v. 6) is propriety for now, with several section [2].
license options which are Starter, Personal, professional and
Enterprise. 3.5 Tanagra
Tanagra was written as an aid to education and research on data
RapidMiner provides visual and user friendly GUI environment. It
mining by Ricco Rakotomalala [4].
also focused on processes that may contain sub-processes.
Processes contain operators in the form of visual component. An Tanagra operation is based on the stream diagram paradigm that
application wizard provides pre-built workflows for a number of was created in 1990’s. Under the stream paradigm, a user builds a
common tasks including direct marketing, predictive maintenance, graph specifying the data source, and operation on the data. Paths
customer churn and sentiment analysis and a new statistics view through the graphs can describe the flow of data through
provides many statistical graphs and summarize data [7]. manipulations and analyses. Tanagra simplifies this paradigm by
restricting the graph to be tree. This means that there can only be
3.3 KEEL one parent to each node, and therefore only one data source for
KEEL (Knowledge Extraction based on Evolutionary Learning) is each operation.
a software tool that facilitates the analysis of the behavior of However,Tanagra not compatible with other formats and complex
evolutionary learning in different approaches of learning data formats will not be able directly to read or write. Tanagra
algorithm such as Pittsburgh, Michigan, IRL (iterative rule also unable to import multiple data sources within one project.
learning) and GCCL (genetic cooperative-competitive learning) as This might be a problem to import data sets from multiple regions
well as pre-processing tasks, making the management of these and use them together in a data mining analysis.
techniques easy for user [1].
KEEL offer three features.The features are as follows:[7] 4. UTILIZING TOOLS FOR
a. Library with evolutionary algorithm based on different FORECASTING
paradigms and the integration of evolutionary learning DM for forecasting offers the opportunity to leverage numerous
algorithms with different pre-processing techniques. sources of data. The data are important for decision maker for
Therefore,researchers will reduces programming work. action strategies to gain profitability [9]. The integration between
DM and forecasting, highest accuracy forecast can be provided.
b. Evolutionary learning algorithms can be extended DM tools discussed in this paper could be used for forecasting
therefore researcher with less knowledge can apply and with the tools decision maker can create a better explanatory
successfully these algorithms to their problems. forecasting model possible. Most of these algorithms are
c. Can be applied to any machine with Java. supported by the five tools as presented in Table 5. The tools such
as Weka, RapidMiner and KEEL supported most of the DM
algorithms.
3.4 Orange
Orange Canvas is for visual programming interface. Orange These DM algorithms can produce a useful forecasting model
Canvas offers a structured view of supported features grouped when applying in various domain applications such as Marketing,
into nine categories: data operations, visualizations, classification, Financial Service, Manufacturing, Health Care, Military and so on
regression, evaluation, unsupervised learning, association, as shown in Table 4. The algorithms built in these tools are for the
visualization using Qt, and prototype implementations. users to deploy and suits to their domain of problems. Each
algorithm works differently to produce an output of results.
Qt is a cross-platform application and UI framework for
developers using C++ or QML (Qt Meta Laguage or Qt The interactive, visual, understandable of these DM tools can
Modelling Language), a CSS & JavaScript like language. well perform with the user data. Significant anomalies arise when
running the tools will draw users’ attention. Therefore, further
Orange functionalities are visually represented by different investigations will also help users’ analyze data by executing a
widgets (e.g. read file, discretize, train SVM classifier etc.). Each series of actions and returning results that provide insights to
widget has a short description within the interface. Programming understand their problems.
Table 4. Open-Source Data Mining Applied in Various Applications, source [8]

Tool Marketing Direct Mail Financial Service Manufacturing Health Care Military
Weka Y Y Y Y Y Y
RapidMiner Y Y Y Y Y Y
KEEL Y Y Y Y Y Y
Orange Y Y Y Y Y Y
Tanagra N N Y Y N N
Table 5. List of Algorithms supported by DM tool, source [8]

Nearest Neighbour

Association Rules
Linear/Statistical

Neural Network
Random Forest

Support Vector
Rule Induction
Decision Tree

Radial Basis

Evaluation

K Means
Boosting
Machine
Factor

Bayes
Tool

Weka Y Y Y Y Y Y Y Y Y Y Y Y Y
RapidMiner Y Y Y Y Y Y Y Y Y Y Y Y Y
KEEL Y Y Y Y Y Y Y Y Y Y Y Y Y
Orange Y N Y Y N Y Y Y Y Y N Y Y
Tanagra Y N N Y N Y Y Y Y Y N Y Y

from http://link.springer.com/chapter/10.1007/978-3-642-
5. CONCLUSION AND FUTURE WORK 04394-9_68
In this paper, five open-source DM tools that provide numerous
learning methods can be applied for forecasting. The important [6] Graczyk, M., Lasota, T., & Trawiński, B. (2009).
features and functionalities of these open-source DM tools has Comparative analysis of premises valuation models using
been presented. In this study, most of the tools have tremendous KEEL, RapidMiner, and WEKA. Computational Collective
functionalities and provide great tools for educators and Intelligence. Semantic Web, Social Networks and Multiagent
researchers. However, Weka and RapidMiner have most of the Systems, 800–812. Retrieved from
desire characteristic for a fully-functional and flexible platform http://link.springer.com/chapter/10.1007/978-3-642-04441-
therefore their use can be recommended for most of DM tasks. As 0_70
for the future work, different type of data sets will be applied
using these tools in order to check the accuracy and efficiency of [7] Jović, A., Brkić, K., & Bogunović, N. (2014). An overview
the tools for forecasting. of free software tools for general data mining. Information
and Communication Technology, Electronics and
Microelectronics (MIPRO), 2014 37th International
6. REFERENCES Convention, (May), 26–30. Retrieved from
[1] Alcalá-Fdez, J., Sánchez, L., & García, S. (2009). KEEL: a http://www.zemris.fer.hr/~ajovic/articles/MIPRO
software tool to assess evolutionary algorithms for data 2014_final.pdf
mining problems. Soft Computing. Retrieved from
http://link.springer.com/article/10.1007/s00500-008-0323-y [8] Madasamy, B., & Tamilselvi, J. (2012). Assesement of
Freeware Data Mining Tools over Some Wide-Range
[2] Chen, X., Ye, Y., Williams, G., & Xu, X. (2007). A survey Characteristics. Wireless Networks and Computational
of open source data mining systems. Emerging Technologies Intelligence Communications in Computer and Information
in Knowledge Discovery and Data Mining, (60603066), 3– Science, 292, 529–535. Retrieved from
14. Retrieved from http://link.springer.com/chapter/10.1007/978-3-642-31686-
http://link.springer.com/chapter/10.1007/978-3-540-77018- 9_61
3_2
[9] Rey, T. D., Dow, T., Company, C., Wells, C., Kauhl, J., &
[3] Collier, K., Ph, D., Carey, B., & Marjaniemi, C. (1999). A Services, T. C. (2013). Using Data Mining in Forecasting
Methodology for Evaluating and Selecting Data Mining Problems, 1–17.
Software Keywords : Data Mining , Tool Evaluation ,
Knowledge Discovery, 00(c), 1–11. [10] Sonnenburg, S., Braun, M., & Ong, C. (2007). The need for
open source software in machine learning, 8, 2443–2466.
[4] Enright, J. (2004). Tanagra : An Evaluation Introduction to Retrieved from
Tanagra Data Import / Export, 1–8. http://researchcommons.waikato.ac.nz/handle/10289/3928

[5] Fernández, A., Luengo, J., & Derrac, J. (2009). [11] Witten, I. H., & Eibe, F. (2005). Data Mining: Practical
Implementation and integration of algorithms into the KEEL Machine Learning Tools and Techniques (2nd ed., p. 525).
data-mining software tool. Intelligent Data Engineering and Morgan Kaufmann.
Automated Learning - IDEAL 2009, 562–569. Retrieved

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy