0% found this document useful (0 votes)

102 views11 pages

Introduction To Weka-A Toolkit For Machine Learning

Weka is an open-source machine learning toolkit developed at the University of Waikato. It contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. Weka supports common data formats like ARFF and CSV. It provides implementations of popular machine learning algorithms like decision trees, naive Bayes classifiers, clustering algorithms and more. The Weka Explorer interface allows users to load data, preprocess it, apply machine learning algorithms, and evaluate results in a simple and intuitive way.

Uploaded by

Risa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views11 pages

Introduction To Weka-A Toolkit For Machine Learning

Uploaded by

Risa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Introduction to Weka- A Toolkit for Machine Learning

1. Introduction

Weka is open source software under the GNU General Public License. System is
developed at the University of Waikato in New Zealand. “Weka” stands for the Waikato
Environment for Knowledge Analysis. The software is freely available at
http://www.cs.waikato.ac.nz/ml/weka. The system is written using object oriented
language Java. There are several different levels at which Weka can be used. Weka
provides implementations of state-of-the-art data mining and machine learning algorithms.
Weka contains modules for data preprocessing, classification, clustering and association
rule extraction.

Main features of Weka include:

• 49 data preprocessing tools

• 76 classification/regression algorithms
• 8 clustering algorithms
• 15 attribute/subset evaluators + 10 search algorithms for feature selection.
• 3 algorithms for finding association rules
• 3 graphical user interfaces
– “The Explorer” (exploratory data analysis)
– “The Experimenter” (experimental environment)
– “The KnowledgeFlow” (new process model inspired interface)

1.1 Weka: Download and Installation

• Download Weka (the stable version) from http://www.cs.waikato.ac.nz/ml/weka/

– Choose a self-extracting executable (including Java VM)
– (If you are interested in modifying/extending weka there is a developer
version that includes the source code)
• After download is completed, run the self extracting file to install Weka, and use
the default set-ups.

1.2 Start the Weka

• From windows desktop,

– click “Start”, choose “All programs”, Choose “Weka 3.6” to start Weka
– Then the first interface window appears:
• Weka GUI Chooser (Fig. 1).
Introduction to Weka- A Toolkit for Machine Learning

Fig. 1: Weka GUI

1.3 Weka Application Interfaces

• Explorer
– preprocessing, attribute selection, learning, visualiation
• Experimenter
– testing and evaluating machine learning algorithms
• Knowledge Flow
– visual design of KDD process
– Explorer
• Simple Command-line
– A simple interface for typing commands

1.4 WEKA data formats

Attribute Relation File Format (ARFF) is the default file type for data analysis in weka but
data can also be imported from various formats.

• ARFF (Attribute Relation File Format) has two sections:

– the Header information defines attribute name, type and relations.
– the Data section lists the data records.
• CSV: Comma Separated Values (text file)
• Data can also be read from a database using ODBC connectivity.

1.4.1 Attribute Relation File Format (arff)

ARFF format of weather dataset from sample data in weka is presented here. Attribute type
is specified in the header tag. Nominal attribute have the distinct values of attribute in curly

Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”

129
Introduction to Weka- A Toolkit for Machine Learning

brackets along with attribute name. Numeric attribute is specified by the keyword real
along with attribute name.

@relation weather

@attribute outlook {sunny, overcast, rainy}

@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no

2. WEKA Explorer

• Click the Explorer on Weka GUI Chooser

• On the Explorer window, click button “Open File” to open a data file from the
folder where your data files stored.
• Then select the desired module (Preprocess, Classify, Cluster, Association etc)
from the upper tabs.

3. Load Data in Weka from Other File Formats

Weka expects data file it to be in ARFF format, because it is necessary to have type
information about each attribute which cannot be automatically deduced from the attribute
values. Before you can apply any algorithm to your data, is must be converted to ARFF
form. This can be done very easily. Most spreadsheet and database programs allow you to
export your data into a file in comma separated format—as a list of records where the items are
separated by commas. Once this has been done, you need only load the file into a text editor or a
word processor; add the dataset’s name using the @relation tag, the attribute information using
@attribute, and a @data line; save the file as raw text. Following example presents conversion of
data to arff format from a Microsoft Excel spreadsheet. From the excel spreadsheet, save the data
in .CSV format. In weka, On the Preprocess tab, select Open file…Then select the Dataset.csv file
(Fig. 2). Make sure that you have selected files of type csv, or you won’t see the dataset that we
want to open.

Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”

130
Introduction to Weka- A Toolkit for Machine Learning

Fig. 2: Open a CSV file

4. Data Preprocessing

Some attributes may not be required in the analysis, and then those attributes can be removed from
the dataset before analysis. For example, attribute instance number of iris dataset is not required in
analysis. This attribute can be removed by selecting it in the Attributes check box, and clicking
Remove (Fig. 3). Resulting dataset then can be stored in arff file format.

4.1 Selecting or Filtering Attributes

In case some attributes needs to be removed before the data mining step, this can be done
using the Attribute filters in WEKA. In the "Filter" panel, click on the "Choose" button.
This will show a popup window with a list available filters. Scroll down the list and select
the "weka.filters.unsupervised.attribute.Remove" filter as shown in Figure 4. Next, click on
text box immediately to the right of the "Choose" buttom. In the resulting dialog box enter
the index of the attribute to be filtered out (this can be a range or a list separated by
commas). In this case, we enter 1 which is the index of the "id" attribute (see the left panel).
Make sure that the "invertSelection" option is set to false (otherwise everything except
attribute 1 will be filtered) (Fig 5). Then click "OK"

Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”

131
Introduction to Weka- A Toolkit for Machine Learning

Fig. 3: Remove an attribute

Fig. 4: Filter an attribute

Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”

132
Introduction to Weka- A Toolkit for Machine Learning

Fig. 5: Options for filtering an attribute

4.2 Discretization

Some techniques require performing discretization on numeric or continuous attributes

before applying data mining task. The WEKA discretization filter, can divide the ranges
blindly, or used various statistical techniques to automatically determine the best way of
partitioning the data. Discretization is represented here with the help of simple binning
method. Click the Filter dialog box and select
"weka.filters.unsupervised.attribute.Discretize" from the list (Fig. 6). Enter the index for
the attributes to be discretized. In this case we enter 1 corresponding to attribute "age". We
also enter 3 as the number of bins (note that it is possible to discretize more than one
attribute at the same time (by using a list of attribute indeces). Since we are doing simple
binning, all of the other available options are set to "false" (Fig 7).

You can observe that WEKA has assigned its own labels to each of the value ranges for the
discretized attribute. For example, the lower range in the "age" attribute is labeled "(-inf-
34.333333]" (enclosed in single quotes and escape characters), while the middle range is
labeled "(34.333333-50.666667]", and so on. These labels now also appear in the data
records where the original age value was in the corresponding range.

Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”

133
Introduction to Weka- A Toolkit for Machine Learning

Fig. 6: Discretization Filter

Fig. 7: Discretization options for attribute

5. ID3 Classifier Example with Weka Explorer

Decision Tree is a “divide-and-conquer” approach to the problem of learning from a set of

independent instances and leads naturally to a style of representation called a decision tree.
Nodes in a decision tree involve testing a particular attribute. Usually, the test at a node
compares an attribute value with a constant. However, some trees compare two attributes

Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”

134
Introduction to Weka- A Toolkit for Machine Learning

with each other, or use some function of one or more attributes. Leaf nodes give a
classification that applies to all instances that reach the leaf, or a set of classifications, or a
probability distribution over all possible classifications. To classify an unknown instance,
it is routed down the tree according to the values of the attributes tested in successive
nodes, and when a leaf is reached the instance is classified according to the class assigned
to the leaf. ID3 is the basic decision tree classifier. Following is the example of ID3 on
weather data from sample datasets of weka (Fig. 8).

• Select the Classify tab from the upper tabs.

• There are many classifiers available in the Weka
• Select ID3 from the tree class
• You can select the cross-validation or percentage split of the data
• Other options like selection of variables for analysis
• By default algorithm considers last attribute as class attribute, user can define any
other attribute as class attribute too.
• Click on start to run the algorithm.

Interpretation of obtained results:

The first two columns are the TP Rate (True Positive Rate) and the FP Rate (False
Positive Rate). For the first level where ‘play=yes’ TP Rate is the ratio of play cases
predicted correctly cases to the total of positive cases (eg: 8 out of 9 is predicted
correctly =8/9=0.88).

The FP Rate is then the ratio no play cases incorrectly predicted as play yes cases to the
total of play no cases. 1 play no case was wrongly predicted as play yes. So the FP Rate
is 1/5=0.2

The next two columns are terms related to information retrieval theory. When one is
conducting a search for relevant documents, it is often not possible to get to the
relevant documents easily or directly. In many cases, a search will yield lots results
many of which will be irrelevant. Under these circumstances, it is often impractical to
get all results at once but only a portion of them at a time. In such cases, the terms
recall and precision are important to consider.

Recall is the ratio of relevant documents found in the search result to the total of all
relevant documents. Thus, higher recall values imply that relevant documents are
returned more quickly. A recall of 30% at 10% means that 30% of the relevant
documents were found with only 10% of the results examined. Precision is the
proportion of relevant documents in the results returned. Thus a precision of 0.75
means that 75% of the returned documents were relevant.

In our example, such measures are not very applicable…the recall in this case just
corresponds to the TP Rate, as we are always looking at 100% of test sample and
precision is just the proportion of low and normal weight cases in the test sample.
the F-measure is a way of combining recall and precision scores into a single measure
of performance. The formula for it is:

Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”

135
Introduction to Weka- A Toolkit for Machine Learning

2recallprecision / recall+ precision

Fig. 8: ID3 algorithm in weka

Confusion matrix specifies the classes of obtained results. For example, class a has
majority of objects (8 objects) from yes category, hence a is treated as class of “yes” group.
Similarly b has majority of objects (4) from no category, hence b is treated as class of “no”
group. Hence one object each from both the classes is misclassified, which leads to
misclassified instance as 2. User can see the plot of tree too.

6. K Means Clustering Example with Weka Explorer

K-means is the most popularly used algorithm for clustering. User need to specify the
number of clusters (k) in advance. Algorithm randomly selects k objects as cluster mean or
center. It works towards optimizing square error criteria function, defined as:
k 2
∑ ∑ x − mi , where mi is the mean of cluster C i .
i =1 x∈Ci

Main steps of k-means algorithm are:

1) Assign initial means mi

2) Assign each data object x to the cluster C i for the closest mean
3) Compute new mean for each cluster
4) Iterate until criteria function converges, that is, there are no more new
assignments.

Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”

136
Introduction to Weka- A Toolkit for Machine Learning

Following is the example of K means on weather data from sample datasets of weka (Fig.
9).

• Select the Cluster tab from the upper tabs.

• Select Kmeans from the choose tab.
• You can select the attributes for clustering.
• If class attribute is known, then user can select that attribute for “classes to cluster
evaluation” to check for accuracy of results.
• In order to store the results, select “Store cluster for visualization”
• Click on start to run the algorithm.
• Right click on the result and select visualize cluster assignment.
• Click on Save button to store the results in arff file format.

Fig. 9: K means clustering in weka

Figure 10 shows the results of k means on weather data. Confusion matrix specifies the
classes of obtained results as we have selected the classes to cluster evaluation. For
example, cluster0 has total 9 objects, out of which majority of objects (6) are from yes
category, hence this cluster is treated as cluster of “yes”. Similarly, cluster1 has total of 5
objects, out of which 3 objects are from “no” category, hence it is considered as cluster of
no category.

Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”

137
Introduction to Weka- A Toolkit for Machine Learning

Fig. 10: Results of K means on weather data

References:

Holmes, A. Donkin, I. H. Witten, WEKA: A Machine Learning Workbench, In

Proceedings of the Second Australian and New Zealand Conference on Intelligent
Information Systems, 357-361,1994.
Software, available at: http://www.cs.waikato.ac.nz/~ml/
I. H. Witten, E. Frank, Data Mining: Practical Machine Learning Tools and Techniques
with Java Implementation, Morgan Kaufmann publishers, 1999.
http://webpages.uncc.edu/~wjiang3/TA/weka/practiceWEKA.pdf
www.cs.waikato.ac.nz/ml/weka/index_documentation.html
http://www.cs.utexas.edu/users/ml/tutorials/Weka-tut/index.htm
http://www.cs.ru.nl/~peterl/teaching/DM/weka.pdf

Winter School on "Data Mining Techniques and Tools for Knowledge Discovery in Agricultural Datasets”

138

Weka Tutorial
No ratings yet
Weka Tutorial
45 pages
Weka Lab
No ratings yet
Weka Lab
11 pages
Weka Tutorial
No ratings yet
Weka Tutorial
8 pages
2.3 Weka Tool
No ratings yet
2.3 Weka Tool
84 pages
Weka Data Miningvsem
No ratings yet
Weka Data Miningvsem
7 pages
hw2 Datapreproc
No ratings yet
hw2 Datapreproc
15 pages
Rintro Wekacomplete
No ratings yet
Rintro Wekacomplete
135 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
55 pages
Task 0: Weka Introduction
No ratings yet
Task 0: Weka Introduction
11 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
DHW Lab (Ex1 To 3)
No ratings yet
DHW Lab (Ex1 To 3)
18 pages
DM Lab Material
No ratings yet
DM Lab Material
88 pages
Data Warehousing and Data Mining Lab
No ratings yet
Data Warehousing and Data Mining Lab
53 pages
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
No ratings yet
Data Mining Term Project Machine Learning With WEKA: Weka Explorer Tutorial For Version 3.4.3
42 pages
CS-703 (B) Data Warehousing and Data Mining Lab
No ratings yet
CS-703 (B) Data Warehousing and Data Mining Lab
50 pages
Data Mining - Session #1 - Unlocked
No ratings yet
Data Mining - Session #1 - Unlocked
22 pages
Data Base Management Key Points
No ratings yet
Data Base Management Key Points
8 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
WEKA Explorer Tutorial
No ratings yet
WEKA Explorer Tutorial
45 pages
Wekappt
No ratings yet
Wekappt
58 pages
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
No ratings yet
131953194aams Vol 196 April 2020 A3 p451-469 Kanwal Preet Singh Attwal
19 pages
DMBI Exp1: Introduction To WEKA Tool
No ratings yet
DMBI Exp1: Introduction To WEKA Tool
6 pages
Analysis & Pediction Using WEKA Machine Learing Toolkit
No ratings yet
Analysis & Pediction Using WEKA Machine Learing Toolkit
37 pages
Using Weka
No ratings yet
Using Weka
6 pages
Mooc On Weka
No ratings yet
Mooc On Weka
59 pages
Data Warehousing - To Write
No ratings yet
Data Warehousing - To Write
23 pages
Weka Software Manuala
No ratings yet
Weka Software Manuala
20 pages
Weka Tutorial
No ratings yet
Weka Tutorial
32 pages
Weka Exercise 1
No ratings yet
Weka Exercise 1
7 pages
Weka Exercise 1
No ratings yet
Weka Exercise 1
7 pages
Data Mining: Index
No ratings yet
Data Mining: Index
47 pages
Data Warehousing Lab Excercise
No ratings yet
Data Warehousing Lab Excercise
45 pages
Dinesh DM
No ratings yet
Dinesh DM
34 pages
Introduction To Weka
No ratings yet
Introduction To Weka
39 pages
Ijiset V2 I2 63 PDF
No ratings yet
Ijiset V2 I2 63 PDF
9 pages
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
No ratings yet
Weka: A Tool For Data Preprocessing, Classification, Ensemble, Clustering and Association Rule Mining
4 pages
BI - Experiment - No - 1
No ratings yet
BI - Experiment - No - 1
7 pages
Appendix Weka
No ratings yet
Appendix Weka
17 pages
Group 3: Elhaine, Jai, Icelle and Marianne
No ratings yet
Group 3: Elhaine, Jai, Icelle and Marianne
17 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
12 pages
Workshop 1
No ratings yet
Workshop 1
16 pages
DMDW Lab Record
No ratings yet
DMDW Lab Record
60 pages
Data Mining - Lab - Manual
No ratings yet
Data Mining - Lab - Manual
20 pages
WEKA Manual
No ratings yet
WEKA Manual
25 pages
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
No ratings yet
Weka Weka: A - Antony Alex MCA DR G R D College of Science - CBE Tamil Nadu - India
23 pages
DWBI Lab Manual 2023-24 Final
No ratings yet
DWBI Lab Manual 2023-24 Final
40 pages
Datawarehouse Pract 2
No ratings yet
Datawarehouse Pract 2
7 pages
Learning To Use We Ka
No ratings yet
Learning To Use We Ka
5 pages
DMW Lab Print
No ratings yet
DMW Lab Print
21 pages
Weka Experiment
No ratings yet
Weka Experiment
13 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
47 pages
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
100% (1)
Weka-: Data Warehousing and Data Mining Lab Manual-Week 9
8 pages
DWDM File
No ratings yet
DWDM File
26 pages
Exp 6
No ratings yet
Exp 6
9 pages
Data-Mining-Lab-Manual Cs 703b
No ratings yet
Data-Mining-Lab-Manual Cs 703b
41 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Study Guide MO-500 Certification Exam Microsoft Access Expert ( Office 2019)
From Everand
Study Guide MO-500 Certification Exam Microsoft Access Expert ( Office 2019)
Anand Vemula
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Considerations For Creating Budgets and Forecasts
No ratings yet
Considerations For Creating Budgets and Forecasts
27 pages
USB Meter Reader: B Etjenings Vejledning
No ratings yet
USB Meter Reader: B Etjenings Vejledning
72 pages
AI Teacher HandbookXII
No ratings yet
AI Teacher HandbookXII
217 pages
How To Transfer Contacts From Nokia To Android Phone or Tablet
No ratings yet
How To Transfer Contacts From Nokia To Android Phone or Tablet
7 pages
Hanisha Mehta - Data Analysis
No ratings yet
Hanisha Mehta - Data Analysis
58 pages
Python Model Paper 1 Bplck105b
No ratings yet
Python Model Paper 1 Bplck105b
29 pages
SQR FAQ's - C
No ratings yet
SQR FAQ's - C
24 pages
CFIHOS-V1 5 1-Excel-Format-V1 5 1
No ratings yet
CFIHOS-V1 5 1-Excel-Format-V1 5 1
3,584 pages
Sqlite Studio Manual
No ratings yet
Sqlite Studio Manual
38 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Data Engineering Top 100 Questions
No ratings yet
Data Engineering Top 100 Questions
59 pages
Cloud Data Platforms For Dummies 2nd Edition
100% (1)
Cloud Data Platforms For Dummies 2nd Edition
68 pages
Output
No ratings yet
Output
5 pages
C4C Data Workbench 1511
No ratings yet
C4C Data Workbench 1511
11 pages
Lesson Plan For XII Informatics Practices
No ratings yet
Lesson Plan For XII Informatics Practices
16 pages
Fachbereich Ii - Mathematik - Physik - Chemie: Tutorial For Designing Experiments Using The R Package Rcmdrplugin - Doe
No ratings yet
Fachbereich Ii - Mathematik - Physik - Chemie: Tutorial For Designing Experiments Using The R Package Rcmdrplugin - Doe
54 pages
XII COMP SC Study Material
No ratings yet
XII COMP SC Study Material
26 pages
EasyDBR User Manual
No ratings yet
EasyDBR User Manual
21 pages
Marksheet Management
No ratings yet
Marksheet Management
8 pages
Data Merge InDesign
No ratings yet
Data Merge InDesign
14 pages
Files CSVFiles CPP
No ratings yet
Files CSVFiles CPP
9 pages
MACROS - 3D Software Customization
No ratings yet
MACROS - 3D Software Customization
6 pages
Computer Science Class 12th (XII) Sample Exam Paper 1
No ratings yet
Computer Science Class 12th (XII) Sample Exam Paper 1
20 pages
Result Analysis Report
No ratings yet
Result Analysis Report
21 pages
Minor III Report
No ratings yet
Minor III Report
44 pages
Know-How Ecomatmobile: System Manual
No ratings yet
Know-How Ecomatmobile: System Manual
159 pages
Red Hat Openstack Platform-16.2-Firewall Rules For Red Hat Openstack Platform-En-us
No ratings yet
Red Hat Openstack Platform-16.2-Firewall Rules For Red Hat Openstack Platform-En-us
9 pages
DACHSER Standard Delivery Order CSV (v3.0.0) (EN)
No ratings yet
DACHSER Standard Delivery Order CSV (v3.0.0) (EN)
15 pages
CodeVita 2015 Round 1
No ratings yet
CodeVita 2015 Round 1
27 pages
Manual Modflow 6-51-100
No ratings yet
Manual Modflow 6-51-100
50 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Introduction To Weka-A Toolkit For Machine Learning

Uploaded by

Introduction To Weka-A Toolkit For Machine Learning

Uploaded by

Introduction to Weka- A Toolkit for Machine Learning

Main features of Weka include:

• 49 data preprocessing tools

1.1 Weka: Download and Installation

• Download Weka (the stable version) from http://www.cs.waikato.ac.nz/ml/weka/

1.2 Start the Weka

• From windows desktop,

Fig. 1: Weka GUI

1.3 Weka Application Interfaces

1.4 WEKA data formats

• ARFF (Attribute Relation File Format) has two sections:

1.4.1 Attribute Relation File Format (arff)

@attribute outlook {sunny, overcast, rainy}

• Click the Explorer on Weka GUI Chooser

3. Load Data in Weka from Other File Formats

Fig. 2: Open a CSV file

4.1 Selecting or Filtering Attributes

Fig. 3: Remove an attribute

Fig. 4: Filter an attribute

Fig. 5: Options for filtering an attribute

Some techniques require performing discretization on numeric or continuous attributes

Fig. 6: Discretization Filter

Fig. 7: Discretization options for attribute

5. ID3 Classifier Example with Weka Explorer

Decision Tree is a “divide-and-conquer” approach to the problem of learning from a set of

• Select the Classify tab from the upper tabs.

Interpretation of obtained results:

2*recall*precision / recall+ precision

Fig. 8: ID3 algorithm in weka

6. K Means Clustering Example with Weka Explorer

Main steps of k-means algorithm are:

1) Assign initial means mi

• Select the Cluster tab from the upper tabs.

Fig. 9: K means clustering in weka

Fig. 10: Results of K means on weather data

Holmes, A. Donkin, I. H. Witten, WEKA: A Machine Learning Workbench, In

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

2recallprecision / recall+ precision