0% found this document useful (0 votes)
16 views18 pages

Introduction in Data Warehouse

The document discusses using various machine learning algorithms like logistic regression, J48, k-means clustering, and hierarchical clustering to analyze a glass dataset from Weka. It compares the performance of these algorithms between Weka and Python. Visualizations of attributes from the dataset are also presented.

Uploaded by

marius.shmek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views18 pages

Introduction in Data Warehouse

The document discusses using various machine learning algorithms like logistic regression, J48, k-means clustering, and hierarchical clustering to analyze a glass dataset from Weka. It compares the performance of these algorithms between Weka and Python. Visualizations of attributes from the dataset are also presented.

Uploaded by

marius.shmek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Study program: TIN

Introduction in Data Warehouse – final project

Author: Marius-Claudiu Scobici

Braşov, 2024
In this project I used the dataset from weka called: Glass.
Columns that exists in the dataset:
- RI: refractive index
- Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes
4-10),
- Mg: Magnesium,
- Al: Aluminum,
- Si: Silicon,
- K: Potassium,
- Ca: Calcium,
- Ba: Barium,
- Fe: Iron,
- Type: type of glass.
Number of instances in the dataset: 214.

I. Weka run: Classification algorithms


Algorithm Run-time Precision
function.Logistic 0.2s 64.486%
trees.J48 0.03s 66.8224%
What can be observed: Logistic algorithm takes more time and has a slightly low precision than
J48.
Logistic algorithm

Figure 1 Logistic algorithm pseudocode


Figure 2 Weka run: Logistic
J48 algorithm

Figure 3 Weka run: J48

Figure 4 Weka decision tree: J48


Figure 5 Pseudocode of J48

II. Weka run: Clustering algorithms


Algorithm Run-time Clustered instances
SimpleKMeans 0.02s 2 (I: 88, II: 126)
HierarchicalClusterer 0.08s 2 (I: 212, II: 2)
What can be observed: SimpleKMeans finished in 0.02s instead of 0.08s as
HierarchicalClusterer. For clustered instance we can observe that the result is the same, 2 clusters
but the number of instances in the cluster are different.
SimpleKMeans algorithm

Figure 6 Weka run: SimpleKMeans

Figure 7 KMeans algorithm


HierarchicalClusterer algorithm

Figure 8 Weka run: HierarchicalClusterer algorithm

Figure 9 HierarchicalClusterer pseudocode


III. Association algorithms
Because weka doesn’t let me use any of Association algorithm I need to change the dataset.
From what weka offers I chose “vote” dataset.
Apriori algorithm

Figure 10 Weka run: Apriori algorithm

From 1st rule: It can observed that if a subject that have adoption-of-the-budget-resolution=y and
psysician-fee-freeze=n is going to be democrat. Confidence is 1, that means is a strong pattern
(100%). This pattern is more frequent (1.63 times lift) than if the items were chosen
independently.
In the same manner can be interpreted all rules.

Figure 11 Apriori algoirthm pseudocode


FPGrowth algorithm

Figure 12 WekaRun: FPGrowth algorithm

Explanation for 1st rule: It’s 99% confidence (strong) that if a subject that have el-salvador-aid
and is republican then he wants to freeze physician fee. This pattern is very frequent (2.44 times
lift) than if the items were chosen independently.

Figure 13 FPGrowth pseudocode


IV. Visualizing
a. Refractive index and Barium

 It’s notable the fact that most of the glass types which don’t contain Barium
and have a refractive index between 1.512 and 1.520, are not headlamps
 On the other hand, headlamps are usually made of barium(around below
1.57 Barium) and have refractive index around 1.515
b. Refractive index and Potasium(K)

 It can be seen that most of the glass types are using a small amount of
Potasium in their composition, below value 1(small value).
 In the same time, some headlamps glasses have potassium in their
composition but most of them are not using this element at all.
 Only one case of headlamp glass exceeds the value of potassium (around
6.25) in its composition
V. Python classification and clustering
Classification
1. J48
Figure 14 Python: J48 run
Figure 15 Python: J48 DecisionTree

Comparation between weka run and python:


- In case of python we got a better accuracy: 74.41% (python) vs 66.82% (weka).
- The tree generated in python based on algorithm is far more complex with more levels
than the one generated in weka.
- Python seems to make a better classification with this algorithm.
2. Logistic

Figure 16 Python: Logistic regression run

Comparation between weka run and python:


- In case of weka we got a better accuracy: 64.48% (weka) vs 58.13%(python).
- Weka seems to make a better classification with this algorithm.
Clustering
1. SimpleKMeans

Figure 17 Python: KMeans run

Differences:
- Even weka and python clustered in 2 the data, the number of instances per cluster is different.
Cluster Weka Python
1 88 163
2 126 50
- In weka number of iterations was 9 vs 6 in python.
2. HierarchicalClusterer

Figure 18 Python: Hierarchical clustering run

Differences:
- The same, as SimpleKMeans we the algorithms clustered the same in 2, but with a huge
different of number of instances per cluster.
Cluster Weka Python
1 212 129
2 2 24
Webography:
https://www.futurelearn.com/info/courses/data-mining-with-weka/0/steps/25374
https://www.springboard.com/blog/data-science/data-mining-python-tutorial/

https://dzone.com/refcardz/data-mining-discovering-and

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy