0% found this document useful (0 votes)

16 views18 pages

Introduction in Data Warehouse

The document discusses using various machine learning algorithms like logistic regression, J48, k-means clustering, and hierarchical clustering to analyze a glass dataset from Weka. It compares the performance of these algorithms between Weka and Python. Visualizations of attributes from the dataset are also presented.

Uploaded by

marius.shmek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views18 pages

Introduction in Data Warehouse

Uploaded by

marius.shmek

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Study program: TIN

Introduction in Data Warehouse – final project

Author: Marius-Claudiu Scobici

Braşov, 2024
In this project I used the dataset from weka called: Glass.
Columns that exists in the dataset:
- RI: refractive index
- Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes
4-10),
- Mg: Magnesium,
- Al: Aluminum,
- Si: Silicon,
- K: Potassium,
- Ca: Calcium,
- Ba: Barium,
- Fe: Iron,
- Type: type of glass.
Number of instances in the dataset: 214.

I. Weka run: Classification algorithms

Algorithm Run-time Precision
function.Logistic 0.2s 64.486%
trees.J48 0.03s 66.8224%
What can be observed: Logistic algorithm takes more time and has a slightly low precision than
J48.
Logistic algorithm

Figure 1 Logistic algorithm pseudocode

Figure 2 Weka run: Logistic
J48 algorithm

Figure 3 Weka run: J48

Figure 4 Weka decision tree: J48

Figure 5 Pseudocode of J48

II. Weka run: Clustering algorithms

Algorithm Run-time Clustered instances
SimpleKMeans 0.02s 2 (I: 88, II: 126)
HierarchicalClusterer 0.08s 2 (I: 212, II: 2)
What can be observed: SimpleKMeans finished in 0.02s instead of 0.08s as
HierarchicalClusterer. For clustered instance we can observe that the result is the same, 2 clusters
but the number of instances in the cluster are different.
SimpleKMeans algorithm

Figure 6 Weka run: SimpleKMeans

Figure 7 KMeans algorithm

HierarchicalClusterer algorithm

Figure 8 Weka run: HierarchicalClusterer algorithm

Figure 9 HierarchicalClusterer pseudocode

III. Association algorithms
Because weka doesn’t let me use any of Association algorithm I need to change the dataset.
From what weka offers I chose “vote” dataset.
Apriori algorithm

Figure 10 Weka run: Apriori algorithm

From 1st rule: It can observed that if a subject that have adoption-of-the-budget-resolution=y and
psysician-fee-freeze=n is going to be democrat. Confidence is 1, that means is a strong pattern
(100%). This pattern is more frequent (1.63 times lift) than if the items were chosen
independently.
In the same manner can be interpreted all rules.

Figure 11 Apriori algoirthm pseudocode

FPGrowth algorithm

Figure 12 WekaRun: FPGrowth algorithm

Explanation for 1st rule: It’s 99% confidence (strong) that if a subject that have el-salvador-aid
and is republican then he wants to freeze physician fee. This pattern is very frequent (2.44 times
lift) than if the items were chosen independently.

Figure 13 FPGrowth pseudocode

IV. Visualizing
a. Refractive index and Barium

 It’s notable the fact that most of the glass types which don’t contain Barium
and have a refractive index between 1.512 and 1.520, are not headlamps
 On the other hand, headlamps are usually made of barium(around below
1.57 Barium) and have refractive index around 1.515
b. Refractive index and Potasium(K)

 It can be seen that most of the glass types are using a small amount of
Potasium in their composition, below value 1(small value).
 In the same time, some headlamps glasses have potassium in their
composition but most of them are not using this element at all.
 Only one case of headlamp glass exceeds the value of potassium (around
6.25) in its composition
V. Python classification and clustering
Classification
1. J48
Figure 14 Python: J48 run
Figure 15 Python: J48 DecisionTree

Comparation between weka run and python:

- In case of python we got a better accuracy: 74.41% (python) vs 66.82% (weka).
- The tree generated in python based on algorithm is far more complex with more levels
than the one generated in weka.
- Python seems to make a better classification with this algorithm.
2. Logistic

Figure 16 Python: Logistic regression run

Comparation between weka run and python:

- In case of weka we got a better accuracy: 64.48% (weka) vs 58.13%(python).
- Weka seems to make a better classification with this algorithm.
Clustering
1. SimpleKMeans

Figure 17 Python: KMeans run

Differences:
- Even weka and python clustered in 2 the data, the number of instances per cluster is different.
Cluster Weka Python
1 88 163
2 126 50
- In weka number of iterations was 9 vs 6 in python.
2. HierarchicalClusterer

Figure 18 Python: Hierarchical clustering run

Differences:
- The same, as SimpleKMeans we the algorithms clustered the same in 2, but with a huge
different of number of instances per cluster.
Cluster Weka Python
1 212 129
2 2 24
Webography:
https://www.futurelearn.com/info/courses/data-mining-with-weka/0/steps/25374
https://www.springboard.com/blog/data-science/data-mining-python-tutorial/

https://dzone.com/refcardz/data-mining-discovering-and

WEKA Lab Manual
100% (2)
WEKA Lab Manual
107 pages
Lab 01-PhamBinhDuong ITCSIU21054
No ratings yet
Lab 01-PhamBinhDuong ITCSIU21054
9 pages
Experiment 1: Installation of WEKA Tool Aim
No ratings yet
Experiment 1: Installation of WEKA Tool Aim
19 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
51 pages
Ama Report
No ratings yet
Ama Report
11 pages
Printing 1-3
No ratings yet
Printing 1-3
36 pages
Data Minig Lab File
No ratings yet
Data Minig Lab File
25 pages
DWM NOTES
No ratings yet
DWM NOTES
118 pages
Data Werehousing Lab Manual
No ratings yet
Data Werehousing Lab Manual
63 pages
31 - Mustansar Ali-Project Report - Data Mining
No ratings yet
31 - Mustansar Ali-Project Report - Data Mining
17 pages
Data Warehousing Record
No ratings yet
Data Warehousing Record
26 pages
Dataminingg
No ratings yet
Dataminingg
22 pages
Lecture 12 - Weka Tutorial
No ratings yet
Lecture 12 - Weka Tutorial
84 pages
DMDV
No ratings yet
DMDV
22 pages
DWM 2 Marks
No ratings yet
DWM 2 Marks
2 pages
Datawarehousing Lab Manual
No ratings yet
Datawarehousing Lab Manual
22 pages
DMDV 210
No ratings yet
DMDV 210
61 pages
BBA CA Semester III Manisha Madam
No ratings yet
BBA CA Semester III Manisha Madam
32 pages
AI-43 Data Mining
No ratings yet
AI-43 Data Mining
96 pages
Komal DWDM 1to5
No ratings yet
Komal DWDM 1to5
61 pages
DMDV Main Manual
No ratings yet
DMDV Main Manual
35 pages
DMDV 210
No ratings yet
DMDV 210
63 pages
DM Lab Manualiii I 1 Mrits
No ratings yet
DM Lab Manualiii I 1 Mrits
39 pages
Data Warehouse Lab Manual
No ratings yet
Data Warehouse Lab Manual
60 pages
Data Warehousing
No ratings yet
Data Warehousing
54 pages
DMDW LAB NEW - Merged
No ratings yet
DMDW LAB NEW - Merged
53 pages
Data Warehouse Final Record
No ratings yet
Data Warehouse Final Record
55 pages
DWDM Manual-1
No ratings yet
DWDM Manual-1
96 pages
DMlab - FilE prINCE
No ratings yet
DMlab - FilE prINCE
27 pages
WEKA
No ratings yet
WEKA
50 pages
OS Journal
No ratings yet
OS Journal
28 pages
DM L-6
No ratings yet
DM L-6
7 pages
R23-DWDM Syllabus
No ratings yet
R23-DWDM Syllabus
5 pages
DWDM File-Final Ver3.pdf 20241230 172003 0000
No ratings yet
DWDM File-Final Ver3.pdf 20241230 172003 0000
54 pages
Latest Data Mining Lab Manual
No ratings yet
Latest Data Mining Lab Manual
74 pages
Lab Updated - Merged
No ratings yet
Lab Updated - Merged
49 pages
Record 5
No ratings yet
Record 5
22 pages
University of Waikato: Data Mining With Weka
No ratings yet
University of Waikato: Data Mining With Weka
2 pages
Cloud Native Security
From Everand
Cloud Native Security
Chris Binnie
5/5 (1)
R23!3!1 DWDM Final Syllabus On 21-06-2025
No ratings yet
R23!3!1 DWDM Final Syllabus On 21-06-2025
5 pages
Unidad I Tarea 3 Minería de Datos. Trabajar Con Weka Usando Archivo Weather Nominal
No ratings yet
Unidad I Tarea 3 Minería de Datos. Trabajar Con Weka Usando Archivo Weather Nominal
13 pages
Practical DWDM
No ratings yet
Practical DWDM
32 pages
Data Mining Lab File
No ratings yet
Data Mining Lab File
20 pages
(Applying Mathematics) David Mumford, Agnès Desolneux - Pattern Theory - The Stochastic Analysis of Real-World Signals-A K Peters - CRC Press (2010)
No ratings yet
(Applying Mathematics) David Mumford, Agnès Desolneux - Pattern Theory - The Stochastic Analysis of Real-World Signals-A K Peters - CRC Press (2010)
413 pages
DWDM Record Print1
No ratings yet
DWDM Record Print1
100 pages
DM Lab Cse
No ratings yet
DM Lab Cse
108 pages
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
No ratings yet
Priyadarshini J. L. College of Engineering, Nagpur: Session 2022-23 Semester-V
31 pages
Data Mining Guidelines
No ratings yet
Data Mining Guidelines
4 pages
Practical Project
No ratings yet
Practical Project
2 pages
Selection From The Book Exploring Geological Data With WEKA For iSE-ACADEMY
No ratings yet
Selection From The Book Exploring Geological Data With WEKA For iSE-ACADEMY
17 pages
DWM Lab Manual
No ratings yet
DWM Lab Manual
92 pages
DWM1 Riya
No ratings yet
DWM1 Riya
16 pages
Clustering With WEKA Explorer: Lab Exercise Four
100% (1)
Clustering With WEKA Explorer: Lab Exercise Four
11 pages
DWM1
No ratings yet
DWM1
19 pages
Assignment2 Group5B
No ratings yet
Assignment2 Group5B
60 pages
Compare Data Mining Tools
No ratings yet
Compare Data Mining Tools
11 pages
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
From Everand
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
Abhishek Mishra
No ratings yet
Block 4
No ratings yet
Block 4
96 pages
Rotate A Matrix by 90 Degree Without Using Any Extra Space
No ratings yet
Rotate A Matrix by 90 Degree Without Using Any Extra Space
4 pages
EAI Endorsed Transactions: Prediction of Dogecoin Price Using Deep Learning and Social Media Trends
No ratings yet
EAI Endorsed Transactions: Prediction of Dogecoin Price Using Deep Learning and Social Media Trends
12 pages
Syllabus - Data Mining Solution With Weka
No ratings yet
Syllabus - Data Mining Solution With Weka
5 pages
ISOM1500 Final Notes
No ratings yet
ISOM1500 Final Notes
6 pages
2406 9MA0-02 A Level Pure Mathematics - June 2024 PDF
67% (12)
2406 9MA0-02 A Level Pure Mathematics - June 2024 PDF
48 pages
Chapter 6
No ratings yet
Chapter 6
42 pages
It5003 - Data Warehousing and Data Mining-1
No ratings yet
It5003 - Data Warehousing and Data Mining-1
5 pages
AI vs. Machine Learning vs. Deep Learning vs. Neural Networks What's The Difference IBM
No ratings yet
AI vs. Machine Learning vs. Deep Learning vs. Neural Networks What's The Difference IBM
11 pages
Pps Mid-1
No ratings yet
Pps Mid-1
1 page
18CS54 - ATCI - MODULE 4 - TURING MACHINES - Part 2
No ratings yet
18CS54 - ATCI - MODULE 4 - TURING MACHINES - Part 2
19 pages
A Novel Coupled Optimization Prediction Model For Air Quality
No ratings yet
A Novel Coupled Optimization Prediction Model For Air Quality
19 pages
Gujarat Technological University: Page 1 of 2
No ratings yet
Gujarat Technological University: Page 1 of 2
2 pages
CSE602 - Data Warehousing & Data Mining
No ratings yet
CSE602 - Data Warehousing & Data Mining
6 pages
Answer Key Quizactivity - Mansci
No ratings yet
Answer Key Quizactivity - Mansci
10 pages
Numerov
No ratings yet
Numerov
5 pages
Comparative Evaluation of CNN Architectures For Image Caption Generation
No ratings yet
Comparative Evaluation of CNN Architectures For Image Caption Generation
9 pages
Bachelor Thesis Eth Math
100% (3)
Bachelor Thesis Eth Math
4 pages
Cazoom Maths. Linear Functions. Equations of Parallel Lines
No ratings yet
Cazoom Maths. Linear Functions. Equations of Parallel Lines
2 pages
Ty-Timetable Latest
No ratings yet
Ty-Timetable Latest
2 pages
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
From Everand
Efficient Algorithms and Structures with Heaps: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Neural Machine Translation Advised by Statistical Machine Translation
No ratings yet
Neural Machine Translation Advised by Statistical Machine Translation
7 pages
Optimal Number of Trials For Monte Carlo Simulation
No ratings yet
Optimal Number of Trials For Monte Carlo Simulation
4 pages
PID Instr Sec 01 Introduction To Process Control
100% (1)
PID Instr Sec 01 Introduction To Process Control
38 pages
Higher Non-Calculator Mark Scheme
No ratings yet
Higher Non-Calculator Mark Scheme
4 pages
DP Patterns
No ratings yet
DP Patterns
10 pages
Data Analysis Resume
No ratings yet
Data Analysis Resume
2 pages
Forecasting Techniques
No ratings yet
Forecasting Techniques
9 pages
Rate of Return Analysis (Online Version)
No ratings yet
Rate of Return Analysis (Online Version)
35 pages
1-Poll Physics
No ratings yet
1-Poll Physics
2 pages
Tutorial-Sheet, Linear Algebra, 2023
No ratings yet
Tutorial-Sheet, Linear Algebra, 2023
3 pages
23 Ex 5G Absolute Maximum and Minimum
No ratings yet
23 Ex 5G Absolute Maximum and Minimum
8 pages
IO621PE: MACHINE LEARNING (Professional Elective - II) B.Tech. III Year II Sem. L T P C 3 0 0 3 Course Objectives
No ratings yet
IO621PE: MACHINE LEARNING (Professional Elective - II) B.Tech. III Year II Sem. L T P C 3 0 0 3 Course Objectives
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Introduction in Data Warehouse

Uploaded by

Introduction in Data Warehouse

Uploaded by

Study program: TIN

Introduction in Data Warehouse – final project

Author: Marius-Claudiu Scobici

I. Weka run: Classification algorithms

Figure 1 Logistic algorithm pseudocode

Figure 3 Weka run: J48

Figure 4 Weka decision tree: J48

II. Weka run: Clustering algorithms

Figure 6 Weka run: SimpleKMeans

Figure 7 KMeans algorithm

Figure 8 Weka run: HierarchicalClusterer algorithm

Figure 9 HierarchicalClusterer pseudocode

Figure 10 Weka run: Apriori algorithm

Figure 11 Apriori algoirthm pseudocode

Figure 12 WekaRun: FPGrowth algorithm

Figure 13 FPGrowth pseudocode

Comparation between weka run and python:

Figure 16 Python: Logistic regression run

Comparation between weka run and python:

Figure 17 Python: KMeans run

Figure 18 Python: Hierarchical clustering run

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.