Introduction in Data Warehouse
Introduction in Data Warehouse
Braşov, 2024
In this project I used the dataset from weka called: Glass.
Columns that exists in the dataset:
- RI: refractive index
- Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes
4-10),
- Mg: Magnesium,
- Al: Aluminum,
- Si: Silicon,
- K: Potassium,
- Ca: Calcium,
- Ba: Barium,
- Fe: Iron,
- Type: type of glass.
Number of instances in the dataset: 214.
From 1st rule: It can observed that if a subject that have adoption-of-the-budget-resolution=y and
psysician-fee-freeze=n is going to be democrat. Confidence is 1, that means is a strong pattern
(100%). This pattern is more frequent (1.63 times lift) than if the items were chosen
independently.
In the same manner can be interpreted all rules.
Explanation for 1st rule: It’s 99% confidence (strong) that if a subject that have el-salvador-aid
and is republican then he wants to freeze physician fee. This pattern is very frequent (2.44 times
lift) than if the items were chosen independently.
It’s notable the fact that most of the glass types which don’t contain Barium
and have a refractive index between 1.512 and 1.520, are not headlamps
On the other hand, headlamps are usually made of barium(around below
1.57 Barium) and have refractive index around 1.515
b. Refractive index and Potasium(K)
It can be seen that most of the glass types are using a small amount of
Potasium in their composition, below value 1(small value).
In the same time, some headlamps glasses have potassium in their
composition but most of them are not using this element at all.
Only one case of headlamp glass exceeds the value of potassium (around
6.25) in its composition
V. Python classification and clustering
Classification
1. J48
Figure 14 Python: J48 run
Figure 15 Python: J48 DecisionTree
Differences:
- Even weka and python clustered in 2 the data, the number of instances per cluster is different.
Cluster Weka Python
1 88 163
2 126 50
- In weka number of iterations was 9 vs 6 in python.
2. HierarchicalClusterer
Differences:
- The same, as SimpleKMeans we the algorithms clustered the same in 2, but with a huge
different of number of instances per cluster.
Cluster Weka Python
1 212 129
2 2 24
Webography:
https://www.futurelearn.com/info/courses/data-mining-with-weka/0/steps/25374
https://www.springboard.com/blog/data-science/data-mining-python-tutorial/
https://dzone.com/refcardz/data-mining-discovering-and