RACHIT MITTAL Capstone Project. Notes 2 PDF
RACHIT MITTAL Capstone Project. Notes 2 PDF
CAPSTONE PROJECT
NOTES - II
Submitted To:
Concerned Faculty
At
Great Learning
The University of Texas at Austin
Submitted By:
Rachit Mittal
PGPDSBA online July E 2020
Laptops
Logistic Regression Training Set
Fig 1
Fig 2
Fig 3
2
Logistic Regression Testing Set
Fig 4
Fig 5
Fig 6
3
Linear Discriminant Analysis Training Set
Fig 7
Fig 8
Fig 9
4
Linear Discriminant Analysis Testing Set
Fig 10
Fig 11
Fig 12
5
K- Nearest Neighbours Training Set
Fig 13
Fig 14
Fig 15
6
K- Nearest Neighbours Testing Set
Fig 16
Fig 17
Fig 18
7
Naive Bayes Training Set
Fig 19
Fig 20
Fig 21
8
Naive Bayes Testing Set
Fig 22
Fig 23
Fig 24
9
Decision Tree Classifier Training Set
Fig 25
Fig 26
Fig 27
10
Decision Tree Classifier Testing Set
Fig 28
Fig 29
Fig 30
11
Random Forest Classifier Training Set
Fig 31
Fig 32
Fig 32
12
Random Forest Classifier Testing Set
Fig 34
Fig 35
Fig 36
13
Model Tuning
Bagging Training Set
Fig 37
Fig 38
Fig 39
14
Bagging Testing Set
Fig 40
Fig 41
Fig 42
15
AdaBoosting Training Set
Fig 43
Fig 44
Fig 45
16
AdaBoosting Testing Set
Fig 46
Fig 47
Fig 48
17
Gradient Boosting Training Set
Fig 49
Fig 50
Fig 51
18
Gradient Boosting Testing Set
Fig 52
Fig 53
Fig 54
19
Model Comparison
Fig 55
➢ In order to perform these models, the data was cleaned and unwanted variables were
removed. This was followed by treatment of the imbalance in the data using SMOTE.
➢ After that the data was scaled using the standard scalar as there are variables in 1000’s,
100’s etc. With that, train test split was performed in which the data was divided in the
ratio of 70:30 where 70% constitutes the training set.
20
Mobiles
Logistic Regression Training Set
Fig 56
Fig 57
Fig 58
21
Logistic Regression Testing Set
Fig 59
Fig 60
Fig 61
22
Linear Discriminant Analysis Training Set
Fig 62
Fig 63
Fig 64
23
Linear Discriminant Analysis Testing Set
Fig 65
Fig 66
Fig 67
24
K- Nearest Neighbours Training Set
Fig 68
Fig 69
Fig 70
25
K- Nearest Neighbours Testing Set
Fig 71
Fig 72
Fig 73
26
Naive Bayes Training Set
Fig 74
Fig 75
Fig 76
27
Naive Bayes Testing Set
Fig 77
Fig 78
Fig 79
28
Decision Tree Classifier Training Set
Fig 80
Fig 81
Fig 82
29
Decision Tree Classifier Testing Set
Fig 83
Fig 84
Fig 85
30
Random Forest Classifier Training Set
Fig 86
Fig 87
Fig 88
31
Random Forest Classifier Testing Set
Fig 89
Fig 90
Fig 91
32
Model Tuning
Bagging Training Set
Fig 92
Fig 93
Fig 94
33
Bagging Testing Set
Fig 95
Fig 96
Fig 97
34
AdaBoosting Training Set
Fig 98
Fig 99
Fig 100
35
AdaBoosting Testing Set
Fig 101
Fig 102
Fig 103
36
Gradient Boosting Training Set
Fig 104
Fig 105
Fig 106
37
Gradient Boosting Testing Set
Fig 107
Fig 108
Fig 109
38
Model Comparison
Fig 110
On comparing with the Laptop users as the number of rows or data points increases the
accuracy of most of the models decreased.
1. Logistic Regression model and Linear Discriminant Analysis model provides very poor
accuracy of 72.3% and 72.2% on train set and 72.7% and 72.7% on test set respectively.
In both the cases it can be observed that the accuracy for test set has shown a very little
improvement.
2. Decision Tree (CART) model and Random Forest model have provided an excellent
accuracy on Training set that is 100% and applying the models to testing set, we see
that the accuracy has declined a bit that is 98.1% for Decision Tree (CART) model and
99.5% for Random Forest model.
3. Although, both the models are very good the recall of Random Forest model (99.2%)
is slightly lower than the K- Nearest Neighbours model (99.7%).
4. The worst performing model in terms of all the parameters is the Naïve Bayes model
which shows the accuracy of just 69.1% in train set and 68.6% in test set.
5. Even after applying the Bagging technique the model although showed very good
performance in both training as well as testing sets clocking the accuracy of 100% and
99.2% respectively.
6. Although, Gradient Boosting model performed better in case of Laptop using customers
but for mobile using individuals the huge decline is observed in the accuracy.
Although, Random Forest is Test set has accuracy of 99.5% but it has produced for number of
False positives which is 21. as compared to the K-nearest Neighbours having accuracy of
98.6% and producing 9 False positive cases.
So, in both the cases K-Nearest Neighbours is the best model.
1. Applying this model having such as high Recall of can help the company identify which
customers can purchase the product in the new future. Also, this helps in better reach
and target the audience accordingly.
2. This can also increase the traffic on the company’s site resulting in better minimizing
the click per cost expense for the company.
3. As, the higher the number of hits on website increases, more chances of purchasing the
product also increases bringing in the surge in revenues for the company.
39