0% found this document useful (0 votes)
222 views39 pages

RACHIT MITTAL Capstone Project. Notes 2 PDF

Note -2 for social media capstone project

Uploaded by

sobashivaprakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
222 views39 pages

RACHIT MITTAL Capstone Project. Notes 2 PDF

Note -2 for social media capstone project

Uploaded by

sobashivaprakash
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

SOCIAL MEDIA TOURISM

CAPSTONE PROJECT

NOTES - II

Submitted To:
Concerned Faculty
At
Great Learning
The University of Texas at Austin

Submitted By:
Rachit Mittal
PGPDSBA online July E 2020
Laptops
Logistic Regression Training Set

Fig 1

Fig 2

Fig 3

2
Logistic Regression Testing Set

Fig 4

Fig 5

Fig 6

3
Linear Discriminant Analysis Training Set

Fig 7

Fig 8

Fig 9

4
Linear Discriminant Analysis Testing Set

Fig 10

Fig 11

Fig 12

5
K- Nearest Neighbours Training Set

Fig 13

Fig 14

Fig 15

6
K- Nearest Neighbours Testing Set

Fig 16

Fig 17

Fig 18

7
Naive Bayes Training Set

Fig 19

Fig 20

Fig 21

8
Naive Bayes Testing Set

Fig 22

Fig 23

Fig 24

9
Decision Tree Classifier Training Set

Fig 25

Fig 26

Fig 27

10
Decision Tree Classifier Testing Set

Fig 28

Fig 29

Fig 30

11
Random Forest Classifier Training Set

Fig 31

Fig 32

Fig 32

12
Random Forest Classifier Testing Set

Fig 34

Fig 35

Fig 36

13
Model Tuning
Bagging Training Set

Fig 37

Fig 38

Fig 39

14
Bagging Testing Set

Fig 40

Fig 41

Fig 42

15
AdaBoosting Training Set

Fig 43

Fig 44

Fig 45

16
AdaBoosting Testing Set

Fig 46

Fig 47

Fig 48

17
Gradient Boosting Training Set

Fig 49

Fig 50

Fig 51

18
Gradient Boosting Testing Set

Fig 52

Fig 53

Fig 54

19
Model Comparison

Fig 55

➢ In order to perform these models, the data was cleaned and unwanted variables were
removed. This was followed by treatment of the imbalance in the data using SMOTE.
➢ After that the data was scaled using the standard scalar as there are variables in 1000’s,
100’s etc. With that, train test split was performed in which the data was divided in the
ratio of 70:30 where 70% constitutes the training set.

1. Logistic Regression model and Linear Discriminant Analysis model provides


very poor accuracy of 74.9% and 75.4% on train set and 75.2% and 75.4% on
test set respectively. In Logistic regression it can be observed that the accuracy
for test set has shown a very little improvement whereas in case of Linear
Discriminant Analysis accuracy remained the same.
2. Decision Tree (CART) model and Random Forest model have provided an
excellent accuracy on Training set that is 100% and applying the models to
testing set, we see that the accuracy has declined a bit that is 95.8% for Decision
Tree (CART) model and 98.8% for Random Forest model.
3. The AUC score of both test and train set for K- Nearest Neighbours model and
Random Forest model is perfect i.e., 100%. Although, both the models are very
good the recall of Random Forest model (98.3%) is slightly lower than the K-
Nearest Neighbours model (100%).
4. The worst performing model in terms of all the parameters is the Naïve Bayes
model which shows the accuracy of just 71.8% in train set and 73.0% in test set
where as both Decision Trees (CART) model and Random Forest model have
performed equally well with better AUC score, Recall and Accuracy.
5. The K-Nearest Neighbours achieved the accuracy 98.2% on training set and
98.1% on testing set. It has the Recall of 95.0% on test set.
6. Even after applying the Bagging technique the model although showed very
good performance in both training as well as testing sets clocking the accuracy
of 100% and 97.0% respectively.
Although, Random Forest is Test set has accuracy of 98.8% but it has produced for number
of False positives which is 4. as compared to the K-nearest Neighbours having accuracy of
97.0% and producing 0 False positive cases.

20
Mobiles
Logistic Regression Training Set

Fig 56

Fig 57

Fig 58

21
Logistic Regression Testing Set

Fig 59

Fig 60

Fig 61

22
Linear Discriminant Analysis Training Set

Fig 62

Fig 63

Fig 64

23
Linear Discriminant Analysis Testing Set

Fig 65

Fig 66

Fig 67

24
K- Nearest Neighbours Training Set

Fig 68

Fig 69

Fig 70

25
K- Nearest Neighbours Testing Set

Fig 71

Fig 72

Fig 73

26
Naive Bayes Training Set

Fig 74

Fig 75

Fig 76

27
Naive Bayes Testing Set

Fig 77

Fig 78

Fig 79

28
Decision Tree Classifier Training Set

Fig 80

Fig 81

Fig 82

29
Decision Tree Classifier Testing Set

Fig 83

Fig 84

Fig 85

30
Random Forest Classifier Training Set

Fig 86

Fig 87

Fig 88

31
Random Forest Classifier Testing Set

Fig 89

Fig 90

Fig 91

32
Model Tuning
Bagging Training Set

Fig 92

Fig 93

Fig 94

33
Bagging Testing Set

Fig 95

Fig 96

Fig 97

34
AdaBoosting Training Set

Fig 98

Fig 99

Fig 100

35
AdaBoosting Testing Set

Fig 101

Fig 102

Fig 103

36
Gradient Boosting Training Set

Fig 104

Fig 105

Fig 106

37
Gradient Boosting Testing Set

Fig 107

Fig 108

Fig 109

38
Model Comparison

Fig 110

On comparing with the Laptop users as the number of rows or data points increases the
accuracy of most of the models decreased.
1. Logistic Regression model and Linear Discriminant Analysis model provides very poor
accuracy of 72.3% and 72.2% on train set and 72.7% and 72.7% on test set respectively.
In both the cases it can be observed that the accuracy for test set has shown a very little
improvement.
2. Decision Tree (CART) model and Random Forest model have provided an excellent
accuracy on Training set that is 100% and applying the models to testing set, we see
that the accuracy has declined a bit that is 98.1% for Decision Tree (CART) model and
99.5% for Random Forest model.
3. Although, both the models are very good the recall of Random Forest model (99.2%)
is slightly lower than the K- Nearest Neighbours model (99.7%).
4. The worst performing model in terms of all the parameters is the Naïve Bayes model
which shows the accuracy of just 69.1% in train set and 68.6% in test set.
5. Even after applying the Bagging technique the model although showed very good
performance in both training as well as testing sets clocking the accuracy of 100% and
99.2% respectively.
6. Although, Gradient Boosting model performed better in case of Laptop using customers
but for mobile using individuals the huge decline is observed in the accuracy.
Although, Random Forest is Test set has accuracy of 99.5% but it has produced for number of
False positives which is 21. as compared to the K-nearest Neighbours having accuracy of
98.6% and producing 9 False positive cases.
So, in both the cases K-Nearest Neighbours is the best model.
1. Applying this model having such as high Recall of can help the company identify which
customers can purchase the product in the new future. Also, this helps in better reach
and target the audience accordingly.
2. This can also increase the traffic on the company’s site resulting in better minimizing
the click per cost expense for the company.
3. As, the higher the number of hits on website increases, more chances of purchasing the
product also increases bringing in the surge in revenues for the company.

39

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy