0% found this document useful (0 votes)

118 views17 pages

Imported CSV Data: Exercise 1

The document summarizes the results of running various machine learning classifiers on a dataset containing phishing and legitimate webpages. It provides the logs and output summaries of running Random Forest, LMT, Logistic Regression, SGD, and Naive Bayes classifiers. The highest performing classifiers were Random Forest and LMT, achieving over 97% accuracy. Naive Bayes had the lowest performance at around 90.7% accuracy. The document also notes that f1 score, recall, and precision code was added to the phishing detection Python script.

Uploaded by

shubham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views17 pages

Imported CSV Data: Exercise 1

Uploaded by

shubham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

EXERCISE 1:

Imported csv data

Under classify selected, test option as cross validation 10 folds

1) Random Forest (Trees)

a. Log

1. 12:28:19: Weka Explorer

2. 12:28:19: (c) 1999-2022 The University of Waikato, Hamilton, New Zealand
3. 12:28:19: web: https://www.cs.waikato.ac.nz/~ml/weka/
4. 12:28:19: Started on Wednesday, 23 February 2022
5. 12:28:48: Base relation is now phishing_l (11055 instances)
6. 12:31:22: Started weka.classifiers.trees.RandomForest
7. 12:31:22: Command: weka.classifiers.trees.RandomForest -P 100 -I 100 -num-slots 1 -K 0 -M
1.0 -V 0.001 -S 1
8. 12:31:43: Finished weka.classifiers.trees.RandomForest

b. Output

1. === Run information ===

2.
3. Scheme: weka.classifiers.trees.RandomForest -P 100 -I 100 -num-slots 1 -K 0 -M 1.0 -V
0.001 -S 1
4. Relation: phishing_l
5. Instances: 11055
6.
7. Test mode: 10-fold cross-validation
8.
9. === Classifier model (full training set) ===
10.
11. RandomForest
12.
13. Bagging with 100 iterations and base learner
14.
15. weka.classifiers.trees.RandomTree -K 0 -M 1.0 -V 0.001 -S 1 -do-not-check-capabilities
16.
17. Time taken to build model: 2.21 seconds
18.
19. === Stratified cross-validation ===
20. === Summary ===
21.
22. Correctly Classified Instances 10751 97.2501 %
23. Incorrectly Classified Instances 304 2.7499 %
24. Kappa statistic 0.9442
25. Mean absolute error 0.0494
26. Root mean squared error 0.1432
27. Relative absolute error 10.0172 %
28. Root relative squared error 28.8286 %
29. Total Number of Instances 11055
30.
31. === Detailed Accuracy By Class ===
32.
33. TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC
Area Class
34. 0.961 0.019 0.976 0.961 0.969 0.944 0.996 0.995
Legitimate
35. 0.981 0.039 0.970 0.981 0.975 0.944 0.996 0.996
Phishing
36. Weighted Avg. 0.973 0.030 0.973 0.973 0.972 0.944 0.996 0.995
37.
38. === Confusion Matrix ===
39.
40. a b <-- classified as
41. 4708 190 | a = Legitimate
42. 114 6043 | b = Phishing

2) LMT (Trees)
a. Log

1. 12:38:04: Started weka.classifiers.trees.LMT

2. 12:38:04: Command: weka.classifiers.trees.LMT -I -1 -M 15 -W 0.0
3. 12:39:13: Finished weka.classifiers.trees.LMT

b. Output

1. === Run information ===

2.
3. Scheme: weka.classifiers.trees.LMT -I -1 -M 15 -W 0.0
4. Relation: phishing_l
5. Instances: 11055
6. Test mode: 10-fold cross-validation
7.
8. === Classifier model (full training set) ===
9.
10.
11. Time taken to build model: 6.89 seconds
12.
13. === Stratified cross-validation ===
14. === Summary ===
15.
16. Correctly Classified Instances 10715 96.9245 %
17. Incorrectly Classified Instances 340 3.0755 %
18. Kappa statistic 0.9376
19. Mean absolute error 0.0348
20. Root mean squared error 0.1603
21. Relative absolute error 7.0539 %
22. Root relative squared error 32.2655 %
23. Total Number of Instances 11055
24.
25. === Detailed Accuracy By Class ===
26.
27. TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC
Area Class
28. 0.958 0.022 0.972 0.958 0.965 0.938 0.990 0.986
Legitimate
29. 0.978 0.042 0.967 0.978 0.973 0.938 0.990 0.990
Phishing
30. Weighted Avg. 0.969 0.033 0.969 0.969 0.969 0.938 0.990 0.988
31.
32. === Confusion Matrix ===
33.
34. a b <-- classified as
35. 4693 205 | a = Legitimate
36. 135 6022 | b = Phishing
37.

3) Logistic (Function)
a. Log

1. 12:39:26: Started weka.classifiers.functions.Logistic

2. 12:39:26: Command: weka.classifiers.functions.Logistic -R 1.0E-8 -M -1 -num-decimal-places 4
3. 12:39:28: Finished weka.classifiers.functions.Logistic
4.

b. Output

1. === Run information ===

2.
3. Scheme: weka.classifiers.functions.Logistic -R 1.0E-8 -M -1 -num-decimal-places 4
4. Relation: phishing_l
5. Instances: 11055
6. Test mode: 10-fold cross-validation
7.
8. === Classifier model (full training set) ===
9.
10. Logistic Regression with ridge parameter of 1.0E-8
11. Coefficients...
12.
13. Time taken to build model: 0.3 seconds
14.
15. === Stratified cross-validation ===
16. === Summary ===
17.
18. Correctly Classified Instances 10252 92.7363 %
19. Incorrectly Classified Instances 803 7.2637 %
20. Kappa statistic 0.8524
21. Mean absolute error 0.1078
22. Root mean squared error 0.2332
23. Relative absolute error 21.8369 %
24. Root relative squared error 46.9471 %
25. Total Number of Instances 11055
26.
27. === Detailed Accuracy By Class ===
28.
29. TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC
Area Class
30. 0.905 0.055 0.929 0.905 0.917 0.853 0.979 0.977
Legitimate
31. 0.945 0.095 0.926 0.945 0.935 0.853 0.979 0.982
Phishing
32. Weighted Avg. 0.927 0.077 0.927 0.927 0.927 0.853 0.979 0.980
33.
34. === Confusion Matrix ===
35.
36. a b <-- classified as
37. 4434 464 | a = Legitimate
38. 339 5818 | b = Phishing
39.
40.

4) SGD(Function)
a. Log

1. 12:39:44: Started weka.classifiers.functions.SGD

2. 12:39:44: Command: weka.classifiers.functions.SGD -F 0 -L 0.01 -R 1.0E-4 -E 500 -C 0.001 -S
1
3. 12:39:48: Finished weka.classifiers.functions.SGD
4.

b. Output

1. === Run information ===

2.
3. Scheme: weka.classifiers.functions.SGD -F 0 -L 0.01 -R 1.0E-4 -E 500 -C 0.001 -S 1
4. Relation: phishing_l
5. Instances: 11055
6. Test mode: 10-fold cross-validation
7.
8. === Classifier model (full training set) ===
9.
10. Loss function: Hinge loss (SVM)
11.
12. Result =
13. Time taken to build model: 0.49 seconds
14.
15. === Stratified cross-validation ===
16. === Summary ===
17.
18. Correctly Classified Instances 10276 92.9534 %
19. Incorrectly Classified Instances 779 7.0466 %
20. Kappa statistic 0.8568
21. Mean absolute error 0.0705
22. Root mean squared error 0.2655
23. Relative absolute error 14.2783 %
24. Root relative squared error 53.4385 %
25. Total Number of Instances 11055
26.
27. === Detailed Accuracy By Class ===
28.
29. TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC
Area Class
30. 0.905 0.051 0.934 0.905 0.919 0.857 0.927 0.887
Legitimate
31. 0.949 0.095 0.926 0.949 0.938 0.857 0.927 0.907
Phishing
32. Weighted Avg. 0.930 0.075 0.930 0.930 0.929 0.857 0.927 0.899
33.
34. === Confusion Matrix ===
35.
36. a b <-- classified as
37. 4433 465 | a = Legitimate
38. 314 5843 | b = Phishing
39.

5) Naïve Bayes (Bayes)

a. Log

1. 12:39:57: Started weka.classifiers.bayes.NaiveBayes

2. 12:39:57: Command: weka.classifiers.bayes.NaiveBayes
3. 12:39:57: Finished weka.classifiers.bayes.NaiveBayes
4.

b. Output

1. === Run information ===

2.
3. Scheme: weka.classifiers.bayes.NaiveBayes
4. Relation: phishing_l
5. Instances: 11055
6. Test mode: 10-fold cross-validation
7.
8. === Classifier model (full training set) ===
9.
10.
11. Time taken to build model: 0.06 seconds
12.
13. === Stratified cross-validation ===
14. === Summary ===
15.
16. Correctly Classified Instances 10027 90.701 %
17. Incorrectly Classified Instances 1028 9.299 %
18. Kappa statistic 0.8111
19. Mean absolute error 0.1197
20. Root mean squared error 0.274
21. Relative absolute error 24.2585 %
22. Root relative squared error 55.1609 %
23. Total Number of Instances 11055
24.
25. === Detailed Accuracy By Class ===
26.
27. TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC
Area Class
28. 0.884 0.074 0.904 0.884 0.894 0.811 0.962 0.956
Legitimate
29. 0.926 0.116 0.909 0.926 0.917 0.811 0.962 0.968
Phishing
30. Weighted Avg. 0.907 0.098 0.907 0.907 0.907 0.811 0.962 0.963
31.
32. === Confusion Matrix ===
33.
34. a b <-- classified as
35. 4328 570 | a = Legitimate
36. 458 5699 | b = Phishing

EXERCISE 2:

Changes done in “phishing_sklearn.py”, added f1 score, recall, precision code

1. from warnings import simplefilter

2.
3. import numpy as np
4. import pandas as pd
5. import sklearn
6. from numpy import genfromtxt
7. from sklearn import datasets
8. from sklearn.naive_bayes import GaussianNB
9. from sklearn.tree import DecisionTreeRegressor
10. from sklearn.ensemble import RandomForestClassifier
11. from sklearn.linear_model import LogisticRegression
12. from sklearn.metrics import (accuracy_score, confusion_matrix, f1_score,
13. precision_score, recall_score)
14. from sklearn.model_selection import train_test_split
15. from sklearn.preprocessing import LabelEncoder, StandardScaler
16.
17. simplefilter(action='ignore', category=FutureWarning)
18.
19. ###############################################################
20.
21. feature=genfromtxt('phishing.csv',delimiter=',',usecols=(i for i in
range(0,30)),skip_header=1)
22. target=genfromtxt('phishing.csv',delimiter=',',usecols=(-1),skip_header=1)
23. sc = StandardScaler()
24. sc.fit(feature)
25. target_label = LabelEncoder().fit_transform(target)
26. feature_std = sc.transform(feature)
27. test_size_val=0.33
28. x_train, x_test, y_train, y_test = train_test_split(feature_std, target_label,
test_size=test_size_val, random_state=1)
29.
30. print("Begin with test_size"+str(test_size_val)+":__________________________________")
31. ###################################################
32. ## print stats
33. precision_scores_list = []
34. accuracy_scores_list = []
35.
36. def print_stats_metrics(y_test, y_pred):
37. print('Accuracy: %.2f' % accuracy_score(y_test, y_pred) )
38. accuracy_scores_list.append(accuracy_score(y_test, y_pred) )
39. confmat = confusion_matrix(y_true=y_test, y_pred=y_pred)
40.
41. print('F1 score: %.2f' % f1_score(y_true=y_test,y_pred=y_pred))
42. print('Precision: %.2f' % precision_score(y_true=y_test,y_pred=y_pred))
43. print('Recall: %.2f' % recall_score(y_true=y_test,y_pred=y_pred))
44.
45. print ("confusion matrix")
46. print(confmat)
47. print (pd.crosstab(y_test, y_pred, rownames=['True'], colnames=['Predicted'],
margins=True))
48.
49. ########################Logistic Regression##############################
50. print("########################Logistic Regression##############################")
51. clfLog = LogisticRegression()
52. clfLog.fit(x_train,y_train)
53. predictions = clfLog.predict(x_test)
54. print_stats_metrics(y_test, predictions)
55.
56. ########################Random Forest##############################
57. print("########################Random Forest##############################")
58. clfRandForest = RandomForestClassifier()
59. clfRandForest.fit(x_train,y_train)
60. predictions = clfRandForest.predict(x_test)
61. print_stats_metrics(y_test, predictions)
62. #######################Decision Tree#######################
63. print("#######################Decision Tree#######################")
64. clfDT = DecisionTreeRegressor()
65. clfDT.fit(x_train,y_train)
66. predictions = clfDT.predict(x_test)
67. print_stats_metrics(y_test, predictions)
68. #######################Naive Bayes#######################
69. print("#######################Naive Bayes#######################")
70. clfNB = GaussianNB()
71. clfNB.fit(x_train,y_train)
72. predictions = clfNB.predict(x_test)
73. print_stats_metrics(y_test, predictions)

At test_size=0.33

1. ########################Logistic Regression##############################
2. Accuracy: 0.92
3. F1 score: 0.93
4. Precision: 0.92
5. Recall: 0.94
6. confusion matrix
7. [[1476 156]
8. [ 121 1896]]
9. Predicted 0 1 All
10. True
11. 0 1476 156 1632
12. 1 121 1896 2017
13. All 1597 2052 3649
14. ########################Random Forest##############################
15. Accuracy: 0.97
16. F1 score: 0.97
17. Precision: 0.96
18. Recall: 0.98
19. confusion matrix
20. [[1557 75]
21. [ 42 1975]]
22. Predicted 0 1 All
23. True
24. 0 1557 75 1632
25. 1 42 1975 2017
26. All 1599 2050 3649
27. #######################Decision Tree#######################
28. Accuracy: 0.96
29. F1 score: 0.96
30. Precision: 0.96
31. Recall: 0.96
32. confusion matrix
33. [[1549 83]
34. [ 76 1941]]
35. Predicted 0.0 1.0 All
36. True
37. 0 1549 83 1632
38. 1 76 1941 2017
39. All 1625 2024 3649
40. #######################Naive Bayes#######################
41. Accuracy: 0.62
42. F1 score: 0.48
43. Precision: 1.00
44. Recall: 0.31
45. confusion matrix
46. [[1629 3]
47. [1383 634]]
48. Predicted 0 1 All
49. True
50. 0 1629 3 1632
51. 1 1383 634 2017
52. All 3012 637 3649
53.
54.

At test_size=0.50

1. Begin with test_size 0.50:__________________________________

2. ########################Logistic Regression##############################
3. Accuracy: 0.92
4. F1 score: 0.93
5. Precision: 0.92
6. Recall: 0.94
7. confusion matrix
8. [[2216 248]
9. [ 176 2888]]
10. Predicted 0 1 All
11. True
12. 0 2216 248 2464
13. 1 176 2888 3064
14. All 2392 3136 5528
15. ########################Random Forest##############################
16. Accuracy: 0.97
17. F1 score: 0.97
18. Precision: 0.96
19. Recall: 0.98
20. confusion matrix
21. [[2331 133]
22. [ 59 3005]]
23. Predicted 0 1 All
24. True
25. 0 2331 133 2464
26. 1 59 3005 3064
27. All 2390 3138 5528
28. #######################Decision Tree#######################
29. Accuracy: 0.95
30. F1 score: 0.95
31. Precision: 0.95
32. Recall: 0.96
33. confusion matrix
34. [[2308 156]
35. [ 125 2939]]
36. Predicted 0.0 1.0 All
37. True
38. 0 2308 156 2464
39. 1 125 2939 3064
40. All 2433 3095 5528
41. #######################Naive Bayes#######################
42. Accuracy: 0.61
43. F1 score: 0.47
44. Precision: 0.99
45. Recall: 0.30
46. confusion matrix
47. [[2459 5]
48. [2134 930]]
49. Predicted 0 1 All
50. True
51. 0 2459 5 2464
52. 1 2134 930 3064
53. All 4593 935 5528
54.

At test_size=0.20

1. Begin with test_size0.20:__________________________________

2. ########################Logistic Regression##############################
3. Accuracy: 0.92
4. F1 score: 0.93
5. Precision: 0.92
6. Recall: 0.94
7. confusion matrix
8. [[ 895 101]
9. [ 68 1147]]
10. Predicted 0 1 All
11. True
12. 0 895 101 996
13. 1 68 1147 1215
14. All 963 1248 2211
15. ########################Random Forest##############################
16. Accuracy: 0.98
17. F1 score: 0.98
18. Precision: 0.97
19. Recall: 0.98
20. confusion matrix
21. [[ 963 33]
22. [ 22 1193]]
23. Predicted 0 1 All
24. True
25. 0 963 33 996
26. 1 22 1193 1215
27. All 985 1226 2211
28. #######################Decision Tree#######################
29. Accuracy: 0.96
30. F1 score: 0.96
31. Precision: 0.97
32. Recall: 0.96
33. confusion matrix
34. [[ 955 41]
35. [ 46 1169]]
36. Predicted 0.0 1.0 All
37. True
38. 0 955 41 996
39. 1 46 1169 1215
40. All 1001 1210 2211
41. #######################Naive Bayes#######################
42. Accuracy: 0.62
43. F1 score: 0.47
44. Precision: 0.99
45. Recall: 0.31
46. confusion matrix
47. [[993 3]
48. [843 372]]
49. Predicted 0 1 All
50. True
51. 0 993 3 996
52. 1 843 372 1215
53. All 1836 375 2211
54.

EXERCISE 3:

Learning rate: Itis a tuning parameter in an optimization algorithm that determines the
step size at each iteration while moving toward a minimum of a loss function.

Epoch: It’s a term used in machine learning and indicates the number of passes of the
entire training dataset the algorithm has been completed.

Batch Size: It’s a term used in machine learning and refers to the number of training
examples utilized in one iteration.

Coded F1 scores, precision, recall

1. import tensorflow.compat.v1 as tf
2. tf.disable_v2_behavior()
3. import numpy as np
4. import pandas as pd
5. from numpy import genfromtxt
6. from sklearn import datasets
7. from sklearn.model_selection import train_test_split
8. import sklearn
9. from sklearn.preprocessing import LabelEncoder
10. from sklearn.preprocessing import StandardScaler
11. from sklearn.metrics import accuracy_score
12. from sklearn.metrics import confusion_matrix
13. from sklearn.metrics import precision_score
14. from sklearn.metrics import recall_score, f1_score
15. import pandas as pd
16. import matplotlib.pyplot as plt
17.
18. ###############################################################
19.
20. learning_rate = 0.01
21. #n_epochs = 5000
22. n_epochs = 100
23. batch_size = 10000
24.
25. def convertOneHot(data):
26. y_onehot=[0]*len(data)
27. for i,j in enumerate(data):
28. y_onehot[i]=[0]*(data.max()+1)
29. y_onehot[i][j]=1
30. return y_onehot
31.
32. ###############################################################
33. feature=genfromtxt('phishing.csv',delimiter=',',usecols=(i for i in
range(0,31)),skip_header=1)
34. target=genfromtxt('phishing.csv',delimiter=',',usecols=(-1),skip_header=1)
35. sc = StandardScaler()
36. sc.fit(feature)
37. target_label = LabelEncoder().fit_transform(target)
38. target_onehot = convertOneHot(target_label)
39. feature_std = sc.transform(feature)
40. x_train, x_test, y_train_onehot, y_test_onehot = train_test_split(feature_std,
target_onehot, test_size=0.30, random_state=0)
41. A=x_train.shape[1]
42. B=len(y_train_onehot[0])
43. print(len(y_test_onehot[0]))
44. print(B)
45. print("Begin:__________________________________")
46. #####################################################################
47.
48. def plot_metric_per_epoch():
49. x_epochs = []
50. y_epochs = []
51. for i, val in enumerate(accuracy_scores_list):
52. x_epochs.append(i)
53. y_epochs.append(val)
54.
55. plt.scatter(x_epochs, y_epochs,s=50,c='lightgreen', marker='s', label='score')
56. plt.xlabel('epoch')
57. plt.ylabel('score')
58. plt.title('Score per epoch')
59. plt.legend()
60. plt.grid()
61. plt.show()
62.
63. ###################################################
64. ## print stats
65. precision_scores_list = []
66. accuracy_scores_list = []
67.
68. def print_stats_metrics(y_test, y_pred):
69. print('Accuracy: %.2f' % accuracy_score(y_test, y_pred) )
70. #Accuracy: 0.84
71. accuracy_scores_list.append(accuracy_score(y_test, y_pred) )
72. confmat = confusion_matrix(y_true=y_test, y_pred=y_pred)
73.
74. print('F1 score: %.2f' % f1_score(y_true=y_test,y_pred=y_pred))
75. print('Precision: %.2f' % precision_score(y_true=y_test,y_pred=y_pred))
76. print('Recall: %.2f' % recall_score(y_true=y_test,y_pred=y_pred))
77.
78. print ("confusion matrix")
79. print(confmat)
80. print (pd.crosstab(y_test, y_pred, rownames=['True'], colnames=['Predicted'],
margins=True))
81.
82. ###############################################################
83. def layer(input, weight_shape, bias_shape):
84. weight_stddev = (2.0/weight_shape[0])**0.5
85. w_init = tf.random_normal_initializer(stddev=weight_stddev)
86. bias_init = tf.constant_initializer(value=0)
87. W = tf.get_variable("W", weight_shape, initializer=w_init)
88. b = tf.get_variable("b", bias_shape, initializer=bias_init)
89. return tf.nn.relu(tf.matmul(input, W) + b)
90. ###############################################################
91. def inference_deep_layers(x_tf, A, B):
92. with tf.variable_scope("hidden_1"):
93. hidden_1 = layer(x_tf, [A, 15],[15])
94. with tf.variable_scope("hidden_2"):
95. hidden_2 = layer(hidden_1, [15, 5],[5])
96. with tf.variable_scope("output"):
97. output = layer(hidden_2, [5, B], [B])
98. return output
99. ###############################################################
100. def loss_deep(output, y_tf):
101. xentropy = tf.nn.softmax_cross_entropy_with_logits(logits=output, labels=y_tf)
102. loss = tf.reduce_mean(xentropy)
103. return loss
104. ###########################################################
105.
106. def training(cost):
107. optimizer = tf.train.GradientDescentOptimizer(learning_rate)
108. train_op = optimizer.minimize(cost)
109. return train_op
110.
111. ###########################################################
112. def evaluate(output, y_tf):
113. correct_prediction = tf.equal(tf.argmax(output,1), tf.argmax(y_tf,1))
114. accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
115. return accuracy
116. ###############################################################
117.
118. x_tf = tf.placeholder("float",[None,A])
119. y_tf = tf.placeholder("float",[None,B])
120. ###############################################################
121. output = inference_deep_layers(x_tf,A,B)
122. cost = loss_deep(output,y_tf)
123. train_op=training(cost)
124. eval_op=evaluate(output,y_tf)
125. ###############################################################
126. init = tf.global_variables_initializer()
127. sess = tf.Session()
128. sess.run(init)
129. ###############################################################
130. y_p_metrics = tf.argmax(output,1)
131. ###############################################################
132. num_samples_train_set=x_train.shape[0]
133. num_batches = int(num_samples_train_set/batch_size)
134.
135. ###############################################################
136.
137. for i in range(n_epochs):
138. print("epoch %s out of %s"%(i,n_epochs))
139. for batch_n in range(num_batches):
140. sta = batch_n*batch_size
141. end = sta+batch_size
142. sess.run(train_op,feed_dict={x_tf:x_train[sta:end],y_tf:y_train_onehot[sta:end]})
143. print ("------------------------------------------------------------------------------
-")
144. print ("Accuracy score")
145. #result = sess.run(eval_op,feed_dict={x_tf:x_test,y_tf:y_test_onehot})
146. result, y_result_metrics = sess.run([eval_op, y_p_metrics], feed_dict={x_tf: x_test,
y_tf: y_test_onehot})
147. print("Run {},{}".format(i,result))
148. y_true = np.argmax(y_test_onehot,1)
149. print_stats_metrics(y_true, y_result_metrics)
150. if i==n_epochs-1:
151. plot_metric_per_epoch()

At different epoch, batch size and learning rate

1) At

1. learning_rate = 0.01
2. n_epochs = 100
3. batch_size = 10000

Output:

1. Accuracy score
2. Run 99,0.4727163016796112
3. Accuracy: 0.47
4. F1 score: 0.09
5. Precision: 0.85
6. Recall: 0.05
7. confusion matrix
8. [[1483 15]
9. [1734 85]]
10. Predicted 0 1 All
11. True
12. 0 1483 15 1498
13. 1 1734 85 1819
14. All 3217 100 3317
15.
16.

2) At

1. learning_rate = 0.006
2. n_epochs = 100
3. batch_size = 10000

Output:

1. Accuracy: 0.41
2. F1 score: 0.20
3. Precision: 0.39
4. Recall: 0.13
5. confusion matrix
6. [[1127 371]
7. [1579 240]]
8. Predicted 0 1 All
9. True
10. 0 1127 371 1498
11. 1 1579 240 1819
12. All 2706 611 3317
13.

3) At

4. learning_rate = 0.006
5. n_epochs = 100
6. batch_size = 1000

Output:

1. Accuracy: 0.95
2. F1 score: 0.95
3. Precision: 0.94
4. Recall: 0.96
5. confusion matrix
6. [[1390 108]
7. [ 65 1754]]
8. Predicted 0 1 All
9. True
10. 0 1390 108 1498
11. 1 65 1754 1819
12. All 1455 1862 3317
13.
4) At

7. learning_rate = 0.006
8. n_epochs = 100
9. batch_size = 128

Output:

1. Accuracy: 0.99
2. F1 score: 0.99
3. Precision: 0.99
4. Recall: 1.00
5. confusion matrix
6. [[1474 24]
7. [ 0 1819]]
8. Predicted 0 1 All
9. True
10. 0 1474 24 1498
11. 1 0 1819 1819
12. All 1474 1843 3317
13.

Results:
At first we saw that how our dataset looks like, Then we installed weka and opened the CSV data
set. Using the Explorer window and. In the classifier tab, we set the classification as different sets.
Logistic regression, Naïve Bayes, Random Forest, etc.
Will logged the data by each of them, The output contains the time taken by the model. The
confusion matrix the F1 score, precision, accuracy and recall, for the data set over the trained
model.

Time taken by model:

Random Forest: 2.21 s
LMT: 6.89 s
Logistic: 0.3
SGD: 0.49
Naïve bayes: 0.06

Mean Abs Error:

Random Forest: 0.049
LMT: 0.034
Logistic: 0.1078
SGD: 0.0705
Naïve bayes: 0.1197

Precision:
Random Forest: 0.970
LMT: 0.967
Logistic: 0.926
SGD: 0.930
Naïve bayes: 0.909

On getting such results we can finally discuss which phising detector performed better in
comparison to others, we can see performance on basis of various parameters that Random Forest
performed better in comparison to others as it have high precision score and lower Mean Abs Error,
but time taken is much higher compared to rest algorithms

In sklearn we implemented the classifier model on the basis of different algorithms like Logistic
Regression, Decision Tree, Random Forest, and Naïve Bayes

On the basis of various parameters and accuracy we can see that which model performed better
Accuracy
Logistic Regression:0.92
Random Forest:0.97
Decision Tree:0.96
Naïve Bayes:0.62
F1 score
Logistic Regression:0.93
Random Forest:0.97
Decision Tree:0.96
Naïve Bayes:0.48
Here also we can see the Random Forest worked better in comparison to other

Now, in terms of NN layer, we saw the highest accuracy came out to be 0.99 on slower learning rate
and smaller batch size which was even better than earlier attempts.
So we can say that Neural Networks worked better in comparison to others.
We can also say that Neural Networks have the ability to learn and model non-linear and complex
relationships , which is really important because in real-life, many of the relationships between
inputs and outputs are non-linear as well as complex, so it performed better.

Lab 7
No ratings yet
Lab 7
12 pages
Introduction To LP - Transshipment Problem
No ratings yet
Introduction To LP - Transshipment Problem
26 pages
Attiq Ahmad Afsar Assignment 1
No ratings yet
Attiq Ahmad Afsar Assignment 1
12 pages
Weka
No ratings yet
Weka
22 pages
WEKA Assignment I
No ratings yet
WEKA Assignment I
2 pages
Lab6-Data Mining
No ratings yet
Lab6-Data Mining
7 pages
Iris Dataset Clustering and Spam Email Separation
No ratings yet
Iris Dataset Clustering and Spam Email Separation
20 pages
Name: Le Ho Thao Nguyen Student ID: 20194224
No ratings yet
Name: Le Ho Thao Nguyen Student ID: 20194224
9 pages
ML Practical1
No ratings yet
ML Practical1
4 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
27 ShivangiSrivastava ML Lab
No ratings yet
27 ShivangiSrivastava ML Lab
52 pages
DM Manual-Min
No ratings yet
DM Manual-Min
100 pages
NTAL Report PDF
No ratings yet
NTAL Report PDF
17 pages
DWM
No ratings yet
DWM
9 pages
SQR Da 2
No ratings yet
SQR Da 2
11 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Relatório IA Classification Type Breast Cancer
No ratings yet
Relatório IA Classification Type Breast Cancer
53 pages
Exemplar A - Good
No ratings yet
Exemplar A - Good
6 pages
Nilay Debnath CSE 06607735
No ratings yet
Nilay Debnath CSE 06607735
22 pages
Apply KNN Model and Naïve Bayes Model
No ratings yet
Apply KNN Model and Naïve Bayes Model
1 page
Cybersecurity Project Presentation
No ratings yet
Cybersecurity Project Presentation
8 pages
School of Information Science: Information System Regular Student Assignment
No ratings yet
School of Information Science: Information System Regular Student Assignment
7 pages
Machine Learnin1
100% (1)
Machine Learnin1
41 pages
CSE422 15 Lab GroupNo011 Report 23241101 21201596
No ratings yet
CSE422 15 Lab GroupNo011 Report 23241101 21201596
13 pages
Results
No ratings yet
Results
4 pages
Journal Paper
No ratings yet
Journal Paper
5 pages
Final Report
No ratings yet
Final Report
17 pages
ML Lab 8
No ratings yet
ML Lab 8
9 pages
Lect 02 Evaluation Part 1
No ratings yet
Lect 02 Evaluation Part 1
33 pages
Confusion Matrix
No ratings yet
Confusion Matrix
5 pages
Credit Card Fraud Analysis Ashutosh
No ratings yet
Credit Card Fraud Analysis Ashutosh
3 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Report
No ratings yet
Report
2 pages
Machine Learning Final Report
No ratings yet
Machine Learning Final Report
8 pages
Module 2
No ratings yet
Module 2
151 pages
Ads Exp 4
No ratings yet
Ads Exp 4
4 pages
DM Record-No Roll No
No ratings yet
DM Record-No Roll No
46 pages
LAB4
No ratings yet
LAB4
5 pages
27 28 37 49 Cpe
No ratings yet
27 28 37 49 Cpe
19 pages
Supervised Learning Using Python - Chapter3
No ratings yet
Supervised Learning Using Python - Chapter3
47 pages
Microsoft - Classifying Cybersecurity Incidents
No ratings yet
Microsoft - Classifying Cybersecurity Incidents
8 pages
Great Step Data Abstract
No ratings yet
Great Step Data Abstract
3 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
Run Information KNN RESULTS
No ratings yet
Run Information KNN RESULTS
55 pages
Improved Intrusion Detection Applying Feature Selection Using Rank & Score of Attributes in KDD-99 Data Set
No ratings yet
Improved Intrusion Detection Applying Feature Selection Using Rank & Score of Attributes in KDD-99 Data Set
44 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
Class Imbalance Problem: BY Dr. Anupam Ghosh 4 SEPT, 2023
No ratings yet
Class Imbalance Problem: BY Dr. Anupam Ghosh 4 SEPT, 2023
27 pages
DWM Exp 12
No ratings yet
DWM Exp 12
2 pages
DW Record - 62
No ratings yet
DW Record - 62
27 pages
AIML Solved Paper Nov-Dec 2024
No ratings yet
AIML Solved Paper Nov-Dec 2024
2 pages
Final Report (Yau Jia Xin)
No ratings yet
Final Report (Yau Jia Xin)
68 pages
EXP Week 6.C, D, E
No ratings yet
EXP Week 6.C, D, E
7 pages
Fake News Phase 3
No ratings yet
Fake News Phase 3
3 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Machine Learning Model Evaluation
No ratings yet
Machine Learning Model Evaluation
11 pages
4.8 Estimating The Performance of A Classifier
No ratings yet
4.8 Estimating The Performance of A Classifier
19 pages
Ml-Exp-2 - Jupyter Notebook
No ratings yet
Ml-Exp-2 - Jupyter Notebook
2 pages
جلسه 13
No ratings yet
جلسه 13
76 pages
23bet10114 Naman Gupta Assignment 3
No ratings yet
23bet10114 Naman Gupta Assignment 3
6 pages
Project Report (Cyber - )
No ratings yet
Project Report (Cyber - )
14 pages
Valuation Methods
No ratings yet
Valuation Methods
73 pages
EE 432/532 Diffusion Examples - 1
No ratings yet
EE 432/532 Diffusion Examples - 1
13 pages
Pollution Control
No ratings yet
Pollution Control
62 pages
Pollution Control Kolstad
No ratings yet
Pollution Control Kolstad
16 pages
8 CL636
No ratings yet
8 CL636
20 pages
5.etch - Part 3 (2) (21613)
No ratings yet
5.etch - Part 3 (2) (21613)
31 pages
Env Dev
No ratings yet
Env Dev
36 pages
10 CL636
No ratings yet
10 CL636
19 pages
Emission Fee
No ratings yet
Emission Fee
5 pages
Class TCM CVM
No ratings yet
Class TCM CVM
34 pages
NRR Transition
No ratings yet
NRR Transition
6 pages
Environmental Economics Pollution Control: Mrinal Kanti Dutta
No ratings yet
Environmental Economics Pollution Control: Mrinal Kanti Dutta
253 pages
3.lithography - Part 2 (3) (17508)
No ratings yet
3.lithography - Part 2 (3) (17508)
33 pages
MicroFabCH1 5
No ratings yet
MicroFabCH1 5
168 pages
Market Failure1
No ratings yet
Market Failure1
37 pages
Economics of Natural Resources: Resources, and (3) Resource Endowment
No ratings yet
Economics of Natural Resources: Resources, and (3) Resource Endowment
31 pages
3.CL 636 - Photolithography - Part 1 (4) (14932)
No ratings yet
3.CL 636 - Photolithography - Part 1 (4) (14932)
33 pages
CL 636 - Introduction - 1 (10500)
No ratings yet
CL 636 - Introduction - 1 (10500)
11 pages
Beol - CL 636 (2) (10759)
100% (1)
Beol - CL 636 (2) (10759)
37 pages
Combined Hs229
No ratings yet
Combined Hs229
74 pages
SWC - Let's Get Into Research Intern!
No ratings yet
SWC - Let's Get Into Research Intern!
14 pages
Powerup Your LinkedIn
No ratings yet
Powerup Your LinkedIn
13 pages
A Language-Based Approach To Measuring Scholarly Impact
No ratings yet
A Language-Based Approach To Measuring Scholarly Impact
8 pages
Numerical Analysis I Sma 2321.docx 1
No ratings yet
Numerical Analysis I Sma 2321.docx 1
3 pages
Linear Systems TEST
No ratings yet
Linear Systems TEST
4 pages
Dokumen - Tips Contoh Studi Kasus Decision Tree
No ratings yet
Dokumen - Tips Contoh Studi Kasus Decision Tree
11 pages
Khushwant Singh
No ratings yet
Khushwant Singh
3 pages
Image Restoration Via Frequency Selection
No ratings yet
Image Restoration Via Frequency Selection
16 pages
An Economical Class of Digital Filters For Decimation and Interpolation (El Que Propuso Este Filtro)
No ratings yet
An Economical Class of Digital Filters For Decimation and Interpolation (El Que Propuso Este Filtro)
8 pages
GTU Paper 2171708-BE-WINTER-2018
No ratings yet
GTU Paper 2171708-BE-WINTER-2018
2 pages
Terminal Velocity Lab
No ratings yet
Terminal Velocity Lab
5 pages
Module 2 Session 7 Counting of Ones in A Window Decaying Windows
No ratings yet
Module 2 Session 7 Counting of Ones in A Window Decaying Windows
3 pages
5, Informed Searching Algorithms-I
No ratings yet
5, Informed Searching Algorithms-I
54 pages
3D Incompressible Navier-Stokes Solver Lower-Gauss-Seidel Algorithm
No ratings yet
3D Incompressible Navier-Stokes Solver Lower-Gauss-Seidel Algorithm
1 page
Data Science
No ratings yet
Data Science
18 pages
What Is The Role of Algorithm Analysis in Data Structures?: Computer Science
No ratings yet
What Is The Role of Algorithm Analysis in Data Structures?: Computer Science
10 pages
Larose, D. T. (2006) Data Mining Methods and Models, Hoboken: John Wiley & Sons, Inc. Morgan Kaufmann
No ratings yet
Larose, D. T. (2006) Data Mining Methods and Models, Hoboken: John Wiley & Sons, Inc. Morgan Kaufmann
1 page
Lecture 4 Linear Programming II - Solving Problems Six Slides
No ratings yet
Lecture 4 Linear Programming II - Solving Problems Six Slides
7 pages
The Coding Interview Formula That Will Get You Any FAANG
No ratings yet
The Coding Interview Formula That Will Get You Any FAANG
6 pages
Battleship Game Assignment
No ratings yet
Battleship Game Assignment
3 pages
Part I-KDD - Tutorial - GNN PDF
No ratings yet
Part I-KDD - Tutorial - GNN PDF
322 pages
Report Analysis: Over-View of The Dataset
No ratings yet
Report Analysis: Over-View of The Dataset
6 pages
Lab 4 - DTFS Analysis
No ratings yet
Lab 4 - DTFS Analysis
4 pages
Quiz 1 Solutions: and Analysis of Algorithms
No ratings yet
Quiz 1 Solutions: and Analysis of Algorithms
13 pages
On Genetic
No ratings yet
On Genetic
17 pages
Hand Written Digit Recognition2106.08267
No ratings yet
Hand Written Digit Recognition2106.08267
12 pages
Artificial Intelligence: Department of Computer Science and Engineering
No ratings yet
Artificial Intelligence: Department of Computer Science and Engineering
34 pages
Data Structures & Algorithms
89% (19)
Data Structures & Algorithms
52 pages
Machine Learning Algorithms To Predict Mortality and Allocate Palliative Care For Older Patients With Hip Fracture
No ratings yet
Machine Learning Algorithms To Predict Mortality and Allocate Palliative Care For Older Patients With Hip Fracture
6 pages
PCI DSS v3 - 2 - 1 ROC S6 R3 Protect Stored Cardholder Data
No ratings yet
PCI DSS v3 - 2 - 1 ROC S6 R3 Protect Stored Cardholder Data
18 pages
Lislie Matrix1
No ratings yet
Lislie Matrix1
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Imported CSV Data: Exercise 1

Uploaded by

Imported CSV Data: Exercise 1

Uploaded by

EXERCISE 1:

Imported csv data

Under classify selected, test option as cross validation 10 folds

1) Random Forest (Trees)

1. 12:28:19: Weka Explorer

1. === Run information ===

1. 12:38:04: Started weka.classifiers.trees.LMT

1. === Run information ===

1. 12:39:26: Started weka.classifiers.functions.Logistic

1. === Run information ===

1. 12:39:44: Started weka.classifiers.functions.SGD

1. === Run information ===

5) Naïve Bayes (Bayes)

1. 12:39:57: Started weka.classifiers.bayes.NaiveBayes

1. === Run information ===

Changes done in “phishing_sklearn.py”, added f1 score, recall, precision code

1. from warnings import simplefilter

1. Begin with test_size 0.50:__________________________________

1. Begin with test_size0.20:__________________________________

Coded F1 scores, precision, recall

At different epoch, batch size and learning rate

Time taken by model:

Mean Abs Error:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.