0% found this document useful (0 votes)

50 views5 pages

NF Assighment4

The document introduces anomaly detection techniques using PCA and random forest classifiers. It loads thyroid disease data and trains PCA and random forest models to detect anomalies. It also calculates various metrics and visualizes feature importances and confusion matrices.

Uploaded by

Abdul Moaid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views5 pages

NF Assighment4

Uploaded by

Abdul Moaid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Welcome To Colab - Colab https://colab.research.google.com/#scrollTo=7-debQKryIG3&printMod...

import pandas as pd

# Load the dataset

data = pd.read_csv('annthyroid_21feat_normalised.csv')

# Preprocess the data if needed

# (e.g., handle missing values, scale the features)

# Split the data into features and labels

X = data.drop(columns=['class']) # Features
y = data['class'] # Labels (assuming 'class' column indicates anomaly)

# Instantiate the PCA Model

pca_model = PCAModel()

# Train the model

pca_model.train(X, None, num_features=2) # No validation data needed for PCA

# Compute anomaly scores for the entire dataset

anomaly_scores = pca_model.compute_anomaly_score(X)

# Print or further analyze the anomaly scores

print("Anomaly scores:")
print(anomaly_scores)

Explained variation per principal component: 0.46743936111388296

Anomaly scores:
0 0.048096
1 0.006211
2 0.002051
3 0.001602
4 0.005633
...
7195 0.045009
7196 0.005748
7197 0.045752
7198 0.002172
7199 0.047087
Length: 7200, dtype: float64

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, f1_score, classification_report
from sklearn.ensemble import RandomForestClassifier

# Load the dataset

data = pd.read_csv('annthyroid_21feat_normalised.csv')

1 of 5 4/30/2024, 12:05 PM
Welcome To Colab - Colab https://colab.research.google.com/#scrollTo=7-debQKryIG3&printMod...

data = pd.read_csv('annthyroid_21feat_normalised.csv')

# Preprocess the data if needed

# (e.g., handle missing values, scale the features)

# Split the data into features and labels

X = data.drop(columns=['class']) # Features
y = data['class'] # Labels (assuming 'class' column indicates anomaly)

# Split the data into training and validation sets

X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

# Instantiate and train the model (example with Random Forest)

model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Predict labels on the validation set

y_pred = model.predict(X_val)

# Calculate performance metrics

accuracy = accuracy_score(y_val, y_pred)
f1 = f1_score(y_val, y_pred, average='weighted')
report = classification_report(y_val, y_pred)

# Print or log the metrics

print("Accuracy:", accuracy)
print("F1 Score:", f1)
print("Classification Report:\n", report)

# Optionally, you can include other metrics such as precision, recall, confusion matrix, etc.

Accuracy: 0.9993055555555556
F1 Score: 0.9993036997916849
Classification Report:
precision recall f1-score support

0 1.00 1.00 1.00 1352

1 1.00 0.99 0.99 88

accuracy 1.00 1440

macro avg 1.00 0.99 1.00 1440
weighted avg 1.00 1.00 1.00 1440

import matplotlib.pyplot as plt

import seaborn as sns
from sklearn.metrics import confusion_matrix

# Plot confusion matrix

cm = confusion_matrix(y_val, y_pred)

2 of 5 4/30/2024, 12:05 PM
Welcome To Colab - Colab https://colab.research.google.com/#scrollTo=7-debQKryIG3&printMod...

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=['Normal', 'Anomaly'], yticklabels
plt.xlabel('Predicted Label')
plt.ylabel('True Label')
plt.title('Confusion Matrix')
plt.show()

# Plot feature importances (if using RandomForestClassifier)

if isinstance(model, RandomForestClassifier):
feature_importances = model.feature_importances_
feature_names = X.columns
sorted_idx = feature_importances.argsort()

plt.figure(figsize=(10, 8))
plt.barh(range(len(sorted_idx)), feature_importances[sorted_idx], align='center')
plt.yticks(range(len(sorted_idx)), [feature_names[i] for i in sorted_idx])
plt.xlabel('Feature Importance')
plt.ylabel('Feature')
plt.title('Feature Importances')
plt.show()

# Plot ROC curve (if applicable)

# Note: For multi-class classification, you may need to use one-vs-rest or one-vs-all strategy
# and calculate ROC curve and AUC for each class separately
# Example:
# from sklearn.metrics import roc_curve, auc
# fpr, tpr, thresholds = roc_curve(y_val, y_pred)
# roc_auc = auc(fpr, tpr)
# plt.figure()
# plt.plot(fpr, tpr, color='darkorange', lw=2, label='ROC curve (area = %0.2f)' % roc_auc)
# plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
# plt.xlim([0.0, 1.0])
# plt.ylim([0.0, 1.05])
# plt.xlabel('False Positive Rate')
# plt.ylabel('True Positive Rate')
# plt.title('Receiver Operating Characteristic (ROC)')
# plt.legend(loc="lower right")
# plt.show()

# Additional plots or charts as needed

3 of 5 4/30/2024, 12:05 PM
Welcome To Colab - Colab https://colab.research.google.com/#scrollTo=7-debQKryIG3&printMod...

4 of 5 4/30/2024, 12:05 PM
Welcome To Colab - Colab https://colab.research.google.com/#scrollTo=7-debQKryIG3&printMod...

5 of 5 4/30/2024, 12:05 PM

List of Imported Libraries
No ratings yet
List of Imported Libraries
12 pages
Bacdeaf 23032025 115708 Split 1
No ratings yet
Bacdeaf 23032025 115708 Split 1
37 pages
3 - Modeling - Ipynb - Colaboratory
No ratings yet
3 - Modeling - Ipynb - Colaboratory
31 pages
Numerical
100% (5)
Numerical
663 pages
Heart Disease
No ratings yet
Heart Disease
20 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages
Codes and Other Relevant Explanations For Supervised Learning (Part 1) - Session by Sabyasachi Mukhopadhyay - August 3
No ratings yet
Codes and Other Relevant Explanations For Supervised Learning (Part 1) - Session by Sabyasachi Mukhopadhyay - August 3
5 pages
Prathamesh KRAI
No ratings yet
Prathamesh KRAI
38 pages
ML PDF
No ratings yet
ML PDF
30 pages
AML Lab
No ratings yet
AML Lab
14 pages
USIT204 Numerical and Statistical Methods
100% (1)
USIT204 Numerical and Statistical Methods
240 pages
Code ExerciseModelSelection
100% (1)
Code ExerciseModelSelection
19 pages
22mid0187 ML Lab-5
No ratings yet
22mid0187 ML Lab-5
13 pages
PRJ-Parkinsons Disease Prediction
No ratings yet
PRJ-Parkinsons Disease Prediction
16 pages
Fyp 4
No ratings yet
Fyp 4
12 pages
Model Evaluation and Selection Cheatsheet 1708023215
No ratings yet
Model Evaluation and Selection Cheatsheet 1708023215
7 pages
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of GRADIENT BOOSTING ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
10 pages
Exercise5 Solution
No ratings yet
Exercise5 Solution
22 pages
Multi Classification - Py (For 1 Class TP, TN, FP, FN)
No ratings yet
Multi Classification - Py (For 1 Class TP, TN, FP, FN)
25 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
ML Functions
No ratings yet
ML Functions
12 pages
SanatKulkarni - AP22110010183 - Assignment5
No ratings yet
SanatKulkarni - AP22110010183 - Assignment5
8 pages
Random Forest
No ratings yet
Random Forest
8 pages
Decision Tree, Random Forest
No ratings yet
Decision Tree, Random Forest
37 pages
Herbert Kreyszig, Erwin Kreyszig - Student Solutions Manual and Study Guide To Advanced Engineering Mathematics (Volume 2) - Wiley (2012)
No ratings yet
Herbert Kreyszig, Erwin Kreyszig - Student Solutions Manual and Study Guide To Advanced Engineering Mathematics (Volume 2) - Wiley (2012)
270 pages
ML Assignment 5
No ratings yet
ML Assignment 5
8 pages
Lab 2
No ratings yet
Lab 2
8 pages
HW1
No ratings yet
HW1
11 pages
EX - NO:3: Algorithm
No ratings yet
EX - NO:3: Algorithm
11 pages
1 Kmeans
No ratings yet
1 Kmeans
6 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Lab 3
No ratings yet
Lab 3
6 pages
Naive Bayes Gaussian Table Tennis - Jupyter Notebook
No ratings yet
Naive Bayes Gaussian Table Tennis - Jupyter Notebook
6 pages
Python Code For KNN Classifier 1. Initial Message
No ratings yet
Python Code For KNN Classifier 1. Initial Message
7 pages
Decision Tree
No ratings yet
Decision Tree
9 pages
1
No ratings yet
1
13 pages
Aiml 5-8
No ratings yet
Aiml 5-8
19 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
PCA&CNN AD - LAB Code
No ratings yet
PCA&CNN AD - LAB Code
4 pages
6 - 2 - SVMS, - Randon - Forests - and - KNN - Ipynb - Colaboratory
No ratings yet
6 - 2 - SVMS, - Randon - Forests - and - KNN - Ipynb - Colaboratory
4 pages
DWDM Lab 3
No ratings yet
DWDM Lab 3
10 pages
Unit1 ML Programs
No ratings yet
Unit1 ML Programs
5 pages
05 E RandomForest LoanData
No ratings yet
05 E RandomForest LoanData
8 pages
Scikit Learn What Were Covering
No ratings yet
Scikit Learn What Were Covering
15 pages
AML Code For m2
No ratings yet
AML Code For m2
7 pages
BBCS4103 Integrated Case Study (SG)
No ratings yet
BBCS4103 Integrated Case Study (SG)
93 pages
RANDOM FOREST (Binary Classification)
No ratings yet
RANDOM FOREST (Binary Classification)
5 pages
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
No ratings yet
Progress of CATBOOST ALGORITHM FOR ELECTRICITY THEFT DETECTION IN POWER UTILITIES
9 pages
Slip
No ratings yet
Slip
5 pages
Work Sheet 2
83% (6)
Work Sheet 2
2 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
Slopes and The Derivative
No ratings yet
Slopes and The Derivative
177 pages
OR Group
100% (1)
OR Group
25 pages
SVM K NN MLP With Sklearn Jupyter NoteBo
No ratings yet
SVM K NN MLP With Sklearn Jupyter NoteBo
22 pages
5) Randomforest - Ipynb - Colaboratory
No ratings yet
5) Randomforest - Ipynb - Colaboratory
12 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
Ma6251 QB With Ans
No ratings yet
Ma6251 QB With Ans
107 pages
Titanic Akshaya
No ratings yet
Titanic Akshaya
12 pages
Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Python Essential Methods in Machine Learning
No ratings yet
Python Essential Methods in Machine Learning
6 pages
Bda 3.1
No ratings yet
Bda 3.1
2 pages
AAM 6th Prac
No ratings yet
AAM 6th Prac
3 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
Import Numpy As NP Import Pandas As PD
No ratings yet
Import Numpy As NP Import Pandas As PD
7 pages
MA6459 Numerical Methods
No ratings yet
MA6459 Numerical Methods
16 pages
Problem TPDE
No ratings yet
Problem TPDE
8 pages
18CSMP68 Lab Manual - Global Academy of Technology 20-21
No ratings yet
18CSMP68 Lab Manual - Global Academy of Technology 20-21
94 pages
Different Types of Interpolation
No ratings yet
Different Types of Interpolation
7 pages
Jigyasa Sharma (AAMM) Assignment 2
No ratings yet
Jigyasa Sharma (AAMM) Assignment 2
14 pages
This Study Resource Was
No ratings yet
This Study Resource Was
5 pages
Skewness, Five-Number Summary, Box-And-Whisker Plot and Kurtosis
No ratings yet
Skewness, Five-Number Summary, Box-And-Whisker Plot and Kurtosis
4 pages
Piecewise Function
No ratings yet
Piecewise Function
3 pages
SOCTA Handbook
No ratings yet
SOCTA Handbook
84 pages
Statistical Concepts
No ratings yet
Statistical Concepts
20 pages
Madlab Manual
No ratings yet
Madlab Manual
54 pages
PEST - and SWOT-analysis of University Internationa
No ratings yet
PEST - and SWOT-analysis of University Internationa
7 pages
Quantifying Neutrosophic Research: A Bibliometric Study
No ratings yet
Quantifying Neutrosophic Research: A Bibliometric Study
31 pages
LP With SIMPLEX - Min
No ratings yet
LP With SIMPLEX - Min
24 pages
جدول التفاوتات
No ratings yet
جدول التفاوتات
1 page
CA Cme334 Ch2
No ratings yet
CA Cme334 Ch2
19 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Skewness and Kurtosis
No ratings yet
Skewness and Kurtosis
6 pages
Criteriosbooks Web Engineering
No ratings yet
Criteriosbooks Web Engineering
10 pages
Credential Evaluation Report - World Education Services
No ratings yet
Credential Evaluation Report - World Education Services
4 pages
Opt Simple Multi
No ratings yet
Opt Simple Multi
24 pages
Numerical Methods: Jeffrey R. Chasnov
No ratings yet
Numerical Methods: Jeffrey R. Chasnov
60 pages
Marshall-Olkin Chris-Jerry Distribution and Its Applications
No ratings yet
Marshall-Olkin Chris-Jerry Distribution and Its Applications
12 pages
RMSC Eng 20
No ratings yet
RMSC Eng 20
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

NF Assighment4

Uploaded by

NF Assighment4

Uploaded by

Welcome To Colab - Colab https://colab.research.google.com/#scrollTo=7-debQKryIG3&printMod...

# Load the dataset

# Preprocess the data if needed

# Split the data into features and labels

# Instantiate the PCA Model

# Train the model

# Compute anomaly scores for the entire dataset

# Print or further analyze the anomaly scores

Explained variation per principal component: 0.46743936111388296

# Load the dataset

# Preprocess the data if needed

# Split the data into features and labels

# Split the data into training and validation sets

# Instantiate and train the model (example with Random Forest)

# Predict labels on the validation set

# Calculate performance metrics

# Print or log the metrics

0 1.00 1.00 1.00 1352

accuracy 1.00 1440

import matplotlib.pyplot as plt

# Plot confusion matrix

# Plot feature importances (if using RandomForestClassifier)

# Plot ROC curve (if applicable)

# Additional plots or charts as needed

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.