0% found this document useful (0 votes)

17 views52 pages

27 ShivangiSrivastava ML Lab

Uploaded by

Mukul Mahawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views52 pages

27 ShivangiSrivastava ML Lab

Uploaded by

Mukul Mahawar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

MACHINE LEARNING LAB

ETCS - 454

Submitted to: Submitted by:

Dr Koyel Datta Gupta Name: Shivangi Srivastava
Serial No: 27
Enrollment No: 05815002717
Class: CSE - 2

Maharaja Surajmal Institute of Technology(Affiliated to G.G.S.I.P.U.)

June 2021
EXPERIMENT - 1

AIM: Study and implement the Naive Bayes learner using WEKA (Breast cancer data file)

THEORY:
It is a classification technique based on Bayes’ Theorem with an assumption of independence
among predictors. In simple terms, a Naive Bayes classifier assumes that the presence of a
particular feature in a class is unrelated to the presence of any other feature.

For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in
diameter. Even if these features depend on each other or upon the existence of the other features,
all of these properties independently contribute to the probability that this fruit is an apple and
that is why it is known as ‘Naive’.

The Naive Bayes model is easy to build and particularly useful for very large data sets. Along
with simplicity, Naive Bayes is known to outperform even highly sophisticated classification
methods.

Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and
P(x|c). Look at the equation below:

Above,
● P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
● P(c) is the prior probability of class.
● P(x|c) is the likelihood which is the probability of predictor given class.
● P(x) is the prior probability of predictor.

First we use the data mining tools WEKA to do the training data prediction. Here, we will
use 10 fold cross validation on training data to calculate the machine learning rules and their
performance. The results are as follows:

Relation: breast

Instances: 683

Attributes: 10

Test mode: 10-fold cross-validation

Time taken to build model: 0.08 seconds

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 659 96.4861%

Incorrectly Classified Instances 24 3.5139%

Kappa statistic 0.9238

K&B Relative Info Score 62650.9331 %

K&B Information Score 585.4063 bits 0.8571bits/instance

Class complexity | order 0 637.9242 bits 0.934 bits/instance
Class complexity | scheme 1877.4218 bits 2.7488 bits/instance
Complexity improvement (Sf) -1239.4976 bits -1.8148 bits/instance
Mean absolute error 0.0362
Root mean squared error 0.1869

Relative absolute error 7.950%

Root relative squared error 39.192%

Total Number of Instances 683

=== Confusion Matrix ===

a b <-- classified as
425 19 | a=2

5 234 | b=4
EXPERIMENT - 2

AIM: Estimate the accuracy of the decision classifier on breast cancer dataset using 5-fold cross
validation. (You need to choose the appropriate options for missing values.)

CODE:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset

dataset = pd.read_csv('breast-cancer-wisconsin-data/data.csv')
Y = dataset.diagnosis
list = ['Unnamed: 32','id','diagnosis']
X = dataset.drop(list,axis = 1 )

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)

# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Fitting Decision Tree Classification to the Training set

from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
classifier.fit(X_train, Y_train)
# Predicting the Test set results
Y_pred = classifier.predict(X_test)

# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(Y_test, Y_pred)
print("Confusion matrix: ",cm)

# Applying 5-fold Cross Validation

from sklearn.model_selection import cross_val_score
accuracies = cross_val_score(estimator = classifier, X = X_train, y = Y_train, cv = None)
print("Mean of accuracies: ",accuracies.mean())
print("Standard deviation of accuracies: ",accuracies.std())

OUTPUT:
Confusion matrix: [[86 4]
[ 2 51]]
Mean of accuracies: 0.9200820793433653
Standard deviation of accuracies: 0.03203202972210602
EXPERIMENT - 3

AIM: Estimate the precision, recall, accuracy, and F-measure of the decision tree classifier on
the text classification task for each of the 10 categories using 10-fold cross-validation.

INTRODUCTION:

Text classification is one of the key techniques in text mining to categorize the documents in a
supervised manner. The processing of text classification involves two main problems: the
extraction of feature terms that become effective keywords in the training phase and then the
actual classification of the document using these feature terms in the test phase. This text
classification task has numerous applications such as automated indexing of scientific articles
according to predefined thesauri of technical terms, routing of customer email in a customer
service department, filing patents into patent directories, automated population of hierarchical
catalogues of Web resources, selective dissemination of information to consumers, identification
of document genre, or detection and identification of criminal activities for military, police, or
secrete service environments and so on. Text classification can be used for document filtering
and routing to topic specific processing mechanisms such as information extraction and machine
translation.

TP = true positives: number of examples predicted positive that are actually positive

FP = false positives: number of examples predicted positive that are actually negative

TN = true negatives: number of examples predicted negative that are actually negative

FN = false negatives: number of examples predicted negative that are actually positive

Recall is referred to as the true positive rate or sensitivity.

The True Positive (TP) rate is the

proportion of examples which were classified as class x, among all examples which truly have
class

x, i.e. how much part of the class was captured.

The Precision is the proportion of the examples which truly have class x among all those which
were

classified as class x.
The F-Measure is simply 2*Precision*Recall/(Precision+Recall), a combined measure for
precision

and recall.

These measures are useful for comparing classifiers.

DataFile

@relation

textclass1

@attribute text1 {ball,goal,medals,party,poll,ministers}

@attribute text2 {wicket,ball,poll,election,performance,party}

@attribute news {politics, sports}

@data

ball, wicket, sports goal,

ball, sports party, poll,

politics poll, election,

politics ministers,

election, politics medals,

performance, sports

ball, party, sports goal,

wicket, sports ministers,

party, politics party,

election, politics goal,

election, politics

poll,performance,

politics
ball,performance, sports

=== Run information ===

Scheme:

weka.classifiers.trees.J48 -C 0.25 -M 2

Relation:

textclass1

Instances:

Attributes

xt1

text2

news

Test mode:

10-fold cross-validation

=== Classifier model (full training set) ===

J48 pruned tree

------------------

text1 = ball: sports (3.0)

text1 = goal: sports

(3.0/1.0) text1 = medals:

sports (1.0) text1 = party:

politics (2.0) text1 = poll:

politics (2.0) text1 =

ministers: politics (2.0)

Number of Leaves

Size of the tree :

Time taken to build model: 0 seconds

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 4 30.7692 %

Incorrectly Classified Instances 9 69.2308 %

Kappa statistic -0.4444

Mean absolute error 0.5192

Root mean squared error 0.6517

Relative absolute error 100.5319 %

Root relative squared error 125.8013 %

Total Number of Instances 13

=== Confusion Matrix ===

<-- classified as

4 3 | a = politics

6 0 | b = sports
EXPERIMENT - 4

AIM: Develop a machine learning method to classify your incoming mails.

CODE:
import numpy as np
import pandas as pd
from sklearn import preprocessing
import nltk
nltk.download('stopwords')

import re
import string

from nltk.corpus import stopwords

from nltk.stem import PorterStemmer
from nltk.tokenize import TweetTokenizer

from sklearn.linear_model import LogisticRegression

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_csv('../input/email-classification-nlp/SMS_train.csv', encoding='unicode_escape')

df = df.drop(['S. No.'], axis=1) # dropping unnecessary column

label_encoder = preprocessing.LabelEncoder() # label encoding for 'Label' column
df['Label'] = label_encoder.fit_transform(df['Label'])
df.isnull().any() # checking for null values if any

def process_mail(mail):
"""Process mail function.
Input:
mail: a string containing message body
Output:
mail_clean: a list of words containing the processed body
"""
stemmer = PorterStemmer()
stopwords_english = stopwords.words('english')
# tokenize reviews
tokenizer = TweetTokenizer(preserve_case=False, strip_handles=True,
reduce_len=True)
mail_tokens = tokenizer.tokenize(mail)
mail_clean = []
for word in mail_tokens:
if (word not in stopwords_english and # remove stopwords
word not in string.punctuation): # remove punctuation
# mail_clean.append(word)
stem_word = stemmer.stem(word) # stemming word
mail_clean.append(stem_word)
return mail_clean

# using the process_mail function for:

# 1. Removing stop words
# 2. Tokenization
# 3. Stemming
A = []
a = df['Message_body']
for i in a:
i = process_mail(i)
A.append(i)
df['Message_body'] = A
print("Bag of words: ",df.head())
cv = CountVectorizer(max_features=1500, analyzer='word', lowercase=False)

df['Message_body'] = df['Message_body'].apply(lambda x: " ".join(x) )

X = cv.fit_transform(df['Message_body'])
y = pd.DataFrame(df['Label'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state = 0)
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
roc_auc_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
print("Confusion matrix: ",cm)
print("Accuracy is ",(cm[0][0]+cm[1][1])/(cm[0][0]+cm[0][1]+cm[1][0]+cm[1][1]))

OUTPUT:
Bag of words: Message_body Label
0 [rofl, true, name] 0
1 [guy, act, like, i'd, interest, buy, so... 0
2 [piti, mood, ..., suggest] 0
3 [ü, b, go, esplanad, fr, home] 0
4 [2nd, time, tri, 2, contact, u, u, £, 750, pou... 1
Confusion matrix: [[162 0]
[ 13 17]]
Accuracy is 0.9322916666666666
EXPERIMENT - 5

AIM: Develop a machine learning method to predict stock prices based on past price variation.

CODE:
import numpy as np
import pandas as pd
import math
import sklearn
import sklearn.preprocessing
import datetime
import os
import matplotlib.pyplot as plt
import tensorflow as tf

# split data in 80%/10%/10% train/validation/test sets

valid_set_size_percentage = 10
test_set_size_percentage = 10

#display parent directory and working directory

print(os.path.dirname(os.getcwd())+':', os.listdir(os.path.dirname(os.getcwd())));
print(os.getcwd()+':', os.listdir(os.getcwd()));

# import all stock prices

df = pd.read_csv("../input/prices-split-adjusted.csv", index_col = 0)
df.info()
df.head()

# number of different stocks

print('\n number of different stocks: ', len(list(set(df.symbol))))
print(list(set(df.symbol))[:10])
df.tail()

df.describe()

df.info()

# function for min-max normalization of stock

def normalize_data(df):
min_max_scaler = sklearn.preprocessing.MinMaxScaler()
df['open'] = min_max_scaler.fit_transform(df.open.values.reshape(-1,1))
df['high'] = min_max_scaler.fit_transform(df.high.values.reshape(-1,1))
df['low'] = min_max_scaler.fit_transform(df.low.values.reshape(-1,1))
df['close'] = min_max_scaler.fit_transform(df['close'].values.reshape(-1,1))
return df

# function to create train, validation, test data given stock data and sequence length
def load_data(stock, seq_len):
data_raw = stock.as_matrix() # convert to numpy array
data = []

# create all possible sequences of length seq_len

for index in range(len(data_raw) - seq_len):
data.append(data_raw[index: index + seq_len])

data = np.array(data);
valid_set_size = int(np.round(valid_set_size_percentage/100*data.shape[0]));
test_set_size = int(np.round(test_set_size_percentage/100*data.shape[0]));
train_set_size = data.shape[0] - (valid_set_size + test_set_size);

x_train = data[:train_set_size,:-1,:]
y_train = data[:train_set_size,-1,:]

x_valid = data[train_set_size:train_set_size+valid_set_size,:-1,:]
y_valid = data[train_set_size:train_set_size+valid_set_size,-1,:]

x_test = data[train_set_size+valid_set_size:,:-1,:]
y_test = data[train_set_size+valid_set_size:,-1,:]

return [x_train, y_train, x_valid, y_valid, x_test, y_test]

# choose one stock

df_stock = df[df.symbol == 'EQIX'].copy()
df_stock.drop(['symbol'],1,inplace=True)
df_stock.drop(['volume'],1,inplace=True)

cols = list(df_stock.columns.values)
print('df_stock.columns.values = ', cols)

# normalize stock
df_stock_norm = df_stock.copy()
df_stock_norm = normalize_data(df_stock_norm)

# create train, test data

seq_len = 20 # choose sequence length
x_train, y_train, x_valid, y_valid, x_test, y_test = load_data(df_stock_norm, seq_len)
print('x_train.shape = ',x_train.shape)
print('y_train.shape = ', y_train.shape)
print('x_valid.shape = ',x_valid.shape)
print('y_valid.shape = ', y_valid.shape)
print('x_test.shape = ', x_test.shape)
print('y_test.shape = ',y_test.shape)
## Basic Cell RNN in tensorflow

index_in_epoch = 0;
perm_array = np.arange(x_train.shape[0])
np.random.shuffle(perm_array)

# function to get the next batch

def get_next_batch(batch_size):
global index_in_epoch, x_train, perm_array
start = index_in_epoch
index_in_epoch += batch_size

if index_in_epoch > x_train.shape[0]:

np.random.shuffle(perm_array) # shuffle permutation array
start = 0 # start next epoch
index_in_epoch = batch_size

end = index_in_epoch
return x_train[perm_array[start:end]], y_train[perm_array[start:end]]

# parameters
n_steps = seq_len-1
n_inputs = 4
n_neurons = 200
n_outputs = 4
n_layers = 2
learning_rate = 0.001
batch_size = 50
n_epochs = 100
train_set_size = x_train.shape[0]
test_set_size = x_test.shape[0]

tf.reset_default_graph()

X = tf.placeholder(tf.float32, [None, n_steps, n_inputs])

y = tf.placeholder(tf.float32, [None, n_outputs])

# use Basic RNN Cell

layers = [tf.contrib.rnn.BasicRNNCell(num_units=n_neurons, activation=tf.nn.elu)
for layer in range(n_layers)]

# use Basic LSTM Cell

#layers = [tf.contrib.rnn.BasicLSTMCell(num_units=n_neurons, activation=tf.nn.elu)
# for layer in range(n_layers)]

# use LSTM Cell with peephole connections

#layers = [tf.contrib.rnn.LSTMCell(num_units=n_neurons,
# activation=tf.nn.leaky_relu, use_peepholes = True)
# for layer in range(n_layers)]

# use GRU cell

#layers = [tf.contrib.rnn.GRUCell(num_units=n_neurons, activation=tf.nn.leaky_relu)
# for layer in range(n_layers)]

multi_layer_cell = tf.contrib.rnn.MultiRNNCell(layers)
rnn_outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32)

stacked_rnn_outputs = tf.reshape(rnn_outputs, [-1, n_neurons])

stacked_outputs = tf.layers.dense(stacked_rnn_outputs, n_outputs)
outputs = tf.reshape(stacked_outputs, [-1, n_steps, n_outputs])
outputs = outputs[:,n_steps-1,:] # keep only last output of sequence
loss = tf.reduce_mean(tf.square(outputs - y)) # loss function = mean squared error
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(loss)

# run graph
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
for iteration in range(int(n_epochs*train_set_size/batch_size)):
x_batch, y_batch = get_next_batch(batch_size) # fetch the next training batch
sess.run(training_op, feed_dict={X: x_batch, y: y_batch})
if iteration % int(5*train_set_size/batch_size) == 0:
mse_train = loss.eval(feed_dict={X: x_train, y: y_train})
mse_valid = loss.eval(feed_dict={X: x_valid, y: y_valid})
print('%.2f epochs: MSE train/valid = %.6f/%.6f'%(
iteration*batch_size/train_set_size, mse_train, mse_valid))

y_train_pred = sess.run(outputs, feed_dict={X: x_train})

y_valid_pred = sess.run(outputs, feed_dict={X: x_valid})
y_test_pred = sess.run(outputs, feed_dict={X: x_test})

y_train.shape

corr_price_development_train = np.sum(np.equal(np.sign(y_train[:,1]-y_train[:,0]),
np.sign(y_train_pred[:,1]-y_train_pred[:,0])).astype(int)) / y_train.shape[0]
corr_price_development_valid = np.sum(np.equal(np.sign(y_valid[:,1]-y_valid[:,0]),
np.sign(y_valid_pred[:,1]-y_valid_pred[:,0])).astype(int)) / y_valid.shape[0]
corr_price_development_test = np.sum(np.equal(np.sign(y_test[:,1]-y_test[:,0]),
np.sign(y_test_pred[:,1]-y_test_pred[:,0])).astype(int)) / y_test.shape[0]

print('correct sign prediction for close - open price for train/valid/test: %.2f/%.2f/%.2f'%(
corr_price_development_train, corr_price_development_valid, corr_price_development_test))

OUTPUT:
Index: 851264 entries, 2016-01-05 to 2016-12-30
Data columns (total 6 columns):
symbol 851264 non-null object
open 851264 non-null float64
close 851264 non-null float64
low 851264 non-null float64
high 851264 non-null float64
volume 851264 non-null float64
dtypes: float64(5), object(1)
memory usage: 45.5+ MB

number of different stocks: 501

['SLG', 'SNI', 'DLR', 'PG', 'O', 'BLK', 'FCX', 'WLTW', 'SHW', 'UPS']
<class 'pandas.core.frame.DataFrame'>
Index: 851264 entries, 2016-01-05 to 2016-12-30
Data columns (total 6 columns):
symbol 851264 non-null object
open 851264 non-null float64
close 851264 non-null float64
low 851264 non-null float64
high 851264 non-null float64
volume 851264 non-null float64
dtypes: float64(5), object(1)
memory usage: 45.5+ MB
df_stock.columns.values = ['open', 'close', 'low', 'high']
x_train.shape = (1394, 19, 4)
y_train.shape = (1394, 4)
x_valid.shape = (174, 19, 4)
y_valid.shape = (174, 4)
x_test.shape = (174, 19, 4)
y_test.shape = (174, 4)
0.00 epochs: MSE train/valid = 0.060335/0.047891
4.99 epochs: MSE train/valid = 0.000148/0.000607
9.97 epochs: MSE train/valid = 0.000132/0.000617
14.96 epochs: MSE train/valid = 0.000130/0.000761
19.94 epochs: MSE train/valid = 0.000106/0.000351
24.93 epochs: MSE train/valid = 0.000097/0.000467
29.91 epochs: MSE train/valid = 0.000095/0.000437
34.90 epochs: MSE train/valid = 0.000115/0.000442
39.89 epochs: MSE train/valid = 0.000076/0.000353
44.87 epochs: MSE train/valid = 0.000078/0.000339
49.86 epochs: MSE train/valid = 0.000068/0.000222
54.84 epochs: MSE train/valid = 0.000099/0.000282
59.83 epochs: MSE train/valid = 0.000082/0.000232
64.81 epochs: MSE train/valid = 0.000068/0.000311
69.80 epochs: MSE train/valid = 0.000061/0.000202
74.78 epochs: MSE train/valid = 0.000078/0.000312
79.77 epochs: MSE train/valid = 0.000066/0.000246
84.76 epochs: MSE train/valid = 0.000063/0.000202
89.74 epochs: MSE train/valid = 0.000062/0.000251
94.73 epochs: MSE train/valid = 0.000069/0.000271
99.71 epochs: MSE train/valid = 0.000081/0.000203
correct sign prediction for close - open price for train/valid/test: 0.72/0.47/0.41
EXPERIMENT - 6

AIM: Develop a machine learning method to predict how people would rate movies, books, etc.

CODE:
# data analysis and wrangling
import pandas as pd
import numpy as np
import random as rnd

# visualization
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split

from sklearn.tree import DecisionTreeClassifier

#Data acquisition of the movies dataset

df_movie=pd.read_csv('../input/movies.dat', sep = '::', engine='python')
df_movie.columns =['MovieIDs','MovieName','Category']
df_movie.dropna(inplace=True)
df_movie.head()

#Data acquisition of the rating dataset

df_rating = pd.read_csv("../input/ratings.dat",sep='::', engine='python')
df_rating.columns =['ID','MovieID','Ratings','TimeStamp']
df_rating.dropna(inplace=True)
df_rating.head()

#Data acquisition of the users dataset

df_user = pd.read_csv("../input/users.dat",sep='::',engine='python')
df_user.columns =['UserID','Gender','Age','Occupation','Zip-code']
df_user.dropna(inplace=True)
df_user.head()

df = pd.concat([df_movie, df_rating,df_user], axis=1)

df.head()

#Visualize user age distribution

df['Age'].value_counts().plot(kind='barh',alpha=0.7,figsize=(10,10))
plt.show()

df.Age.plot.hist(bins=25)
plt.title("Distribution of users' ages")
plt.ylabel('count of users')
plt.xlabel('Age')

labels = ['0-9', '10-19', '20-29', '30-39', '40-49', '50-59', '60-69', '70-79']

df['age_group'] = pd.cut(df.Age, range(0, 81, 10), right=False, labels=labels)
df[['Age', 'age_group']].drop_duplicates()[:10]

#Visualize overall rating by users

df['Ratings'].value_counts().plot(kind='bar',alpha=0.7,figsize=(10,10))
plt.show()

groupedby_movieName = df.groupby('MovieName')
groupedby_rating = df.groupby('Ratings')
groupedby_uid = df.groupby('UserID')

movies = df.groupby('MovieName').size().sort_values(ascending=True)[:1000]
ToyStory_data = groupedby_movieName.get_group('Toy Story 2 (1999)')
ToyStory_data.shape

#Find and visualize the user rating of the movie “Toy Story”
plt.figure(figsize=(10,10))
plt.scatter(ToyStory_data['MovieName'],ToyStory_data['Ratings'])
plt.title('Plot showing the user rating of the movie “Toy Story”')
plt.show()

#Find and visualize the viewership of the movie “Toy Story” by age group
ToyStory_data[['MovieName','age_group']]

#Find and visualize the top 25 movies by viewership rating

top_25 = df[25:]
top_25['Ratings'].value_counts().plot(kind='barh',alpha=0.6,figsize=(7,7))
plt.show()

#Visualize the rating data by user of user id = 2696

userid_2696 = groupedby_uid.get_group(2696)
userid_2696[['UserID','Ratings']]

#First 500 extracted records

first_500 = df[500:]
first_500.dropna(inplace=True)

#Use the following features:movie id,age,occupation

features = first_500[['MovieID','Age','Occupation']].values

#Use rating as label

labels = first_500[['Ratings']].values
#Create train and test data set
train, test, train_labels, test_labels =
train_test_split(features,labels,test_size=0.33,random_state=42)

#Create a histogram for movie

df.Age.plot.hist(bins=25)
plt.title("Movie & Rating")
plt.ylabel('MovieID')
plt.xlabel('Ratings')

#Create a histogram for age

df.Age.plot.hist(bins=25)
plt.title("Age & Rating")
plt.ylabel('Age')
plt.xlabel('Ratings')

#Create a histogram for occupation

df.Age.plot.hist(bins=25)
plt.title("Occupation & Rating")
plt.ylabel('Occupation')
plt.xlabel('Ratings')

# Decision Tree
decision_tree = DecisionTreeClassifier()
decision_tree.fit(train, train_labels)
Y_pred = decision_tree.predict(test)
acc_decision_tree = round(decision_tree.score(train, train_labels) * 100, 2)
print("The accuracy of Decision tree algorithm is ",acc_decision_tree)

OUTPUT:
The accuracy of Decision tree algorithm is 98.54
EXPERIMENT - 7

AIM: Develop a machine learning method to cluster gene expression data, how to modify
existing methods to solve the problem better.

CODE:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.neighbors import KNeighborsClassifier

from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
import xgboost as xgb

from sklearn.model_selection import train_test_split

from sklearn.model_selection import GridSearchCV

from sklearn.metrics import recall_score, precision_score,

classification_report,accuracy_score,confusion_matrix, roc_curve, auc,
roc_curve,accuracy_score,plot_confusion_matrix
from sklearn.preprocessing import StandardScaler, normalize
from sklearn.decomposition import PCA
from scipy import ndimage
import seaborn as sns

#Load dataset
Train_Data = pd.read_csv("gene-expression/data_set_ALL_AML_train.csv")
Test_Data = pd.read_csv("gene-expression/data_set_ALL_AML_independent.csv")
labels = pd.read_csv("gene-expression/actual.csv", index_col = 'patient')

Train_Data.head()

#check for nulls

print(Train_Data.isna().sum().max())
print(Test_Data.isna().sum().max())

#drop 'call' columns

cols = [col for col in Test_Data.columns if 'call' in col]
test = Test_Data.drop(cols, 1)
cols = [col for col in Train_Data.columns if 'call' in col]
train = Train_Data.drop(cols, 1)

#Join all the data

patients = [str(i) for i in range(1, 73, 1)]
df_all = pd.concat([train, test], axis = 1)[patients]

#transpose rows and columns

df_all = df_all.T

df_all["patient"] = pd.to_numeric(patients)
labels["cancer"]= pd.get_dummies(labels.cancer, drop_first=True)

# add the cancer column to train data

Data = pd.merge(df_all, labels, on="patient")

Data.head()

Data['cancer'].value_counts()
plt.figure(figsize=(4,8))
colors = ["AML", "ALL"]
sns.countplot('cancer', data=Data, palette = "Set1")
plt.title('Class Distributions \n (0: AML || 1: ALL)', fontsize=14)

#X -> matrix of independent variable

#y -> vector of dependent variable
X, y = Data.drop(columns=["cancer"]), Data["cancer"]

print(X)
print(y)

#split the dataset

X_train, X_test, y_train, y_test = train_test_split(X,y,test_size = 0.25, random_state= 0)

#feature scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

X_train.shape

pca = PCA()
pca.fit_transform(X_train)

total = sum(pca.explained_variance_)
k=0
current_variance = 0
while current_variance/total < 0.90:
current_variance += pca.explained_variance_[k]
k=k+1
print(k, " features explain around 90% of the variance. From 7129 features to ", k, ", not too
bad.", sep='')

pca = PCA(n_components=k)
X_train_pca = pca.fit(X_train)
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

var_exp = pca.explained_variance_ratio_.cumsum()
var_exp = var_exp*100
plt.bar(range(k), var_exp,color = 'r')

pca.n_components_

from mpl_toolkits.mplot3d import Axes3D

pca3 = PCA(n_components=3).fit(X_train)
X_train_reduced = pca3.transform(X_train)

plt.clf()
fig = plt.figure(1, figsize=(10,6))
ax = Axes3D(fig, elev=-150, azim=110,)
ax.scatter(X_train_reduced[:, 0], X_train_reduced[:, 1], X_train_reduced[:, 2], c = y_train,
cmap='coolwarm', linewidths=10)
ax.set_title("First three PCA directions")
ax.set_xlabel("1st eigenvector")
ax.w_xaxis.set_ticklabels([])
ax.set_ylabel("2nd eigenvector")
ax.w_yaxis.set_ticklabels([])
ax.set_zlabel("3rd eigenvector")
ax.w_zaxis.set_ticklabels([])

from sklearn.utils import resample

from collections import Counter

print("Before Upsampling:-")
print(Counter(y_train))

from imblearn.over_sampling import SMOTE

oversample = SMOTE()
X_train_ov, y_train_ov = oversample.fit_resample(X_train_pca,y_train)

print("After Upsampling:-")
print(Counter(y_train_ov))

# do a grid search
svc_params = [{'C': [1, 10, 100, 1000], 'kernel': ['linear']},
{'C': [1, 10, 100, 1000], 'kernel': ['rbf'], 'gamma': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8,
0.9]}]

search = GridSearchCV(SVC(), svc_params, n_jobs=-1, verbose=1)

search.fit(X_train_ov, y_train_ov)

best_accuracy = search.best_score_ #to get best score

best_parameters = search.best_params_ #to get best parameters
# select best svc
best_svc = search.best_estimator_
best_svc

#build SVM model with best parameters

svc_model = SVC(C=1, kernel='linear',probability=True)
svc_model.fit(X_train_ov, y_train_ov)

prediction=svc_model.predict(X_test_pca)

acc_svc = accuracy_score(prediction,y_test)
print('The accuracy of SVM is', acc_svc)
print ("\nClassification report :\n",(classification_report(y_test,prediction)))

#Confusion matrix
plt.figure(figsize=(13,10))
plt.subplot(221)
sns.heatmap(confusion_matrix(y_test,prediction),annot=True, cmap='Greens', fmt =
"d",linecolor="k",linewidths=3)
plt.title("CONFUSION MATRIX",fontsize=20)

#ROC curve and Area under the curve plotting

predicting_probabilites = svc_model.predict_proba(X_test_pca)[:,1]
fpr,tpr,thresholds = roc_curve(y_test,predicting_probabilites)
plt.subplot(222)
plt.plot(fpr,tpr,label = ("Area_under the curve :",auc(fpr,tpr)),color = "r")
plt.plot([1,0],[1,0],linestyle = "dashed",color ="k")
plt.legend(loc = "best")
plt.title("ROC - CURVE & AREA UNDER CURVE",fontsize=20)

knn_param = {
"n_neighbors": [i for i in range(1,30,5)],
"weights": ["uniform", "distance"],
"algorithm": ["ball_tree", "kd_tree", "brute"],
"leaf_size": [1, 10, 30],
"p": [1,2]
}
search = GridSearchCV(KNeighborsClassifier(), knn_param, n_jobs=-1, verbose=1)
search.fit(X_train_ov, y_train_ov)

best_accuracy = search.best_score_ #to get best score

best_parameters = search.best_params_ #to get best parameters
# select best svc
best_knn = search.best_estimator_
best_knn

knn_model = KNeighborsClassifier(algorithm='ball_tree', leaf_size=1, n_neighbors=6,

weights='distance')

knn_model.fit(X_train_ov,y_train_ov)
prediction=knn_model.predict(X_test_pca)

acc_knn = accuracy_score(prediction,y_test)
print('The accuracy of K-NN is', acc_knn)
print ("\nClassification report :\n",(classification_report(y_test,prediction)))

#ROC curve and Area under the curve plotting

predicting_probabilites = knn_model.predict_proba(X_test_pca)[:,1]
fpr,tpr,thresholds = roc_curve(y_test,predicting_probabilites)
plt.subplot(222)
plt.plot(fpr,tpr,label = ("Area_under the curve :",auc(fpr,tpr)),color = "r")
plt.plot([1,0],[1,0],linestyle = "dashed",color ="k")
plt.legend(loc = "best")
plt.title("ROC - CURVE & AREA UNDER CURVE",fontsize=20)

log_grid = {'C': [1e-03, 1e-2, 1e-1, 1, 10],

'penalty': ['l1', 'l2']}

log_model = GridSearchCV(estimator=LogisticRegression(solver='liblinear'),
param_grid=log_grid,
cv=3,
scoring='accuracy')
log_model.fit(X_train_ov, y_train_ov)

best_accuracy = log_model.best_score_ #to get best score

best_parameters = log_model.best_params_ #to get best parameters
# select best svc
best_lr = log_model.best_estimator_
best_lr

#Logistic Regression
lr_model = LogisticRegression(C=0.001, solver='liblinear')

lr_model.fit(X_train_ov,y_train_ov)

prediction=lr_model.predict(X_test_pca)

acc_log = accuracy_score(prediction,y_test)
print('Validation accuracy of Logistic Regression is', acc_log)
print ("\nClassification report :\n",(classification_report(y_test,prediction)))
#Confusion matrix
plt.figure(figsize=(13,10))
plt.subplot(221)
sns.heatmap(confusion_matrix(y_test,prediction),annot=True,cmap="Greens",fmt =
"d",linecolor="k",linewidths=3)
plt.title("CONFUSION MATRIX",fontsize=20)

#ROC curve and Area under the curve plotting

predicting_probabilites = lr_model.predict_proba(X_test_pca)[:,1]
fpr,tpr,thresholds = roc_curve(y_test,predicting_probabilites)
plt.subplot(222)
plt.plot(fpr,tpr,label = ("Area_under the curve :",auc(fpr,tpr)),color = "r")
plt.plot([1,0],[1,0],linestyle = "dashed",color ="k")
plt.legend(loc = "best")
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title("ROC - CURVE & AREA UNDER CURVE",fontsize=20)

params = {'max_leaf_nodes': list(range(2, 100)), 'min_samples_split': [2, 3, 4, 5, 6],

'max_depth':[3,4,5,6,7,8]}
decision_search = GridSearchCV(DecisionTreeClassifier(random_state=42), params, verbose=1,
cv=3)

decision_search.fit(X_train_ov, y_train_ov)

best_accuracy = decision_search.best_score_ #to get best score

best_parameters = decision_search.best_params_ #to get best parameters
# select best svc
best_ds = decision_search.best_estimator_
best_ds
#Decision Tree
ds_model = DecisionTreeClassifier(max_depth=3, max_leaf_nodes=3, random_state=42)

ds_model.fit(X_train_ov,y_train_ov)

prediction=ds_model.predict(X_test_pca)

acc_decision_tree = accuracy_score(prediction,y_test)
print('Validation accuracy of Decision Tree is', acc_decision_tree)
print ("\nClassification report :\n",(classification_report(y_test,prediction)))

#Confusion matrix
plt.figure(figsize=(13,10))
plt.subplot(221)
sns.heatmap(confusion_matrix(y_test,prediction),annot=True,cmap="Greens",fmt =
"d",linecolor="k",linewidths=3)
plt.title("CONFUSION MATRIX",fontsize=20)

#ROC curve and Area under the curve plotting

predicting_probabilites = ds_model.predict_proba(X_test_pca)[:,1]
fpr,tpr,thresholds = roc_curve(y_test,predicting_probabilites)
plt.subplot(222)
plt.plot(fpr,tpr,label = ("Area_under the curve :",auc(fpr,tpr)),color = "r")
plt.plot([1,0],[1,0],linestyle = "dashed",color ="k")
plt.legend(loc = "best")
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title("ROC - CURVE & AREA UNDER CURVE",fontsize=20)

# Hyperparameters search grid

rf_param_grid = {'bootstrap': [False, True],
'n_estimators': [60, 70, 80, 90, 100],
'max_features': [0.6, 0.65, 0.7, 0.75, 0.8],
'min_samples_leaf': [8, 10, 12, 14],
'min_samples_split': [3, 5, 7]
}

# Create the GridSearchCV object

rf_search = GridSearchCV(estimator=RandomForestClassifier(), param_grid=rf_param_grid,
cv=3, scoring='accuracy')
rf_search.fit(X_train_ov, y_train_ov)

best_accuracy = rf_search.best_score_ #to get best score

best_parameters = rf_search.best_params_ #to get best parameters
# select best svc
best_rf = rf_search.best_estimator_
best_rf

#Random forest
rf_model = RandomForestClassifier(bootstrap=False, max_features=0.6, min_samples_leaf=8,
min_samples_split=3, n_estimators=70)

rf_model.fit(X_train_ov,y_train_ov)

prediction=rf_model.predict(X_test_pca)

acc_random_forest = accuracy_score(prediction,y_test)
print('Validation accuracy of RandomForest Classifier is', acc_random_forest)
print ("\nClassification report :\n",(classification_report(y_test,prediction)))

#ROC curve and Area under the curve plotting

predicting_probabilites = rf_model.predict_proba(X_test_pca)[:,1]
fpr,tpr,thresholds = roc_curve(y_test,predicting_probabilites)
plt.subplot(222)
plt.plot(fpr,tpr,label = ("Area_under the curve :",auc(fpr,tpr)),color = "r")
plt.plot([1,0],[1,0],linestyle = "dashed",color ="k")
plt.legend(loc = "best")
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title("ROC - CURVE & AREA UNDER CURVE",fontsize=20)

xgb_grid_params = {'max_depth': [3, 4, 5, 6, 7, 8, 10, 12],

'min_child_weight': [1, 2, 4, 6, 8, 10, 12, 15],
'n_estimators': [40, 50, 60, 70, 80, 90, 100, 110, 120, 130],
'learning_rate': [0.001, 0.01, 0.05, 0.1, 0.2, 0.3]}

# Create the GridSearchCV object

xgb_search = GridSearchCV(estimator=xgb.XGBClassifier(), param_grid=xgb_grid_params,
cv=3, scoring='accuracy')
xgb_search.fit(X_train_ov, y_train_ov)

best_accuracy = xgb_search.best_score_ #to get best score

best_parameters = xgb_search.best_params_ #to get best parameters
# select best svc
best_xgb = xgb_search.best_estimator_
best_xgb

#XB Boost
xgb_model = xgb.XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1,
importance_type='gain', interaction_constraints='',
learning_rate=0.001, max_delta_step=0, max_depth=3,
min_child_weight=1, monotone_constraints='()',
n_estimators=40, n_jobs=0, num_parallel_tree=1, random_state=0,
reg_alpha=0, reg_lambda=1, scale_pos_weight=1, subsample=1,
tree_method='exact', validate_parameters=1, verbosity=None)

xgb_model.fit(X_train_ov,y_train_ov)

prediction=xgb_model.predict(X_test_pca)

acc_xgb = accuracy_score(prediction,y_test)
print('Validation accuracy of XG Boost is', acc_xgb)
print ("\nClassification report :\n",(classification_report(y_test,prediction)))

#ROC curve and Area under the curve plotting

predicting_probabilites = xgb_model.predict_proba(X_test_pca)[:,1]
fpr,tpr,thresholds = roc_curve(y_test,predicting_probabilites)
plt.subplot(222)
plt.plot(fpr,tpr,label = ("Area_under the curve :",auc(fpr,tpr)),color = "r")
plt.plot([1,0],[1,0],linestyle = "dashed",color ="k")
plt.legend(loc = "best")
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title("ROC - CURVE & AREA UNDER CURVE",fontsize=20)

from sklearn.naive_bayes import GaussianNB

#In case of naive Bayes, there isn't a hyper-parameter to tune, so you have nothing to grid
search over.
nb_model = GaussianNB()

nb_model.fit(X_train_ov,y_train_ov)

prediction=nb_model.predict(X_test_pca)

acc_nb = accuracy_score(prediction,y_test)
print('Validation accuracy of Naive Bayes is', acc_nb)
print ("\nClassification report :\n",(classification_report(y_test,prediction)))

#ROC curve and Area under the curve plotting

predicting_probabilites = nb_model.predict_proba(X_test_pca)[:,1]
fpr,tpr,thresholds = roc_curve(y_test,predicting_probabilites)
plt.subplot(222)
plt.plot(fpr,tpr,label = ("Area_under the curve :",auc(fpr,tpr)),color = "r")
plt.plot([1,0],[1,0],linestyle = "dashed",color ="k")
plt.legend(loc = "best")
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title("ROC - CURVE & AREA UNDER CURVE",fontsize=20)

models = pd.DataFrame({
'Model': ['Support Vector Machines', 'KNN', 'Logistic Regression', 'Decision Tree',
'Random Forest', 'XG Boost', 'Naive Bayes'],

'Score': [acc_svc, acc_knn, acc_log, acc_decision_tree,

acc_random_forest, acc_xgb, acc_nb]})
models.sort_values(by='Score', ascending=False)

OUTPUT:
Model Score
6 Naive Bayes 0.944444
2 Logistic Regression 0.833333
1 KNN 0.722222
3 Decision Tree 0.722222
4 Random Forest 0.722222
5 XG Boost 0.722222
0 Support Vector Machines 0.666667
EXPERIMENT - 8

AIM: Select 2 datasets. Each dataset should contain examples from multiple classes. For training
purposes, assume that the class label of each example is unknown(if it is known, ignore it).
Implement the K-means algorithm and apply it to the data you selected. Evaluate performance by
measuring the sum of Euclidean distance of each example from its class center. Test the
performance of the algorithm as a function of the parameter k.

CODE:
Dataset used: Mall_Customers.csv

# Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset

dataset = pd.read_csv('Mall_Customers.csv')
X = dataset.iloc[:, [3, 4]].values

# Using the elbow method to find the optimal number of clusters

from sklearn.cluster import KMeans
wcss = []
for i in range(1, 11):
kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
kmeans.fit(X)
wcss.append(kmeans.inertia_)
plt.plot(range(1, 11), wcss)
plt.title('The Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.show()

# Fitting K-Means to the dataset

kmeans = KMeans(n_clusters = 5, init = 'k-means++', random_state = 42)
y_kmeans = kmeans.fit_predict(X)

# Visualising the clusters

plt.scatter(X[y_kmeans == 0, 0], X[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Cluster 1')
plt.scatter(X[y_kmeans == 1, 0], X[y_kmeans == 1, 1], s = 100, c = 'blue', label = 'Cluster 2')
plt.scatter(X[y_kmeans == 2, 0], X[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Cluster 3')
plt.scatter(X[y_kmeans == 3, 0], X[y_kmeans == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
plt.scatter(X[y_kmeans == 4, 0], X[y_kmeans == 4, 1], s = 100, c = 'magenta', label = 'Cluster 5')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300, c = 'yellow',
label = 'Centroids')
plt.title('Clusters of customers')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.show()

OUTPUT:
Dataset used: Iris.csv

#Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

#Importing the Iris dataset with pandas

dataset = pd.read_csv('../input/Iris.csv')
x = dataset.iloc[:, [1, 2, 3, 4]].values

#Finding the optimum number of clusters for k-means classification

from sklearn.cluster import KMeans
wcss = []

for i in range(1, 11):

kmeans = KMeans(n_clusters = i, init = 'k-means++', max_iter = 300, n_init = 10,
random_state = 0)
kmeans.fit(x)
wcss.append(kmeans.inertia_)

#Plotting the results onto a line graph, allowing us to observe 'The elbow'
plt.plot(range(1, 11), wcss)
plt.title('The elbow method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS') #within cluster sum of squares
plt.show()

#Applying kmeans to the dataset / Creating the kmeans classifier

kmeans = KMeans(n_clusters = 3, init = 'k-means++', max_iter = 300, n_init = 10, random_state
= 0)
y_kmeans = kmeans.fit_predict(x)

#Visualising the clusters

plt.scatter(x[y_kmeans == 0, 0], x[y_kmeans == 0, 1], s = 100, c = 'red', label = 'Iris-setosa')
plt.scatter(x[y_kmeans == 1, 0], x[y_kmeans == 1, 1], s = 100, c = 'blue', label =
'Iris-versicolour')
plt.scatter(x[y_kmeans == 2, 0], x[y_kmeans == 2, 1], s = 100, c = 'green', label = 'Iris-virginica')

#Plotting the centroids of the clusters

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:,1], s = 100, c = 'yellow', label
= 'Centroids')
plt.legend()
OUTPUT:

Week 1: Assignment1: Assignment Submitted On 2022-01-28, 20:30 IST
No ratings yet
Week 1: Assignment1: Assignment Submitted On 2022-01-28, 20:30 IST
53 pages
6.data Mining - Classification
No ratings yet
6.data Mining - Classification
37 pages
Probability & Probability Distribution
No ratings yet
Probability & Probability Distribution
39 pages
Assessment in Learning 1: Prof 212 SY 2023-2024
No ratings yet
Assessment in Learning 1: Prof 212 SY 2023-2024
14 pages
Evaluating Model Performance Unit 6
No ratings yet
Evaluating Model Performance Unit 6
46 pages
KNN Evaluation
No ratings yet
KNN Evaluation
51 pages
Experimental Design and Data Analysis For Biologists 1st Edition Gerry P. Quinn Download
No ratings yet
Experimental Design and Data Analysis For Biologists 1st Edition Gerry P. Quinn Download
46 pages
Digital Communications Homework
No ratings yet
Digital Communications Homework
7 pages
Statlect: Log-Likelihood
No ratings yet
Statlect: Log-Likelihood
6 pages
Module 2
No ratings yet
Module 2
151 pages
Cp4252 Machine Learning Lab Manual
No ratings yet
Cp4252 Machine Learning Lab Manual
40 pages
Chương 2e. Model Evaluation
No ratings yet
Chương 2e. Model Evaluation
27 pages
Classification Data Mining
No ratings yet
Classification Data Mining
84 pages
Bayesian Classifier
No ratings yet
Bayesian Classifier
17 pages
ML Lab
No ratings yet
ML Lab
26 pages
IPS Report
No ratings yet
IPS Report
127 pages
Evaluation Metrics
No ratings yet
Evaluation Metrics
25 pages
CH 6
No ratings yet
CH 6
24 pages
4.8 Estimating The Performance of A Classifier
No ratings yet
4.8 Estimating The Performance of A Classifier
19 pages
Total Listing Machine Learning
100% (1)
Total Listing Machine Learning
114 pages
Data Analytics Classification
No ratings yet
Data Analytics Classification
56 pages
Unit 2
No ratings yet
Unit 2
20 pages
Lecture03. Classification (Chapter 3)
No ratings yet
Lecture03. Classification (Chapter 3)
46 pages
Lecture 11
No ratings yet
Lecture 11
61 pages
Pseudorandom Numbers in Modeling and Simulation
No ratings yet
Pseudorandom Numbers in Modeling and Simulation
7 pages
Sklearn
No ratings yet
Sklearn
141 pages
Power
No ratings yet
Power
24 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
Classification - Performance Evlaution
No ratings yet
Classification - Performance Evlaution
13 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
19 pages
TensorFlow Classification
No ratings yet
TensorFlow Classification
68 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
3ML.02.MainConcepts Evaluation
No ratings yet
3ML.02.MainConcepts Evaluation
35 pages
Data Mining Final
No ratings yet
Data Mining Final
25 pages
Purposive Sampling Also Known As Judgmental
No ratings yet
Purposive Sampling Also Known As Judgmental
3 pages
Module 6
No ratings yet
Module 6
24 pages
Fallsem2024-25 Bcse331l TH VL2024250101742 Cat-2-Qp - Key
No ratings yet
Fallsem2024-25 Bcse331l TH VL2024250101742 Cat-2-Qp - Key
5 pages
2 Supervised Learning
No ratings yet
2 Supervised Learning
52 pages
Sathyabama University: Register Number
No ratings yet
Sathyabama University: Register Number
4 pages
Bayesian
No ratings yet
Bayesian
23 pages
Kalman Filters: Emtiyaz CS, Ubc
No ratings yet
Kalman Filters: Emtiyaz CS, Ubc
12 pages
CSE4261 Lecture-10
No ratings yet
CSE4261 Lecture-10
50 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
Practical 3
No ratings yet
Practical 3
11 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
8 pages
CS109a Lecture17 Boosting Other
No ratings yet
CS109a Lecture17 Boosting Other
21 pages
J.D. Opdyke - Operational Risk Capital Estimation and Planning (With Alex Cavallo) in New Frontiers in Operational Risk Modeling
No ratings yet
J.D. Opdyke - Operational Risk Capital Estimation and Planning (With Alex Cavallo) in New Frontiers in Operational Risk Modeling
56 pages
Data Mining: Lecture - 03
No ratings yet
Data Mining: Lecture - 03
56 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
8 pages
04 - Model Selection
No ratings yet
04 - Model Selection
62 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Chapter 2 DIgital Communication
No ratings yet
Chapter 2 DIgital Communication
25 pages
Lecture 5 Evaluation - Classifer
No ratings yet
Lecture 5 Evaluation - Classifer
61 pages
SupervisedLearning Classification
No ratings yet
SupervisedLearning Classification
20 pages
L22 KNN+Metrics
No ratings yet
L22 KNN+Metrics
18 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
41 pages
F06-Method Validation Assay (Chemical)
No ratings yet
F06-Method Validation Assay (Chemical)
2 pages
Slides #1 Chapter 10.1.2
No ratings yet
Slides #1 Chapter 10.1.2
33 pages
Naïve Bayes Classifier
No ratings yet
Naïve Bayes Classifier
39 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
100 pages
Probability and Sampling
No ratings yet
Probability and Sampling
20 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Sensitivity Unit 4
No ratings yet
Sensitivity Unit 4
4 pages
Output SPSS
No ratings yet
Output SPSS
9 pages
Model Evaluation - II
No ratings yet
Model Evaluation - II
12 pages
DL IT324a 4
No ratings yet
DL IT324a 4
52 pages
Statistical Learning Slides
No ratings yet
Statistical Learning Slides
60 pages
Untitled: Markowitz Efficient Frontier
No ratings yet
Untitled: Markowitz Efficient Frontier
9 pages
Machine Learning Chapter3
No ratings yet
Machine Learning Chapter3
27 pages
Summer Term 2024 Course Handout: Date: 28.05.2024
No ratings yet
Summer Term 2024 Course Handout: Date: 28.05.2024
3 pages
Ai DS 2 Book-Chpt-5
No ratings yet
Ai DS 2 Book-Chpt-5
17 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK
No ratings yet
20150908-Lecture-3-Draft Asd Def HFL DFGF Lkreglker Lerg Kelr GK
15 pages
BA ZG524 Advanced Statistical Methods
No ratings yet
BA ZG524 Advanced Statistical Methods
7 pages
Accuracy Precision and Recall
No ratings yet
Accuracy Precision and Recall
21 pages
Research 3 Quarter 3 LESSON-2-HYPOTHESIS-TESTING
No ratings yet
Research 3 Quarter 3 LESSON-2-HYPOTHESIS-TESTING
29 pages
Chapter 3 Model Evaluation Final
No ratings yet
Chapter 3 Model Evaluation Final
30 pages
LABORATORY WORK NO 2 June 29 2023
No ratings yet
LABORATORY WORK NO 2 June 29 2023
6 pages
Math 140 Introductory Statistics: Types of Error
No ratings yet
Math 140 Introductory Statistics: Types of Error
4 pages
Joseph Victor Michalowicz, Jonathan M. Nichols, Frank Bucholtz - Handbook of Differential Entropy (2013, Chapman and Hall - CRC) PDF
No ratings yet
Joseph Victor Michalowicz, Jonathan M. Nichols, Frank Bucholtz - Handbook of Differential Entropy (2013, Chapman and Hall - CRC) PDF
241 pages
Factor Analysis Example Coca Cola
No ratings yet
Factor Analysis Example Coca Cola
7 pages
TR Rain Error
No ratings yet
TR Rain Error
6 pages
Econometrics Assignemente
No ratings yet
Econometrics Assignemente
2 pages
Test of Significance
No ratings yet
Test of Significance
22 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.