0% found this document useful (0 votes)
22 views26 pages

PP DWDM 4 5

The document outlines a series of experiments related to data warehousing and data mining, including matrix operations, data understanding, correlation matrices, data preprocessing, and association rule mining. Each experiment includes Python code examples for tasks such as creating multi-dimensional arrays, loading datasets, computing statistics, handling missing values, and generating frequent itemsets using the Apriori algorithm. The document serves as a practical guide for performing various data manipulation and analysis tasks using Python libraries like NumPy, Pandas, and Matplotlib.

Uploaded by

prasadrc816
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views26 pages

PP DWDM 4 5

The document outlines a series of experiments related to data warehousing and data mining, including matrix operations, data understanding, correlation matrices, data preprocessing, and association rule mining. Each experiment includes Python code examples for tasks such as creating multi-dimensional arrays, loading datasets, computing statistics, handling missing values, and generating frequent itemsets using the Apriori algorithm. The document serves as a practical guide for performing various data manipulation and analysis tasks using Python libraries like NumPy, Pandas, and Matplotlib.

Uploaded by

prasadrc816
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Data Warehousing and Data Mining Lab

Experiment 1- Matrix Operations

a) Create multi-dimensional arrays and find its shape and dimension


b) Create a matrix full of zeros and ones
c) Reshape and flatten data in the array
d) Append data vertically and horizontally
e) Apply indexing and slicing on array
f) Use statistical functions on array - Min, Max, Mean, Median and Standard Deviation

PROGRAMS:

a) Create multi-dimensional arrays and find its shape and dimension

import numpy as np

#creation of multi-dimensional array


a=np.array([[1,2,3],[2,3,4],[3,4,5]])

#shape
b=a.shapeprint("shape:",a.shape)

#dimension
c=a.ndimprint("dimensions:",a.ndim)

b) Create a matrix full of zeros and ones

#matrix full of zeros z=np.zeros((2,2))


print("zeros:",z)

#matrix full of ones


o=np.ones((2,2))
print("ones:",o)

c) Reshape and flatten data in the array

#matrix reshape
a=np.array([[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7]])
b=a.reshape(4,2,2) print("reshape:",b)

#matrix flatten
c=a.flatten() print("flatten:",c)

d) Append data vertically and horizontally #Appending data vertically


x=np.array([[10,20],[80,90]])

Department of Information Technology Page 1


Data Warehousing and Data Mining Lab

y=np.array([[30,40],[60,70]])
v=np.vstack((x,y))
print("vertically:",v)

#Appending data horizontally


h=np.hstack((x,y))
print("horizontally:",h)

e) Apply indexing and slicing on array #indexing

a=np.array([[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7]]) temp = a[[0, 1, 2, 3], [1, 1, 1, 1]]


print(“indexing”,temp)

#slicing
i=a[:4,::2]
print(“slicing”,i)

f) Use statistical functions on array - Min, Max, Mean, Median and Standard
Deviation #min for finding minimum of an array

a=np.array([[1,3,-1,4],[3,-2,1,4]])
b=a.min() print(“minimum:”,b)

#max for finding maximum of an array


C=a.max() Print(“maximum”,c)

#mean a=np.array([1,2,3,4,5])
d=a.mean()
print(“mean:”,d)

#median e=np.median(a) print(“median:”,e)

#standard deviation
f=a.std()
print(“standard deviation”,f)

OUTPUT:
a) shape: (3, 3) dimensions: 2

zeros:
[[0. 0.]
[0. 0.]]

ones:
[[1. 1.]

Department of Information Technology Page 2


Data Warehousing and Data Mining Lab

[1. 1.]]

b) reshape: [[[1 2] [3 4]] [[2 3] [4 5]] [[3 4] [5 6]] [[4 5] [6 7]]]


flatten: [1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7]

c) vertically: [[10 20] [80 90]


[30 40]
[60 70]]
horizontally: [[10 20 30 40]
[80 90 60 70]]

d) indexing [2 3 4 5] slicing [[1 3]


[2 4]
[3 5]
[4 6]]

e) minimum: -2 maximum: 4

f) ) mean: 3 median: 3
standard deviation: 1.4142135623730951

Department of Information Technology Page 3


Data Warehousing and Data Mining Lab

Experiment 2: Understanding Data

Write a Python program to do the following operations:

Dataset: brain_size.csv
Library: Pandas, matplotlib
a) Loading data from CSV file
b) Compute the basic statistics of given data - shape, no. of columns, mean
c) Splitting a data frame on values of categorical variables
d) Visualize data using Scatter plot

Program:

a)Loading data from CSV file #loading file csv

import pandas as pd
pd.read_csv("P:/python/newfile.csv")

b)Compute the basic statistics of given data - shape, no. of columns, mean #shape

a=pd.read_csv("C:/Users/admin/Documents/diabetes.csv")
print('shape :',a.shape)

#no of columns
cols=len(a.axes[1])
print('no of columns:',cols)

#mean of data
m=a["Age"].mean()
print('mean of Age:',m)

c) Splitting a data frame on values of categorical variables #adding data

a['address']=["hyderabad,ts","Warangal,ts","Adilabad,ts","medak,ts"]

#splitting dataframe
a_split=a['address'].str.split(',',1)
a['district']=a_split.str.get(0)
a['state']=a_split.str.get(1)
del(a['address'])

d)Visualize data using Scatter plot #visualize data using scatter plot

Department of Information Technology Page 4


Data Warehousing and Data Mining Lab

import matplotlib as plt


a.plot.scatter(x='marks',y='rollno',c='Blue')

Output:
a)
student rollno marks 0 a1 121 98
1 a2 122 82
2 a3 123 92
3 a4 124 78

b)
shape: (4, 3)
no of colums: 3 mean: 87.5

c) before:
student rollno marks address
0 a1 121 98 hyderabad,ts
1 a2 122 82 Warangal,ts
2 a3 123 92 Adilabad,ts
3 a4 124 78 medak,ts After:
student rollno marks district state
0 a1 121 98 hyderabadts
1 a2 122 82 Warangal ts
2 a3 123 92 Adilabadts
3 a4 124 78 medakts d)

d)

Department of Information Technology Page 5


Data Warehousing and Data Mining Lab

Experiment 3: Correlation Matrix

Write a python program to load the dataset and understand the input data

Dataset: Pima Indians Diabetes Dataset https://www.kaggle.com/uciml/pima-indians-


diabetes-database#diabetes.csv

Library: Scipy

a) Load data, describe the given data and identify missing, outlier data items

b) Find correlation among all attributes

c) Visualize correlation matrix

Program:

a)Load data

import pandas as pd

import numpy as np

import matplotlib as plt

%matplotlib inline

#Reading the dataset in a dataframe using Pandas

df = pd.read_csv("C:/Users/admin/Documents/diabetes.csv")

#describe the given data

print(df. describe())

#Display first 10 rows of data

print(df.head(10))

#outlier data items

import numpy as np

def outliers_z_score(ys):

threshold = 3

mean_y = np.mean(ys)

stdev_y = np.std(ys)

Department of Information Technology Page 6


Data Warehousing and Data Mining Lab

z_scores = [(y - mean_y) / stdev_y for y in ys]

return np.where(np.abs(z_scores) > threshold)

b)Find correlation among all attributes # importing pandas as pd

import pandas as pd

# Making data frame from the csv file

df = pd.read_csv("nba.csv")

# Printing the first 10 rows of the data frame for visualization

df[:10]

# To find the correlation among columns # using pearson method

df.corr(method ='pearson')

# using „kendall‟ method.

df.corr(method ='kendall')

c)Visualize correlation matrix

import pandas as pd

df = pd.read_csv("C:/Users/admin/Documents/diabetes.csv")

print(df. describe())

print(df.head(10))

Output:

Department of Information Technology Page 7


Data Warehousing and Data Mining Lab

Department of Information Technology Page 8


Data Warehousing and Data Mining Lab

Experiment 4 - Data Preprocessing – Handling Missing Values

Write a python program to impute missing values with various techniques on given
dataset.

a) Remove rows/ attributes


b) Replace with mean or mode
c) Write a python program to perform transformation of data using Discretization (Binning)
on given dataset.

https://www.kaggle.com/uciml/pima-indians-diabetes-database#diabetes.csv

Library: Scipy

Program:

# filling missing value using fillna()

df.fillna(0)

# filling a missing value with previous value

df.fillna(method ='pad')

#Filling null value with the next ones

df.fillna(method ='bfill')

# filling a null values using fillna()

data["Gender"].fillna("No Gender", inplace = True)

# will replace Nan value in dataframe with value -99

data.replace(to_replace = np.nan, value = -99)

# Remove rows/ attributes

# using dropna() function to remove rows having one Nan

df.dropna()

# using dropna() function to remove rows with all Nan

df.dropna(how = 'all')

# using dropna() function to remove column having one Nan

Department of Information Technology Page 9


Data Warehousing and Data Mining Lab

df.dropna(axis = 1)

# Replace with mean or mode

mean_y = np.mean(ys)

# Perform transformation of data using Discretization (Binning)

import numpy as np

import math

from sklearn.datasets import load_iris

from sklearn import datasets, linear_model, metrics

# load iris data set dataset = load_iris() a = dataset.data

b = np.zeros(150)

# take 1st column among 4 column of data set for i in range (150):

b[i]=a[i,1]

b=np.sort(b) #sort the array

# create bins bin1=np.zeros((30,5)) bin2=np.zeros((30,5)) bin3=np.zeros((30,5))

# Bin mean

for i in range (0,150,5): k=int(i/5)

mean=(b[i] + b[i+1] + b[i+2] + b[i+3] + b[i+4])/5 for j in range(5):

bin1[k,j]=mean print("Bin Mean: \n",bin1)

# Bin boundaries

for i in range (0,150,5): k=int(i/5)

for j in range (5):

if (b[i+j]-b[i]) < (b[i+4]-b[i+j]): bin2[k,j]=b[i]

else:

Department of Information Technology Page 10


Data Warehousing and Data Mining Lab

bin2[k,j]=b[i+4] print("Bin Boundaries: \n",bin2)

# Bin median

for i in range (0,150,5): k=int(i/5)

for j in range (5): bin3[k,j]=b[i+2]

print("Bin Median: \n",bin3)

Output:
Bin Mean: Bin Boundaries: Bin Median:
[[2.18 2.18 2.18 2.18 2.18] [[2. 2.3 2.3 2.3 2.3] [[2.2 2.2 2.2 2.2 2.2]
[2.34 2.34 2.34 2.34 2.34] [2.3 2.3 2.3 2.4 2.4] [2.3 2.3 2.3 2.3 2.3]
[2.48 2.48 2.48 2.48 2.48] [2.4 2.5 2.5 2.5 2.5] [2.5 2.5 2.5 2.5 2.5]
[2.52 2.52 2.52 2.52 2.52] [2.5 2.5 2.5 2.5 2.6] [2.5 2.5 2.5 2.5 2.5]
[2.62 2.62 2.62 2.62 2.62] [2.6 2.6 2.6 2.6 2.7] [2.6 2.6 2.6 2.6 2.6]

Department of Information Technology Page 11


Data Warehousing and Data Mining Lab

Experiment 5 - Association Rule Mining – Apriori

Write a python program to generate frequent itemsets using Apriori Algorithm and also
generate association rules for any market basket data.

Program:

pip install mlxtend


Import pandas as pd
From mlxtend.preprocessing import TransactionEncoder
From mlxtend.frequent_patterns import apriori,association_rules
data =[['Bread', 'Milk', 'Eggs'],
['Bread','Diapers','Beer','Eggs'],
['Milk','Diapers','Beer','Cola'],
['Bread''Milk','Diapers','Beer','Cola','Eggs']] te=TransactionEncoder()
te_ary=te.fit(data).transform(data)
df=pd.DataFrame(te_ary,columns=te.columns_)
df
frequent_itemsets=apriori(df,min_support=0.75,use_colnames=True)
print(frequent_itemsets)
rules=association_rules(frequent_itemsets,metric='confidence',min_threshold=0.7)
print(rules)
selected_columns=['antecedents','consequents','antecedent support','consequent support',
'support', 'confidence']
print(rules[selected_columns])
Apriori:
pip install apyori
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from apyori import apriori
store_data=pd.read_csv("D:/DM/store_data.csv",header=None)
display(store_data.head())
store_data.shape
records= []
for i in range(1,7501):

Department of Information Technology Page 12


Data Warehousing and Data Mining Lab

records.append([str(store_data.values[i,j])
for j in range(0,20)]):
print(type(records))
association_rules=apriori(records,min_support=0.0045,min_confidence=0.2,min_lift=3,
min_length=2)
association_results=list(association_rules)
print("Thereare{}Relation derived.".format(len(association_results)))
for i in range(0, len(association_results)):
print(association_results[i][0])
for item in association_results:
pair = item[0]
items=[xforxin pair]
print("Rule:"+items[0]+"->"+items[1])
print("Support: " +str(item[1]))
print("Confidence: " +str(item[2][0][2]))
print("Lift: " +str(item[2][0][3]))
print("==========================================")

Output:
support itemsets
0 0.75 (Beer)
1 0.75 (Diapers)
2 0.75 (Eggs)
3 0.75(Beer, Diapers)
Antecedents consequents antecedent support consequent support support\
0 (Beer) (Diapers) 0.75 0.75 0.75
1 (Diapers) (Beer) 0.75 0.75 0.75 confidence lift
leverageconvictionzhangs_metric

antecedents consequents antecedent support consequent support support\


0 (Beer) (Diapers) 0.75 0.75 0.75
1 (Diapers) (Beer) 0.75 0.75 0.75

Department of Information Technology Page 13


Data Warehousing and Data Mining Lab

confidence
0 1.0
1 1.0
2
(7501,20)
<class'list'>
There are 48 Relation derived. frozenset({'chicken', 'light cream'})
frozenset({'escalope','mushroomcreamsauce'}) frozenset({'escalope', 'pasta'})
frozenset({'ground beef', 'herb & pepper'})

frozenset({'tomatosauce','groundbeef'}) Page|26
frozenset({'wholewheatpasta','oliveoil'}) frozenset({'shrimp', 'pasta'}) frozenset({'nan',
'chicken', 'light cream'})
frozenset({'shrimp','frozenvegetables','chocolate'}) frozenset({'cooking oil', 'ground beef',
'spaghetti'})
Rule: chicken->light cream Support: 0.004533333333333334
Confidence:0.2905982905982906
Lift:4.843304843304844
==========================================
Rule:escalope->mushroomcreamsauce
Support: 0.005733333333333333
Confidence:0.30069930069930073
Lift: 3.7903273197390845

Department of Information Technology Page 14


Data Warehousing and Data Mining Lab

Experiment 6 – Logistic Regression

Write a python program using Logistic Regression on any dataset.

Program:

import numpy as np

import pandas as pd

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.metrics import confusion_matrix, accuracy_score

# Sample dataset

# We'll create a dataset about students: Hours studied vs Passed exam (Yes=1, No=0)

# Features (Hours Studied)

X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])

# Labels (0 = Fail, 1 = Pass)

y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

# Split dataset into Train and Test

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

# Create Logistic Regression model

model = LogisticRegression()

# Train the model

model.fit(X_train, y_train)

# Predict on test data

y_pred = model.predict(X_test)

# Evaluation

print("Confusion Matrix:")

print(confusion_matrix(y_test, y_pred))

Department of Information Technology Page 15


Data Warehousing and Data Mining Lab

print("\nAccuracy Score:", accuracy_score(y_test, y_pred))

# Predicting a custom value (example: will a student studying 5 hours pass?)

custom_prediction = model.predict([[5]])

print("\nPrediction for student studying 5 hours:", "Pass" if custom_prediction[0] == 1 else

"Fail")

Output:

Confusion Matrix:

[[1 0]

[0 2]]

Accuracy Score: 1.0

Prediction for student studying 5 hours: Pass

Department of Information Technology Page 16


Data Warehousing and Data Mining Lab

Experiment 7: Classification – KNN

Write a python program using K-Nearest Neighbors (KNN) algorithm on any dataset.

Program:

# Import libraries

import numpy as np

import pandas as pd

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import confusion_matrix, accuracy_score

# Load the Iris dataset

iris = load_iris()

# Features and target

X = pd.DataFrame(iris.data, columns=iris.feature_names)

y = iris.target

# Split into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create KNN model

knn = KNeighborsClassifier(n_neighbors=3) # Using k=3 neighbors

# Train the model

knn.fit(X_train, y_train)

# Predict on test set

y_pred = knn.predict(X_test)

# Evaluate

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

Department of Information Technology Page 17


Data Warehousing and Data Mining Lab

print("\nAccuracy Score:", accuracy_score(y_test, y_pred))

Output:

Confusion Matrix:

[[16 0 0]

[ 0 12 2]

[ 0 0 15]]

Accuracy Score: 0.9555555555555556

Department of Information Technology Page 18


Data Warehousing and Data Mining Lab

Experiment 8 - Classification - Decision Trees

Write a python program using Decision Tree algorithm on any dataset.

Program:

#import libraries

from sklearn.datasets import load_iris

from sklearn.tree importDecisionTreeClassifier,plot_tree from sklearn.model_selection

import train_test_split

from sklearn.metrics import accuracy_score

import matplotlib.pyplot as plt #Load iris dataset iris=load_iris()

X=iris.data y=iris.target

#Split the data into training and testing sets

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=None) #Create

an instance of the DecisionTreeClassifier class

tree_clf=DecisionTreeClassifier(max_depth=3)

#Fit the model on the training data tree_clf.fit(X_train,y_train)

DecisionTreeClassifier(max_depth=3) #predict on the testing data

y_pred=tree_clf.predict(X_test) #Calculate accuracy of the model

accuracy=accuracy_score(y_test,y_pred) print('Accuracy : ',accuracy)

#Visualize the decision tree using the plot_tree function plt.figure(figsize=(15,10))

plot_tree(tree_clf,filled=True,feature_names=iris.feature_names,class_names=iris.target_nam

es)

plt.show()

Output:
Accuracy: 0.9555555555555556

Department of Information Technology Page 19


Data Warehousing and Data Mining Lab

Department of Information Technology Page 20


Data Warehousing and Data Mining Lab

Experiment 9 - Classification – Bayesian Network

Write a python program using Naïve Bayes Classification algorithm on any dataset.

Program:

from sklearn import metrics

from sklearn.naive_bayesimport GaussianNB import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split from sklearn.metrics import

confusion_matrix

from sklearn.metrics import accuracy_score dataset=pd.read_csv('D:\DM\iris.csv')

print(dataset)

X=dataset.iloc[:,:4].values Y=dataset['Species'].values print(Y)

print(X)

#split the dataset into training and test datasets

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3) #create an object for Bayes

Classifier GaussianNB classifier=GaussianNB()

classifier.fit(X_train,Y_train) #predict the values print(X_test[0])

y_pred=classifier.predict(X_test) print(y_pred)

accuracy=accuracy_score(Y_test,y_pred) print("Accuracy: ",accuracy)

#build confusion matrix

from sklearn.metrics import confusion_matrix cm=confusion_matrix(Y_test,y_pred)

from sklearn.metrics import accuracy_score

print("Accuracy: ",accuracy_score(Y_test,y_pred)) cm

df=pd.DataFrame({'RealValues':Y_test,'PredictedValues':y_pred})

print(df)

Department of Information Technology Page 21


Data Warehousing and Data Mining Lab

NaiveBayes:

#predict class table for

from sklearn.datasets impor load_iris

from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import GaussianNB

from sklearn.metrics import accuracy_score

from sklearn.preprocessing import LabelEncoder

import numpy as np

#import iris dataset

iris=load_iris()

X=iris.data

Y=iris.target

le=LabelEncoder()

Y=le.fit_transform(Y)

#Split the dataset into training and testing sets

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=42)

#Train a Naive bayes model on the training data

nb_model = GaussianNB() nb_model.fit(X_train, Y_train)

#Make predictions on the test data y_pred=nb_model.predict(X_test)

y_pred = le.inverse_transform(y_pred)

accuracy=accuracy_score(Y_test,y_pred)

print("Accuracy: ",accuracy)

new_observation = np.array([[5.8, 3.0, 4.5, 1.5]])

predicted_class = nb_model.predict(new_observation)

predicted_class=le.inverse_transform(predicted_class)

Department of Information Technology Page 22


Data Warehousing and Data Mining Lab

print("Predicted class: ", predicted_class)

Output:

IdSepalLengthCmSepalWidthCmPetalLengthCmPetalWidthCm\

0 1 5.1 3.5 1.4 0.2

1 2 4.9 3.0 1.4 0.2

2 3 4.7 3.2 1.3 0.2 Page |21

3 4 4.6 3.1 1.5 0.2

.. 5

... 5.0

... 3.6 1.4 0.2

... ... ...

145 146 6.7 3.0 5.2 2.

146 147 6.3 2.5 5.0 1.9

147 148 6.5 3.0 5.2 2.0

148 149 6.2 3.4 5.4 2.3

149 150 5.9 3.0 5.1 1.8

Species

0 Iris-setosa

1 Iris-setosa

2 Iris-setosa

3 Iris-setosa

4 Iris-setosa

.. ...

Department of Information Technology Page 23


Data Warehousing and Data Mining Lab

145 Iris-virginica

146 Iris-virginica

147 Iris-virginica

148 Iris-virginica

149 Iris-virginica

[150rowsx6columns]

['Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-

setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-

setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-

setosa''Iris-setosa'

[[ 1. 5.1 3.5 1.4]

[ 2. 4.9 3. 1.4]

[ 3. 4.7 3.2 1.3]

[ 4. 4.6 3.1 1.5]

[ 5. 5. 3.6 1.4]

[ 6. 5.4 3.9 1.7]

[ 7. 4.6 3.4 1.4]

[ 8. 5. 3.4 1.5]

[ 9. 4.4 2.9 1.4]

[ 10. 4.9 3.1 1.5]

[11. 5.4 3.7 1.5]

['Iris-setosa''Iris-virginica''Iris-versicolor''Iris-setosa''Iris-versicolor''Iris-virginica''Iris-

virginica''Iris-setosa''Iris-virginica''Iris-setosa''Iris-virginica''Iris-virginica''Iris-setosa''Iris-

versicolor''Iris-virginica''Iris-virginica'

Department of Information Technology Page 24


Data Warehousing and Data Mining Lab

Experiment 10: Classification – Support Vector Machines (SVM)

Write a python program using Support Vector Machines (SVM) on any dataset.

Program:

import numpy as np

import pandas as pd

from sklearn import datasets

from sklearn.model_selection import train_test_split

from sklearn.svm import SVC # SVC = Support Vector Classifier

from sklearn.metrics import confusion_matrix, accuracy_score

# Load the Iris dataset

iris = datasets.load_iris()

# Features and target

X = iris.data

y = iris.target

# Split into training and testing data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create SVM model

svm_model = SVC(kernel='linear') # 'linear' kernel

# Train the model

svm_model.fit(X_train, y_train)

# Predict on test set

y_pred = svm_model.predict(X_test)

# Evaluation

print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))

print("\nAccuracy Score:", accuracy_score(y_test, y_pred))

Department of Information Technology Page 25


Data Warehousing and Data Mining Lab

# Predicting a custom input (optional)

sample = [[5.1, 3.5, 1.4, 0.2]] # Example input

predicted_class = svm_model.predict(sample)

print("\nPrediction for sample input:", iris.target_names[predicted_class[0]])

Output:

Confusion Matrix:

[[16 0 0]

[ 0 14 1]

[ 0 0 14]]

Accuracy Score: 0.9777777777777777

Prediction for sample input: setosa

Department of Information Technology Page 26

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy