PP DWDM 4 5
PP DWDM 4 5
PROGRAMS:
import numpy as np
#shape
b=a.shapeprint("shape:",a.shape)
#dimension
c=a.ndimprint("dimensions:",a.ndim)
#matrix reshape
a=np.array([[1,2,3,4],[2,3,4,5],[3,4,5,6],[4,5,6,7]])
b=a.reshape(4,2,2) print("reshape:",b)
#matrix flatten
c=a.flatten() print("flatten:",c)
y=np.array([[30,40],[60,70]])
v=np.vstack((x,y))
print("vertically:",v)
#slicing
i=a[:4,::2]
print(“slicing”,i)
f) Use statistical functions on array - Min, Max, Mean, Median and Standard
Deviation #min for finding minimum of an array
a=np.array([[1,3,-1,4],[3,-2,1,4]])
b=a.min() print(“minimum:”,b)
#mean a=np.array([1,2,3,4,5])
d=a.mean()
print(“mean:”,d)
#standard deviation
f=a.std()
print(“standard deviation”,f)
OUTPUT:
a) shape: (3, 3) dimensions: 2
zeros:
[[0. 0.]
[0. 0.]]
ones:
[[1. 1.]
[1. 1.]]
e) minimum: -2 maximum: 4
f) ) mean: 3 median: 3
standard deviation: 1.4142135623730951
Dataset: brain_size.csv
Library: Pandas, matplotlib
a) Loading data from CSV file
b) Compute the basic statistics of given data - shape, no. of columns, mean
c) Splitting a data frame on values of categorical variables
d) Visualize data using Scatter plot
Program:
import pandas as pd
pd.read_csv("P:/python/newfile.csv")
b)Compute the basic statistics of given data - shape, no. of columns, mean #shape
a=pd.read_csv("C:/Users/admin/Documents/diabetes.csv")
print('shape :',a.shape)
#no of columns
cols=len(a.axes[1])
print('no of columns:',cols)
#mean of data
m=a["Age"].mean()
print('mean of Age:',m)
a['address']=["hyderabad,ts","Warangal,ts","Adilabad,ts","medak,ts"]
#splitting dataframe
a_split=a['address'].str.split(',',1)
a['district']=a_split.str.get(0)
a['state']=a_split.str.get(1)
del(a['address'])
d)Visualize data using Scatter plot #visualize data using scatter plot
Output:
a)
student rollno marks 0 a1 121 98
1 a2 122 82
2 a3 123 92
3 a4 124 78
b)
shape: (4, 3)
no of colums: 3 mean: 87.5
c) before:
student rollno marks address
0 a1 121 98 hyderabad,ts
1 a2 122 82 Warangal,ts
2 a3 123 92 Adilabad,ts
3 a4 124 78 medak,ts After:
student rollno marks district state
0 a1 121 98 hyderabadts
1 a2 122 82 Warangal ts
2 a3 123 92 Adilabadts
3 a4 124 78 medakts d)
d)
Write a python program to load the dataset and understand the input data
Library: Scipy
a) Load data, describe the given data and identify missing, outlier data items
Program:
a)Load data
import pandas as pd
import numpy as np
%matplotlib inline
df = pd.read_csv("C:/Users/admin/Documents/diabetes.csv")
print(df. describe())
print(df.head(10))
import numpy as np
def outliers_z_score(ys):
threshold = 3
mean_y = np.mean(ys)
stdev_y = np.std(ys)
import pandas as pd
df = pd.read_csv("nba.csv")
df[:10]
df.corr(method ='pearson')
df.corr(method ='kendall')
import pandas as pd
df = pd.read_csv("C:/Users/admin/Documents/diabetes.csv")
print(df. describe())
print(df.head(10))
Output:
Write a python program to impute missing values with various techniques on given
dataset.
https://www.kaggle.com/uciml/pima-indians-diabetes-database#diabetes.csv
Library: Scipy
Program:
df.fillna(0)
df.fillna(method ='pad')
df.fillna(method ='bfill')
df.dropna()
df.dropna(how = 'all')
df.dropna(axis = 1)
mean_y = np.mean(ys)
import numpy as np
import math
b = np.zeros(150)
# take 1st column among 4 column of data set for i in range (150):
b[i]=a[i,1]
# Bin mean
# Bin boundaries
else:
# Bin median
Output:
Bin Mean: Bin Boundaries: Bin Median:
[[2.18 2.18 2.18 2.18 2.18] [[2. 2.3 2.3 2.3 2.3] [[2.2 2.2 2.2 2.2 2.2]
[2.34 2.34 2.34 2.34 2.34] [2.3 2.3 2.3 2.4 2.4] [2.3 2.3 2.3 2.3 2.3]
[2.48 2.48 2.48 2.48 2.48] [2.4 2.5 2.5 2.5 2.5] [2.5 2.5 2.5 2.5 2.5]
[2.52 2.52 2.52 2.52 2.52] [2.5 2.5 2.5 2.5 2.6] [2.5 2.5 2.5 2.5 2.5]
[2.62 2.62 2.62 2.62 2.62] [2.6 2.6 2.6 2.6 2.7] [2.6 2.6 2.6 2.6 2.6]
Write a python program to generate frequent itemsets using Apriori Algorithm and also
generate association rules for any market basket data.
Program:
records.append([str(store_data.values[i,j])
for j in range(0,20)]):
print(type(records))
association_rules=apriori(records,min_support=0.0045,min_confidence=0.2,min_lift=3,
min_length=2)
association_results=list(association_rules)
print("Thereare{}Relation derived.".format(len(association_results)))
for i in range(0, len(association_results)):
print(association_results[i][0])
for item in association_results:
pair = item[0]
items=[xforxin pair]
print("Rule:"+items[0]+"->"+items[1])
print("Support: " +str(item[1]))
print("Confidence: " +str(item[2][0][2]))
print("Lift: " +str(item[2][0][3]))
print("==========================================")
Output:
support itemsets
0 0.75 (Beer)
1 0.75 (Diapers)
2 0.75 (Eggs)
3 0.75(Beer, Diapers)
Antecedents consequents antecedent support consequent support support\
0 (Beer) (Diapers) 0.75 0.75 0.75
1 (Diapers) (Beer) 0.75 0.75 0.75 confidence lift
leverageconvictionzhangs_metric
confidence
0 1.0
1 1.0
2
(7501,20)
<class'list'>
There are 48 Relation derived. frozenset({'chicken', 'light cream'})
frozenset({'escalope','mushroomcreamsauce'}) frozenset({'escalope', 'pasta'})
frozenset({'ground beef', 'herb & pepper'})
frozenset({'tomatosauce','groundbeef'}) Page|26
frozenset({'wholewheatpasta','oliveoil'}) frozenset({'shrimp', 'pasta'}) frozenset({'nan',
'chicken', 'light cream'})
frozenset({'shrimp','frozenvegetables','chocolate'}) frozenset({'cooking oil', 'ground beef',
'spaghetti'})
Rule: chicken->light cream Support: 0.004533333333333334
Confidence:0.2905982905982906
Lift:4.843304843304844
==========================================
Rule:escalope->mushroomcreamsauce
Support: 0.005733333333333333
Confidence:0.30069930069930073
Lift: 3.7903273197390845
Program:
import numpy as np
import pandas as pd
# Sample dataset
# We'll create a dataset about students: Hours studied vs Passed exam (Yes=1, No=0)
X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
# Evaluation
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))
custom_prediction = model.predict([[5]])
"Fail")
Output:
Confusion Matrix:
[[1 0]
[0 2]]
Write a python program using K-Nearest Neighbors (KNN) algorithm on any dataset.
Program:
# Import libraries
import numpy as np
import pandas as pd
iris = load_iris()
X = pd.DataFrame(iris.data, columns=iris.feature_names)
y = iris.target
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
# Evaluate
Output:
Confusion Matrix:
[[16 0 0]
[ 0 12 2]
[ 0 0 15]]
Program:
#import libraries
import train_test_split
X=iris.data y=iris.target
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=None) #Create
tree_clf=DecisionTreeClassifier(max_depth=3)
plot_tree(tree_clf,filled=True,feature_names=iris.feature_names,class_names=iris.target_nam
es)
plt.show()
Output:
Accuracy: 0.9555555555555556
Write a python program using Naïve Bayes Classification algorithm on any dataset.
Program:
import numpy as np
confusion_matrix
print(dataset)
print(X)
y_pred=classifier.predict(X_test) print(y_pred)
print("Accuracy: ",accuracy_score(Y_test,y_pred)) cm
df=pd.DataFrame({'RealValues':Y_test,'PredictedValues':y_pred})
print(df)
NaiveBayes:
import numpy as np
iris=load_iris()
X=iris.data
Y=iris.target
le=LabelEncoder()
Y=le.fit_transform(Y)
X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.3,random_state=42)
y_pred = le.inverse_transform(y_pred)
accuracy=accuracy_score(Y_test,y_pred)
print("Accuracy: ",accuracy)
predicted_class = nb_model.predict(new_observation)
predicted_class=le.inverse_transform(predicted_class)
Output:
IdSepalLengthCmSepalWidthCmPetalLengthCmPetalWidthCm\
.. 5
... 5.0
Species
0 Iris-setosa
1 Iris-setosa
2 Iris-setosa
3 Iris-setosa
4 Iris-setosa
.. ...
145 Iris-virginica
146 Iris-virginica
147 Iris-virginica
148 Iris-virginica
149 Iris-virginica
[150rowsx6columns]
['Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-
setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-
setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-setosa''Iris-
setosa''Iris-setosa'
[ 2. 4.9 3. 1.4]
[ 5. 5. 3.6 1.4]
[ 8. 5. 3.4 1.5]
['Iris-setosa''Iris-virginica''Iris-versicolor''Iris-setosa''Iris-versicolor''Iris-virginica''Iris-
virginica''Iris-setosa''Iris-virginica''Iris-setosa''Iris-virginica''Iris-virginica''Iris-setosa''Iris-
versicolor''Iris-virginica''Iris-virginica'
Write a python program using Support Vector Machines (SVM) on any dataset.
Program:
import numpy as np
import pandas as pd
iris = datasets.load_iris()
X = iris.data
y = iris.target
svm_model.fit(X_train, y_train)
y_pred = svm_model.predict(X_test)
# Evaluation
predicted_class = svm_model.predict(sample)
Output:
Confusion Matrix:
[[16 0 0]
[ 0 14 1]
[ 0 0 14]]