0% found this document useful (0 votes)
11 views

ML LAB

The document outlines the practical component of the Machine Learning course (BCSL606) at K.V.G College of Engineering, detailing a series of experiments designed to teach various machine learning techniques using different datasets. Each experiment includes specific tasks such as creating histograms, implementing algorithms like k-Nearest Neighbour and decision trees, and visualizing data through plots. The document also provides code implementations and expected outputs for each experiment.

Uploaded by

bhagyajyothi151
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

ML LAB

The document outlines the practical component of the Machine Learning course (BCSL606) at K.V.G College of Engineering, detailing a series of experiments designed to teach various machine learning techniques using different datasets. Each experiment includes specific tasks such as creating histograms, implementing algorithms like k-Nearest Neighbour and decision trees, and visualizing data through plots. The document also provides code implementations and expected outputs for each experiment.

Uploaded by

bhagyajyothi151
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

PÉ.«.f. vÁAwæPÀ ªÀĺÁ«zÁå®AiÀÄ, ¸ÀļÀå zÀ.PÀ.

574 327
Lab Component of Machine learning BCSL606

K.V.G COLLEGE OF ENGINEERING, SULLIA, D.K. – 574 327


(AFFILIATED TO VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELAGAVI)

DEPARTMENT COMPUTER SCIENCE AND ENGINEERING(AI&ML)

PRACTICAL COMPONENT OF

MACHINE LEARNING
COURSE CODE: BCSL606

VI SEMESTER

Course Material Prepared by Course Material Approved by

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 1


Lab Component of Machine learning BCSL606
LIST OF EXPERIMENTS
Sl. Experiment Page No. Marks Staff
No. Awarded Signature

1 Develop a program to create histograms for all


numerical features and analyze the distribution of each
feature. Generate box plots for all numerical features
and identify any outliers. Use California Housing
dataset.
2 Develop a program to Compute the correlation matrix
to understand the relationships between pairs of
features. Visualize the correlation matrix using a
heatmap to know which variables have strong
positive/negative correlations. Create a pair plot to
visualize pairwise relationships between features. Use
California Housing dataset.
3 Develop a program to implement Principal Component
Analysis (PCA) for reducing the dimensionality of the
Iris dataset from 4 features to 2.
4 For a given set of training data examples stored in a
.CSV file, implement and demonstrate the Find-S
algorithm to output a description of the set of all
hypotheses consistent with the training examples.
5 Develop a program to implement k-Nearest Neighbour
algorithm to classify the randomly generated 100
values of x in the range of [0,1]. Perform the following
based on dataset generated.
1. Label the first 50 points {x1,……,x50} as follows:
if (xi ≤ 0.5), then xi ∊ Class1, else xi ∊ Class1
2. Classify the remaining points, x51,……,x100 using
KNN. Perform this for k=1,2,3,4,5,20,30
6 Implement the non-parametric Locally Weighted
Regression algorithm in order to fit data points. Select
appropriate data set for your experiment and draw
graphs
7 Develop a program to demonstrate the working of
Linear Regression and Polynomial Regression. Use
Boston Housing Dataset for Linear Regression and
Auto MPG Dataset (for vehicle fuel efficiency
prediction) for Polynomial Regression.
8 Develop a program to demonstrate the working of the
decision tree algorithm. Use Breast Cancer Data set for
building the decision tree and apply this knowledge to
classify a new sample.
9 Develop a program to implement the Naive Bayesian
classifier considering Olivetti Face Data set for
training. Compute the accuracy of the classifier,
considering a few test data sets

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 2


Lab Component of Machine learning BCSL606
10 Develop a program to implement k-means clustering
using Wisconsin Breast Cancer data set and visualize
the clustering result.

Average Marks Out of :

Marks Distribution Max. Marks


Marks Awarded
Average Marks Scaled Up

Lab Test Marks

Total Marks in the Practical Component of the Course

Signature of the Staff with date

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 3


Lab Component of Machine learning BCSL606
Experiment 1:Develop a program to create histograms for all numerical
features and analyze the distribution of each feature. Generate box plots for
all numerical features and identify any outliers. Use California Housing
dataset.
Code Implementation:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
# Load dataset
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
# Plot histograms
df.hist(figsize=(10, 8), bins=20)
plt.show()
# Plot box plots
df.boxplot(figsize=(10, 6), rot=45)
plt.show()
Required Libraries & Installation:
pip install scikit-learn pandas numpy matplotlib seaborn
Output:

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 4


Lab Component of Machine learning BCSL606
Experiment 2: Develop a program to Compute the correlation matrix to
understand the relationships between pairs of features. Visualize the
correlation matrix using a heatmap to know which variables have strong
positive/negative correlations. Create a pair plot to visualize pairwise
relationships between features. Use California Housing dataset.
Code Implementation:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
# Load dataset
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
# Compute and display correlation matrix
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm")
plt.title("Correlation Matrix Heatmap")
plt.show()
# Create a pair plot
sns.pairplot(df)
plt.show()
Output:

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 5


Lab Component of Machine learning BCSL606
Experiment 3: Develop a program to implement Principal Component
Analysis (PCA) for reducing the dimensionality of the Iris dataset from 4
features to 2.
Code Implementation:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
# Load dataset
data = load_iris()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['target'] = data.target # Adding target labels
# Apply PCA to reduce dimensions from 4 to 2
pca = PCA(n_components=2)
pca_result = pca.fit_transform(df.iloc[:, :-1])
# Convert PCA output to a DataFrame
pca_df = pd.DataFrame(pca_result, columns=['PC1', 'PC2'])
pca_df['target'] = df['target']
# Plot PCA results
sns.scatterplot(x='PC1', y='PC2', hue=pca_df['target'], palette='viridis', data=pca_df)
plt.title("PCA on Iris Dataset")
plt.show()
Output:

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 6


Lab Component of Machine learning BCSL606
Experiment 4: For a given set of training data examples stored in a .CSV file,
implement and demonstrate the Find-S algorithm to output a description of
the set of all hypotheses consistent with the training examples.
Code Implementation:
import pandas as pd
# Load dataset from CSV file
df = pd.read_csv("training_data.csv") # Ensure the CSV file is in the same directory
# Extract features and target
X = df.iloc[:, :-1].values # All columns except the last (features)
y = df.iloc[:, -1].values # Last column (target)
# Find-S Algorithm
def find_s(X, y):
hypothesis = X[0].copy() # Initialize with first positive example
for i in range(len(y)):
if y[i] == "Yes": # Consider only positive examples
for j in range(len(hypothesis)):
if X[i][j] != hypothesis[j]:
hypothesis[j] = "?" # Generalize hypothesis
return hypothesis
# Run Find-S algorithm
final_hypothesis = find_s(X, y)
# Print the final hypothesis
print("Final Hypothesis:", final_hypothesis)
Output:
Final Hypothesis: ['Sunny' 'Hot' 'High' '?' '?' '?']

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 7


Lab Component of Machine learning BCSL606
Experiment 5: Develop a program to implement k-Nearest Neighbour
algorithm to classify the randomly generated 100 values of x in the range of
[0,1]. Perform the following based on dataset generated.
1. Label the first 50 points {x1,……,x50} as follows: if (xi ≤ 0.5), then xi ∊
Class1, else xi ∊ Class1
2. Classify the remaining points, x51,……,x100 using KNN. Perform this for
k=1,2,3,4,5,20,30
Code Implementation:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neighbors import KNeighborsClassifier
# Generate 100 random values in range [0,1]
np.random.seed(0) # For reproducibility
X = np.random.rand(100, 1) # 1D feature array with 100 values
# Assign labels: Class1 if x ≤ 0.5, else Class2
y = np.where(X <= 0.5, "Class1", "Class2")
# Train k-NN classifier and test with new points
k_values = [1, 2, 3, 4, 5, 20, 30]
plt.figure(figsize=(10, 6))
for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X, y.ravel()) # Train model
# Predict labels for all X values
y_pred = knn.predict(X)
# Plot results
plt.scatter(X, y_pred == "Class1", label=f'k={k}', alpha=0.6)
plt.xlabel("X values")
plt.ylabel("Predicted Class (0=Class1, 1=Class2)")
plt.title("k-NN Classification for Different k Values")
plt.legend()
Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 8
Lab Component of Machine learning BCSL606
plt.show()
Output:

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 9


Lab Component of Machine learning BCSL606
Experiment 6: Implement the non-parametric Locally Weighted Regression
algorithm in order to fit data points. Select appropriate data set for your
experiment and draw graphs.
Code Implementation:
import numpy as np
import matplotlib.pyplot as plt
# Generate dataset
np.random.seed(0) # For reproducibility
X = np.linspace(0, 10, 100) # 100 points from 0 to 10
y = np.sin(X) + np.random.normal(scale=0.2, size=X.shape) # Sinusoidal data with noise
# Kernel function: Weighting based on distance
def kernel(x, x_i, tau):
return np.exp(-((x - x_i) ** 2) / (2 * tau ** 2))
# Locally Weighted Regression function
def locally_weighted_regression(X, y, tau):
y_pred = np.zeros_like(X)
for i, x_i in enumerate(X):
weights = kernel(X, x_i, tau) # Compute weights
W = np.diag(weights) # Diagonal weight matrix
theta = np.linalg.inv(X[:, None].T @ W @ X[:, None]) @ (X[:, None].T @ W @ y)
y_pred[i] = theta[0] * x_i
return y_pred
# Perform regression with different values of tau (bandwidth)
tau_values = [0.1, 0.5, 1, 5]
plt.figure(figsize=(10, 6))
for tau in tau_values:
y_pred = locally_weighted_regression(X, y, tau)
plt.plot(X, y_pred, label=f'tau={tau}', linewidth=2)

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 10


Lab Component of Machine learning BCSL606
plt.scatter(X, y, color='black', label='Data')
plt.xlabel("X")
plt.ylabel("y")
plt.title("Locally Weighted Regression for Different Bandwidths")
plt.legend()
plt.show()
Output:

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 11


Lab Component of Machine learning BCSL606
Experiment 7: Develop a program to demonstrate the working of Linear
Regression and Polynomial Regression. Use Boston Housing Dataset for
Linear Regression and Auto MPG Dataset (for vehicle fuel efficiency
prediction) for Polynomial Regression.
Code Implementation:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
# Load Boston Housing dataset manually
boston_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(boston_url, sep="\s+", skiprows=22, header=None)
boston_data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
boston_target = raw_df.values[1::2, 2]
# Load Auto MPG dataset
mpg_url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/mpg.csv"
mpg_df = pd.read_csv(mpg_url).dropna() # Remove missing values
# --------- LINEAR REGRESSION (Boston Housing Dataset) --------- #
def linear_regression():
X = boston_data[:, 5].reshape(-1, 1) # Feature: RM (Average number of rooms)
y = boston_target # Target: House Price
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 12


Lab Component of Machine learning BCSL606
# Predict and plot
y_pred = model.predict(X_test)
plt.scatter(X_test, y_test, color='blue', label="Actual Prices")
plt.plot(X_test, y_pred, color='red', linewidth=2, label="Regression Line")
plt.xlabel("Average Number of Rooms (RM)")
plt.ylabel("House Price")
plt.title("Linear Regression - Boston Housing")
plt.legend()
plt.show()
# --------- POLYNOMIAL REGRESSION (Auto MPG Dataset) --------- #
def polynomial_regression():
X = mpg_df["horsepower"].values.reshape(-1, 1) # Feature: Horsepower
y = mpg_df["mpg"].values # Target: Miles per gallon
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Polynomial transformation (degree = 2)
poly = PolynomialFeatures(degree=2)
X_poly_train = poly.fit_transform(X_train)
X_poly_test = poly.transform(X_test)
# Train Polynomial Regression model
model = LinearRegression()
model.fit(X_poly_train, y_train)
# Predict and plot
y_pred = model.predict(X_poly_test)
plt.scatter(X_test, y_test, color='blue', label="Actual MPG")
plt.scatter(X_test, y_pred, color='red', label="Predicted MPG")
plt.xlabel("Horsepower")
plt.ylabel("Miles Per Gallon (MPG)")
plt.title("Polynomial Regression - Auto MPG")

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 13


Lab Component of Machine learning BCSL606
plt.legend()
plt.show()
# Run both functions
linear_regression()
polynomial_regression()
Output:

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 14


Lab Component of Machine learning BCSL606
Experiment 8: Develop a program to demonstrate the working of the decision
tree algorithm. Use Breast Cancer Data set for building the decision tree and
apply this knowledge to classify a new sample.
Code Implementation:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load the Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target # Features and target labels
# Train-test split (80-20)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train Decision Tree Classifier
model = DecisionTreeClassifier(criterion="gini", max_depth=3, random_state=42)
model.fit(X_train, y_train)
# Predictions
y_pred = model.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Decision Tree Classifier Accuracy: {accuracy:.2f}")
# Plot Decision Tree
plt.figure(figsize=(10, 6))
plot_tree(model, feature_names=iris.feature_names, class_names=iris.target_names,
filled=True)
plt.title("Decision Tree for Iris Dataset")
plt.show()
Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 15
Lab Component of Machine learning BCSL606
Output:

Decision Tree Classifier Accuracy: 1.00

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 16


Lab Component of Machine learning BCSL606
Experiment 9: Develop a program to implement the Naive Bayesian classifier
considering Olivetti Face Data set for training. Compute the accuracy of the
classifier, considering a few test data sets.
Code Implementation:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_olivetti_faces
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Load Olivetti Faces dataset
faces = fetch_olivetti_faces(shuffle=True, random_state=42)
X, y = faces.data, faces.target # Features and labels
# Splitting into training and testing data (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train Naïve Bayes Classifier
model = GaussianNB()
model.fit(X_train, y_train)
# Predictions
y_pred = model.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Naïve Bayes Classifier Accuracy: {accuracy:.2f}")
# Displaying a few sample images from test set with predicted labels
fig, axes = plt.subplots(1, 5, figsize=(10, 5))
for i, ax in enumerate(axes):
ax.imshow(X_test[i].reshape(64, 64), cmap="gray")
ax.set_title(f"Pred: {y_pred[i]}")
ax.axis("off")

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 17


Lab Component of Machine learning BCSL606
plt.show()
Output:

Naive Bayes Classifier Accuracy: 0.78

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 18


Lab Component of Machine learning BCSL606
Experiment 10: Develop a program to implement k-means clustering using
Wisconsin Breast Cancer data set and visualize the clustering result.
Code Implementation:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
# Load the Wisconsin Breast Cancer dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
# Standardizing the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Applying k-Means clustering with 2 clusters (Benign & Malignant)
kmeans = KMeans(n_clusters=2, random_state=42, n_init=10)
kmeans.fit(X_scaled)
# Adding cluster labels to the dataset
X['Cluster'] = kmeans.labels_
# Visualizing the clustering result using a scatter plot
plt.figure(figsize=(8, 6))
sns.scatterplot(x=X['mean radius'], y=X['mean texture'], hue=X['Cluster'],
palette="coolwarm")
plt.title("k-Means Clustering on Wisconsin Breast Cancer Dataset")
plt.xlabel("Mean Radius")
plt.ylabel("Mean Texture")
plt.show()
# Display cluster counts
Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 19
Lab Component of Machine learning BCSL606
unique, counts = np.unique(kmeans.labels_, return_counts=True)
print("Cluster Counts:", dict(zip(unique, counts)))
Output:

Cluster Counts: {0: 375, 1: 194}

Department of CS&E-AIML KVG College of Engineering SULLIA, D.K 20

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy