0% found this document useful (0 votes)
31 views8 pages

Lab 8

The document describes an implementation of Gaussian Mixture Models (GMM) using the Expectation-Maximization (EM) algorithm for parameter learning. It generates random data from 3 Gaussian distributions, calculates the probability of each data point belonging to each Gaussian component, and normalizes the probabilities. It then applies principal component analysis (PCA) to reduce the dimensionality of a wine dataset before training a logistic regression classifier on the reduced data for classification.

Uploaded by

Aman Bansal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views8 pages

Lab 8

The document describes an implementation of Gaussian Mixture Models (GMM) using the Expectation-Maximization (EM) algorithm for parameter learning. It generates random data from 3 Gaussian distributions, calculates the probability of each data point belonging to each Gaussian component, and normalizes the probabilities. It then applies principal component analysis (PCA) to reduce the dimensionality of a wine dataset before training a logistic regression classifier on the reduced data for classification.

Uploaded by

Aman Bansal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Submitted by

Aman Agarwal

Reg No: 201800119

Sec. C

Intelligent Systems Lab Assignment – 8

Question: Write your own GMM implementation, using the EM algorithm for
parameter learning. Learn a GMM with 10 components on your data in PCA space.

SOURCE CODE:
import numpy as np
from scipy.stats import norm
np.random.seed(0)
X = np.linspace(-5,5,num=20)
X0 = X*np.random.rand(len(X))+10 # Create data cluster 1
X1 = X*np.random.rand(len(X))-10 # Create data cluster 2
X2 = X*np.random.rand(len(X)) # Create data cluster 3
X_tot = np.stack((X0,X1,X2)).flatten() # Combine the clusters to get the random d
atapoints from above

#Create the array r with dimensionality nxK

r = np.zeros((len(X_tot),3))
print('Dimensionality','=',np.shape(r))

#Instantiate the random gaussians


gauss_1 = norm(loc=-5,scale=5)
gauss_2 = norm(loc=8,scale=3)
gauss_3 = norm(loc=1.5,scale=1)

#Probability for each datapoint x_i to belong to gaussian g


for c,g in zip(range(3),[gauss_1,gauss_2,gauss_3]):
r[:,c] = g.pdf(X_tot)
# Write the probability that x belongs to gaussian c in column c.
# Therewith we get a 60x3 array filled with the probability that each x_i belongs t
o one of the gaussians

#Normalize the probabilities such that each row of r sums to 1


for i in range(len(r)):
r[i] = r[i]/np.sum(r,axis=1)[i]

print(r)
print(np.sum(r,axis=1))

OUTPUT:
Dimensionality = (60, 3)
[[2.97644006e-02 9.70235407e-01 1.91912550e-07]
[3.85713024e-02 9.61426220e-01 2.47747304e-06]
[2.44002651e-02 9.75599713e-01 2.16252823e-08]
[1.86909096e-02 9.81309090e-01 8.07574590e-10]
[1.37640773e-02 9.86235923e-01 9.93606589e-12]
[1.58674083e-02 9.84132592e-01 8.42447356e-11]
[1.14191259e-02 9.88580874e-01 4.48947365e-13]
[1.34349421e-02 9.86565058e-01 6.78305927e-12]
[1.11995848e-02 9.88800415e-01 3.18533028e-13]
[8.57645259e-03 9.91423547e-01 1.74498648e-15]
[7.64696969e-03 9.92353030e-01 1.33051021e-16]
[7.10275112e-03 9.92897249e-01 2.22285146e-17]
[6.36154765e-03 9.93638452e-01 1.22221112e-18]
[4.82376290e-03 9.95176237e-01 1.55549544e-22]
[7.75866904e-03 9.92241331e-01 1.86665135e-16]
[7.52759691e-03 9.92472403e-01 9.17205413e-17]
[8.04550643e-03 9.91954494e-01 4.28205323e-16]
[3.51864573e-03 9.96481354e-01 9.60903037e-30]
[3.42631418e-03 9.96573686e-01 1.06921949e-30]
[3.14390460e-03 9.96856095e-01 3.91217273e-35]
[1.00000000e+00 2.67245688e-12 1.56443629e-57]
[1.00000000e+00 4.26082753e-11 9.73970426e-49]
[9.99999999e-01 1.40098281e-09 3.68939866e-38]
[1.00000000e+00 2.65579518e-10 4.05324196e-43]
[9.99999977e-01 2.25030673e-08 3.11711096e-30]
[9.99999997e-01 2.52018974e-09 1.91287930e-36]
[9.99999974e-01 2.59528826e-08 7.72534540e-30]
[9.99999996e-01 4.22823192e-09 5.97494463e-35]
[9.99999980e-01 1.98158593e-08 1.38414545e-30]
[9.99999966e-01 3.43722391e-08 4.57504394e-29]
[9.99999953e-01 4.74290492e-08 3.45975850e-28]
[9.99999876e-01 1.24093364e-07 1.31878573e-25]
[9.99999878e-01 1.21709730e-07 1.17161878e-25]
[9.99999735e-01 2.65048706e-07 1.28402556e-23]
[9.99999955e-01 4.53370639e-08 2.60841891e-28]
[9.99999067e-01 9.33220139e-07 2.02379180e-20]
[9.99998448e-01 1.55216175e-06 3.63693167e-19]
[9.99997285e-01 2.71542629e-06 8.18923788e-18]
[9.99955648e-01 4.43516655e-05 1.59283752e-11]
[9.99987200e-01 1.28004505e-05 3.20565446e-14]
[9.64689131e-01 9.53405294e-03 2.57768163e-02]
[9.77001731e-01 7.96383733e-03 1.50344317e-02]
[9.96373670e-01 2.97775078e-03 6.48579562e-04]
[3.43634425e-01 2.15201653e-02 6.34845409e-01]
[9.75390877e-01 8.19866977e-03 1.64104537e-02]
[9.37822997e-01 1.19363656e-02 5.02406373e-02]
[4.27396946e-01 2.18816340e-02 5.50721420e-01]
[3.28570544e-01 2.14190231e-02 6.50010433e-01]
[3.62198108e-01 2.16303800e-02 6.16171512e-01]
[2.99837196e-01 2.11991858e-02 6.78963618e-01]
[2.21768797e-01 2.04809383e-02 7.57750265e-01]
[1.76497129e-01 2.01127714e-02 8.03390100e-01]
[8.23252013e-02 2.50758227e-02 8.92598976e-01]
[2.11943183e-01 2.03894641e-02 7.67667353e-01]
[1.50351209e-01 2.00499057e-02 8.29598885e-01]
[1.54779991e-01 2.00449518e-02 8.25175057e-01]
[7.92109803e-02 5.93118654e-02 8.61477154e-01]
[9.71905134e-02 2.18698473e-02 8.80939639e-01]
[7.60625670e-02 4.95831879e-02 8.74354245e-01]
[8.53513721e-02 2.40396004e-02 8.90609028e-01]]
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]

PCA :
SOURCE CODE:
# Principal Component Analysis (PCA)

# Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset

dataset = pd.read_csv('Wine.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

# Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_stat
e = 0)

# Feature Scaling

from sklearn.preprocessing import StandardScaler


sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
# Applying PCA

from sklearn.decomposition import PCA


pca = PCA(n_components = 2)
X_train = pca.fit_transform(X_train)
X_test = pca.transform(X_test)

# Training the Logistic Regression model on the Training set

from sklearn.linear_model import LogisticRegression


classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix, accuracy_score


y_pred = classifier.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

# Visualising the Training set results

from matplotlib.colors import ListedColormap


X_set, y_set = X_train, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() -
1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() -
1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape
(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green', 'blue')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green', 'blue'))(i), label = j)
plt.title('PCA (Training set)')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.legend()
plt.show()

# Visualising the Test set results

from matplotlib.colors import ListedColormap


X_set, y_set = X_test, y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() -
1, stop = X_set[:, 0].max() + 1, step = 0.01),
np.arange(start = X_set[:, 1].min() -
1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, classifier.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape
(X1.shape),
alpha = 0.75, cmap = ListedColormap(('red', 'green', 'blue')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
c = ListedColormap(('red', 'green', 'blue'))(i), label = j)
plt.title('PCA (Test set)')
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.legend()
plt.show()
Expected Output

Accuracy Score:
Graph:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy