Pattern File
Pattern File
ECECE-26
Theory
The Gaussian distribution is characterized by two parameters: the mean (μ) and the standard
deviation (σ). The mean is the central tendency of the distribution, while the standard
deviation measures the spread of the distribution. The probability density function (PDF) of
the Gaussian distribution is given by:
where x is a random variable, μ is the mean, σ is the standard deviation, and e is the
mathematical constant e (approximately equal to 2.71828).
The Gaussian distribution has several important properties. It is symmetric about the mean,
meaning that the probability of a value being above the mean is the same as the probability
of it being below the mean. The total area under the curve of the Gaussian distribution is
equal to 1, which means that the sum of the probabilities of all possible values of x is equal
to 1. The Gaussian distribution is also asymptotic, meaning that the tails of the curve
approach but never touch the x-axis.
The Gaussian distribution is widely used in statistics and probability theory because of its
simplicity and versatility. It is used to model a wide variety of natural phenomena, including
the heights of individuals in a population, the errors in scientific measurements, and the
noise in electronic signals. The central limit theorem states that the sum of a large number of
independent random variables, regardless of their original distribution, will tend to be
normally distributed, which further highlights the importance of the Gaussian distribution in
probability theory.
Python Code
Plots
Experiment -2
Aim : Write a MATLAB/Python function that will take as inputs: (a) the mean vectors, (b) the
covariance matrices of the class distributions of a c-class problem, (c) the a priori probabilities of
the c classes, and (d) a matrix X containing column vectors that stem from the above classes. It
will give as output an N-dimensional vector whose ith component contains the class where the
corresponding vector is assigned, according to the Bayesian classification rule.
Theory
The Bayes' theorem states that the probability of a hypothesis (H) given some observed evidence
(E) is proportional to the probability of the evidence given the hypothesis, multiplied by the prior
probability of the hypothesis. In the context of classification, the hypothesis corresponds to the
category or class to which an object or event belongs, and the evidence corresponds to the
observed features or attributes.
The Bayesian classification rule works by first calculating the prior probability of each class,
based on the frequency or proportion of objects or events belonging to each class in the training
data. Then, for each observed feature or attribute, the conditional probability of that feature given
each class is calculated using the training data. Finally, the posterior probability of each class
given the observed features is calculated using Bayes' theorem, and the object or event is
classified as belonging to the class with the highest posterior probability.
where P(C) is the prior probability of class C, and P(X) is the probability of observed features X.
# Example input
np.random.seed(1)
mean_vectors = [np.array([0, 0]), np.array([5, 5])]
covariance_matrices = [np.array([[1, 0], [0, 1]]), np.array([[2, 0], [0, 2]])]
a_priori_probs = [0.5, 0.5]
X = np.concatenate((np.random.multivariate_normal(mean_vectors[0], covariance_matrices[0],
100),
np.random.multivariate_normal(mean_vectors[1], covariance_matrices[1], 100)))
Conclusion
The N - dimensional vector whose ith component contains the class where the corresponding vector
is assigned, according to the Bayesian classification rule is been identified
3
Experiment – 4
AIM : Write a MATLAB/Python function that will take as inputs: (a) the mean
vectors,(b) the covariance matrix of the class distributions of a c-class problem, and (c) a
matrix X containing column vectors that stem from the above classes. It will give as output
an N-dimensional vector whose ith component contains the class where the corresponding
vector is assigned according to the minimum Mahalanobis distance classifier.
Theory
The Mahalanobis distance classifier is a classification algorithm that measures the distance
between a test point and the mean of the training data in a way that takes into account the
correlation structure of the input variables. It is based on the Mahalanobis distance, which is a
measure of the distance between two points in a multivariate space.
The Mahalanobis distance is defined as the distance between a test point and the mean of a set of
points, divided by the standard deviation of the set of points in the direction of the test point. This
distance metric takes into account the correlations between the input variables, which can be used
to improve classification accuracy in situations where the variables are correlated.
To use the Mahalanobis distance as a classifier, we first calculate the Mahalanobis distance
between the test point and the mean of each class in the training data. The test point is then
assigned to the class with the smallest Mahalanobis distance.
The Mahalanobis distance classifier has several advantages over other classification algorithms,
such as the k-nearest neighbors classifier. One advantage is that it can handle correlated input
variables, which can improve classification accuracy. Another advantage is that it can handle
missing data, as long as the covariance matrix is estimated correctly.
However, the Mahalanobis distance classifier also has some limitations. One limitation is that it
can be sensitive to outliers in the training data, as these can have a large impact on the covariance
matrix. Another limitation is that it assumes that the data are normally distributed, which may not
be the case in some situations.
Overall, the Mahalanobis distance classifier is a useful tool for classification in situations where
the input variables are correlated, and where the assumption of normality holds.
Code
import numpy as np
import matplotlib.pyplot as plt
Result
Experiment - 5
Aim : Write a MATLAB/Python function that takes as inputs: (a) a set of N1 vectors packed as
columns of a matrix Z,(b) an N1-dimensional vector containing the classes where each vector in Z
belongs, (c) the value for the parameter k of the classifier,(d) a set of N vectors packed as columns
in the matrix X. It returns an N-dimensional vector whose ith component contains the class where
the corresponding vector of X is assigned, according to the k-nearest neighbour classifier.
Theory
The k-nearest neighbor (k-NN) classifier is a classification algorithm that works by finding
the k nearest training data points to a given test data point and classifying the test point
based on the most common class among its k-nearest neighbors. The choice of k is an
important parameter in the algorithm, and it can be chosen through cross-validation or other
methods.
The k-NN classifier is a non-parametric algorithm, meaning it does not make any
assumptions about the distribution of the data. It can handle both continuous and
categorical data, and can be used for both binary and multi-class classification problems.
One advantage of the k-NN classifier is its simplicity and ease of implementation. It also
has the potential to be very accurate, particularly if the training data set is large and the
value of k is chosen carefully. However, the algorithm can be computationally intensive,
especially for large data sets.
One limitation of the k-NN classifier is that it can be sensitive to noisy data and outliers. It
also requires the choice of an appropriate distance metric, which can be challenging in
some situations.
Overall, the k-NN classifier is a useful algorithm for classification problems where the data
is not easily modeled by a parametric distribution and the training data set is sufficiently
large.
Code
import numpy as np
import matplotlib.pyplot as plt
Aim : Write a MATLAB/Python function that will take as inputs: (a) an N-dimensional vector,
each component of which contains the class where the corresponding data vector belongs and (b)
a similar N- dimensional vector each component of which contains the class where the
corresponding data vector is assigned from a certain classifier. Its output will be the percentage of
the places where the two vectors differ (i.e., the classification error of the classifier).
Theory
Error analysis is the process to isolate, observe and diagnose erroneous ML predictions thereby
helping understand pockets of high and low performance of the model. When it is said that “the
model accuracy is 90%” it might not be uniform across subgroups of data and there might be
some input conditions which the model fails more. So, it is the next step from aggregate metrics to
a more in-depth review of model errors for improvement. The classification error for a supervised
learning classifier is given by the difference of total samples and the erroneous predictions
divided by the total samples predicted by the classifier. Since, we are calculating this for a
supervised model, we will already have the values of the actual classes of the inputs.
The classifier used in this case is the k-nearest neighbour classifier. The k- nearest neighbors
algorithm, also known as KNN or k-NN, is a non- parametric, supervised learning classifier,
which uses proximity to make
# Example usage
true_classes = np.array([0, 1, 0, 1, 0])
predicted_classes = np.array([0, 1, 1, 1, 0])
error = classification_error(true_classes, predicted_classes)
print(f"Classification error: {error:.2%}")
Result
Conclusion
The classification of the data points using the KNN classifier is done and the
error for classification has been calculated as above
Experiment - 7
Aim : Write a MATLAB/Python function for the perceptron algorithm. This will take as inputs:
(a) a matrix X containing N l-dimensional column vectors, ( b) an N -dimensional row vector y,
whose I th component contains the class (-1 or +1) where the corresponding vector belongs, and
(c) an initial value vector wini for the parameter vector. It returns the estimated parameter vector.
Theory
The perceptron algorithm is a supervised learning algorithm used for binary classification tasks. It
is based on the concept of a simple artificial neuron called a perceptron, which takes in a set of
input features and produces a binary output, either 0 or 1.
The algorithm starts by initializing the weights for each input feature to a random value. It then
takes in a set of training data, where each data point consists of a set of input features and a binary
output. The algorithm computes the weighted sum of the input features, adds a bias term, and
applies an activation function to the result to obtain the predicted output.
The activation function used in the perceptron algorithm is usually the step function, which
outputs 1 if the weighted sum is greater than or equal to a certain threshold, and 0 otherwise. The
threshold value is a hyperparameter that can be set by the user.
During training, the algorithm compares the predicted output with the actual output and adjusts
the weights based on the error between them. The weights are updated according to the following
formula:
where learning_rate is a hyperparameter that controls the rate at which the algorithm learns, and
input_feature is the value of the input feature for the current data point.
The algorithm continues to update the weights for each data point in the training set until it
converges to a set of weights that can accurately classify the training data. Once the algorithm has
converged, it can be used to make predictions on new data points by computing the weighted sum
of the input features, adding the bias term, and applying the activation function.
Code
import numpy as np
class Perceptron:
def _init_(self, learning_rate=0.1, epochs=100):
self.learning_rate = learning_rate
self.epochs = epochs
return y_pred
Result
Experiment - 8
Aim : Design a three-layer FFN, using gradient descent to perform the{x1, x2} → {y1, y2}
mapping. The activation function for all the nodes is the hyperbolic tangent one. For training, one
may select one of the following algorithms: a) the standard gradient descent backpropagation
algorithm, ( b) the backpropagation algorithm with momentum , and (c) the backpropagation
algorithm with adaptive learning rate.
Theory
A three-layer feedforward neural network (FFN) is a type of neural network architecture that
consists of three layers of nodes: an input layer, a hidden layer, and an output layer. The input
layer contains nodes that receive input data, the hidden layer processes the input data through a set
of nonlinear transformations, and the output layer produces a set of predictions or outputs.
Input layer: The input layer consists of nodes that receive input data, which can be a vector of
features or raw input data such as images or audio signals. Each node in the input layer
corresponds to a single feature or input dimension.
Hidden layer: The hidden layer consists of nodes that process the input data through a set of
nonlinear transformations. Each node in the hidden layer receives input from all nodes in the
previous layer and produces an output that is fed forward to the next layer. The activations of the
hidden layer nodes are computed using a nonlinear activation function, such as the sigmoid
function, ReLU function, or hyperbolic tangent function.
Output layer: The output layer consists of nodes that produce the final set of predictions or
outputs. The number of nodes in the output layer depends on the task at hand, such as regression
or classification. For example, in a binary classification task, there would be a single output node
that produces a value between 0 and 1, which can be interpreted as the probability of belonging to
the positive class.
During training, the weights and biases of the network are learned through backpropagation,
which is a gradient-based optimization algorithm. Backpropagation computes the gradients of the
loss function with respect to the weights and biases of the network, and updates them using a
learning rate and a momentum parameter. The process of computing the gradients and updating
the weights and biases is repeated iteratively until the network converges to a set of weights and
biases that minimize the loss function on the training data.
Code
import numpy as np
from sklearn.datasets import make_regression
# Update weights
W2 -= learning_rate * np.dot(hidden_layer_output.T, d_y_pred)
W1 -= learning_rate * np.dot(X.T, d_hidden)
# Backward pass
error = y_pred - y
d_y_pred = error * tanh_derivative(y_pred)
error_hidden = np.dot(d_y_pred, W2.T)
d_hidden = error_hidden * tanh_derivative(hidden_layer_output)
# Update velocity
v2 = momentum * v2 - learning_rate * np.dot(hidden_layer_output.T, d_y_pred)
v1 = momentum * v1 - learning_rate * np.dot(X.T, d_hidden)
# Update weights
W2 += v2
W1 += v1
# Update weights
W2 -= learning_rate * np.dot(hidden_layer_output.T, d_y_pred)
W1 -= learning_rate * np.dot(X.T, d_hidden)
Result
The 3 layer feed-forward network using gradient descent has been created
and the output for a given set of inputs is as shown.
Experiment - 9
Aim :
Write MATLAB/Python function to compute the principal components of the covariance matrix
of an l × N dimensional data matrix X as well as the corresponding variances. Hence, write a
MATLAB/Python function that evaluates the performance of the PCA method when applied on a
data matrix X
Theory:-
Principal Component Analysis (PCA) is a technique used for dimensionality reduction and data
compression. It works by identifying the directions, or principal components, in which the data
varies the most, and then projecting the data onto these directions.
Here are the steps involved in PCA:
Standardize the data: Before applying PCA, it is important to standardize the data by subtracting
the mean and dividing by the standard deviation. This ensures that all features are on the same
scale and have equal importance in the analysis.
Compute the covariance matrix: The next step is to compute the covariance matrix, which
measures the linear relationship between the features. The covariance matrix is a square matrix
where the (i,j)th element is the covariance between the i-th and j-th features.
Compute the eigenvectors and eigenvalues: The eigenvectors and eigenvalues of the covariance
matrix represent the directions and magnitudes of the principal components, respectively. The
eigenvectors are the directions in which the data varies the most, and the eigenvalues represent the
amount of variance explained by each principal component.
Choose the number of principal components: The next step is to choose the number of principal
components to retain. This can be done by examining the eigenvalues and selecting the top k
components that explain the most variance in the data. Typically, a scree plot is used to visualize
the eigenvalues and identify the "elbow point" where the curve levels off.
Project the data onto the principal components: The final step is to project the data onto the
selected principal components. This can be done by multiplying the standardized data matrix by
the matrix of eigenvectors corresponding to the selected principal components.
PCA is widely used in a variety of fields, including image processing, signal processing, and
finance. It can be used for tasks such as dimensionality reduction, feature extraction, and data
compression. PCA can also be extended to nonlinear and kernelized versions, such as Kernel
PCA, which can handle nonlinear and non-Gaussian data distributions.
Code a
import numpy as np
# compute variances
variances = np.var(pcs, axis=0)
return V, variances
# Generate random data matrix
X = np.random.normal(size=(100, 5))
principal_components, variances = pca(X)
plt.scatter(principal_components[:, 0], principal_components[:
, 1])
plt.xlabel('PC1 (Variance = {:.2f})'.format(variances[0]))
plt.ylabel('PC2 (Variance = {:.2f})'.format(variances[1]))
plt.show()
Result