LP - III Lab Manual
LP - III Lab Manual
LAB MANUAL
Companion Courses:
Sr.
Assignment Title
No.
Write a program non-recursive and recursive program to calculate Fibonacci numbers and
1
analyze their time and space complexity.
Write a program to solve a 0-1 Knapsack problem using dynamic programming or branch and
4
bound strategy.
5 Design n-Queens matrix having first Queen placed. Use backtracking to place remaining
Queens to generate the final n-queen‗s matrix.
Classify the email using the binary classification method. Email Spam detection has two
states: a) Normal State – Not Spam, b) Abnormal State – Spam. Use K-Nearest Neighbors
and Support Vector Machine for classification. Analyze their performance.
7 Dataset link: The emails.csv dataset on the Kaggle
https://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv
Write a smart contract on a test network, for Bank account of a customer for following
operations:
13 ● Deposit money
● Withdraw Money
● Show balance
Write a program in solidity to create Student data. Use the following constructs:
● Structures
14 ● Arrays
● Fallback
Deploy this as smart contract on Ethereum and Observe the transaction fee and Gas values.
15 Write a survey report on types of Blockchains and its real time use cases.
Recursive Algorithm:
Algorithm Fibonacci(n)
{
if (n <= 1)
return n;
else
return Fibonacci(n - 1) + Fibonacci(n - 2); }
Time complexity:
T(n) = T(n-1) + T(n-2) + c
= 2T(n-1) + c //from the approximation T(n-1) ~ T(n-2)
= 2*(2T(n-2) + c) + c
= 4T(n-2) + 3c
= 8T(n-3) + 7c
= 2 k * T(n - k) + (2 k - 1)*c
Let's find the value of k for which: n - k = 0
k=n
T(n) = 2 n * T(0) + (2 n - 1)*c
= 2 n * (1 + c) - c
T(n) = 2 n
Conclusion: We have studied Recursive and Non-Recursive way to Calculate Fibonacci Numbers.
Illustration of step 2
Now min heap contains 5 nodes where 4 nodes are roots of trees with single element each, and one heap node is
root of tree with 3 elements
character Frequency
c 12
d 13
Internal Node 14
e 16
f 45
Step 3: Extract two minimum frequency nodes from heap. Add a new internal node with frequency 12 + 13 = 25
Illustration of step 4
Now min heap contains 3 nodes.
character Frequency
Internal Node 25
Internal Node 30
f 45
Step 5: Extract two minimum frequency nodes. Add a new internal node with frequency 25 + 30 = 55
Illustration of step 6
Now min heap contains only one node.
character Frequency
Internal Node 100
Since the heap contains only one node, the algorithm stops here.
Steps to print codes from Huffman Tree:
We can express this fact in the following formula: define c[i, w] to be the solution for items 1,2, … , i and the
maximum weight w.
Note here that node 3 and node 5 have been killed after updating U at node 7. Also, node 6 is not explored
further, since adding any more weight exceeds the threshold. At the end, only nodes 6 and 8 remain. Since the
value of U is less for node 8, we select this node. Hence the solution is {1, 1, 0, 1}, and we can see here that the
total weight is exactly equal to the threshold value in this case.
Conclusion: We can solve 0/1 knapsack problem by using Dynamic Programming and Branch and Bound
Approach.
The expected output is a binary matrix that has 1s for the blocks where queens are placed. For example, the
following is the output matrix for the above 4 queen solution.
{ 0, 1, 0, 0}
{ 0, 0, 0, 1}
{ 1, 0, 0, 0}
{ 0, 0, 1, 0}
Objective:
Student will learn:
1] The basic concept and implementation logic of linear regression and random forest regression model.
2] Different evaluation metrics used for regression models like R2, RMSE, etc.
Theory:
Linear Regression:
Linear regression is one of the easiest and most popular Machine Learning algorithms. It is a statistical method
that is used for predictive analysis. Linear regression makes predictions for continuous/real or numeric variables
such as sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more independent
(y) variables, hence called as linear regression. Since linear regression shows the linear relationship, which
means it finds how the value of the dependent variable is changing according to the value of the independent
variable.
The linear regression model provides a sloped straight line representing the relationship between the variables.
Consider the below image:
Random Forest:
Random Forest is a popular machine learning algorithm that belongs to the supervised learning technique. It can
be used for both Classification and Regression problems in ML. It is based on the concept of ensemble
learning, which is a process of combining multiple classifiers to solve a complex problem and to improve the
performance of the model.
As the name suggests, "Random Forest is a classifier that contains a number of decision trees on various subsets
of the given dataset and takes the average to improve the predictive accuracy of that dataset." Instead of relying
on one decision tree, the random forest takes the prediction from each tree and based on the majority votes of
predictions, and it predicts the final output.
The greater number of trees in the forest leads to higher accuracy and prevents the problem of overfitting.
The below diagram explains the working of the Random Forest algorithm:
1] RMSE: Root Mean Squared Error is the square root of Mean Squared error. It measures the standard
deviation of residuals.
Conclusion: We have studied the Linear Regression and Random forest algorithm. Also implemented and
evaluated the models using R2 and RMSE scores.
Objective:
Student will learn:
1] The basic concept and implementation logic of K-Nearest Neighbors algorithm.
2] The basic concept and implementation logic of Support Vector Machine algorithm.
3] Different evaluation metrics used for classification models like accuracy, precision, recall, F-score, etc.
Theory:
K-Nearest Neighbor (KNN) Algorithm:
K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning
technique. K-NN algorithm assumes the similarity between the new case/data and available cases and put the
new case into the category that is most similar to the available categories. K-NN algorithm stores all the
available data and classifies a new data point based on the similarity. This means when new data appears then
it can be easily classified into a well suited category by using K- NN algorithm. K-NN is a non-parametric
algorithm, which means it does not make any assumption on underlying data. It is also called a lazy learner
algorithm because it does not learn from the training set immediately instead it stores the dataset and at the
time of classification, it performs an action on the dataset. KNN algorithm at the training phase just stores the
dataset and when it gets new data, then it classifies that data into a category that is much similar to the new
data.
Example: Suppose there are two categories, i.e., Category A and Category B, and we have a new data point
x1, so this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.
SVM algorithm can be used for Face detection, image classification, text categorization, etc.
For analysis of performance of KNN and SVM, use different evaluation metrics like accuracy, precision,
recall, F-score, etc.
Conclusion: We have studied the KNN and SVM algorithm. Also implemented and evaluated the models
using accuracy, precision, recall, F-score.
Objective:
Student will learn:
1] The basic concept and implementation logic of normalization of data.
2] The basic concept and implementation logic of accuracy score and confusion matrix.
Theory:
Normalization in Machine Learning:
Normalization is one of the most frequently used data preparation techniques, which helps us to change the
values of numeric columns in the dataset to use a common scale. Normalization is a scaling technique in
Machine Learning applied during data preparation to change the values of numeric columns in the dataset to
use a common scale. It is not necessary for all datasets in a model. It is required only when features of
machine learning models have different ranges.
Mathematically, we can calculate normalization with the below formula:
Xn = (X - Xminimum) / ( Xmaximum - Xminimum)
o Xn = Value of Normalization
o Xmaximum = Maximum value of a feature
o Xminimum = Minimum value of a feature
Accuracy Score:
Accuracy is one metric for evaluating classification models. Informally, accuracy is the fraction of predictions
our model got right. Formally, accuracy has the following definition:
For binary classification, accuracy can also be calculated in terms of positives and negatives as follows:
The confusion matrix is a matrix used to determine the performance of the classification models for a given
set of test data. It can only be determined if the true values for test data are known. The matrix itself can be
easily understood, but the related terminologies may be confusing. Since it shows the errors in the model
performance in the form of a matrix, hence also known as an error matrix. Some features of Confusion
matrix are given below:
o For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is 3*3 table, and
so on.
o The matrix is divided into two dimensions, that are predicted values and actual values along with the
total number of predictions.
o Predicted values are those values, which are predicted by the model, and actual values are the true
values for the given observations.
o It looks like the below table:
Conclusion: We have studied the concept of normalization of data, accuracy score and confusion matrix.
Also implemented and calculated the accuracy score.
Objective:
Student will learn:
1] The basic concept and implementation logic of K-Nearest Neighbors.
2] The basic concept and implementation logic of accuracy, error rate, precision and recall.
Theory:
K-Nearest Neighbor (KNN) Algorithm:
K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning
technique. K-NN algorithm assumes the similarity between the new case/data and available cases and put the
new case into the category that is most similar to the available categories. K-NN algorithm stores all the
available data and classifies a new data point based on the similarity. This means when new data appears then
it can be easily classified into a well suited category by using K- NN algorithm. K-NN is a non-parametric
algorithm, which means it does not make any assumption on underlying data. It is also called a lazy learner
algorithm because it does not learn from the training set immediately instead it stores the dataset and at the
time of classification, it performs an action on the dataset. KNN algorithm at the training phase just stores the
dataset and when it gets new data, then it classifies that data into a category that is much similar to the new
data.
How does K-NN work?
The K-NN working can be explained on the basis of the below algorithm:
• Step-1: Select the number K of the neighbors
• Step-2: Calculate the Euclidean distance of K number of neighbors
• Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
• Step-4: Among these k neighbors, count the number of the data points in each category.
• Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
• Step-6: Our model is ready.
The confusion matrix is a matrix used to determine the performance of the classification models for a given
set of test data. It can only be determined if the true values for test data are known. The matrix itself can be
easily understood, but the related terminologies may be confusing. Since it shows the errors in the model
performance in the form of a matrix, hence also known as an error matrix. Some features of Confusion matrix
are given below:
o For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is 3*3 table, and
so on.
o The matrix is divided into two dimensions, that are predicted values and actual values along with the
total number of predictions.
o Predicted values are those values, which are predicted by the model, and actual values are the true
values for the given observations.
Accuracy - Accuracy is the most intuitive performance measure and it is simply a ratio of correctly predicted
observation to the total observations. One may think that, if we have high accuracy then our model is best.
Yes, accuracy is a great measure but only when you have symmetric datasets where values of false positive
and false negatives are almost same. Therefore, you have to look at other parameters to evaluate the
performance of your model. For our model, we have got 0.803 which means our model is approx. 80%
accurate.
Accuracy = TP+TN/TP+FP+FN+TN
Precision - Precision is the ratio of correctly predicted positive observations to the total predicted positive
observations. The question that this metric answer is of all passengers that labeled as survived, how many
actually survived? High precision relates to the low false positive rate. We have got 0.788 precision which is
pretty good.
Precision = TP/TP+FP
Recall (Sensitivity) - Recall is the ratio of correctly predicted positive observations to the all observations in
actual class - yes. The question recall answers is: Of all the passengers that truly survived, how many did we
label? We have got recall of 0.631 which is good for this model as it‘s above 0.5.
Recall = TP/TP+FN
Conclusion: We have studied the K-Nearest Neighbors algorithm. Also implemented and evaluated the
models using accuracy, error rate, precision, and recall.
Objective:
Student will learn:
1] The basic concept and implementation logic of K-Means clustering/hierarchical clustering.
2] The basic concept of elbow method used to determine the number of clusters.
Theory:
K-Means Clustering Algorithm:
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different
clusters. Here K defines the number of pre-defined clusters that need to be created in the process, as if K=2,
there will be two clusters, and for K=3, there will be three clusters, and so on.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each
dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the categories of groups
in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between the data point and their corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and repeats
the process until it does not find the best clusters. The value of k should be predetermined in this algorithm.
The k-means clustering algorithm mainly performs two tasks:
o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the particular k-
center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is away from other clusters.
The below diagram explains the working of the K-means Clustering Algorithm:
Elbow Method:
The Elbow method is one of the most popular ways to find the optimal number of clusters. This method uses
the concept of WCSS value. WCSS stands for Within Cluster Sum of Squares, which defines the total
variations within a cluster. The formula to calculate the value of WCSS (for 3 clusters) is given below:
WCSS= ∑Pi in Cluster1 distance(Pi C1)2 +∑Pi in Cluster2distance(Pi C2)2+∑Pi in CLuster3 distance(Pi C3)2
Conclusion: We have studied the K-Means clustering algorithm and elbow method to find the optimal
number of clusters. Also implemented the K-Means clustering algorithm using python language.
Create Your Own Wallet Using Meta mask For Crypto Transaction
Once you‘ve completed the above steps, you‘ll be able to access your new MetaMask wallet. There are two
main components you‘ll need to familiarize yourself with so that you can begin using the software:
Identifying your public address: This is the address you can freely share with people or platforms like
exchanges in order to receive cryptocurrency into your wallet. Think of it as your home address that you share
with people to receive inbound mail. It‘s always advisable, however, to check to make sure any inbound
tokens are compatible with MetaMask first before receiving them, otherwise, they might be lost forever.
How to fund/buy and send: These are the core functions of MetaMask.
contract TipJar {
contract Bank {
Therefore, in a nutshell, all the Blockchains are having their own benefits and advantages and as awhole
public and private is considered as major or worthy in terms of operations (as depicted in Fig. According to
the experts security, scalability, and transparency are considered as worthy and main points in the
Blockchains of public and private types. It is important to note that private blockchains are not trustworthy;
while the public network is important in proof-of-work based.
Assignment No.17
Aim: Mini-project on Machine Learning.
Assignment No.18
Aim: Mini-project on Blockchain Technology.