0% found this document useful (0 votes)
9 views36 pages

ML Feb Unit 1 Notes-Merged

ml units
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views36 pages

ML Feb Unit 1 Notes-Merged

ml units
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY KAKINADA

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


III Year – II Semester

Subject: MACHINE LEARNING

Unit I:
1. Introduction- Artificial Intelligence, Machine Learning, Deep learning, Types of Machine Learning
Systems, Main Challenges of Machine Learning.
2. Statistical Learning: Introduction, Supervised and Unsupervised Learning, Training and Test Loss,
Tradeoffs in Statistical Learning, Estimating Risk Statistics, Sampling distribution of an estimator,
Empirical Risk Minimization.

Unit II:
Supervised Learning(Regression/Classification):Basic Methods: Distance based Methods, Nearest
Neighbours, Decision Trees, Naive Bayes,
Linear Models: Linear Regression, Logistic Regression,Generalized Linear Models, Support Vector
Machines,
Binary Classification: Multiclass/Structured outputs, MNIST, Ranking.

Unit III:
Ensemble Learning and Random Forests: Introduction, Voting Classifiers, Bagging and Pasting,
Random Forests, Boosting, Stacking.
Support Vector Machine: Linear SVM Classification, Nonlinear SVM Classification SVM Regression,
Naïve Bayes Classifiers.

Unit IV:
Unsupervised Learning Techniques: Clustering, K-Means, Limits of K-Means, Using Clustering for
Image Segmentation, Using Clustering for Preprocessing, Using Clustering for Semi-Supervised
Learning, DBSCAN, Gaussian Mixtures.
Dimensionality Reduction: The Curse of Dimensionality, Main Approaches for Dimensionality
Reduction, PCA, Using Scikit-Learn, Randomized PCA, Kernel PCA.

Unit V:
Neural Networks and Deep Learning: Introduction to Artificial Neural Networks with Keras,
Implementing MLPs with Keras, Installing TensorFlow 2, Loading and Preprocessing Data with
TensorFlow.

Text Books:
1. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition, O’Reilly
Publications, 2019
2. Data Science and Machine Learning Mathematical and Statistical Methods,Dirk P. Kroese,
Zdravko I. Botev, Thomas Taimre, Radislav Vaisman,25th November 2020
Reference Books:
1. Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012.
Unit I
1. Introduction- Artificial Intelligence, Machine Learning, Deep learning,

Types of Machine Learning Systems,

Main Challenges of Machine Learning.

Ans:

Machine learning:

“Machine Learning is broadly defined as the capability of a machine to


imitate intelligent human behavior.”

Machine Learning Examples:

1. Healthcare and medical diagnosis

2. Face detection in images

3. Commute predictions

4. Public safety

5. Agriculture

6. Smart assistants

7. Government industry and policymaking

8. Workplace safety

9. Safeguarding the environment

10. Cyber security

Artificial Intelligence, Machine Learning, Deep learning:

Artificial Intelligence: A technique which enables machines to follow human


behaviour.

Machine Learning: Subset of AI technique which uses statistical methods to


enables machines to improve Data and Previous Experience.

AI ⊆ ML
Deep Learning: Subset of ML which make the computation of multi-layer
neural network feasible.

ML ⊆ DL

Fig : AI vs ML vs DL.

Types of Machine Learning Systems:

Machine learning is a subset of AI, which enables the machine to automatically


learn from data, improve performance from past experiences, and make
predictions.

1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement
The following diagram shows the classification of machine learning.

Supervised Learning:

Supervised Learning is a task driven concept of a Machine.

Example:
• Classification
• Regression
Unsupervised Learning:

Unsupervised Learning is a data driven concept of a Machine.

Example:

• Clustering

Reinforcement Learning:

Reinforcement Learning is a Prediction of a Machine. It is the concept of


learning from mistakes.

Example:

• Playing Games.

Fig: Types of Machine Learning .

Main Challenges of Machine Learning:

The following are the main challenges of machine learning. They are

• Lack of training data


• Poor quality of data
• Data overfitting
• Data underfitting
• Irrelevant features
• Data security
• Accessibility
• Deployment
• Video training data
• Object detection

2. Statistical Learning: Introduction, Supervised and Unsupervised Learning,


Training and Test Loss, Tradeoffs in Statistical Learning, Estimating Risk
Statistics, Sampling distribution of an estimator, Empirical Risk Minimization.

Ans:

The following are the Statistical learning sub concepts. They are

• Introduction,
• Supervised and Unsupervised Learning,
• Training and Test Loss,
• Tradeoffs in Statistical Learning,
• Estimating Risk,
• Sampling distribution of an estimator,
• Empirical Risk Minimization.

Introduction:

Statistical Learning is the concept of statistical methods in a Machine


Learning.

A Machine should work with the below ones

Data----> Collection of Algorithms

Previous experiences ----> Task done by the machine

Predition---- > Future task of the machine.

Statistical Learning is work with the functionality of

Y = f (X) + €
Here, Y ----- is an Estimated output,

X----- is an input given to the system as X1,X2,X3........,Xn.

f(X)----- functionality of statistical learning

€------ is an error .

Supervised Learning:

“In a statistical learning number of inputs are given to the system as X


(X1,X2,X3........,Xn) and the responses is Y is called Supervised Learning.”

Y = f (X) + €

Estimated X1 Error

Output X2

X3
:

Xn

Unsupervised Learning:

“Ina statistical learning number of inputs are given to the system as X


(X1,X2,X3........,Xn) and responses are stored into the system is called
Unsupervised Learning.”
Y = f (X) + €

X1 Error

X2

X3
:

Xn
Training and Test Loss:

Training a machine learning (ML) model is a process in which a machine


learning algorithm is fed with training data from which it can learn .

In simple terms, the Loss function is a method of evaluating how well your
algorithm is modeling your dataset. It is a mathematical function of the
parameters of the machine learning algorithm. In simple linear regression,
predictions are calculated using slope (m) and intercept (c).

Y = mX+c

Here, Y is estimated output

X is number of inputs

m is slope

c is intercept.
Loss Function:

L(f(x), y
Tradeoffs in Statistical Learning:
It is important to understand prediction errors (bias and variance) when it
comes to accuracy in any machine learning algorithm

Y = f (X) + €

errors

Bias
The bias is known as the difference between the prediction of the values by
the ML model and the correct value.

High ------biasing gives ------large error, in training as well as testing data.

Low ------biasing gives ------low error, in training as well as testing data.

By high bias, the data predicted is in a straight line format, thus not
fitting accurately in the data in the data set. Such fitting is known
as Underfitting of Data. This happens when the hypothesis is too simple or
linear in nature. Refer to the graph given below for an example of such a
situation.

In such a problem, a hypothesis looks like follows.

Variance:
The variability of model prediction for a given data point which tells
us spread of our data is called the variance of the model.

When a model is high on variance, it is then said to as Overfitting of


Data. Overfitting is fitting the training set accurately via complex curve and
high order hypothesis but is not the solution as the error with unseen data is
high.While training a data model variance should be kept low.
The high variance data looks like follows.
In such a problem, a hypothesis looks like follows

The following diagram shows the relationship between bias - varience :

Fig :Tradeoffs in Statistical Learning

Estimating Risk Statistics:

Estimating risk is when you determine the probability of occurrence of


harm and severity of harm. The risk should be recorded in your hazard
traceability matrix or risk analysis. You do this both before risk control
measures have been taken, as well as after risk control measures have been
implemented.

Risk = Likelihood × Impact

Rtrue(f) = E p L(f(x), y) = ∫∫p(x, y)L(f(x), y)dxdy.


Sampling distribution of an estimator:

The “sampling distribution” of a statistic (estimator) is a probability


distribution that describes the probabilities with which the possible values for
a specific statistic (estimator) occur.

Sampling distribution of the sample:

Mean

Variance

Proportion

An Estimator is two types they are

1. Biased estimators
• Median
• Mode
• Standard deviation
2. UnBiased estimators
• Mean
• Variance
• Proportion

Emperical Risk Minimization:

Empirical risk minimization is a principle in statistical learning


theory which defines a family of learning algorithms based on evaluating
performance over a known and fixed dataset.
empirical risk:

R(f) = EˆL(f(X), Y )
empirical risk minimization:
f ∗ = arg min f∈F R(f)
UNIT-2

1. Explain the Basic Methods in Supervised Learning.


Ans:
Supervised Learning: Basic Methods:
A. Distance based Methods
B. Nearest Neighbors,
C. Decision Trees
D. Naïve Bayes

A.Distance Based Methods:


Distance methods are used in supervised and unsupervised learning to
calculate similarity in data points.

The four types of distance metrics are

1. Euclidean Distance
2. Manhattan Distance
3. Minkowski Distance
4. Hamming Distance.

1. Euclidean Distance:

Calculating the distance between two points. They are P(X1,Y1) and
Q(X2,y2). Where D is the distance, and (x 1 , y 1 ) and ( x 2 , y 2 ) are the Cartesian
coordinates of the two points.

Formula and Graph:


Example:
Find the distance between points P(3, 2) and Q(4, 1).
Solution:
PQ = √2 units.
2. Manhattan Distance:

It is the sum of the absolute differences between points across all the
dimensions.
• It calculates the distance between real vectors.
• It is also called Taxicab distance or City Block Distance.
Formula:

d = |x1 – y1| + |x2 – y2|

3. Minkowski Distance:
It is the generalization of Euclidean and Manhattan Distance.
Formula:

Graphs of distances:

Fig: Graphs of Different Distances.


Hamming Distance:

The Hamming distance between two strings or vectors of equal length


is the number of positions at which the corresponding symbols are different.

B.Nearest Neighbors:

• K-Nearest Neighbour is one of the simplest Machine Learning algorithms


based on Supervised Learning technique.
• K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
• K-NN is a non-parametric algorithm,
• It is also called a lazy learner algorithm
C.Decision Trees:

Decision Tree Terminologies


• Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which
further gets divided into two or more homogeneous sets.

• Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node.

• Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to
the given conditions.

• Branch/Sub Tree: A tree formed by splitting the tree.

• Pruning: Pruning is the process of removing the unwanted branches from the tree.

• Parent/Child node: The root node of the tree is called the parent node, and other nodes are called
the child nodes.

Example:
Attribute Selection Measures:
While implementing a Decision tree, the main issue arises that how to select the best
attribute for the root node and for sub-nodes. So, to solve such problems there is a technique
which is called as Attribute selection measure or ASM.

o Information Gain
o Gini Index

Formula:

Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)

Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies


randomness in data. Entropy can be calculated as:

Where,

o S= Total number of samples


o P= Probability of samples

Gini Index:
Gini Index= 1- ∑jPj2

Pruning: Getting an Optimal Decision tree:


Pruning is a process of deleting the unnecessary nodes from a tree in order to get
the optimal decision tree.

o Cost Complexity Pruning


o Reduced Error Pruning.
Advantages of the Decision Tree
o It is simple to understand as it follows the same process which a human
follow while making any decision in real-life.
o It can be very useful for solving decision-related problems.
o It helps to think about all the possible outcomes for a problem.
o There is less requirement of data cleaning compared to other algorithms.

Disadvantages of the Decision Tree


o The decision tree contains lots of layers, which makes it complex.
o It may have an overfitting issue, which can be resolved using the Random
Forest algorithm.
o For more class labels, the computational complexity of the decision tree
may increase.

D.Naïve Bayes:

o Naïve Bayes algorithm is a supervised learning algorithm, which is based


on Bayes theorem and used for solving classification problems.
o It is mainly used in text classification that includes a high-dimensional training
dataset.
o Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can
make quick predictions.
o It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
o Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.

Bayes' Theorem:
Where,

P(A|B) is Posterior probability: Probability of hypothesis A on the observed event B.

P(B|A) is Likelihood probability: Probability of the evidence given that the probability
of a hypothesis is true.

P(A) is Prior Probability: Probability of hypothesis before observing the evidence.

P(B) is Marginal Probability: Probability of Evidence.

2. Explain the concept of Linear Models in machine learning.

Ans:

The following are the models used in linear models in machine learning. They
are

A. Linear Regression
B. Logistic Regression
C. Generalized Linear Models
D. Support Vector Machines

Regression:
“The term regression is to find the relationship between two or more variables.”

A.Linear Regression
Linear regression uses the relationship between the data-points to draw a
straight line through all them.

Linear Regression equation is:


Python Code:
import matplotlib.pyplot as plt
from scipy import stats

x=[5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]

slope, intercept, r, p, std_err = stats.linregress(x, y)

def myfunc(x):
return slope * x + intercept

mymodel = list(map(myfunc, x))

plt.scatter(x,y)
plt.plot(x,mymodel)
plt.show()

output:
B.Logistic Regression
Logistic regression aims to solve classification problems. It does this by
predicting categorical outcomes, unlike linear regression that predicts a
continuous outcome.

Fig: Illustration of Logistic Regression.

Python code:

import numpy
from sklearn import linear_model

X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69,
5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

logr = linear_model.LogisticRegression()
logr.fit(X,y)
log_odds = logr.coef_
odds = numpy.exp(log_odds)
print(odds)

output:

[[4.03541657]]

C.Generalized Linear Models:


Generalized Linear Models (GLMs) are a class of regression models that can
be used to model a wide range of relationships between a response variable
and one or more predictor variables.

Some of the features of GLMs include:

• Flexibility
• Model interpretability
• Robustness
• Scalability
• Ease of use
• Hypothesis testing
• Regularization
• Model comparison

Some of the disadvantages of GLMs include:

• Assumptions
• Model specification
• Overfitting
• Limited flexibility
• Data requirements
• Model assumptions
Poisson regression is an example of generalized linear models (GLM). There are
three components in generalized linear models.

• Linear predictor
• Link function
• Probability distribution

Poisson Distribution Formula:

GLMs can be used to construct the models for regression and classification
problems by using the type of distribution which best describes the data or
labels given for training the model.

1. Binary classification data – Bernoulli distribution


2. Real valued data – Gaussian distribution
3. Count-data – Poisson distribution
D.Support Vector Machines:

Support Vector Machine:


Support Vector Machine (SVM) is a supervised machine learning algorithm
used for both classification and regression.
The main objective of the SVM algorithm is to find the
optimal hyperplane in an N-dimensional space that can separate the data points
in different classes in the feature space.
Support Vector Machine Terminology:

• Hyperplane
• Support Vectors
• Margin
• Kernel
• Hard Margin
• Soft Margin
• C
• Hinge Loss
• Dual Problem

Fig: Support Vector Machine .


3.Explain the concepts of Binary Classification:

Ans:

The concept of Binary Classification contains the following sub topics.They


are

• Multiclass/Structured outputs
• MNIST,
• Ranking.

Structured outputs:
MNIST:

The MNIST database (Modified National Institute of Standards and


Technology database) is a large database of handwritten digits that is commonly
used for training various image processing systems.

Fig: MNIST Data set.

Ranking:
Ranking is a type of supervised machine learning (ML) that uses
labeled datasets to train its data and models to classify future data to
predict outcomes.

Ranking was first largely deployed within search engines. People


search for a topic, while the ranking algorithm reorders search results
based on the PageRank, and the search engine is able to display the
most relevant results to its customers.
Ranking models are made up of 2 main factors:

1. queries and
2. documents.

Fig: Ranking Model by Searching.


Unit -3

1. Explain the concept of Ensemble Learning and Random Forests.

Ans:

Introduction :

Ensemble Learning:

A group of predictors is called an Ensemble; thus this technique is called


Ensemble Learning and an Ensemble Learning algorithm is called an
Ensembl e method.

Fig: Ensemble model diagram.

Ensemble Methods :

The following are the ensemble methods in Machine Learning. They are

• Voting Classifier
• Bagging and Pasting
Bagging and Bootstrap Aggregating.
Pasting.
Out of Bag Evaluation.
Random Patches and Random Subspaces
• Random Forests
Extra –Trees
Features Importance
• Boosting
Ada Boost
Gradient Boosting
• Stacking

Voting Classifier:

A Voting Classifier is a machine learning model that trains on an ensemble of


numerous models and predicts an output (class) based on their highest probability
of chosen class as the output.

Fig: Voting Classifier diagram.

Voting Classifier supports two types of votings.They are

• Hard Voting: In hard voting, the predicted output class is a class with the
highest majority of votes.
• Soft Voting: In soft voting, the output class is the prediction based on the
average of probability given to that class.

For Example:

Identify the Soft and Hard Voting of Classifiers from the given data set:
Classifiers A B
Classifier1 0.7 0.3

Classifier2 0.1 0.9

Classifier3 0.6 0.4

Solution:

Hard voting:

Soft Voting:

Answer : A
Bagging and Pasting:

Bagging means bootstrap+aggregating and it is a ensemble method in


which we first bootstrap our data and for each bootstrap sample we train one
model. After that, we aggregate them with equal weights.

When it's not used replacement, the method is called pasting.

Bagging = Bootstrap + Aggregation


(With Replacement)

Pasting:

Pasting = Bootstrap + Aggregation


(Without Replacement)

Out-of-Bag Scoring

If we are using bagging, there’s a chance that a sample would never be selected,
while anothers may be selected multiple time.

The probability of not selecting a specific sample is (1–1/n), where n is the number
of samples.

Therefore, the probability of not picking n samples in n draws is (1–1/n)^n.

When the value of n is big, we can approximate this probability to 1/e, which is
approximately 0.3678.
This means that when the dataset is big enough, 37% of its samples are never
selected and we could use it to test our model.

This is called Out-of-Bag scoring, or OOB Scoring.

Random Forests
As the name suggest, a random forest is an ensemble of decision trees that can be
used to classification or regression. In most cases, it is used bagging. Each tree in
the forest outputs a prediction and the most voted becomes the output of the
model. This is helpful to make the model with more accuracy and stable,
preventing overfitting.
Another very useful property of random forests is the ability to measure
the relative importance of each feature by calculating how much each one reduce
the impurity of the model. This is called feature importance.

Fig: Random Forest Diagram

Boosting and stacking:


Bagging, boosting and stacking are the three most popular ensemble learning
techniques. Each of these techniques offers a unique approach to improving
predictive accuracy.

Boosting is used to reduce the bias of weak learners.


Fig : Boosting in Ensemble Learning.

Stacking:
Stacking is used to improve the overall accuracy of strong learners.

Fig: Stacking in Ensemble Learning.

2. Explain the concept of Support Vector Machine(SVM).

Ans:

The following are the concepts of SVM.They are


• Linear SVM Classification
• Non-Linear SVM Classification
• SVM Regression
• Naïve Bayes Classification

Support vector machines are broadly classified into two types:

1. Simple or linear SVM and


2. Kernel or non-linear SVM.

Mathematical intuition of Support Vector Machine:

The equation for the linear hyperplane can be written as:

The distance between a data point x_i and the decision boundary can be
calculated as:

where ||w|| represents the Euclidean norm of the weight vector w. Euclidean
norm of the normal vector W

For Linear SVM classifier :


Fig: Linear SVM diagram.

Kernel or non-linear SVM:

The main difference is that in the case of Linear Classification,


data is classified using a hyperplane. In contrast, kernels are used to organize
data in the Non-Linear Classification case.
SVM regression:
SVM regression or Support Vector Regression (SVR) is a machine
learning algorithm used for regression analysis.

Naïve Bayes Classification:


The Naïve Bayes classifier is a supervised machine learning
algorithm, which is used for classification tasks, like text classification.

Fig: Naïve Bayes Classifier in ML.


3.Explain the differences between linear and nonlinear classifier.

Ans:

S.No Linear Classification Non-Linear Classification

1. Linear Classification refers to categorizing Non-Linear Classification refers to


a set of data points into a discrete class categorizing those instances that
based on a linear combination of its are not linearly separable.
explanatory variables.

2 It is possible to classify data with a It is not easy to classify data with a


straight line. straight line.

3 Data is classified with the help of a The utilization of kernels is made to


hyperplane. transform non-separable data into
separable data.

4 Some of the popular linear classifiers are: Some of the popular non-linear
i) Naive Bayes classifiers are:
ii) Logistic Regression i) Multi-Layer Perceptron (MLP)
iii) Support Vector Machine (linear kernel) ii) Decision Tree
iii) Random Forests
iv) K-Nearest Neighbors

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy