ML Feb Unit 1 Notes-Merged
ML Feb Unit 1 Notes-Merged
Unit I:
1. Introduction- Artificial Intelligence, Machine Learning, Deep learning, Types of Machine Learning
Systems, Main Challenges of Machine Learning.
2. Statistical Learning: Introduction, Supervised and Unsupervised Learning, Training and Test Loss,
Tradeoffs in Statistical Learning, Estimating Risk Statistics, Sampling distribution of an estimator,
Empirical Risk Minimization.
Unit II:
Supervised Learning(Regression/Classification):Basic Methods: Distance based Methods, Nearest
Neighbours, Decision Trees, Naive Bayes,
Linear Models: Linear Regression, Logistic Regression,Generalized Linear Models, Support Vector
Machines,
Binary Classification: Multiclass/Structured outputs, MNIST, Ranking.
Unit III:
Ensemble Learning and Random Forests: Introduction, Voting Classifiers, Bagging and Pasting,
Random Forests, Boosting, Stacking.
Support Vector Machine: Linear SVM Classification, Nonlinear SVM Classification SVM Regression,
Naïve Bayes Classifiers.
Unit IV:
Unsupervised Learning Techniques: Clustering, K-Means, Limits of K-Means, Using Clustering for
Image Segmentation, Using Clustering for Preprocessing, Using Clustering for Semi-Supervised
Learning, DBSCAN, Gaussian Mixtures.
Dimensionality Reduction: The Curse of Dimensionality, Main Approaches for Dimensionality
Reduction, PCA, Using Scikit-Learn, Randomized PCA, Kernel PCA.
Unit V:
Neural Networks and Deep Learning: Introduction to Artificial Neural Networks with Keras,
Implementing MLPs with Keras, Installing TensorFlow 2, Loading and Preprocessing Data with
TensorFlow.
Text Books:
1. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition, O’Reilly
Publications, 2019
2. Data Science and Machine Learning Mathematical and Statistical Methods,Dirk P. Kroese,
Zdravko I. Botev, Thomas Taimre, Radislav Vaisman,25th November 2020
Reference Books:
1. Machine Learning Probabilistic Approach, Kevin P. Murphy, MIT Press, 2012.
Unit I
1. Introduction- Artificial Intelligence, Machine Learning, Deep learning,
Ans:
Machine learning:
3. Commute predictions
4. Public safety
5. Agriculture
6. Smart assistants
8. Workplace safety
AI ⊆ ML
Deep Learning: Subset of ML which make the computation of multi-layer
neural network feasible.
ML ⊆ DL
Fig : AI vs ML vs DL.
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement
The following diagram shows the classification of machine learning.
Supervised Learning:
Example:
• Classification
• Regression
Unsupervised Learning:
Example:
• Clustering
Reinforcement Learning:
Example:
• Playing Games.
The following are the main challenges of machine learning. They are
Ans:
The following are the Statistical learning sub concepts. They are
• Introduction,
• Supervised and Unsupervised Learning,
• Training and Test Loss,
• Tradeoffs in Statistical Learning,
• Estimating Risk,
• Sampling distribution of an estimator,
• Empirical Risk Minimization.
Introduction:
Y = f (X) + €
Here, Y ----- is an Estimated output,
€------ is an error .
Supervised Learning:
Y = f (X) + €
Estimated X1 Error
Output X2
X3
:
Xn
Unsupervised Learning:
X1 Error
X2
X3
:
Xn
Training and Test Loss:
In simple terms, the Loss function is a method of evaluating how well your
algorithm is modeling your dataset. It is a mathematical function of the
parameters of the machine learning algorithm. In simple linear regression,
predictions are calculated using slope (m) and intercept (c).
Y = mX+c
X is number of inputs
m is slope
c is intercept.
Loss Function:
L(f(x), y
Tradeoffs in Statistical Learning:
It is important to understand prediction errors (bias and variance) when it
comes to accuracy in any machine learning algorithm
Y = f (X) + €
errors
Bias
The bias is known as the difference between the prediction of the values by
the ML model and the correct value.
By high bias, the data predicted is in a straight line format, thus not
fitting accurately in the data in the data set. Such fitting is known
as Underfitting of Data. This happens when the hypothesis is too simple or
linear in nature. Refer to the graph given below for an example of such a
situation.
Variance:
The variability of model prediction for a given data point which tells
us spread of our data is called the variance of the model.
Mean
Variance
Proportion
1. Biased estimators
• Median
• Mode
• Standard deviation
2. UnBiased estimators
• Mean
• Variance
• Proportion
R(f) = EˆL(f(X), Y )
empirical risk minimization:
f ∗ = arg min f∈F R(f)
UNIT-2
1. Euclidean Distance
2. Manhattan Distance
3. Minkowski Distance
4. Hamming Distance.
1. Euclidean Distance:
Calculating the distance between two points. They are P(X1,Y1) and
Q(X2,y2). Where D is the distance, and (x 1 , y 1 ) and ( x 2 , y 2 ) are the Cartesian
coordinates of the two points.
It is the sum of the absolute differences between points across all the
dimensions.
• It calculates the distance between real vectors.
• It is also called Taxicab distance or City Block Distance.
Formula:
3. Minkowski Distance:
It is the generalization of Euclidean and Manhattan Distance.
Formula:
Graphs of distances:
B.Nearest Neighbors:
• Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node.
• Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to
the given conditions.
• Pruning: Pruning is the process of removing the unwanted branches from the tree.
• Parent/Child node: The root node of the tree is called the parent node, and other nodes are called
the child nodes.
Example:
Attribute Selection Measures:
While implementing a Decision tree, the main issue arises that how to select the best
attribute for the root node and for sub-nodes. So, to solve such problems there is a technique
which is called as Attribute selection measure or ASM.
o Information Gain
o Gini Index
Formula:
Where,
Gini Index:
Gini Index= 1- ∑jPj2
D.Naïve Bayes:
Bayes' Theorem:
Where,
P(B|A) is Likelihood probability: Probability of the evidence given that the probability
of a hypothesis is true.
Ans:
The following are the models used in linear models in machine learning. They
are
A. Linear Regression
B. Logistic Regression
C. Generalized Linear Models
D. Support Vector Machines
Regression:
“The term regression is to find the relationship between two or more variables.”
A.Linear Regression
Linear regression uses the relationship between the data-points to draw a
straight line through all them.
x=[5,7,8,7,2,17,2,9,4,11,12,9,6]
y = [99,86,87,88,111,86,103,87,94,78,77,85,86]
def myfunc(x):
return slope * x + intercept
plt.scatter(x,y)
plt.plot(x,mymodel)
plt.show()
output:
B.Logistic Regression
Logistic regression aims to solve classification problems. It does this by
predicting categorical outcomes, unlike linear regression that predicts a
continuous outcome.
Python code:
import numpy
from sklearn import linear_model
X = numpy.array([3.78, 2.44, 2.09, 0.14, 1.72, 1.65, 4.92, 4.37, 4.96, 4.52, 3.69,
5.88]).reshape(-1,1)
y = numpy.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
logr = linear_model.LogisticRegression()
logr.fit(X,y)
log_odds = logr.coef_
odds = numpy.exp(log_odds)
print(odds)
output:
[[4.03541657]]
• Flexibility
• Model interpretability
• Robustness
• Scalability
• Ease of use
• Hypothesis testing
• Regularization
• Model comparison
• Assumptions
• Model specification
• Overfitting
• Limited flexibility
• Data requirements
• Model assumptions
Poisson regression is an example of generalized linear models (GLM). There are
three components in generalized linear models.
• Linear predictor
• Link function
• Probability distribution
GLMs can be used to construct the models for regression and classification
problems by using the type of distribution which best describes the data or
labels given for training the model.
• Hyperplane
• Support Vectors
• Margin
• Kernel
• Hard Margin
• Soft Margin
• C
• Hinge Loss
• Dual Problem
Ans:
• Multiclass/Structured outputs
• MNIST,
• Ranking.
Structured outputs:
MNIST:
Ranking:
Ranking is a type of supervised machine learning (ML) that uses
labeled datasets to train its data and models to classify future data to
predict outcomes.
1. queries and
2. documents.
Ans:
Introduction :
Ensemble Learning:
Ensemble Methods :
The following are the ensemble methods in Machine Learning. They are
• Voting Classifier
• Bagging and Pasting
Bagging and Bootstrap Aggregating.
Pasting.
Out of Bag Evaluation.
Random Patches and Random Subspaces
• Random Forests
Extra –Trees
Features Importance
• Boosting
Ada Boost
Gradient Boosting
• Stacking
Voting Classifier:
• Hard Voting: In hard voting, the predicted output class is a class with the
highest majority of votes.
• Soft Voting: In soft voting, the output class is the prediction based on the
average of probability given to that class.
For Example:
Identify the Soft and Hard Voting of Classifiers from the given data set:
Classifiers A B
Classifier1 0.7 0.3
Solution:
Hard voting:
Soft Voting:
Answer : A
Bagging and Pasting:
Pasting:
Out-of-Bag Scoring
If we are using bagging, there’s a chance that a sample would never be selected,
while anothers may be selected multiple time.
The probability of not selecting a specific sample is (1–1/n), where n is the number
of samples.
When the value of n is big, we can approximate this probability to 1/e, which is
approximately 0.3678.
This means that when the dataset is big enough, 37% of its samples are never
selected and we could use it to test our model.
Random Forests
As the name suggest, a random forest is an ensemble of decision trees that can be
used to classification or regression. In most cases, it is used bagging. Each tree in
the forest outputs a prediction and the most voted becomes the output of the
model. This is helpful to make the model with more accuracy and stable,
preventing overfitting.
Another very useful property of random forests is the ability to measure
the relative importance of each feature by calculating how much each one reduce
the impurity of the model. This is called feature importance.
Stacking:
Stacking is used to improve the overall accuracy of strong learners.
Ans:
The distance between a data point x_i and the decision boundary can be
calculated as:
where ||w|| represents the Euclidean norm of the weight vector w. Euclidean
norm of the normal vector W
Ans:
4 Some of the popular linear classifiers are: Some of the popular non-linear
i) Naive Bayes classifiers are:
ii) Logistic Regression i) Multi-Layer Perceptron (MLP)
iii) Support Vector Machine (linear kernel) ii) Decision Tree
iii) Random Forests
iv) K-Nearest Neighbors