Unit4 ML BARKHA BHARDWAJ
Unit4 ML BARKHA BHARDWAJ
Greater Noida
Unit: 4
MACHINE LEARNING
Ms. Barkha Bhardwaj
ACSML0601
Assistant Professor
B Tech 6th Sem CSE(AI) Department
Computer Science and Engineering (Artificial
Intelligence)
End
Subject Periods Evaluation Scheme
Sl.
Subject Name Semester Total Credit
No.
Codes
L T P CT TA TOTAL PS TE PE
WEEKS COMPULSORY INDUCTION PROGRAM
1. Engineering knowledge:
2. Problem analysis:
3. Design/development of solutions:
4. Conduct investigations of complex problems:
5. Modern tool usage:
6. The engineer and society:
7. Environment and sustainability:
8. Ethics:
9. Individual and team work:
10. Communication:
11. Project management and finance:
12. Life-long learning
Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit
12/6/24 9
4
CO-PO and PSO Mapping
ACSML0601.1 3 2 2 1 2 2 - - - 1 - -
ACSML0601.2 3 2 2 3 2 2 1 - 2 1 1 2
ACSML0601.3 2 2 2 2 2 2 2 1 1 - 1 3
ACSML0601.4 3 3 1 3 1 1 2 - 2 1 1 2
ACSML0601.5 3 2 1 2 1 2 1 1 2 1 1 1
AVG 2.8 2.2 1.6 2.2 1.6 1.8 1.2 0.4 1.4 0.8 0.8 1.6
Matrix of CO/PSO:
PSO1 PSO2 PSO3
ACSML0601.1 3 2 3
ACSML0601.2 3 2 2
ACSML0601.3 3 2 3
ACSML0601.4 2 1 1
ACSML0601.5 2 2 1
Prerequisites:
• Statistics.
• Linear Algebra.
• Calculus.
• Probability.
• Programming Languages.
https://www.youtube.com/watch?v=PPLop4L2eGk&list=PLLssT5z_DsK-
h9vYZkQkYNWcItqhlRJLN
Random Forest
Gradient Boosting Machines
XGBoost.
Prerequisites:
• Statistics.
• Linear Algebra.
• Calculus.
• Probability.
• Programming Languages.
3. Random forest
4. Gradient boosting
The objective of the topic is to make the student able to understand about :
•Bayesian learning.
Recap
Students learnt the unsupervised learning algorithms
• Suppose that you are allowed to flip the coin 10 times in order
to determine the fairness of the coin.
• Observations from the experiment will fall under one of the
following cases:
• Case 1: observing 5 heads and 5 tails.
• Case 2: observing h heads and 10-h tails.
The objective of the topic is to make the student able to understand about :
•.Bayes Optimal classifier
Recap
Students learnt thethe Bayesian learning algorithms
• The Bayes optimal classifier is a probabilistic model that makes the most
probable prediction for a new example, given the training dataset.
• This model is also referred to as the Bayes optimal learner, the Bayes
classifier, Bayes optimal decision boundary, or the Bayes optimal
discriminant function.
The objective of the topic is to make the student able to understand about :
•.Naive Bayes
Recap
Students learnt the Bayes optimal classifier.
• Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that
can make quick predictions.
• Some popular Naïve Bayess of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles
12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 35
Naïve Bayes
• The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:
Where,
• P(A|B) is Posterior probability: Probability of hypothesis A on the
observed event B.
• Problem: If the weather is sunny, then the Player should play or not?
Since 0.144 > 0.048, Which means given the features RED SUV and
Domestic, our Naïve Bayes gets classified as ’NO’ the car is not stolen.
P(sunny/yes) = 2/9
P(Temperature=66/yes) = 0.034
P(Humidity=90/yes) = 0.0221 P(x/yes) * P(yes) = (2/9) * 0.034 * 0.0221 * (3/9) * (9/14)
= 0.000036
P(True/yes) = 3/9
P(x/no) * P(no) = (3/5) * 0.0279* 0.0381* (3/5) * (5/14)
and = 0.008137
P(sunny/no) = 3/5 0.008137 > 0.000036
P(Temperature=66/no) = 0.0279 Classification — NO
P(Humidity=90/no) = 0.0381
P(True/no) = 3/5
12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 51
Topic Objective
The objective of the topic is to make the student able to understand about :
•.Naive Bayes Pros and Cons
Recap
Students learnt the Naïve Bayes algorithms
Pros:
• It is easy and fast to predict class of test data set. It also perform well in
multi class prediction
Cons:
• If categorical variable has a category (in test data set), which was not
observed in training data set, then model will assign a 0 (zero) probability
and will be unable to make a prediction. This is often known as “Zero
Frequency”.
The objective of the topic is to make the student able to understand about :
•.Bayesian Belief Network
Recap
Students learnt the Pros and cons of Naïve bayes algorithms
• Bayesian Network can be used for building models from data and experts
opinions, and it consists of two parts:
Ø Directed Acyclic Graph
Ø Table of conditional probabilities.
• The Bayesian network for the above problem is given below. The
network structure is showing that burglary and earthquake is the
parent node of the alarm and directly affecting the probability of
alarm's going off, but David and Sophia's calls depend on alarm
probability.
• The network is representing that our assumptions do not directly
perceive the burglary and also do not notice the minor earthquake,
and they also not confer before calling.
• The conditional distributions for each node are given as conditional
probabilities table or CPT.
• Each row in the CPT must be sum to 1 because all the entries in the
table represent an exhaustive set of cases for the variable.
• In CPT, a boolean variable with k boolean parents contains
2K probabilities. Hence, if there are two parents, then CPT will
contain 4 probability value
• Burglary (B)
• Earthquake(E)
• Alarm(A)
• David Calls(D)
• Sophia calls(S)
• Let's take the observed probability for the Burglary and earthquake
component:
• P(B= True) = 0.002, which is the probability of burglary.
• P(B= False)= 0.998, which is the probability of no burglary.
• P(E= True)= 0.001, which is the probability of a minor earthquake
• P(E= False)= 0.999, Which is the probability that an earthquake not
occurred.
• From the formula of joint distribution, we can write the problem statement
in the form of probability distribution:
• P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
• Hence, a Bayesian network can answer any query about the domain by
using Joint distribution.
The objective of the topic is to make the student able to understand about :
•Ensemble Methods
Recap
Students learnt the Bayesian Belief network
The objective of the topic is to make the student able to understand about :
•Ensemble Methods (Bagging)
Recap
Students learnt the Ensemble methods
• Step 1: Multiple subsets are created from the original data set with equal
tuples, selecting observations with replacement.
• Step 2: A base model is created on each of these subsets.
• Step 3: Each model is learned in parallel with each training set and
independent of each other.
• Step 4: The final predictions are determined by combining the
predictions from all the models.
•
Example of Bagging
• The Random Forest model uses Bagging, where decision tree models with
higher variance are present. It makes random feature selection to grow
trees. Several random trees make a Random Forest
The objective of the topic is to make the student able to understand about :
•Ensemble Methods (Boosting)
Recap
Students learnt thethe Bagging
Bagging and Boosting, both being the commonly used methods, have a
universal similarity of being classified as ensemble methods. Here we will
explain the similarities between them.
The objective of the topic is to make the student able to understand about :
•Bagging & boosting impact
Recap
Students learnt the Boosting.
• .
• Bagging is meant to reduce the variance without increasing the bias. This
technique is especially effective where minute changes in a learner’s
training set lead to huge changes in the predicted output. Bagging
reduces the variance by aggregating individual models. These models
have dissimilar statistical properties like the means and standard
deviations, among others
The objective of the topic is to make the student able to understand about :
• C5.0 Boosting
Recap
Students learnt the Bagging and boosting and its impact on bias and
variance.
C5.0 is adaptive boosting, based on the work of Rob Schapire and Yoav
Freund. The idea is to generate several classifiers (either decision trees or
rulesets) rather than just one. When a new case is to be classified, each
classifier votes for its predicted class and the votes are counted to determine
the final class.
The objective of the topic is to make the student able to understand about :
• Random Forest
Recap
Students learnt the C5.0 Boosting
• the random forest combines multiple trees to predict the class of the
dataset, it is possible that some decision trees may predict the correct
output, while others may not. But together, all the trees predict the correct
output. Therefore, below are two assumptions for a better Random forest
classifier:
• There should be some actual values in the feature variable of the dataset so
that the classifier can predict accurate results rather than a guessed result.
• The predictions from each tree must have very low correlations.
• Below are some points that explain why we should use the Random Forest
algorithm:
• <="" li="">It takes less training time as compared to other algorithms.
• It predicts output with high accuracy, even for the large dataset it runs
efficiently.
• It can also maintain accuracy when a large proportion of data is missing.
The Working process can be explained in the below steps and diagram:
• Step-1: Select random K data points from the training set.
• Step-2: Build the decision trees associated with the selected data points
(Subsets).
• Step-3: Choose the number N for decision trees that you want to build.
• Step-4: Repeat Step 1 & 2.
• Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.
• Suppose there is a dataset that contains multiple fruit images. So, this
dataset is given to the Random forest classifier. The dataset is divided into
subsets and given to each decision tree. During the training phase, each
decision tree produces a prediction result, and when a new data point
occurs, then based on the majority of results, the Random Forest classifier
predicts the final decision. Consider the below image:
The objective of the topic is to make the student able to understand about :
• Gradient Boosting
Recap
Students learnt the Random Forest
But here one question may arise if we are applying the same algorithm
then how multiple decision trees can give better predictions than a
single decision tree? Moreover, how does each decision tree capture
different information from the same data?
The objective of the topic is to make the student able to understand about :
• XG Boost
Recap
Students learnt the Gradient Boosting.
• A loss function should be improved, which implies bringing down the loss
function better than the result.
• To make expectations, weak learners are used in the model
• Decision trees are utilized in this, and they are utilized in a jealous way,
which alludes to picking the best-divided focuses in light of Gini Impurity
and so forth or to limit the loss function
• The additive model is utilized to gather every one of the frail models,
limiting the loss function.
• Trees are added each, ensuring existing trees are not changed in the
decision tree. Regularly angle plummet process is utilized to find the best
hyper boundaries, post which loads are refreshed further.
8. How the entries in the full joint probability distribution can be calculated?
a) Using variables
b) Using information
c) Both Using variables & information
d) None of the mentioned
10. Define
(i) Prior Probability
(ii) Conditional Probability
(iii) XG Boost
3) If you remove the following any one red points from the data. Does
the decision boundary will change?
A) Yes
B) No
4) If you remove the non-red circled points from the data, the decision
boundary will change?
A) True
B) False
5.What is the consequence between a node and its predecessors while creating
bayesian network?
a) Functionally dependent
b) Dependant
c) Conditionally independent
d) Both Conditionally dependant & Dependant
Naïve Bayes
• Ensemble methods
• Bagging
• Boosting
• Pros and cons of Bagging and Boosting
• Impact of bagging and boosting on variance
• Ensemble methods
• Random Forest
• Gradient boosting
• XG boost
•
Reference Books:
ØIntroduction to Statistical Learning, Springer, 2013 By Gareth
James, Daniela Witten, Trevor Hastie, Robert Tibshirani.
ØPattern Classification, 2nd Ed., John Wiley & Sons, 2001, Richard
Duda, Peter Hart, David Stork.