0% found this document useful (0 votes)

58 views124 pages

Unit4 ML BARKHA BHARDWAJ

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views124 pages

Unit4 ML BARKHA BHARDWAJ

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 124

Noida Institute of Engineering and Technology,

Greater Noida

PROBABILISTIC LEARNING & ENSEMBLE

Unit: 4

MACHINE LEARNING
Ms. Barkha Bhardwaj
ACSML0601
Assistant Professor
B Tech 6th Sem CSE(AI) Department
Computer Science and Engineering (Artificial
Intelligence)

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

1
12/6/24
Evaluation Scheme

End
Subject Periods Evaluation Scheme
Sl.
Subject Name Semester Total Credit
No.
Codes
L T P CT TA TOTAL PS TE PE
WEEKS COMPULSORY INDUCTION PROGRAM

1 ACSE0501 Design and Analysis of Algorithms 3 1 0 30 20 50 100 150 4

2 ACSE0502 Computer Networks 3 1 0 30 20 50 100 150 4

3 ACSE0503 Design Thinking-II 2 1 0 30 20 50 100 150 3

4 ACSE0505 Web Technology 3 0 0 30 20 50 100 150 3

5 Departmental Elective-I 3 0 0 30 20 50 100 150 3

6 Departmental Elective-II 3 0 0 30 20 50 100 150 3

7 ACSE0551 Design and Analysis of Algorithms Lab 0 0 2 25 25 50 1

8 ACSE0552 Computer Networks Lab 0 0 2 25 25 50 1

9 ACSE0555 Web Technology Lab 0 0 2 25 25 50 1

10 ACSE0559 Internship Assessment 0 0 2 50 50 1

ANC0501 / Constitution of India, Law and Engineering /

11 2 0 0 30 20 50 50 100
ANC0502 Essence of Indian Traditional Knowledge

12 MOOCs (For B.Tech. Hons. Degree)

GRAND TOTAL 1100 24

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 2

Subject Syllabus

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 3

Subject Syllabus

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 4

Text Books

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 5

Branch Wise Applications

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 6
Course Objective

• To introduce students to the basic concepts of Machine Learning.

• To develop skills of implementing machine learning for solving

practical problems.

• To gain experience of doing independent study and research related

to Machine Learning

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 7
Course Outcome

At the end of the semester, student will be able to:

Course CO Description Blooms’

Outcomes Taxonomy
(CO)
CO1 Understanding utilization and implementation proper K2
machine learning algorithm.
CO2 Understand the basic supervised machine learning K2
algorithms.
CO3 Understand the difference between supervise and K2
unsupervised learning.
CO4 Understand algorithmic topics of machine learning and K2
mathematically deep enough to introduce the required
theory.
CO5 Apply an appreciation for what is involved in learning K3
from data.
12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 8
Program Outcome

1. Engineering knowledge:
2. Problem analysis:
3. Design/development of solutions:
4. Conduct investigations of complex problems:
5. Modern tool usage:
6. The engineer and society:
7. Environment and sustainability:
8. Ethics:
9. Individual and team work:
10. Communication:
11. Project management and finance:
12. Life-long learning
Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit
12/6/24 9
4
CO-PO and PSO Mapping

Correlation Matrix of CO with PO

CO.K PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12

ACSML0601.1 3 2 2 1 2 2 - - - 1 - -

ACSML0601.2 3 2 2 3 2 2 1 - 2 1 1 2

ACSML0601.3 2 2 2 2 2 2 2 1 1 - 1 3

ACSML0601.4 3 3 1 3 1 1 2 - 2 1 1 2

ACSML0601.5 3 2 1 2 1 2 1 1 2 1 1 1

AVG 2.8 2.2 1.6 2.2 1.6 1.8 1.2 0.4 1.4 0.8 0.8 1.6

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 10

Program Specific Outcomes

• PSO1-Apply Artificial Intelligence and its applications

to design intelligent systems for the betterment of
society.
• PSO2-Develop AI-based innovative solutions
demonstrating research, entrepreneurship,
professional ethics, and communication skills.
• PSO3-Demonstrate competency in AI by working in a
team and engaging in life-long learning.

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 11
CO-PO and PSO Mapping

Matrix of CO/PSO:
PSO1 PSO2 PSO3

ACSML0601.1 3 2 3

ACSML0601.2 3 2 2

ACSML0601.3 3 2 3

ACSML0601.4 2 1 1

ACSML0601.5 2 2 1

AVG 2.6 1.8 2

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 12

Program Educational Objectives

• PEO1-Engage in successful professional practices in the

area of Artificial Intelligence and pursue higher
education and research.

• PEO2-Demonstrate effective leadership and

communicate as an individual and as a team in the
workspace and society.

• PEO3-Pursue life-long learning in developing AI-based

innovative solutions for the betterment of society.

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 13
Result Analysis

• ML Result of 2020-21: 89.39%

• Average Marks: 46.05

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 14

End Semester Question Paper Template

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 15

Prerequisite

Prerequisites:
• Statistics.
• Linear Algebra.
• Calculus.
• Probability.
• Programming Languages.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 16

Brief Introduction to Subject

https://www.youtube.com/watch?v=PPLop4L2eGk&list=PLLssT5z_DsK-
h9vYZkQkYNWcItqhlRJLN

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 17

Topic Mapping with Course Outcome

Topics Course outcome

Bayesian Learning, Bayes Optimal CO4
Classifier, Naıve Bayes Classifier,
Probability estimation, Bayesian Belief
Networks.
CO4
Ensembles methods, Bagging, Boosting
Bagging & boosting and its impact on bias
and variance, C5.0 boosting CO4

Random Forest
Gradient Boosting Machines
XGBoost.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 18

Lecture Plan

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 19

Lecture Plan

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 20

Lecture Plan

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 21

Lecture Plan

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 22

Lecture Plan

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 23

Prerequisite

Prerequisites:
• Statistics.
• Linear Algebra.
• Calculus.
• Probability.
• Programming Languages.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 24

Ø Unit 4 Content:
• Bayesian Learning,
• Bayes Optimal Classifier
• Naıve Bayes Classifier
• Probability estimation
• Bayesian Belief Networks.
• Ensembles methods
• Bagging
• Boosting
• Bagging & boosting and its impact on bias and variance
• C5.0 boosting
• Random Forest
• Gradient Boosting Machines
• XGBoost.

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit

12/6/24 25
4
Unit Objective

The objective of the Unit 4

1. To study about Probabilistic learning algorithm's.

2. To study the concept of Bagging and boosting

3. Random forest

4. Gradient boosting

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 26

Topic Objective

The objective of the topic is to make the student able to understand about :
•Bayesian learning.

Recap
Students learnt the unsupervised learning algorithms

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 27
Bayesian Learning

• Suppose that you are allowed to flip the coin 10 times in order
to determine the fairness of the coin.
• Observations from the experiment will fall under one of the
following cases:
• Case 1: observing 5 heads and 5 tails.
• Case 2: observing h heads and 10-h tails.

• If case 2 is observed you can either:

1. Neglect your prior beliefs since now you have new data, decide the
probability of observing heads is h/10 by solely depending on recent
observations.
2. Adjust your belief accordingly to the value of h that you have just
observed, and decide the probability of observing heads using your recent
observations.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 28

Bayesian Learning

• BAYSEIAN LEARNING: It is this thinking model which uses the most

recent observations together with our beliefs or inclination for critical
thinking that is known as Bayesian thinking.

• INCREMENTAL LEARNING: It is a learning ,where you update your

knowledge incrementally with new evidence.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 29

Bayesian Learning

• Bayesian machine learning is a particular set of approaches to

probabilistic machine learning.
• Bayesian learning treats model parameters as random variables — in
Bayesian learning, parameter estimation amounts to computing post.
• Bayesian probability basically talks about the interpretation of “partial
beliefs”.
• Bayesian Estimation calculates the validity of the proposition.

• Validity of the Proposition depends on two things:

I. Prior Estimate.
II. New Relevant evidence.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 30

Topic Objective

The objective of the topic is to make the student able to understand about :
•.Bayes Optimal classifier

Recap
Students learnt thethe Bayesian learning algorithms

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 31
Bayes Optimal classifier

• The Bayes optimal classifier is a probabilistic model that makes the most
probable prediction for a new example, given the training dataset.

• This model is also referred to as the Bayes optimal learner, the Bayes
classifier, Bayes optimal decision boundary, or the Bayes optimal
discriminant function.

• Bayes Classifier: Probabilistic model that makes the most probable

prediction for new examples.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 32

Bayes Optimal classifier

• Defined as the label produced by the most probable classifier.

• Computing this can be hopelessly inefficient And yet an interesting

theoretical concept because, no other classification method can outperform
this method on average (using the same hypothesis space and prior
knowledge)

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 33

Topic Objective

The objective of the topic is to make the student able to understand about :
•.Naive Bayes

Recap
Students learnt the Bayes optimal classifier.

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 34
Naïve Bayes

• Naïve Bayes algorithm is a supervised learning algorithm, which is based

on Bayes theorem and used for solving classification problems.

• It is mainly used in text classification that includes a high-dimensional

training dataset.

• Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that
can make quick predictions.

• It is a probabilistic classifier, which means it predicts on the basis of the

probability of an object.

• Some popular Naïve Bayess of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles
12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 35
Naïve Bayes

• The Naïve Bayes algorithm is comprised of two words Naïve and Bayes,
Which can be described as:

• Naïve: It is called Naïve because it assumes that the occurrence of a

certain feature is independent of the occurrence of other features. Such as
if the fruit is identified on the bases of color, shape, and taste, then red,
spherical, and sweet fruit is recognized as an apple. Hence each feature
individually contributes to identify that it is an apple without depending
on each other.
• Bayes: It is called Bayes because it depends on the principle of Bayes'
Theorem.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 36

Naïve Bayes
• Bayes' theorem is also known as Bayes' Rule or Bayes' law, which is used
to determine the probability of a hypothesis with prior knowledge. It
depends on the conditional probability.
• The formula for Bayes' theorem is given as:

Where,
• P(A|B) is Posterior probability: Probability of hypothesis A on the
observed event B.

• P(B|A) is Likelihood probability: Probability of the evidence given that

the probability of a hypothesis is true.

• P(A) is Prior Probability: Probability of hypothesis before observing the

evidence.
• P(B) is Marginal Probability: Probability of Evidence.
12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 37
Naïve Bayes

• Suppose we have a dataset of weather conditions and corresponding target

variable "Play". So using this dataset we need to decide that whether we
should play or not on a particular day according to the weather conditions.
So to solve this problem, we need to follow the below steps:

1. Convert the given dataset into frequency tables.

2. Generate Likelihood table by finding the probabilities of given features.

3. Now, use Bayes theorem to calculate the posterior probability.

• Problem: If the weather is sunny, then the Player should play or not?

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 38

Naïve Bayes

• Solution: To solve this, first consider the below dataset:

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 39

Naïve Bayes

We can solve it using above discussed method of posterior probability.

• P(Yes | Sunny) = P( Sunny | Yes) * P(Yes) / P (Sunny)

• Here we have P (Sunny |Yes) = 3/9 = 0.33, P(Sunny) = 5/14 = 0.36, P(
Yes)= 9/14 = 0.64
• Now, P (Yes | Sunny) = 0.33 * 0.64 / 0.36 = 0.60, which has higher
probability.

Thus, A player can play on a sunny day.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 40

Naïve Bayes
Consider the car theft problem with attributes Color, Type, Origin, and the
target, Stolen can be either Yes or No.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 41

Naïve Bayes

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 42

Naïve Bayes

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 43

Naïve Bayes

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 44

Naïve Bayes

• So in our Naïve Bayes, we have 3 predictors X.

As per the equations discussed above, we can calculate the posterior

probability P(Yes | X) as :

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 45

Naïve Bayes

• and, P(No | X):

Since 0.144 > 0.048, Which means given the features RED SUV and
Domestic, our Naïve Bayes gets classified as ’NO’ the car is not stolen.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 46

Naïve Bayes

A Gaussian distribution is usually chosen to represent the class-conditional

probability for continuous attributes.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 47

Naïve Bayes

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 48

Naïve Bayes

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 49

Naïve Bayes

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 50

Naïve Bayes

likelihood for play=yes

P(x/yes) * P(yes) = P(sunny/yes) * P(Temperature=66/yes)
* P(Humidity=90/yes) * P(True/yes) * P(yes)

likelihood for play=no

P(x/no) * P(no) = P(sunny/no) * P(Temperature=66/no) *
P(Humidity=90/no) * P(True/no) * P(no)

P(sunny/yes) = 2/9
P(Temperature=66/yes) = 0.034
P(Humidity=90/yes) = 0.0221 P(x/yes) * P(yes) = (2/9) * 0.034 * 0.0221 * (3/9) * (9/14)
= 0.000036
P(True/yes) = 3/9
P(x/no) * P(no) = (3/5) * 0.0279* 0.0381* (3/5) * (5/14)
and = 0.008137
P(sunny/no) = 3/5 0.008137 > 0.000036
P(Temperature=66/no) = 0.0279 Classification — NO
P(Humidity=90/no) = 0.0381
P(True/no) = 3/5
12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 51
Topic Objective

The objective of the topic is to make the student able to understand about :
•.Naive Bayes Pros and Cons

Recap
Students learnt the Naïve Bayes algorithms

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 52
PROS AND CONS OF NAÏVE’S BAYES

Pros:

• It is easy and fast to predict class of test data set. It also perform well in
multi class prediction

• When assumption of independence holds, a Naive Bayes classifier

performs better compare to other models like logistic regression and you
need less training data.

• It perform well in case of categorical input variables compared to

numerical variable(s). For numerical variable, normal distribution is
assumed (bell curve, which is a strong assumption).

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 53

PROS AND CONS OF NAÏVE’S BAYES

Cons:

• If categorical variable has a category (in test data set), which was not
observed in training data set, then model will assign a 0 (zero) probability
and will be unable to make a prediction. This is often known as “Zero
Frequency”.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 54

Topic Objective

The objective of the topic is to make the student able to understand about :
•.Bayesian Belief Network

Recap
Students learnt the Pros and cons of Naïve bayes algorithms

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 55
BAYESIAN BELIEF NETWORK

• A Bayesian network is a probabilistic graphical model which represents a

set of variables and their conditional dependencies using a directed acyclic
graph.
• It is also called a Bayes network, belief network, decision network,
or Bayesian model.
• Bayesian networks are probabilistic, because these networks are built from
a probability distribution, and also use probability theory for prediction
and anomaly detection.

• Bayesian Network can be used for building models from data and experts
opinions, and it consists of two parts:
Ø Directed Acyclic Graph
Ø Table of conditional probabilities.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 56

BAYESIAN BELIEF NETWORK

• Each node corresponds to the random variables, and a variable can

be continuous or discrete.
• Arc or directed arrows represent the causal relationship or
conditional probabilities between random variables.
• These directed links or arrows connect the pair of nodes in the
graph.
• These links represent that one node directly influence the other
node, and if there is no directed link that means that nodes are
independent with each other
Ø In the above diagram, A, B, C, and D are random variables
represented by the nodes of the network graph.
Ø If we are considering node B, which is connected with node A
by a directed arrow, then node A is called the parent of Node
B.
Ø Node C is independent of node A.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 57

BAYESIAN BELIEF NETWORK

A Bayesian network graph is made up of nodes and Arcs

(directed links), where:

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 58

BAYESIAN BELIEF NETWORK

• Example: Harry installed a new burglar alarm at his home to detect

burglary. The alarm reliably responds at detecting a burglary but
also responds for minor earthquakes. Harry has two neighbors David
and Sophia, who have taken a responsibility to inform Harry at work
when they hear the alarm. David always calls Harry when he hears
the alarm, but sometimes he got confused with the phone ringing
and calls at that time too. On the other hand, Sophia likes to listen to
high music, so sometimes she misses to hear the alarm. Here we
would like to compute the probability of Burglary Alarm.

• Problem: Calculate the probability that alarm has sounded, but

there is neither a burglary, nor an earthquake occurred, and
David and Sophia both called the Harry.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 59

BAYESIAN BELIEF NETWORK

• The Bayesian network for the above problem is given below. The
network structure is showing that burglary and earthquake is the
parent node of the alarm and directly affecting the probability of
alarm's going off, but David and Sophia's calls depend on alarm
probability.
• The network is representing that our assumptions do not directly
perceive the burglary and also do not notice the minor earthquake,
and they also not confer before calling.
• The conditional distributions for each node are given as conditional
probabilities table or CPT.
• Each row in the CPT must be sum to 1 because all the entries in the
table represent an exhaustive set of cases for the variable.
• In CPT, a boolean variable with k boolean parents contains
2K probabilities. Hence, if there are two parents, then CPT will
contain 4 probability value

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 60

BAYESIAN BELIEF NETWORK

List of all events occurring in this network:

• Burglary (B)
• Earthquake(E)
• Alarm(A)
• David Calls(D)
• Sophia calls(S)

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 61

BAYESIAN BELIEF NETWORK

• We can write the events of problem statement in the form of

probability: P[D, S, A, B, E], can rewrite the above probability
statement using joint probability distribution:

• P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]

=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 62

BAYESIAN BELIEF NETWORK

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 63

BAYESIAN BELIEF NETWORK

• Let's take the observed probability for the Burglary and earthquake
component:
• P(B= True) = 0.002, which is the probability of burglary.
• P(B= False)= 0.998, which is the probability of no burglary.
• P(E= True)= 0.001, which is the probability of a minor earthquake
• P(E= False)= 0.999, Which is the probability that an earthquake not
occurred.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 64

BAYESIAN BELIEF NETWORK
• Conditional probability table for Alarm A:
• The Conditional probability of Alarm A depends on Burglar and
earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 65

BAYESIAN BELIEF NETWORK
• Conditional probability table for David Calls:
• The Conditional probability of David that he will call depends on the
probability of Alarm.
A P(D= True) P(D= False)
True 0.91 0.09
False 0.05 0.95

• Conditional probability table for Sophia Calls:

• The Conditional probability of Sophia that she calls is depending on its
Parent Node "Alarm."

A P(S= True) P(S= False)

True 0.75 0.25
False 0.02 0.98

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 66

BAYESIAN BELIEF NETWORK

• From the formula of joint distribution, we can write the problem statement
in the form of probability distribution:
• P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.

• Hence, a Bayesian network can answer any query about the domain by
using Joint distribution.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 67

Topic Objective

The objective of the topic is to make the student able to understand about :
•Ensemble Methods

Recap
Students learnt the Bayesian Belief network

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 68
Ensemble Methods

Ensemble learning helps improve machine learning results by

combining several models. This approach allows the production of
better predictive performance compared to a single model. Basic idea is
to learn a set of classifiers (experts) and to allow them to vote.

Bagging and Boosting are two types of Ensemble Learning. These

two decrease the variance of a single estimate as they combine several
estimates from different models. So the result may be a model with
higher stability

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 69

Ensemble Methods

1. Bagging: It is a homogeneous weak learners’ model that learns from each

other independently in parallel and combines them for determining the
model average.

2. Boosting: It is also a homogeneous weak learners’ model but works

differently from Bagging. In this model, learners learn sequentially and
adaptively to improve model predictions of a learning algorithm.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 70

Topic Objective

The objective of the topic is to make the student able to understand about :
•Ensemble Methods (Bagging)

Recap
Students learnt the Ensemble methods

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 71
Bagging

• Bootstrap Aggregating, also known as bagging, is a machine learning

ensemble meta-algorithm designed to improve the stability and accuracy
of machine learning algorithms used in statistical classification and
regression.

• It decreases the variance and helps to avoid overfitting. It is usually

applied to decision tree methods. Bagging is a special case of the model
averaging approach.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 72

Bagging

Description of the Technique

• Suppose a set D of d tuples, at each iteration i, a training set Di of d

tuples is selected via row sampling with a replacement method (i.e., there
can be repetitive elements from different d tuples) from D (i.e.,
bootstrap). Then a classifier model Mi is learned for each training set D <
i. Each classifier Mi returns its class prediction. The bagged classifier M*
counts the votes and assigns the class with the most votes to X (unknown
sample).

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 73

Bagging

Implementation Steps of Bagging

• Step 1: Multiple subsets are created from the original data set with equal
tuples, selecting observations with replacement.
• Step 2: A base model is created on each of these subsets.
• Step 3: Each model is learned in parallel with each training set and
independent of each other.
• Step 4: The final predictions are determined by combining the
predictions from all the models.
•

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 74

Bagging

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 75

Bagging

Example of Bagging

• The Random Forest model uses Bagging, where decision tree models with
higher variance are present. It makes random feature selection to grow
trees. Several random trees make a Random Forest

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 76

Topic Objective

The objective of the topic is to make the student able to understand about :
•Ensemble Methods (Boosting)

Recap
Students learnt thethe Bagging

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 77
Boosting

• Boosting is an ensemble modeling technique that attempts to build

a strong classifier from the number of weak classifiers. It is done
by building a model by using weak models in series. Firstly, a
model is built from the training data. Then the second model is
built which tries to correct the errors present in the first model.
This procedure is continued and models are added until either the
complete training data set is predicted correctly or the maximum
number of models is added.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 78

Boosting

• There are several boosting algorithms. The original ones, proposed

by Robert Schapire and Yoav Freund were not adaptive and could not
take full advantage of the weak learners. Schapire and Freund then
developed AdaBoost, an adaptive boosting algorithm that won the
prestigious Gödel Prize. AdaBoost was the first really successful
boosting algorithm developed for the purpose of binary classification.
AdaBoost is short for Adaptive Boosting and is a very popular boosting
technique that combines multiple “weak classifiers” into a single “strong
classifier”.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 79

Boosting

Bagging and Boosting, both being the commonly used methods, have a
universal similarity of being classified as ensemble methods. Here we will
explain the similarities between them.

1. Both are ensemble methods to get N learners from 1 learner.

2. Both generate several training data sets by random sampling.
3. Both make the final decision by averaging the N learners (or taking the
majority of them i.e Majority Voting).
4. Both are good at reducing variance and provide higher stability.

12/6/24 Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4 80

Topic Objective

The objective of the topic is to make the student able to understand about :
•Bagging & boosting impact

Recap
Students learnt the Boosting.

Ms. Barkha Bhardwaj ACSML0601 Machine Learning Unit 4

12/6/24 81
Bagging & boosting and its impact on bias and
variance

• Prediction errors are defined as the collection of bias, variance, and

irreducible errors. We can refer to the irreducible error as noise since we
cannot reduce it, regardless of the algorithm chosen. The focus of this
article is on the errors of bias and variance. The goal is to understand the
two errors. We shall explore their relationship with each other and how
ensemble methods affect them.