100% found this document useful (1 vote)

128 views14 pages

Unit-5 ML

This document provides an overview of reinforcement learning, including its key components such as agent, environment, state, action, reward, policy, and value function. It also covers Markov Chain Monte Carlo methods, graphical models like Bayesian networks and hidden Markov models, and their applications in various fields such as robotics, healthcare, and finance. Additionally, it discusses sampling techniques and tracking methods like the Kalman filter.

Uploaded by

1432ultragamer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

128 views14 pages

Unit-5 ML

Uploaded by

1432ultragamer

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

UNIT-V

Reinforcement Learning; Overview of reinforcement learning, Getting Lost Example.

Markov Chain Monte Carlo Methods: Sampling, Proposal Distribution, Markov Chain Monte
Carlo.
Graphical Models: Bayesian Networks, Markov Random Fields, Hidden Markov Models,
Tracking Methods.
Reinforcement Learning:
 Reinforcement learning (RL) is a type of machine learning where an agent learns to
make decisions by interacting with an environment.
 The agent receives feedback in the form of rewards or penalties based on the actions it
takes, and its goal is to maximize the cumulative reward over time.

Key Components of Reinforcement Learning

1. Agent:
o The learner or decision-maker that interacts with the environment.
2. Environment:
o The external system the agent interacts with. It provides feedback based on the
agent's actions.
3. State:
o A representation of the current situation of the environment. The
agent perceives the environment through states.
4. Action:
o The set of all possible moves the agent can make in the environment.
5. Reward:
o Feedback from the environment based on the agent's actions. Positive
rewards incentivize desirable actions, while negative rewards (or penalties)
discourage undesirable actions.
6. Policy:
o A strategy used by the agent to determine the next action based on the current
state. It can be deterministic or stochastic.
7. Value Function:
o A function that estimates the expected cumulative reward of states or state-
action pairs, helping the agent to make decisions that maximize long-term
rewards.

1
The Learning Process

1. Exploration:
o The agent tries out different actions to discover their effects and
gather information about the environment.
2. Exploitation:
o The agent uses its knowledge to choose actions that it believes will maximize
the reward.
3. Balance:
o Effective RL requires balancing exploration and exploitation to ensure the
agent learns the optimal policy.

Algorithms in Reinforcement Learning

1. Q-Learning:
o A model-free algorithm where the agent learns a value function Q(s,a),
which represents the expected utility of taking action a in state s and
following the optimal policy thereafter.
2. SARSA (State-Action-Reward-State-Action):
o Similar to Q-Learning, but updates the Q-value based on the action actually
taken, considering the policy followed by the agent.

Getting Lost Example:

 "getting lost" example using reinforcement learning to understand how an agent
learns to navigate an environment, avoid pitfalls, and reach its goal.
 We can imagine a scenario where an agent (like a robot) is placed in a maze and
needs to find its way to the exit.

Scenario: Robot Navigating a Maze

1. Environment:
o The maze consists of a grid with walls, open spaces, and an exit.
o The robot starts at a random position and must find the exit.
2. State:
o The current position of the robot in the maze, represented by coordinates (x,
y).
3. Actions:
o The robot can move up, down, left, or right.
4. Rewards:
o Positive reward for reaching the exit.
o Negative reward for hitting a wall.

2
Applications of Reinforcement Learning:

Reinforcement learning is a powerful approach to building intelligent systems that can adapt
and improve through experience, opening up possibilities across a wide range of applications.

1. Game Playing: RL agents have achieved superhuman performance in games

like chess, Go, and video games (e.g., AlphaGo, OpenAI Five).
2. Robotics: Training robots to perform complex tasks such as walking,
grasping objects, and navigating environments.
3. Autonomous Vehicles: Learning to drive safely and efficiently in various
traffic conditions.
4. Healthcare: Optimizing treatment strategies, personalized medicine, and
managing clinical trials.
5. Finance: Algorithmic trading, portfolio management, and risk assessment.

Markov Chain Monte Carlo Methods:

 Markov Chain Monte Carlo (MCMC) methods are a class of algorithms used to
sample from complex probability distributions, especially when direct sampling is
difficult.
 These methods are widely used in Bayesian statistics, computational physics, and
other fields where dealing with high-dimensional integrals is necessary.
Key Concepts of MCMC Methods:

1. Markov Chain:

 A sequence of random variables where the next state depends only on the current
state (the Markov property).
 The chain has a stationary distribution that it converges to over time.

2. Monte Carlo:

 A technique that uses random sampling to estimate numerical results.

 In MCMC, Monte Carlo methods are used to generate samples from the probability
distribution.

3
How MCMC Works:

1. Initialization:
o Start with an initial state (or set of states) from the target distribution.
2. Iteration:
o Propose a new state based on a proposal distribution.
o Accept or reject the new state based on a criterion (e.g., Metropolis-Hastings
algorithm)

3. Convergence:

o After many iterations, the distribution of the states will approximate the
target distribution.

Common Algorithms

1. Metropolis-Hastings Algorithm:
o Proposes new states and accepts or rejects them based on the acceptance ratio
o Widely used for its simplicity and flexibility.
2. Gibbs Sampling:

 Samples each variable in turn, conditional on the current values of the other
variables.
 Useful when the conditional distributions are easier to sample from.

Applications of MCMC

1. Bayesian Inference:
o Estimating posterior distributions of parameters when the likelihood and
prior are known.
o Useful for hierarchical models and complex data structures.
2. Statistical Physics:

 Simulating the behavior of physical systems at the atomic or molecular level.

 Estimating properties like magnetization or phase transitions.

3. Machine Learning:

 Training models with complex likelihood functions.

 Bayesian neural networks and other probabilistic models.

4. Ecology and Evolutionary Biology:

 Estimating parameters of population dynamics models.

 Studying the evolution of traits under different selection pressures.

Sampling:
Sampling is a technique used to select a subset of data from a larger population, allowing for
the analysis and inference of population characteristics without examining the entire dataset.

4
Types of Sampling

1. Probability Sampling:
o Description: Every member of the population has a known, non-zero chance
of being selected.
o Examples:
 Simple Random Sampling: Every member of the population has
an equal chance of being selected.
 Systematic Sampling: Selects every k-th member from a list after a
random start.
 Stratified Sampling: Divides the population into strata (groups) and
samples from each stratum.
 Cluster Sampling: Divides the population into clusters and randomly
selects entire clusters.
2. Non-Probability Sampling:
o Description: Not every member of the population has a known or
equal chance of being selected.
o Examples:
 Convenience Sampling: Samples are selected based on their
availability or ease of access.
 Judgmental (Purposive) Sampling: Samples are selected based on
the researcher’s judgment.
 Quota Sampling: Ensures representation by selecting samples to meet
certain quotas.
 Snowball Sampling: Current subjects recruit future subjects from their
acquaintances.

Proposal Distribution:
 A proposal distribution is a fundamental component in Markov Chain Monte Carlo
(MCMC) methods.
 It is used to generate new candidate samples from a target probability distribution,
especially when direct sampling is not feasible.
 A proposal distribution, denoted as q(x′∣x), is a probability distribution used to
propose new candidate states x' given the current state x.
 The new candidate state is then accepted or rejected based on a criterion designed to
ensure that the sequence of samples converges to the target distribution π(x).
Markov Chain Monte Carlo Algorithms:
Metropolis-Hastings Algorithm:

 Description: A general algorithm that generates a candidate sample from a

proposal distribution and accepts or rejects it based on an acceptance probability.
 Process:
o Initialize at a state x0.
o Generate a candidate state x′ from the proposal distribution q(x′∣x).
o Calculate the acceptance probability:

5
4. Accept x′ with probability α, otherwise, stay at x.

 Use Case: Widely applicable and flexible for various target distributions.

Gibbs Sampling:

 Description: Samples each variable in turn from its conditional distribution given
the current values of the other variables.
 Process:
1. Initialize all variables.
2. Sample each variable xi from p(xi∣other variables).
3. Repeat until convergence.
 Use Case: Effective when conditional distributions are easier to sample from.
 Example: Ideal for Bayesian networks and hierarchical models.

Graphical Models:
 Graphical models are a powerful framework for representing complex dependencies
among variables in a visual and mathematical way.

Bayesian Networks:
 Bayesian Networks (BNs) are a type of probabilistic graphical model that uses
directed acyclic graphs (DAGs) to represent a set of variables and their conditional
dependencies.
 They are particularly powerful for modeling complex systems where understanding
the relationships between variables is crucial.

6
Joint Probability:
 Joint probability is a probability of two or more events happening together. For
example, the joint probability of two events A and B is the probability that both
events occur, P(A∩B).
P(A ∩ B) = P(A) · P(B)
P(A ∩ B) = P(A | B) · P(B)
Conditional Probability:
 Conditional probability defines the probability that event B will occur, given that
event A has already occurred.

Example:

Burglary ‘B’ –

 P (B=T) = 0.001 (‘B’ is true i.e burglary has occurred)

 P (B=F) = 0.999 (‘B’ is false i.e burglary has not occurred)

Fire ‘F’ –

 P (F=T) = 0.002 (‘F’ is true i.e fire has occurred)

 P (F=F) = 0.998 (‘F’ is false i.e fire has not occurred)

Alarm ‘A’ –

7
B F P (A=T) P (A=F)
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999

Person ‘P1’ –

A P (P1=T) P (P1=F)
T 0.95 0.05
F 0.05 0.95

Person ‘P2’ –

A P (P2=T) P (P2=F)
T 0.80 0.20
F 0.01 0.99

P ( P1, P2, A, ~B, ~F)

= P (P1/A) * P (P2/A) * P (A/~B~F) * P (~B) * P (~F)

= 0.95 * 0.80 * 0.001 * 0.999 * 0.998

= 0.00075

Applications

 Medical Diagnosis: Modeling diseases and symptoms to assist in diagnostic

reasoning.
 Machine Learning: Feature selection, classification, and regression.
 Natural Language Processing: Dependency parsing and language modeling.
 Econometrics: Understanding relationships between economic indicators.

Markov Random Fields:

 Markov Random Fields (MRFs), also known as Markov Networks, are a type of
probabilistic graphical model that represents the dependencies among a set of random
variables using an undirected graph.
 They are particularly useful for modeling scenarios where the exact direction of
dependency is not well-defined or when symmetrical relationships between variables
exist.

8
 A Markov Random Field or Markov Network is a class of graphical models with an
undirected graph between random variables.
 The structure of this graph decides the dependence or independence between the
random variables.

Fig. Markov Random Field with four random variables

Components of Markov Random Fields

1. Nodes (Vertices):
o Each node represents a random variable.
o Nodes can represent observed data, hidden variables, or any entities in
the model.
2. Edges (Links):
o Undirected edges between nodes indicate direct dependencies.
o Unlike Bayesian Networks, MRFs use undirected edges to capture
the symmetrical nature of relationships.
3. Clique Potentials (Factors):
o Potential functions are associated with cliques (fully connected subgraphs) of
the graph.
o They represent the local dependencies among the variables in a clique.
o These potential functions are often denoted as ψ

Applications

 Image Processing: Image segmentation, denoising, and restoration.

 Natural Language Processing: Part-of-speech tagging, named entity recognition.
 Computer Vision: Object recognition, scene labeling.
 Bioinformatics: Modeling spatial dependencies in protein structures.

9
Hidden Markov Models:
 The hidden Markov Model (HMM) is a statistical model that is used to describe the
probabilistic relationship between a sequence of observations and a sequence of
hidden states.
 It is often used in situations where the underlying system or process that generates the
observations is unknown or hidden, hence it has the name “Hidden Markov Model.”
 It is used to predict future observations or classify sequences, based on the underlying
hidden process that generates the data.

An HMM consists of two types of variables: hidden states and observations.

 The hidden states are the underlying variables that generate the observed data, but
they are not directly observable.
 The observations are the variables that are measured and observed.

The Hidden Markov Model (HMM) is the relationship between the hidden states and the
observations using two sets of probabilities: the transition probabilities and the emission
probabilities.

 The transition probabilities describe the probability of transitioning from one

hidden state to another.

 The emission probabilities describe the probability of observing an output given

a hidden state.

Hidden Markov Model Algorithm:

Step 1: Define the state space and observation space

The state space is the set of all possible hidden states, and the observation space is the set
of all possible observations.

Step 2: Define the initial state distribution

This is the probability distribution over the initial state.

Step 3: Define the state transition probabilities

These are the probabilities of transitioning from one state to another. This forms the
transition matrix, which describes the probability of moving from one state to another.

Step 4: Define the observation likelihoods:

These are the probabilities of generating each observation from each state. This forms the
emission matrix, which describes the probability of generating each observation from each
state.

10
Step 5: Train the model

The parameters of the state transition probabilities and the observation likelihoods are
estimated using the Baum-Welch algorithm, or the forward-backward algorithm. This is done
by iteratively updating the parameters until convergence.

Step 6: Decode the most likely sequence of hidden states

Given the observed data, the Viterbi algorithm is used to compute the most likely sequence
of hidden states. This can be used to predict future observations, classify sequences, or detect
patterns in sequential data.

Step 7: Evaluate the model

The performance of the HMM can be evaluated using various metrics, such as accuracy,
precision, recall, or F1 score.
Tracking Methods:
 Tracking methods in machine learning, often referred to as object tracking, involve
techniques used to locate and follow an object's position over time in a sequence of
frames or images.
 These methods have applications in various fields, including computer vision,
robotics, surveillance, and augmented reality.
Kalman Filter:
 The Kalman filter is an optimal estimator for linear systems with Gaussian noise.
 It provides a recursive solution to the linear quadratic estimation problem, efficiently
processing noisy measurements to produce an estimate of the system's state.
Components:

1. State Vector (xt):

o Represents the state of the system at time t.
2. State Transition Model (A):
o Describes how the state evolves over time.
o xt=Axt−1+But+wtxt where ut is the control input and wt is the process noise.
3. Measurement Model (H):
o Relates the state to the observations.
o zt=Hxt+vtzt = where zt is the measurement and vt is the measurement noise.
4. Covariance Matrices (Q and R):
o Q: Process noise covariance.
o R: Measurement noise covariance.

11
Algorithm:

1. Prediction:
o Predict the next state
o Predict the error covariance
2. Update:
o Compute the Kalman gain
o Update the state estimate
o Update the error covariance

Applications:

 Navigation Systems: GPS and inertial navigation.

 Control Systems: Robotics, aerospace.
 Economics: Estimating trends and cycles.

Particle Filter:
 The particle filter, or Sequential Monte Carlo (SMC) method, is used for non-linear,
non-Gaussian systems.
 It represents the posterior distribution of the state using a set of random samples
(particles) and weights.
Components

1. Particles:
o A set of samples representing possible states.
2. Weights:

12
o Importance weights for each particle, representing the likelihood given
the observations.

Algorithm:

1. Initialization:
o Generate an initial set of particles from the prior distribution.
o Initialize weights
2. Prediction:
o Propagate particles according to the state transition model
3. Update:
o Update weights based on the measurement likelihood
o Normalize weights
4. Resampling:
o Resample particles based on their weights to avoid degeneracy.

Applications:

 Robotics: Localization and mapping (SLAM).

 Computer Vision: Object tracking.
 Finance: Filtering in stochastic volatility models.

13
Comparison:

 Kalman Filter:
o Assumes linear dynamics and Gaussian noise.
o Computationally efficient.
o Optimal for linear systems.
 Particle Filter:
o Handles non-linear and non-Gaussian systems.
o More computationally intensive.
o Provides a flexible framework for complex systems.

*****

Ai Planning IV Unit
No ratings yet
Ai Planning IV Unit
30 pages
Ai & ML Digital Notes
No ratings yet
Ai & ML Digital Notes
177 pages
Ai Unit 3 Ai Unit 3
No ratings yet
Ai Unit 3 Ai Unit 3
55 pages
Unit1 ML
No ratings yet
Unit1 ML
23 pages
ML Unit-1
No ratings yet
ML Unit-1
15 pages
Subject: Artificial Intelligence 5. Planning: Faculty Name: Anita Patil Mrs. Jyoti Joshi
No ratings yet
Subject: Artificial Intelligence 5. Planning: Faculty Name: Anita Patil Mrs. Jyoti Joshi
49 pages
Designing A Learning System: DR - Chandrika.J Professor CSE Course Faculty
No ratings yet
Designing A Learning System: DR - Chandrika.J Professor CSE Course Faculty
22 pages
Demand Controller
No ratings yet
Demand Controller
64 pages
Module 5
No ratings yet
Module 5
31 pages
Android Interview Questions PDF
No ratings yet
Android Interview Questions PDF
24 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
Module 4
No ratings yet
Module 4
18 pages
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
No ratings yet
Unit - 3:: Explain Briefly About Automatic Indexing? Explain About Types of Classes Automatic Indexing?
28 pages
KRR Unit 4 Part 1 Lecture Notes
No ratings yet
KRR Unit 4 Part 1 Lecture Notes
9 pages
Unit 4 Ai
100% (2)
Unit 4 Ai
16 pages
ML Notes MAKAUT 7th Sem
No ratings yet
ML Notes MAKAUT 7th Sem
31 pages
Ai Unit 1,2 Notes
No ratings yet
Ai Unit 1,2 Notes
45 pages
21cs54 Tie Simp
No ratings yet
21cs54 Tie Simp
5 pages
Ensemble Learning: Martin Sewell
No ratings yet
Ensemble Learning: Martin Sewell
16 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Unit 1 Notes
100% (1)
Unit 1 Notes
18 pages
F-Secure Linux Security
No ratings yet
F-Secure Linux Security
219 pages
SQL Rev Class 12
No ratings yet
SQL Rev Class 12
6 pages
Schneider - Industrial Automation - Contractor, Push Button, SMPS, Limit Switch Price List Wef 25-01-2022
No ratings yet
Schneider - Industrial Automation - Contractor, Push Button, SMPS, Limit Switch Price List Wef 25-01-2022
148 pages
Unit Iv Web Retrieval and Web Crawling 9
No ratings yet
Unit Iv Web Retrieval and Web Crawling 9
1 page
Module 3 Games Optimal Decisions in Games Minimax Algorithm
No ratings yet
Module 3 Games Optimal Decisions in Games Minimax Algorithm
18 pages
Unit - 4
No ratings yet
Unit - 4
64 pages
Molex M-100 Catalog 1973
100% (1)
Molex M-100 Catalog 1973
28 pages
Insignia Ns Lcd42hd 09 (SM)
No ratings yet
Insignia Ns Lcd42hd 09 (SM)
93 pages
AI Digital Notes Complete
100% (1)
AI Digital Notes Complete
202 pages
DataVisualization Lab Manual
No ratings yet
DataVisualization Lab Manual
110 pages
PPT08-Natural Language Processing
100% (1)
PPT08-Natural Language Processing
44 pages
Principles of The Self-Organizing System (Ashby)
100% (3)
Principles of The Self-Organizing System (Ashby)
25 pages
AI Old Question Papers
No ratings yet
AI Old Question Papers
7 pages
AI Spectrum U5
No ratings yet
AI Spectrum U5
30 pages
Digital Notes: (Department of Computer Applications)
No ratings yet
Digital Notes: (Department of Computer Applications)
14 pages
Module 2@13 3 2024
No ratings yet
Module 2@13 3 2024
41 pages
DLunit 2
No ratings yet
DLunit 2
8 pages
MP Neuron
No ratings yet
MP Neuron
35 pages
Training Practice Questions V11
0% (1)
Training Practice Questions V11
8 pages
KRR Unit-IV
No ratings yet
KRR Unit-IV
117 pages
(KB2885) Download and Install ESET Offline or Install Older Versions of ESET Windows Home Products
No ratings yet
(KB2885) Download and Install ESET Offline or Install Older Versions of ESET Windows Home Products
4 pages
AL3391-AI Unit IV
No ratings yet
AL3391-AI Unit IV
65 pages
Catalogo Sinamics
No ratings yet
Catalogo Sinamics
28 pages
ML - CSA 301 - ML Perspective and Issues
No ratings yet
ML - CSA 301 - ML Perspective and Issues
34 pages
ARTIFICIAl iNTELLIGENCE Unit III &iv
No ratings yet
ARTIFICIAl iNTELLIGENCE Unit III &iv
39 pages
Programming With Python - PGDBDA - Feb20
No ratings yet
Programming With Python - PGDBDA - Feb20
26 pages
Factors For Success in Customer Relationship Management (CRM) Systems
No ratings yet
Factors For Success in Customer Relationship Management (CRM) Systems
27 pages
Ontological Engineering
No ratings yet
Ontological Engineering
17 pages
SANS Security Design
No ratings yet
SANS Security Design
13 pages
A Good Image Generator Is What You Need For High Resolution Video Synthesis
No ratings yet
A Good Image Generator Is What You Need For High Resolution Video Synthesis
23 pages
Deep Reinforcement Learning For 5G Networks: Joint Beamforming, Power Control, and Interference Coordination
No ratings yet
Deep Reinforcement Learning For 5G Networks: Joint Beamforming, Power Control, and Interference Coordination
30 pages
NNDL Unit-1
No ratings yet
NNDL Unit-1
28 pages
AI Unitwise Imp Questions
No ratings yet
AI Unitwise Imp Questions
3 pages
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
No ratings yet
Lecture 3: Text Processing & Minimum Edit Distance Algorithm
57 pages
Concept Learning
No ratings yet
Concept Learning
85 pages
ML Unit 1
No ratings yet
ML Unit 1
25 pages
Unit 2 AI
No ratings yet
Unit 2 AI
22 pages
Data Analytics III I
No ratings yet
Data Analytics III I
86 pages
Báo cáo Đa nền tảng
No ratings yet
Báo cáo Đa nền tảng
24 pages
The Hash Table Data Structure: Mugurel Ionuț Andreica Spring 2012
No ratings yet
The Hash Table Data Structure: Mugurel Ionuț Andreica Spring 2012
9 pages
S.E Notes
No ratings yet
S.E Notes
29 pages
Lecture 6 - State Space Search - Uninformed Search
No ratings yet
Lecture 6 - State Space Search - Uninformed Search
43 pages
Deep Learning r18 Jntuh Lab Manual
No ratings yet
Deep Learning r18 Jntuh Lab Manual
20 pages
SM 6th-Sem Cse Internet-Of-Things
No ratings yet
SM 6th-Sem Cse Internet-Of-Things
76 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
1 page
Devops UNIT 4
No ratings yet
Devops UNIT 4
12 pages
Thyroid Disease Classification Using Machine Learning Project
No ratings yet
Thyroid Disease Classification Using Machine Learning Project
34 pages
PowerWalker VFD 600-1000 EN
No ratings yet
PowerWalker VFD 600-1000 EN
8 pages
Unit 3 - Knowledge Representation
No ratings yet
Unit 3 - Knowledge Representation
54 pages
OB1 - : Name: Family: Author: Block Version: Time Stamp Code: Interface: Lengths (Block/logic/data)
No ratings yet
OB1 - : Name: Family: Author: Block Version: Time Stamp Code: Interface: Lengths (Block/logic/data)
11 pages
EE423 Embedded System Design BEE-6D (F'17) Assignment - 0 Moeez Aziz Reg# 111606
No ratings yet
EE423 Embedded System Design BEE-6D (F'17) Assignment - 0 Moeez Aziz Reg# 111606
3 pages
Unit-5 Alt
No ratings yet
Unit-5 Alt
15 pages
Cognitive Computing (Course Code: 18CS3272) : CO1 - Session4 Session Topic: The Elements of A Cognitive System
No ratings yet
Cognitive Computing (Course Code: 18CS3272) : CO1 - Session4 Session Topic: The Elements of A Cognitive System
9 pages
ML Question Bank
No ratings yet
ML Question Bank
29 pages
Data and Instruction Caches
No ratings yet
Data and Instruction Caches
6 pages
Unit-4 ML
No ratings yet
Unit-4 ML
17 pages
CD Unit - 1
No ratings yet
CD Unit - 1
38 pages
PWM Signal Generator ESR 1.2
No ratings yet
PWM Signal Generator ESR 1.2
4 pages
Question Bank COURSE: Artificial Intelligence Department: Cse Class: Iii B.Tech Sem Ii Year: 2009-2010 Unit I
No ratings yet
Question Bank COURSE: Artificial Intelligence Department: Cse Class: Iii B.Tech Sem Ii Year: 2009-2010 Unit I
9 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages
04 APAAR Misconception Eng
No ratings yet
04 APAAR Misconception Eng
5 pages
AI Lab MAnual Final
No ratings yet
AI Lab MAnual Final
44 pages
Chapter4 - Heuristic Search
No ratings yet
Chapter4 - Heuristic Search
18 pages
Addition of Matrices
No ratings yet
Addition of Matrices
4 pages
AI Ch-14 Inroduction To Prolog
No ratings yet
AI Ch-14 Inroduction To Prolog
15 pages
Scan 0016
No ratings yet
Scan 0016
2 pages
6QPG1 CSE Artificial Intelligence CS8691 QBM
No ratings yet
6QPG1 CSE Artificial Intelligence CS8691 QBM
2 pages
Binance PDF
No ratings yet
Binance PDF
1 page
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
No ratings yet
Gujarat Technological University: Computer Engineering Machine Learning SUBJECT CODE: 3710216
2 pages
ECE5179and6179 Course Project
No ratings yet
ECE5179and6179 Course Project
8 pages
Voice-Based Mail System For Visually Impaired
No ratings yet
Voice-Based Mail System For Visually Impaired
1 page
Abhishek Gadekar
No ratings yet
Abhishek Gadekar
1 page
Ai QB
No ratings yet
Ai QB
6 pages
Textbook of Engineering Chemistry
From Everand
Textbook of Engineering Chemistry
C. Parameswara Murthy
No ratings yet
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
From Everand
Connectivity Prediction in Mobile Ad Hoc Networks for Real-Time Control
Sebastian Thelen
5/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit-5 ML

Uploaded by

Unit-5 ML

Uploaded by

UNIT-V

Reinforcement Learning; Overview of reinforcement learning, Getting Lost Example.

Key Components of Reinforcement Learning

Algorithms in Reinforcement Learning

Getting Lost Example:

Scenario: Robot Navigating a Maze

1. Game Playing: RL agents have achieved superhuman performance in games

Markov Chain Monte Carlo Methods:

 A technique that uses random sampling to estimate numerical results.

 Simulating the behavior of physical systems at the atomic or molecular level.

 Training models with complex likelihood functions.

4. Ecology and Evolutionary Biology:

 Estimating parameters of population dynamics models.

 Description: A general algorithm that generates a candidate sample from a

 P (B=T) = 0.001 (‘B’ is true i.e burglary has occurred)

 P (F=T) = 0.002 (‘F’ is true i.e fire has occurred)

P ( P1, P2, A, ~B, ~F)

= P (P1/A) * P (P2/A) * P (A/~B~F) * P (~B) * P (~F)

= 0.95 * 0.80 * 0.001 * 0.999 * 0.998

 Medical Diagnosis: Modeling diseases and symptoms to assist in diagnostic

Markov Random Fields:

Fig. Markov Random Field with four random variables

Components of Markov Random Fields

 Image Processing: Image segmentation, denoising, and restoration.

An HMM consists of two types of variables: hidden states and observations.

 The transition probabilities describe the probability of transitioning from one

 The emission probabilities describe the probability of observing an output given

Hidden Markov Model Algorithm:

Step 1: Define the state space and observation space

Step 2: Define the initial state distribution

This is the probability distribution over the initial state.

Step 3: Define the state transition probabilities

Step 4: Define the observation likelihoods:

Step 6: Decode the most likely sequence of hidden states

Step 7: Evaluate the model

1. State Vector (xt):

 Navigation Systems: GPS and inertial navigation.

 Robotics: Localization and mapping (SLAM).

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.