Unit-5 ML
Unit-5 ML
1. Agent:
o The learner or decision-maker that interacts with the environment.
2. Environment:
o The external system the agent interacts with. It provides feedback based on the
agent's actions.
3. State:
o A representation of the current situation of the environment. The
agent perceives the environment through states.
4. Action:
o The set of all possible moves the agent can make in the environment.
5. Reward:
o Feedback from the environment based on the agent's actions. Positive
rewards incentivize desirable actions, while negative rewards (or penalties)
discourage undesirable actions.
6. Policy:
o A strategy used by the agent to determine the next action based on the current
state. It can be deterministic or stochastic.
7. Value Function:
o A function that estimates the expected cumulative reward of states or state-
action pairs, helping the agent to make decisions that maximize long-term
rewards.
1
The Learning Process
1. Exploration:
o The agent tries out different actions to discover their effects and
gather information about the environment.
2. Exploitation:
o The agent uses its knowledge to choose actions that it believes will maximize
the reward.
3. Balance:
o Effective RL requires balancing exploration and exploitation to ensure the
agent learns the optimal policy.
1. Q-Learning:
o A model-free algorithm where the agent learns a value function Q(s,a),
which represents the expected utility of taking action a in state s and
following the optimal policy thereafter.
2. SARSA (State-Action-Reward-State-Action):
o Similar to Q-Learning, but updates the Q-value based on the action actually
taken, considering the policy followed by the agent.
1. Environment:
o The maze consists of a grid with walls, open spaces, and an exit.
o The robot starts at a random position and must find the exit.
2. State:
o The current position of the robot in the maze, represented by coordinates (x,
y).
3. Actions:
o The robot can move up, down, left, or right.
4. Rewards:
o Positive reward for reaching the exit.
o Negative reward for hitting a wall.
2
Applications of Reinforcement Learning:
Reinforcement learning is a powerful approach to building intelligent systems that can adapt
and improve through experience, opening up possibilities across a wide range of applications.
1. Markov Chain:
A sequence of random variables where the next state depends only on the current
state (the Markov property).
The chain has a stationary distribution that it converges to over time.
2. Monte Carlo:
3
How MCMC Works:
1. Initialization:
o Start with an initial state (or set of states) from the target distribution.
2. Iteration:
o Propose a new state based on a proposal distribution.
o Accept or reject the new state based on a criterion (e.g., Metropolis-Hastings
algorithm)
3. Convergence:
o After many iterations, the distribution of the states will approximate the
target distribution.
Common Algorithms
1. Metropolis-Hastings Algorithm:
o Proposes new states and accepts or rejects them based on the acceptance ratio
o Widely used for its simplicity and flexibility.
2. Gibbs Sampling:
Samples each variable in turn, conditional on the current values of the other
variables.
Useful when the conditional distributions are easier to sample from.
Applications of MCMC
1. Bayesian Inference:
o Estimating posterior distributions of parameters when the likelihood and
prior are known.
o Useful for hierarchical models and complex data structures.
2. Statistical Physics:
3. Machine Learning:
Sampling:
Sampling is a technique used to select a subset of data from a larger population, allowing for
the analysis and inference of population characteristics without examining the entire dataset.
4
Types of Sampling
1. Probability Sampling:
o Description: Every member of the population has a known, non-zero chance
of being selected.
o Examples:
Simple Random Sampling: Every member of the population has
an equal chance of being selected.
Systematic Sampling: Selects every k-th member from a list after a
random start.
Stratified Sampling: Divides the population into strata (groups) and
samples from each stratum.
Cluster Sampling: Divides the population into clusters and randomly
selects entire clusters.
2. Non-Probability Sampling:
o Description: Not every member of the population has a known or
equal chance of being selected.
o Examples:
Convenience Sampling: Samples are selected based on their
availability or ease of access.
Judgmental (Purposive) Sampling: Samples are selected based on
the researcher’s judgment.
Quota Sampling: Ensures representation by selecting samples to meet
certain quotas.
Snowball Sampling: Current subjects recruit future subjects from their
acquaintances.
Proposal Distribution:
A proposal distribution is a fundamental component in Markov Chain Monte Carlo
(MCMC) methods.
It is used to generate new candidate samples from a target probability distribution,
especially when direct sampling is not feasible.
A proposal distribution, denoted as q(x′∣x), is a probability distribution used to
propose new candidate states x' given the current state x.
The new candidate state is then accepted or rejected based on a criterion designed to
ensure that the sequence of samples converges to the target distribution π(x).
Markov Chain Monte Carlo Algorithms:
Metropolis-Hastings Algorithm:
5
4. Accept x′ with probability α, otherwise, stay at x.
Use Case: Widely applicable and flexible for various target distributions.
Gibbs Sampling:
Description: Samples each variable in turn from its conditional distribution given
the current values of the other variables.
Process:
1. Initialize all variables.
2. Sample each variable xi from p(xi∣other variables).
3. Repeat until convergence.
Use Case: Effective when conditional distributions are easier to sample from.
Example: Ideal for Bayesian networks and hierarchical models.
Graphical Models:
Graphical models are a powerful framework for representing complex dependencies
among variables in a visual and mathematical way.
Bayesian Networks:
Bayesian Networks (BNs) are a type of probabilistic graphical model that uses
directed acyclic graphs (DAGs) to represent a set of variables and their conditional
dependencies.
They are particularly powerful for modeling complex systems where understanding
the relationships between variables is crucial.
6
Joint Probability:
Joint probability is a probability of two or more events happening together. For
example, the joint probability of two events A and B is the probability that both
events occur, P(A∩B).
P(A ∩ B) = P(A) · P(B)
P(A ∩ B) = P(A | B) · P(B)
Conditional Probability:
Conditional probability defines the probability that event B will occur, given that
event A has already occurred.
Example:
Burglary ‘B’ –
Fire ‘F’ –
Alarm ‘A’ –
7
B F P (A=T) P (A=F)
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.001 0.999
Person ‘P1’ –
A P (P1=T) P (P1=F)
T 0.95 0.05
F 0.05 0.95
Person ‘P2’ –
A P (P2=T) P (P2=F)
T 0.80 0.20
F 0.01 0.99
= 0.00075
Applications
8
A Markov Random Field or Markov Network is a class of graphical models with an
undirected graph between random variables.
The structure of this graph decides the dependence or independence between the
random variables.
1. Nodes (Vertices):
o Each node represents a random variable.
o Nodes can represent observed data, hidden variables, or any entities in
the model.
2. Edges (Links):
o Undirected edges between nodes indicate direct dependencies.
o Unlike Bayesian Networks, MRFs use undirected edges to capture
the symmetrical nature of relationships.
3. Clique Potentials (Factors):
o Potential functions are associated with cliques (fully connected subgraphs) of
the graph.
o They represent the local dependencies among the variables in a clique.
o These potential functions are often denoted as ψ
Applications
9
Hidden Markov Models:
The hidden Markov Model (HMM) is a statistical model that is used to describe the
probabilistic relationship between a sequence of observations and a sequence of
hidden states.
It is often used in situations where the underlying system or process that generates the
observations is unknown or hidden, hence it has the name “Hidden Markov Model.”
It is used to predict future observations or classify sequences, based on the underlying
hidden process that generates the data.
The hidden states are the underlying variables that generate the observed data, but
they are not directly observable.
The observations are the variables that are measured and observed.
The Hidden Markov Model (HMM) is the relationship between the hidden states and the
observations using two sets of probabilities: the transition probabilities and the emission
probabilities.
The state space is the set of all possible hidden states, and the observation space is the set
of all possible observations.
These are the probabilities of transitioning from one state to another. This forms the
transition matrix, which describes the probability of moving from one state to another.
These are the probabilities of generating each observation from each state. This forms the
emission matrix, which describes the probability of generating each observation from each
state.
10
Step 5: Train the model
The parameters of the state transition probabilities and the observation likelihoods are
estimated using the Baum-Welch algorithm, or the forward-backward algorithm. This is done
by iteratively updating the parameters until convergence.
Given the observed data, the Viterbi algorithm is used to compute the most likely sequence
of hidden states. This can be used to predict future observations, classify sequences, or detect
patterns in sequential data.
The performance of the HMM can be evaluated using various metrics, such as accuracy,
precision, recall, or F1 score.
Tracking Methods:
Tracking methods in machine learning, often referred to as object tracking, involve
techniques used to locate and follow an object's position over time in a sequence of
frames or images.
These methods have applications in various fields, including computer vision,
robotics, surveillance, and augmented reality.
Kalman Filter:
The Kalman filter is an optimal estimator for linear systems with Gaussian noise.
It provides a recursive solution to the linear quadratic estimation problem, efficiently
processing noisy measurements to produce an estimate of the system's state.
Components:
11
Algorithm:
1. Prediction:
o Predict the next state
o Predict the error covariance
2. Update:
o Compute the Kalman gain
o Update the state estimate
o Update the error covariance
Applications:
Particle Filter:
The particle filter, or Sequential Monte Carlo (SMC) method, is used for non-linear,
non-Gaussian systems.
It represents the posterior distribution of the state using a set of random samples
(particles) and weights.
Components
1. Particles:
o A set of samples representing possible states.
2. Weights:
12
o Importance weights for each particle, representing the likelihood given
the observations.
Algorithm:
1. Initialization:
o Generate an initial set of particles from the prior distribution.
o Initialize weights
2. Prediction:
o Propagate particles according to the state transition model
3. Update:
o Update weights based on the measurement likelihood
o Normalize weights
4. Resampling:
o Resample particles based on their weights to avoid degeneracy.
Applications:
13
Comparison:
Kalman Filter:
o Assumes linear dynamics and Gaussian noise.
o Computationally efficient.
o Optimal for linear systems.
Particle Filter:
o Handles non-linear and non-Gaussian systems.
o More computationally intensive.
o Provides a flexible framework for complex systems.
*****
14