0% found this document useful (0 votes)

97 views47 pages

Module - 05 Machine Learning (BCS602) Search Creators

The document provides an overview of clustering algorithms in machine learning, emphasizing the importance of cluster analysis in unsupervised learning to group unlabelled data into meaningful clusters. It discusses various clustering techniques, including hierarchical, partitional, and density-based methods, along with their advantages and challenges. Additionally, it introduces reinforcement learning, outlining its principles, applications, and challenges in decision-making processes.

Uploaded by

gamingsrb34

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

97 views47 pages

Module - 05 Machine Learning (BCS602) Search Creators

Uploaded by

gamingsrb34

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 47

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Module-5

Chapter – 01 - Clustering Algorithms

Introduction to Clustering Approaches

 Cluster analysis is the fundamental task of unsupervised learning. Unsupervised

learning involves exploring the given dataset.

 Cluster analysis is a technique of partitioning a collection of unlabelled objects that

have many attributes into meaningful disjoint groups or clusters.

 This is done using a trial and error approach as there are no supervisors available as

in classification.

 The characteristic of clustering is that the objects in the clusters or groups are

similar to each other within the clusters while differ from the objects in other

clusters significantly.

 The input for cluster analysis is examples or samples. These are known as objects,

data points or data instances.

 All these terms are same and used interchangeably in this chapter. All the samples

or objects with no labels associated with them are called unlabelled.

 The output is the set of clusters (or groups) of similar data if it exists in the input.

 For example, the following Figure 13.1(a) shows data points or samples with two

features shown in different shaded samples and Figure 13.1(b) shows the manually

drawn ellipse to indicate the clusters formed.

Search Creators... Page 1

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Search Creators... Page 2

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Visual identification of clusters in this case is easy as the examples have only two
features.

But, when examples have more features, say 100, then clustering cannot be done
manually and automatic clustering algorithms are required.

Also, automating the clustering process is desirable as these tasks are considered
difficult by humans and almost impossible. All clusters are represented by centroids.

Example: For example, if the input examples or data is (3, 3), (2, 6) and (7, 9), then the
centroid is given as.

The clusters should not overlap and every cluster should represent only one class.
Therefore, clustering algorithms use trial and error method to form clusters that can be
converted to labels.

Difference between Clustering & Classification

Search Creators... Page 3

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Applications of Clustering

Challenges of Clustering Algorithms

High-Dimensional Data

o As the number of features increases, clustering becomes difficult.

Scalability Issue

o Some algorithms perform well for small datasets but fail for large-scale data.

Unit Inconsistency

o Different measurement units (e.g., kg vs. pounds) can create problems.

Proximity Measure Design

o Choosing an appropriate distance metric is crucial for accurate clustering.

Search Creators... Page 4

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Advantages and Disadvantages of Clustering Algorithms

Proximity Measures

Proximity measures determine similarity or dissimilarity among objects.

Distance measures (dissimilarity) indicate how different objects are.

Similarity measures indicate how alike objects are.

property: More distance → Less similarity, and vice versa.

Properties of Distance Measures (Metric Conditions)

Search Creators... Page 5

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Types of Distance Measures Based on Data Types

Quantitative Variables

Search Creators... Page 6

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Binary Attributes

Categorical Variables

Distance is 1 if different, 0 if same.

Example: Gender (Male, Female) → Distance = 1

Search Creators... Page 7

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Ordinal Variables

Vector-Based Distance Measures (For Text & Documents)

Cosine Similarity

o Measures angle between two vectors.

o Formula:

Search Creators... Page 8

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Distance Measures

Hierarchical Clustering Algorithms

Overview

 Produces a nested partition of objects with hierarchical relationships.

 Represented using a dendrogram.
 Two main categories: Agglomerative and Divisive methods.

Types of Hierarchical Clustering

1. Agglomerative Methods (Bottom-Up)

o Each sample starts as an individual cluster.
o Clusters are merged iteratively until one cluster remains.
o Once a cluster is formed, it cannot be undone (irreversible).
2. Divisive Methods (Top-Down)
o Starts with a single cluster containing all data points.
o Splits iteratively into smaller clusters.
o Continues until each sample becomes its own cluster.

Search Creators... Page 9

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Agglomerative Clustering Techniques

Single Linkage (MIN Algorithm)

o Merges clusters based on the smallest distance between two points from different
clusters.
o Related to the Minimum Spanning Tree (MST).

Complete Linkage (MAX or Clique Algorithm)

Search Creators... Page 10

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Average Linkage Algorithm

Mean-Shift Clustering Algorithm

 Non-parametric and hierarchical clustering technique.

 Also known as mode-seeking or sliding window algorithm.
 No prior knowledge of cluster count or shape required.
 Moves towards high-density regions in data using a kernel function (e.g., Gaussian
window).

Search Creators... Page 11

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Advantages of Mean-Shift Clustering

No model assumptions

Suitable for all non-convex shapes

Only one parameter of the window, that is, bandwidth is required Robust to noise

No issues of local minima or premature termination

Disadvantages of Mean-Shift Clustering

Selecting the bandwidth is a challenging task. If it is larger, then many clusters are
missed. If it is small, then many points are missed and convergence occurs as the
problem.

The number of clusters cannot be specified and user has no control over this
parameter.

Partitional Clustering Algorithm

 k-means is a widely used partitional clustering algorithm.

 The user specifies k, the number of clusters.
 Assumes non-overlapping clusters.
 Works well for circular or spherical clusters.

Process of k-means Algorithm

1. Initialization
o Select k initial cluster centers (randomly or using prior knowledge).
o Normalize data for better performance.
2. Assignment of Data Points
o Assign each data point to the nearest centroid based on Euclidean distance.
3. Update Centroids

Search Creators... Page 12

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o Compute the mean vector of assigned points to update cluster centroids.

o Repeat this process until no changes occur in cluster assignments.
4. Termination
o The process stops when cluster assignments remain unchanged.

Mathematical Optimization

Advantages

1. Simple and easy to implement.

2. Efficient for small to medium datasets.

Search Creators... Page 13

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Disadvantages

1. Sensitive to initialization – different initial points may lead to different results.

2. Time-consuming for large datasets – requires multiple iterations.

Choosing the Value of k

 No fixed rule for selecting k.

 Use Elbow Method:
o Run k-means with different values of k.
o Plot Within Cluster Sum of Squares (WCSS) vs. k.
o The optimal k is at the "elbow" where the curve flattens.

Computational Complexity

O(nkId), where:

o n = number of samples
o k = number of clusters
o I = number of iterations
o d = number of attributes

Density-based Methods

 DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density-

based clustering algorithm.
 Clusters are dense regions of data points separated by areas of low density (noise).
 Works well for arbitrary-shaped clusters and datasets with noise.

Search Creators... Page 14

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Uses two parameters:

1. ε (epsilon) – Neighborhood radius.

2. m (minPts) – Minimum number of points within ε to form a cluster.

Types of Points in DBSCAN

1. Core Point
o A point with at least m points in its ε-neighborhood.
2. Border Point
o Has fewer than m points in its ε-neighborhood but is adjacent to a core point.
3. Noise Point (Outlier)
o Neither a core point nor a border point.

Density Connectivity Measures

1. Direct Density Reachability

o Point X is directly reachable from Y if:
 X is in the ε-neighborhood of Y.
 Y is a core point.
2. Densely Reachable

Search Creators... Page 15

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o X is densely reachable from Y if there exists a chain of core points linking

them.
3. Density Connected
o X and Y are density connected if they are both densely reachable from a
common core point Z.

Advantages of DBSCAN

1. Can detect arbitrary-shaped clusters.

2. Robust to noise and outliers.
3. Does not require specifying the number of clusters (k-means does).

Disadvantages of DBSCAN

1. Sensitive to ε and m parameters – Poor parameter choice can affect results.

2. Fails in datasets with varying density – A single ε may not work for all clusters.
3. Computationally expensive for high-dimensional data.

Grid-based Approach

 Grid-based clustering partitions space into a grid structure and fits data into cells for
clustering.
 Suitable for high-dimensional data.
 Uses subspace clustering, dense cells, and monotonicity property.

Concepts

Subspace Clustering

o Clusters are formed using a subset of features (dimensions) rather than all
attributes.
o Useful for high-dimensional data like gene expression analysis.

Search Creators... Page 16

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o CLIQUE (Clustering in Quest) is a widely used grid-based subspace clustering

algorithm.

Concept of Dense Cells

o CLIQUE partitions dimensions into intervals (cells).

o A cell is dense if its data point density exceeds a threshold.
o Dense cells are merged to form clusters.

Monotonicity Property

Search Creators... Page 17

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o Uses anti-monotonicity (Apriori property):

 If a k-dimensional cell is dense, then all (k-1) dimensional projections
must also be dense.
 If a lower-dimensional cell is not dense, then higher-dimensional cells
containing it are also not dense.
o Similar to association rule mining in frequent pattern mining.

Advantages of CLIQUE

1. Insensitive to input order of objects.

2. No assumptions about data distribution.
3. Finds high-density clusters in subspaces of high-dimensional data.

Disadvantage of CLIQUE

 Tuning grid parameters (grid size, density threshold) is difficult.

 Finding the optimal threshold to classify a cell as dense is challenging.

Search Creators... Page 18

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Chapter :- 2

Reinforcement Learning

Overview of Reinforcement Learning

What is Reinforcement Learning?

 Reinforcement Learning (RL) is a machine learning paradigm that mimics how

humans and animals learn through experience.
 Humans interact with the environment, receive feedback (rewards or penalties),
and adjust their behavior accordingly.
 Example: A child touching fire learns to avoid it after experiencing pain (negative
reinforcement).

How RL Works in Machines

 RL simulates real-world scenarios for a computer program (agent) to learn by trial

and error.
 The agent executes actions, receives positive or negative rewards, and optimizes
its future actions based on these experiences.

Types of Reinforcement Learning

1. Positive Reinforcement Learning

o Rewards encourage good behavior (reinforce correct actions).
o Example: A robot gets +10 points for reaching a goal successfully.
o Effect: Increases the likelihood of repeating the rewarded action.
2. Negative Reinforcement Learning
o Negative rewards discourage unwanted actions.
o Example: A game agent loses -10 points for stepping into a danger zone.
o Effect: Helps the agent learn to avoid negative outcomes.

Search Creators... Page 19

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Characteristics of RL

 Sequential Decision-Making: The agent makes a series of decisions to maximize total

rewards.
 Trial and Error Learning: The agent learns by exploring different actions and their
consequences.
 No Supervised Labels: Unlike supervised learning, RL does not require labeled data; it
learns from experience.

Applications of Reinforcement Learning

 Robotics: Teaching robots to walk, grasp objects, or perform complex tasks.

 Gaming: AI agents in chess, Go, and video games (e.g., AlphaGo, OpenAI Five).
 Autonomous Vehicles: Self-driving cars learn optimal driving strategies.
 Finance: AI-based trading strategies for stock markets.
 Healthcare: Personalized treatment plans based on patient responses.

Scope of Reinforcement Learning

Reinforcement Learning (RL) is well-suited for decision-making problems in dynamic and

uncertain environments. It excels in cases where an agent must learn through trial and
error and optimize its actions based on delayed rewards.

Situations Where RL Can Be Used

Pathfinding and Navigation

o Consider a grid-based game where a robot must navigate from a starting node (E) to
a goal node (G) by choosing the optimal path.
o RL can learn the best route by exploring different paths and receiving feedback on
their efficiency.

Search Creators... Page 20

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o In obstacle-based games, RL can identify safe paths while avoiding dangerous

zones.

Dynamic Decision-Making with Uncertainty

o RL is useful in environments where not all information is known upfront.

o It is not suitable for tasks like object detection, where a classifier with complete
labeled data performs better.

Characteristics of Reinforcement Learning

1. Sequential Decision-Making
o In RL, decisions are made in steps, and each step influences future choices.
o Example: In a maze game, a wrong turn can lead to failure.
2. Delayed Feedback
o Rewards are not always immediate; the agent may need to take multiple steps
before receiving feedback.
3. Interdependence of Actions
o Each action affects the next set of choices, meaning an incorrect move can
have long-term consequences.
4. Time-Related Decisions
o Actions are taken in a specific sequence over time, affecting the final
outcome.

Challenges in Reinforcement Learning

Reward Design

o Setting the right reward values is crucial. Incorrectly designed rewards may lead the
agent to learn undesired behavior.

Absence of a Fixed Model

Search Creators... Page 21

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o Some environments, like chess, have fixed rules, but many real-world problems lack
predefined models.
o Example: Training a self-driving car requires simulations to generate experiences.

Partial Observability

o Some environments, like weather prediction, involve uncertainty because complete

state information is unavailable.

High Computational Complexity

o Games like Go involve a huge state space, making RL training time-consuming.

o More possible actions → More training time needed.

Applications of Reinforcement Learning

1. Industrial Automation
o Optimizing robot movements for efficiency.
2. Resource Management
o Allocating resources in data centers and cloud computing.
3. Traffic Light Control
o Reducing congestion by adjusting signal timings dynamically.
4. Personalized Recommendation Systems
o Used in news feeds, e-commerce, and streaming services (e.g., Netflix
recommendations).
5. Online Advertisement Bidding
o Optimizing ad placements for maximum engagement.
6. Autonomous Vehicles
o RL helps in training self-driving cars to navigate safely.
7. Game AI (Chess, Go, Dota 2, etc.)
o AI models like AlphaGo use RL to master complex games.
8. DeepMind Applications

Search Creators... Page 22

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o AI systems that generate programs, images, and optimize machine learning

models.

Reinforcement Learning as Machine Learning

Reinforcement Learning (RL) is a distinct branch of machine learning that differs

significantly from supervised learning.

While supervised learning depends on labeled data, reinforcement learning learns

through interaction with the environment, making decisions based on trial and error.

Why RL Is Necessary?

Some tasks cannot be solved using supervised learning due to the absence of a labeled
training dataset. For example:

 Chess & Go: There is no dataset with all possible game moves and their
outcomes. RL allows the agent to explore and improve over time.
 Autonomous Driving: The car must learn from real-world experiences rather than
relying on a fixed dataset.

Challenges in Reinforcement Learning Compared to Supervised Learning

 More complex decision-making since every action affects future outcomes.

 Longer training times due to trial-and-error learning.
 Delayed rewards, making it difficult to attribute success or failure to a specific
action.

Search Creators... Page 23

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Differences between Supervised Learning and Reinforcement Learning

Components of Reinforcement Learning

Reinforcement Learning (RL) is based on an agent interacting with an environment to

learn an optimal strategy through trial and error.

Basic Components of RL

Search Creators... Page 24

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

1. Agent – The decision-maker (e.g., a robot, self-driving car, AI player in a game).

2. Environment – The external world where the agent interacts (e.g., a game board,
real-world traffic).
3. State (S) – A representation of the environment at a specific time.
4. Actions (A) – The possible choices available to the agent.
5. Rewards (R) – The feedback signal received by the agent for taking an action.
6. Policy (π) – The agent’s strategy for selecting actions based on states.
7. Episodes – The sequence of states, actions, and rewards from the start state to
the goal state.

Types of RL Problems
Learning Problems

 Unknown environment – The agent learns by trial and error.

 Goal – Improve the policy through interaction.
 Example – A robot navigating through an unknown maze.

Planning Problems

 Known environment – The agent can compute and improve the policy using a
model.
 Example – Chess AI that plans its moves based on game rules.

Environment and Agent

 The environment contains all elements the agent interacts with, including
obstacles, rewards, and state transitions.
 The agent makes decisions and performs actions to maximize rewards.

Example

In self-driving cars,

Search Creators... Page 25

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

 The environment includes roads, traffic, and signals.

 The agent is the AI system making driving decisions.

States and Actions

 State (S) – Represents the current situation.

 Action (A) – Causes a transition from one state to another.

Example (Navigation)

In a grid-based game, states represent positions (A, B, C, etc.), and actions are
movements (UP, DOWN, LEFT, RIGHT).

Types of States

1. Start State – Where the agent begins.

2. Goal State – The target state with the highest reward.
3. Non-terminal States – Intermediate steps between start and goal.

Search Creators... Page 26

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Types of Episodes

 Episodic – Has a definite start and goal state (e.g., solving a maze).
 Continuous – No fixed goal state; the task continues indefinitely (e.g., stock
trading).

Policies in RL

A policy (π) is the strategy used by the agent to choose actions.

Types of Policies

Choosing the Best Policy

 The optimal policy is the one that maximizes cumulative expected rewards.

Rewards in RL

 Immediate Reward (r) – The instant feedback for an action.

 Total Reward (G) – The sum of all rewards collected during an episode.
 Long-term Reward – The cumulative future reward.

Search Creators... Page 27

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Discount Factor (γ)

RL Algorithm Categories

 Model-Based RL – Uses a predefined model (e.g., Chess AI).

 Model-Free RL – Learns by trial and error (e.g., a robot navigating an unknown
environment).

Markov Decision Process

A Markov Chain is a stochastic process that satisfies the Markov property.

It consists of a sequence of random variables where the probability of transitioning to

the next state depends only on the current state and not on the past states.

Search Creators... Page 28

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Example: University Transition

Consider two universities:

 80% of students from University A move to University B for a master's degree,

while 20% remain in University A.
 60% of students from University B move to University A, while 40% remain in
University B.

This can be represented as a Markov Chain, where:

 States represent the universities.

 Edges denote the probability of transitioning between states.

A transition matrix at time is defined as:

Each row represents a probability distribution, meaning the sum of elements in each
row equals 1.

Probability Prediction

Let the initial distribution be:

To find the state distribution after one time step

After two time steps:

Search Creators... Page 29

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

The system stabilizes over time, reflecting the equilibrium distribution.

Markov Decision Process (MDP)

An MDP extends a Markov Chain by incorporating rewards. It consists of:

1. Set of states
2. Set of actions
3. Transition probability function
4. Reward function
5. Policy
6. Value function

Markov Assumption

The Markov property states that the probability of reaching a state and receiving a
reward depends only on the previous state and action :

MDP Process

1. Observe the current state .

2. Choose an action .
3. Receive a reward .
4. Move to the next state .
5. Repeat to maximize cumulative rewards.

State Transition Probability

Search Creators... Page 30

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

The probability of moving from state to after taking action is given by:

This forms a state transition matrix, where each row represents transition probabilities
from one state to another.

Expected Reward

The expected reward for an action in state is given by:

Training and Testing of RL Systems

Once an MDP is modeled, the system undergoes:

1. Training: The agent repeatedly interacts with the environment, adjusting

parameters based on rewards.
2. Inference: A trained model is deployed to make decisions in real-time.
3. Retraining: When the environment changes, the model is retrained to adapt and
improve performance.

Goal of MDP

The agent's objective is to maximize total accumulated rewards over time by following
an optimal policy.

Search Creators... Page 31

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Multi-Arm Bandit Problem and Reinforcement Problem Types

Reinforcement Learning Overview

Reinforcement learning (RL) uses trial and error to learn a series of actions that
maximize the total reward. RL consists of two fundamental sub-problems:

Prediction (Value Estimation):

o The goal is to predict the total reward (return), also known as policy evaluation or
value estimation.
o This requires the formulation of a function called the state-value function.
o The estimation of the state-value function can be performed using Temporal
Difference (TD) Learning.

Policy Improvement:

o The objective is to determine actions that maximize returns.

o This process is known as policy improvement.
o Both prediction and policy improvement can be combined into policy iteration, where
these steps are used alternately to find an optimal policy.

Multi-Arm Bandit Problem

A commonly encountered problem in reinforcement learning is the multi-arm bandit

problem (or N-arm bandit problem).

Consider a hypothetical casino with a robotic arm that activates a 5-armed slot machine.
When a lever is pulled, the machine returns a reward within a specified range (e.g., $1 to
$10).

The challenge is that each arm provides rewards randomly within this range.

Search Creators... Page 32

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Objective:

Given a limited number of attempts, the goal is to maximize the total reward by
selecting the best lever.

A logical approach is to determine which lever has the highest average reward and use it
repeatedly.

Formalization:

Given k attempts on an N-arm slot machine, with rewards , the expected reward (action-
value function) is:

The best action is defined as:

This indicates the action that returns the highest average reward and is used as an
indicator of action quality.

Example:

If a slot machine is chosen five times and returns rewards , the quality of this action is:

Search Creators... Page 33

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Exploration vs Exploitation and Selection Policies

In reinforcement learning, an agent must decide how to select actions:

Exploration:

o Tries all actions, even if they lead to sub-optimal decisions.

o Useful in games where exploring different actions provides better long-term
rewards.
o Risky but informative.

Exploitation:

o Uses the current best-known action repeatedly.

o Focuses on short-term gains.
o Simple but often sub-optimal.

A balance between exploration and exploitation is crucial for optimal decision-making.

Selection Policies

Greedy Method

 Picks the best-known action at any given time.

 Based solely on exploitation.
 Risk: It may miss out on exploring better options.

ε-Greedy Method

 Balances exploration and exploitation.

 With probability ε, the agent explores a random action.
 With probability 1 - ε, it selects the best-known action.
 ε ranges from 0 to 1 (e.g., ε = 0.1 means a 10% chance of exploration).

Search Creators... Page 34

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Reinforcement Learning Agent Types

An RL agent can be classified into different approaches based on how it learns:

1. Value-Based Approaches
o Optimize the value function , which represents the maximum expected future
reward from a given state.
o Uses discount factors to prioritize future rewards.
2. Policy-Based Approaches
o Focus on finding the optimal policy , a function that maps states to actions.
o Rather than estimating values, it directly learns which action to take.
3. Actor-Critic Methods (Hybrid Approaches)
o Combine value-based and policy-based methods.
o The actor updates the policy, while the critic evaluates it.
4. Model-Based Approaches
o Create a model of the environment (e.g., using Markov Decision Processes
(MDPs)).
o Use simulations to plan the best actions.
5. Model-Free Approaches

Search Creators... Page 35

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o No predefined model of the environment.

o Use methods like Temporal Differencing (TD) Learning and Monte Carlo
methods to estimate values from experience.

Reinforcement Algorithm Selection

The choice of a reinforcement learning algorithm depends on factors such as:

 Availability of models
 Nature of updates (incremental vs. batch learning)
 Exploration vs. exploitation trade-offs
 Computational efficiency

Model-based Learning

Passive Learning refers to a model-based environment, where the environment is known.

This means that for any given state, the next state and action probability distribution are
known.

Markov Decision Process (MDP) and Dynamic Programming are powerful tools for
solving reinforcement learning problems in this context.

Search Creators... Page 36

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

The mathematical foundation for passive learning is provided by MDP. These model-
based reinforcement learning problems can be solved using dynamic programming after
constructing the model with MDP.

The primary objective in reinforcement learning is to take an action a that transitions the
system from the current state to the end state while maximizing rewards. These
rewards can be positive or negative.

The goal is to maximize expected rewards by choosing the optimal policy:

for all possible values of s at time t.

Policy and Value Functions

An agent in reinforcement learning has multiple courses of action for a given state. The
way the agent behaves is determined by its policy.

A policy is a distribution over all possible actions with probabilities assigned to each
action.

Different actions yield different rewards. To quantify and compare these rewards, we
use value functions.

Value Function Notation

A value function summarizes possible future scenarios by averaging expected returns

under a given policy π.

It is a prediction of future rewards and computes the expected sum of future rewards
for a given state s under policy π:

Search Creators... Page 37

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

where v(s) represents the quality of the state based on a long-term strategy.

Example

If we have two states with values 0.2 and 0.9, the state with 0.9 is a better state to be in.

Value functions can be of two types:

 State-Value Function (for a state)

 State-Action Function (for a state-action pair)

State-Value Function

Denoted as v(s), the state-value function of an MDP is the expected return from state s
under a policy π:

This function accumulates all expected rewards, potentially discounted over time, and
helps determine the goodness of a state.

The optimal state-value function is given by:

Action-Value Function (Q-Function)

Apart from v(s), another function called the Q-function is used. This function returns a
real value indicating the total expected reward when an agent:

1. Starts in state s
2. Takes action a
3. Follows a policy π afterward

Search Creators... Page 38

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Bellman Equation

Dynamic programming methods require a recursive formulation of the problem. The

recursive formulation of the state-value function is given by the Bellman equation:

Solving Reinforcement Problems

There are two main algorithms for solving reinforcement learning problems using
conventional methods:

1. Value Iteration
2. Policy Iteration

Value Iteration

Value iteration estimates v(s) iteratively:

Search Creators... Page 39

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Algorithm

1. Initialize v(s) arbitrarily (e.g., all zeros).

2. Iterate until convergence:
o For each state s, update v(s) using the Bellman equation.
o Repeat until changes are negligible.

Policy Iteration

Policy iteration consists of two main steps:

1. Policy Evaluation
2. Policy Improvement

Policy Evaluation

Initially, for a given policy π, the algorithm starts with v(s) = 0 (no reward). The Bellman
equation is used to obtain v(s), and the process continues iteratively until the optimal
v(s) is found.

Policy Improvement

The policy improvement process is performed as follows:

1. Evaluate the current policy using policy evaluation.

2. Solve the Bellman equation for the current policy to obtain v(s).
3. Improve the policy by applying the greedy approach to maximize expected
rewards.
4. Repeat the process until the policy converges to the optimal policy.

Algorithm

1. Start with an arbitrary policy π.

2. Perform policy evaluation using Bellman’s equation.
Search Creators... Page 40
BCS602 | MACHINE LEARNING| SEARCH CREATORS.

3. Improve the policy greedily.

4. Repeat until convergence.

Model Free Methods

Model-free methods do not require complete knowledge of the environment. Instead,

they learn through experience and interaction with the environment.

The reward determination in model-free methods can be categorized into three

formulations:

1. Episodic Formulation: Rewards are assigned based on the outcome of an entire

episode. For example, if a game is won, all actions in the episode receive a positive
reward (+1). If lost, all actions receive a negative reward (-1). However, this approach
may unfairly penalize or reward intermediate actions.
2. Continuous Formulation: Rewards are determined immediately after an action. An
example is the multi-armed bandit problem, where an immediate reward between $1
- $10 can be given after each action.
3. Discounted Returns: Long-term rewards are considered using a discount factor. This
method is often used in reinforcement learning algorithms.

Model-free methods primarily utilize the following techniques:

 Monte Carlo (MC) Methods

 Temporal Difference (TD) Learning

Monte-Carlo Methods

Monte Carlo (MC) methods do not assume any predefined model, making them purely
experience-driven. This approach is analogous to how humans and animals learn from
interactions with their environment.

Search Creators... Page 41

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Characteristics of Monte Carlo Methods:

 Experience is divided into episodes, where each episode is a sequence of states

from a starting state to a goal state.
 Episodes must terminate; regardless of the starting point, an episode must reach
an endpoint.
 Value-action functions are computed only after the completion of an episode,
making MC an incremental method.
 MC methods compute rewards at the end of an episode to estimate maximum
expected future rewards.
 Empirical mean is used instead of expected return; the total return over multiple
episodes is averaged.
 Due to the non-stationary nature of environments, value functions are computed
for a fixed policy and revised using dynamic programming.

Monte Carlo Mean Value Computation:

The mean value of a state is calculated as:

Incremental Monte Carlo Update:

The value function is updated incrementally using the following formula:

Search Creators... Page 42

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Temporal Difference (TD) Learning

Temporal Difference (TD) Learning is an alternative to Monte Carlo methods. It is also a

model-free technique that learns from experience and interaction with the environment.

Characteristics of TD Learning:

 Bootstrapping Method: Updates are based on the current estimate and future
reward.
 Incremental Updates: Unlike MC, which waits until the end of an episode, TD
updates values at each step.
 More Efficient: TD can learn before an episode ends, making it more sample-
efficient than MC methods.
 Used for Non-Stationary Problems: Suitable for environments where conditions
change over time.

Differences between Monte Carlo and TD Learning

Search Creators... Page 43

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Eligibility Traces and TD(λ)

TD Learning can be accelerated using eligibility traces, which allow updates to be

spread over multiple states. This leads to a family of algorithms called TD(λ), where λ is
the decay parameter (0 ≤ λ ≤ 1):

 λ = 0: Only the previous prediction is updated.

 λ = 1: All previous predictions are updated.

By incorporating eligibility traces, TD(λ) provides an alternative short-term memory

mechanism to enhance learning efficiency.

Search Creators... Page 44

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Q-Learning

Q-Learning Algorithm

1. Initialize Q-table:
o Create a table Q(s,a) with states s and actions a.
o Initialize Q-values with random or zero values.
2. Set parameters:
o Learning rate α\alphaα (typically between 0 and 1).
o Discount factor γ (typically close to 1).
o Exploration-exploitation trade off strategy (e.g., ε-greedy policy).
3. Repeat for each episode:
o Start from an initial state s.
o Repeat until reaching a terminal state:

4. End the training once convergence is reached (Q-values become stable).

This iterative process helps the agent learn optimal Q-values, which guide it to take
actions that maximize rewards.

Search Creators... Page 45

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

SARSA Learning
SARSA Algorithm (State-Action-Reward-State-Action)

Initialize Q-table:

o Create a table Q(s,a) for all state-action pairs.

Search Creators... Page 46

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o Initialize Q-values with random or zero values.

Set parameters:

o Learning rate α (typically between 0 and 1).

o Discount factor γ (typically close to 1).
o Exploration-exploitation strategy (e.g., ε-greedy policy).

Repeat for each episode:

o Start from an initial state s.

o Choose an action a using the ε-greedy policy.

Repeat until the terminal state is reached:

End the training when Q-values converge.

Differences between SARSA and Q-Learning

Search Creators... Page 47

CS583 Unsupervised Learning
No ratings yet
CS583 Unsupervised Learning
95 pages
JNTUA R20 B.Tech - CSE Data Science III IV Year Course Structure Syllabus
No ratings yet
JNTUA R20 B.Tech - CSE Data Science III IV Year Course Structure Syllabus
118 pages
Harzing
No ratings yet
Harzing
26 pages
@vtucode - in BCS515B Module 3 Textbook
No ratings yet
@vtucode - in BCS515B Module 3 Textbook
32 pages
BA Sample Paper Questions2
0% (1)
BA Sample Paper Questions2
17 pages
BCS602 Model Set 1 Paper
No ratings yet
BCS602 Model Set 1 Paper
2 pages
3 Hours / 70 Marks: Instructions
No ratings yet
3 Hours / 70 Marks: Instructions
4 pages
Rajat Sir FDP
No ratings yet
Rajat Sir FDP
48 pages
Module - 05 CC (Bcs601) Search Creators
100% (2)
Module - 05 CC (Bcs601) Search Creators
35 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Karpatne, Kumar
28 pages
Chat GPT
No ratings yet
Chat GPT
32 pages
1725629890-Unit1 Machine Learning Introduction CU 3.0
No ratings yet
1725629890-Unit1 Machine Learning Introduction CU 3.0
38 pages
Sieve: Actionable Insights From Monitored Metrics in Microservices
No ratings yet
Sieve: Actionable Insights From Monitored Metrics in Microservices
17 pages
CSE3008 Module4
No ratings yet
CSE3008 Module4
32 pages
A02-Multivariate Time Series Clustering Based On Complex Network
No ratings yet
A02-Multivariate Time Series Clustering Based On Complex Network
17 pages
DMBI Theory
No ratings yet
DMBI Theory
15 pages
MLQuestion-Bank (2) - For IA1
No ratings yet
MLQuestion-Bank (2) - For IA1
2 pages
What Is A Digital Image: Pixel Values Typically Represent Gray Levels, Colours, Heights, Opacities Etc
No ratings yet
What Is A Digital Image: Pixel Values Typically Represent Gray Levels, Colours, Heights, Opacities Etc
7 pages
7th Sem Syllabus
No ratings yet
7th Sem Syllabus
15 pages
Cluster Analysis - Market Segmentation
No ratings yet
Cluster Analysis - Market Segmentation
5 pages
Modifying An Existing Numerical Shade Sorting System Through Cluster Analysis
No ratings yet
Modifying An Existing Numerical Shade Sorting System Through Cluster Analysis
8 pages
PDF Research Gate
No ratings yet
PDF Research Gate
11 pages
DMDW Course Outcome
No ratings yet
DMDW Course Outcome
8 pages
Clustering On Boston Dataset
No ratings yet
Clustering On Boston Dataset
3 pages
Technical Seminar Report
No ratings yet
Technical Seminar Report
30 pages
Application of Data Warehouse and Data Mining in Construction Management
No ratings yet
Application of Data Warehouse and Data Mining in Construction Management
12 pages
Verdical Data Science
No ratings yet
Verdical Data Science
13 pages
Discretization - and - Concept - Hierarchy - Generation Word
No ratings yet
Discretization - and - Concept - Hierarchy - Generation Word
4 pages
Module - 04 CC (Bcs601) Search Creators - 250426 - 131037
No ratings yet
Module - 04 CC (Bcs601) Search Creators - 250426 - 131037
64 pages
Image Processing Lab Report
No ratings yet
Image Processing Lab Report
11 pages
CP5074 - SNA Unit III Notes
No ratings yet
CP5074 - SNA Unit III Notes
27 pages
Module - 03 Machine Learning (BCS602) Search Creators
No ratings yet
Module - 03 Machine Learning (BCS602) Search Creators
29 pages
Customer Segmentation Techniques On E-Commerce
No ratings yet
Customer Segmentation Techniques On E-Commerce
4 pages
ML Mod 5
No ratings yet
ML Mod 5
47 pages
Module - 04 Machine Learning (BCS602) Search Creators
No ratings yet
Module - 04 Machine Learning (BCS602) Search Creators
21 pages
About Data Science Dojo: Bootcamp Outline
No ratings yet
About Data Science Dojo: Bootcamp Outline
1 page
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
MODULE - 03 CC (BCS601) Search Creators
No ratings yet
MODULE - 03 CC (BCS601) Search Creators
30 pages
Presentation On: Crime Analysis and Prediction Using Data Mining
No ratings yet
Presentation On: Crime Analysis and Prediction Using Data Mining
14 pages
MLQuestion Bank
No ratings yet
MLQuestion Bank
2 pages
Advanced Machine Learning: Module-1
No ratings yet
Advanced Machine Learning: Module-1
164 pages
Characterization of Distributed Systems Ds Module1
No ratings yet
Characterization of Distributed Systems Ds Module1
23 pages
Assign 7
No ratings yet
Assign 7
5 pages
BCS515B
No ratings yet
BCS515B
2 pages
KNN Solved Numerical Problem (Regression)
No ratings yet
KNN Solved Numerical Problem (Regression)
3 pages
AIML 4th and 5th Module Notes
No ratings yet
AIML 4th and 5th Module Notes
77 pages
UNIT1
No ratings yet
UNIT1
38 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
ML LAB Viva Questions With Answers
No ratings yet
ML LAB Viva Questions With Answers
10 pages
M4-Similarity Based Learning
No ratings yet
M4-Similarity Based Learning
8 pages
Branch and Bound
No ratings yet
Branch and Bound
30 pages
Block Chain-Module-I
No ratings yet
Block Chain-Module-I
19 pages
20SFC254-ML Group 4
0% (1)
20SFC254-ML Group 4
1 page
@vtucode - in Module 4 AI 2021 Scheme 5th Sem
No ratings yet
@vtucode - in Module 4 AI 2021 Scheme 5th Sem
11 pages
ML Lab
No ratings yet
ML Lab
62 pages
BCS602 Model Question Paper Solved (Search Creators) - 2-37
0% (2)
BCS602 Model Question Paper Solved (Search Creators) - 2-37
36 pages
QB Answers Ia 1 18ai733
No ratings yet
QB Answers Ia 1 18ai733
11 pages
ELECTRONICS COMMUNICATION SYSTEMS Cie 2 Question Bank
No ratings yet
ELECTRONICS COMMUNICATION SYSTEMS Cie 2 Question Bank
2 pages
Decision Tree - A Step-by-Step Guide
No ratings yet
Decision Tree - A Step-by-Step Guide
36 pages
Unit 4 NNDL
No ratings yet
Unit 4 NNDL
37 pages
131AES Andreopoulou - HRTF Clustering
No ratings yet
131AES Andreopoulou - HRTF Clustering
10 pages
Module 4 - Bayesian Learning
No ratings yet
Module 4 - Bayesian Learning
36 pages
CS8082U4L02 - Locally Weighted Regression
No ratings yet
CS8082U4L02 - Locally Weighted Regression
13 pages
Counting Tree
No ratings yet
Counting Tree
7 pages
NOSQL
No ratings yet
NOSQL
16 pages
M3-M4-Understanding of Data
No ratings yet
M3-M4-Understanding of Data
16 pages
Bai602 ML Lesson Plan 2024-25 Even Aiml Dept
No ratings yet
Bai602 ML Lesson Plan 2024-25 Even Aiml Dept
5 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
37 pages
AIML Module 3
No ratings yet
AIML Module 3
25 pages
ML Question Bank
No ratings yet
ML Question Bank
7 pages
Module 5
No ratings yet
Module 5
43 pages
BCS602 Model Question Paper Solved (Search Creators)
No ratings yet
BCS602 Model Question Paper Solved (Search Creators)
37 pages
UNIT2
No ratings yet
UNIT2
25 pages
VTU Exam Question Paper With Solution of 17CS73 Machine Learning Jan-2021-Swathi Y
No ratings yet
VTU Exam Question Paper With Solution of 17CS73 Machine Learning Jan-2021-Swathi Y
7 pages
UNIT 4 - Perceptron and DL
No ratings yet
UNIT 4 - Perceptron and DL
39 pages
Aiml Lab Manual 2023
No ratings yet
Aiml Lab Manual 2023
17 pages
20 431 Internship PPT Final
No ratings yet
20 431 Internship PPT Final
19 pages
Module-02 AIML NOTES
No ratings yet
Module-02 AIML NOTES
29 pages
Module 2 Principle of AI
No ratings yet
Module 2 Principle of AI
15 pages
21CS54 Aiml Module3 PPT
No ratings yet
21CS54 Aiml Module3 PPT
102 pages
IML-IITKGP - Assignment 7 Solution
No ratings yet
IML-IITKGP - Assignment 7 Solution
8 pages
Machine Learning Class Notes
No ratings yet
Machine Learning Class Notes
1 page
AIML Lab Manual
No ratings yet
AIML Lab Manual
43 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
10 pages
MACHINE LEARNING Important Questions
100% (1)
MACHINE LEARNING Important Questions
2 pages
Co Po Mapping Bda With Justiificaton
No ratings yet
Co Po Mapping Bda With Justiificaton
4 pages
Sample Report 22-23 1
No ratings yet
Sample Report 22-23 1
30 pages
MC - BCS402 Lab Manual
No ratings yet
MC - BCS402 Lab Manual
21 pages
Module-1: Review Questions: Automata Theory and Computability - 15CS54
No ratings yet
Module-1: Review Questions: Automata Theory and Computability - 15CS54
4 pages
ADA Model QP 2023-24
No ratings yet
ADA Model QP 2023-24
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.