0% found this document useful (0 votes)
24 views69 pages

Unit 5 ML-2-70

The document discusses two primary machine learning techniques: Instance-Based Learning and Reinforcement Learning. Instance-Based Learning focuses on algorithms like k-Nearest Neighbor (KNN) that classify new instances based on similarity to training examples, while Reinforcement Learning involves agents taking actions in an environment to maximize cumulative rewards through methods like Q-learning. Key concepts such as policies, states, and rewards are explored, along with the challenges and applications of both learning paradigms.

Uploaded by

mounika07chinta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views69 pages

Unit 5 ML-2-70

The document discusses two primary machine learning techniques: Instance-Based Learning and Reinforcement Learning. Instance-Based Learning focuses on algorithms like k-Nearest Neighbor (KNN) that classify new instances based on similarity to training examples, while Reinforcement Learning involves agents taking actions in an environment to maximize cumulative rewards through methods like Q-learning. Key concepts such as policies, states, and rewards are explored, along with the challenges and applications of both learning paradigms.

Uploaded by

mounika07chinta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Contents

• Instance Based Learning: k-Nearest Neighbor


learning, Locally weighted regression, Radial
basis functions, Case-based reasoning.
• Reinforcement Learning: The learning task, Q-
learning, Nondeterministic rewards and
actions, Temporal difference learning,
Generalizing from examples, Relationship to
dynamic programming.
Instance-based learning

• The Machine Learning systems which are categorized


as instance-based learning are the systems that learn
the training examples by heart and then generalizes to
new instances based on some similarity measure.
• It is called instance-based because it builds the
hypotheses from the training instances.
• It is also known as memory-based learning or lazy-
learning.
• The time complexity of this algorithm depends upon
the size of training data.
• The worst-case time complexity of this algorithm is O
(n), where n is the number of training instances.
• For example, If we were to create a spam filter
with an instance-based learning algorithm,
instead of just flagging emails that are already
marked as spam emails, our spam filter would
be programmed to also flag emails that are
very similar to them.
• This requires a measure of resemblance
between two emails.
• A similarity measure between two emails
could be the same sender or the repetitive use
of the same keywords or something else.
Advantages:
• Instead of estimating for the entire instance
set, local approximations can be made to the
target function.
• This algorithm can adapt to new data easily,
one which is collected as we go .
Disadvantages:
• Classification costs are high
• Large amount of memory required to store
the data, and each query involves starting the
identification of a local model from scratch.
Some of the instance-based learning algorithms
are :
• K Nearest Neighbor (KNN)
• Self-Organizing Map (SOM)
• Learning Vector Quantization (LVQ)
• Locally Weighted Learning (LWL)
K Nearest Neighbor (KNN)
• It is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
• It assumes the similarity between the new case/data and
available cases and put the new case into the category that is
most similar to the available categories.
• It stores all the available data and classifies a new data point
based on the similarity.
• This means when new data appears then it can be easily
classified into a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification
problems.
• K-NN is a non-parametric algorithm, which means it does not
make any assumption on underlying data.
• It is also called a lazy learner algorithm because it does
not learn from the training set immediately instead it
stores the dataset and at the time of classification, it
performs an action on the dataset.
• KNN algorithm at the training phase just stores the
dataset and when it gets new data, then it classifies
that data into a category that is much similar to the
new data.
• Example: Suppose, we have an image of a creature that
looks similar to cat and dog, but we want to know
either it is a cat or dog.
• So for this identification, we can use the KNN
algorithm, as it works on a similarity measure.
• Our KNN model will find the similar features of the
new data set to the cats and dogs images and based on
the most similar features it will put it in either cat or
dog category.
Why do we need a K-NN Algorithm?
• Suppose there are two categories, i.e., Category A
and Category B, and we have a new data point x1, so
this data point will lie in which of these categories.
• To solve this type of problem, we need a K-NN
algorithm.
• With the help of K-NN, we can easily identify
the category or class of a particular dataset.
Consider the below diagram
How does K-NN work?
The K-NN working can be explained on the basis of the
below algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number
of neighbors
Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of
the data points in each category.
Step-5: Assign the new data points to that category for
which the number of the neighbor is maximum.
Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the
required category. Consider the below image:

• Firstly, we will choose the number of neighbors, so we will


choose the k=5.
• Next, we will calculate the Euclidean distance between the
data points.
• The Euclidean distance is the distance between two points,
which we have already studied in geometry. It can be calculated
as:
As we can see the 3 nearest neighbors are from category A, hence this new data point must
belong to category A.
KNN Algorithm
Visit this link https://medium.com/@pramodmehta_76622/locally-
weighted-learning-
502426036d45#:~:text=Locally%20weighted%20Learni
ng%20is%20a%20Machine%20Learning%20algorithm,
problem%20statement.%20The%20basic%20problem%
20statement%20is%20Regression.
As we know Nearest Neighbour classifiers stores training tuples as points in Euclidean space.
But Case-Based Reasoning classifiers (CBR) use a database of problem solutions to solve new
problems. It stores the tuples or cases for problem-solving as complex symbolic descriptions.
 How CBR works?
When a new case arrises to classify, a Case-based Reasoner(CBR) will first check if
an identical training case exists. If one is found, then the accompanying solution to that
case is returned.
 If no identical case is found, then the CBR will search for training cases having
components that are similar to those of the new case.
 Conceptually, these training cases may be considered as neighbours of the new case. If
cases are represented as graphs, this involves searching for subgraphs that are similar to
subgraphs within the new case.
 The CBR tries to combine the solutions of the neighbouring training cases to propose a
solution for the new case. If compatibilities arise with the individual solutions, then
backtracking to search for other solutions may be necessary.
 The CBR may employ background knowledge and problem-solving strategies to propose a
feasible solution.
• Applications of CBR includes:
• Problem resolution for customer service help desks, where cases describe product-related
diagnostic problems.
• It is also applied to areas such as engineering and law, where cases are either technical
designs or legal rulings, respectively.
• Medical educations, where patient case histories and treatments are used to help
diagnose and treat new patients.
Challenges with CBR
• Finding a good similarity metric (eg for matching subgraphs) and suitable
methods for combining solutions.
• Selecting salient features for indexing training cases and the development
of efficient indexing techniques.
• CBR becomes more intelligent as the number of the trade-off between
accuracy and efficiency evolves as the number of stored cases becomes
very large. But after a certain point, the system’s efficiency will suffer as
the time required to search for and process relevant cases increases.
What is Reinforcement Learning?
• Reinforcement Learning is defined as a Machine Learning
method that is concerned with how software agents should take
actions in an environment.
• It is a part of the deep learning method that helps you to
maximize some portion of the cumulative reward.
• This neural network learning method helps you to learn how to
attain a complex objective or maximize a specific dimension over
many steps.
Agent: It is an assumed entity which performs actions in an environment to gain
some reward.
Environment (e): A scenario that an agent has to face.
Reward (R): An immediate return given to an agent when he or she performs
specific action or task.
State (s): State refers to the current situation returned by the environment.
Policy (π): It is a strategy which applies by the agent to decide the next action
based on the current state.
Value (V): It is expected long-term return with discount, as compared to the
short-term reward.
Value Function: It specifies the value of a state that is the total amount of
reward. It is an agent which should be expected beginning from that state.
Model of the environment: This mimics the behavior of the environment. It
helps you to make inferences to be made and also determine how the
environment will behave.
Model based methods: It is a method for solving reinforcement learning
problems which use model-based methods.
Q value or action value (Q): Q value is quite similar to value. The only
difference between the two is that it takes an additional parameter as a current
action.
How Reinforcement Learning works?
Let’s see some simple example which helps you to illustrate the reinforcement learning
mechanism.

In this case,
• Your cat is an agent that is exposed to
the environment. In this case, it is your
house. An example of a state could be
your cat sitting, and you use a specific
word in for cat to walk.
• Our agent reacts by performing an
action transition from one “state” to
another “state.”
• For example, your cat goes from sitting
to walking.
• The reaction of an agent is an action,
and the policy is a method of selecting
an action given a state in expectation of
better outcomes.
• After the transition, they may get a
reward or penalty in return.
Reinforcement Learning Algorithms
1.Value-Based:
• In a value-based Reinforcement Learning method, you should try to
maximize a value function V(s). In this method, the agent is expecting
a long-term return of the current states under policy π.
2.Policy-based:
• In a policy-based RL method, you try to come up with such a policy
that the action performed in every state helps you to gain maximum
reward in the future.
Two types of policy-based methods are:
Deterministic: For any state, the same action is produced by the policy π.
Stochastic: Every action has a certain probability, which is determined by
the following equation. Stochastic Policy :n{a\s) = P\A, = a\S, =S]
3.Model-Based:
• In this Reinforcement Learning method, you need to create a virtual
model for each environment. The agent learns to perform in that
specific environment.
Characteristics of Reinforcement Learning
• There is no supervisor, only a real number or reward signal
• Sequential decision making
• Time plays a crucial role in Reinforcement problems
• Feedback is always delayed, not instantaneous
• Agent’s actions determine the subsequent data it receives
Types of Reinforcement Learning
Two types of reinforcement learning methods are:
Positive:
• It is defined as an event, that occurs because of specific behavior. It
increases the strength and the frequency of the behavior and impacts
positively on the action taken by the agent.
• It maximize performance and sustain change for a more extended
period. However, too much Reinforcement may lead to over-
optimization of state, which can affect the results.
Negative:
• Negative Reinforcement is defined as strengthening of behavior that
occurs because of a negative condition which should have stopped or
avoided.
• It helps you to define the minimum stand of performance.
• However, the drawback of this method is that it provides enough to
meet up the minimum behavior.
Learning Models of Reinforcement
There are two important learning models in reinforcement learning:
1. Markov Decision Process
2. Q learning

Markov Decision Process


The following parameters are used to get a solution:
Set of actions- A
Set of states -S
Reward- R
Policy- n
Value- V
• The mathematical approach for mapping a solution in reinforcement
Learning is recon as a Markov Decision Process or (MDP).
Q Learning –
• It’s a value-based model free approach for supplying
information to intimate which action an agent should
perform.
• It revolves around the notion of updating Q values
which shows the value of doing action A in state S.
• Value update rule is the main aspect of the Q-learning
algorithm.
Q-learning Definition
• Q*(s,a) is the expected value (cumulative discounted reward) of
doing a in state s and then following the optimal policy.
• Q-learning uses Temporal Differences(TD) to estimate the value of
Q*(s,a).
• Temporal difference is an agent learning from an environment
through episodes with no prior knowledge of the environment.
• The agent maintains a table of Q[S, A], where S is the set
of states and A is the set of actions.
• Q[s, a] represents its current estimate of Q*(s,a).
What Is The Bellman Equation?
• The Bellman Equation is used to determine the value of a
particular state and deduce how good it is to be in/take that state.
• The optimal state will give us the highest optimal value.
• The equation is given below. It uses the current state, and the
reward associated with that state, along with the maximum
expected reward and a discount rate, which determines its
importance to the current state, to find the next state of our
agent.
• The learning rate determines how fast or slow, the model will be
learning.
How to Make a Q-Table?
While running our algorithm, we will come across various
solutions and the agent will take multiple paths. How do we
find out the best among them? This is done by tabulating
our findings in a table called a Q-Table.
A Q-Table helps us to find the best action for each state in
the environment. We use the Bellman Equation at each
state to get the expected future state and reward and save it
in a table to compare with other states.
Lets us create a q-table for an agent that has to learn to run,
fetch and sit on command. The steps taken to construct a q-
table are :
Step 1: Create an initial Q-Table with all values initialized to 0
When we initially start, the values of all states and rewards will be 0.
Consider the Q-Table shown below which shows a dog simulator
learning to perform actions :
Step 2: Choose an action and perform it. Update values in the table
This is the starting point. We have performed no other action as of
yet. Let us say that we want the agent to sit initially, which it does.
The table will change to:
Step 3: Get the value of the reward and calculate the value Q-
Value using Bellman Equation
For the action performed, we need to calculate the value of the
actual reward and the Q( S, A ) value
Step 4: Continue the same until the table is filled or an episode ends
The agent continues taking actions and for each action, the reward and
Q-value are calculated and it updates the table.
Example:
• Consider building a learning robot.
• The robot, or agent, has a set of sensors to observe the state of its environment,
and a set of actions it can perform to alter this state.
• Its task is to learn a control strategy, or policy, for choosing actions that achieve its
goals.
• The goals of the agent can be defined by a reward function that assigns a
numerical value.
• This reward function may be built into the robot, or known only to an external
teacher who provides the reward value for each action performed by the robot.
• The task of the robot is to perform sequences of actions, observe their
consequences, and learn a control policy.
• The control policy is one that, from any initial state, chooses actions that maximize
the reward accumulated over time by the agent.
• A mobile robot may have sensors such as a camera and sonars, and actions such
as "move forward" and "turn.“
• The robot may have a goal of docking onto its battery charger whenever its
battery level is low.
• The goal of docking to the battery charger can be captured by assigning a positive
reward (Eg., +100) to state-action transitions that immediately result in a
connection to the charger and a reward of zero to every other state-action
transition.
THE LEARNING TASK

• Consider Markov decision process (MDP) where the agent can perceive a
set S of distinct states of its environment and has a set A of actions that it
can perform.
• At each discrete time step t, the agent senses the current state st, chooses
a current action at, and performs it.
• The environment responds by giving the agent a reward rt = r(st, at)
by producing the succeeding state st+l = δ(st, at).
• Here the functions δ(st, at) and r(st, at) depend only on the current state and
action, and not on earlier states or actions.
How shall we specify precisely which policy π we would like the agent to learn?

1.One approach is to require the policy that produces the greatest possible cumulative
reward for the robot over time.
• To state this requirement more precisely, define the cumulative value Vπ (st)
achieved by following an arbitrary policy π from an arbitrary initial state st as
follows:
• The quantity Vπ (st) is called the discounted cumulative reward achieved by
policy π from initial state s.
2. Other definitions of total reward is finite horizon reward,

Considers the undiscounted sum of rewards over a finite number h of steps


3. Another approach is average reward

Considers the average reward per time step over the entire lifetime of the agent.

We require that the agent learn a policy π that maximizes Vπ (st)


for all states s. such a policy is called an optimal policy and denote it by π*

Refer the value function Vπ*(s) an optimal policy as V*(s).


• V*(s) gives the maximum discounted cumulative reward that the agent can
obtain starting from state s.
Example: A simple grid-world environment is depicted in the diagram

• The 6 grid squares in this diagram represent 6 possible states, or locations,


for the agent.
• Each arrow in the diagram represents a possible action the agent can take to
move from one state to another.
• The number associated with each arrow represents the immediate reward
r(s, a) the agent receives if it executes the corresponding state-action
transition
The immediate reward in this environment is defined to be zero for all state-
action transitions except for those leading into the state labelled G.
The state G as the goal state, and the agent can receive reward by entering this
state.
Once the states, actions, and immediate rewards are defined, choose a value for the
discount factor γ, determine the optimal policy π * and its value function V*(s).
The discounted future reward from the bottom centre state is
0+ γ 100+ γ2 0+ γ3 0+... = 90
Q LEARNING

How can an agent learn an optimal policy π * for an arbitrary


environment?
• The training information available to the learner is the sequence
of immediate rewards r(si,ai) for i = 0, 1,2.
• Given this kind of training information it is easier to learn a
numerical evaluation function defined over states and actions,
then implement the optimal policy in terms of this evaluation
function.
What evaluation function should the agent attempt to learn?
• One obvious choice is V*. The agent should prefer state sl over
state s2 whenever V*(sl) > V*(s2), because the cumulative future
reward will be greater from sl
The optimal action in state s is the action a that maximizes the sum
of the immediate reward r(s, a) plus the value V* of the immediate
successor state, discounted by γ.
Difference between Deterministic and Non-deterministic
Algorithms

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy