0% found this document useful (0 votes)

24 views69 pages

Unit 5 ML-2-70

The document discusses two primary machine learning techniques: Instance-Based Learning and Reinforcement Learning. Instance-Based Learning focuses on algorithms like k-Nearest Neighbor (KNN) that classify new instances based on similarity to training examples, while Reinforcement Learning involves agents taking actions in an environment to maximize cumulative rewards through methods like Q-learning. Key concepts such as policies, states, and rewards are explored, along with the challenges and applications of both learning paradigms.

Uploaded by

mounika07chinta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views69 pages

Unit 5 ML-2-70

Uploaded by

mounika07chinta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

• Instance Based Learning: k-Nearest Neighbor

learning, Locally weighted regression, Radial
basis functions, Case-based reasoning.
• Reinforcement Learning: The learning task, Q-
learning, Nondeterministic rewards and
actions, Temporal difference learning,
Generalizing from examples, Relationship to
dynamic programming.
Instance-based learning

• The Machine Learning systems which are categorized

as instance-based learning are the systems that learn
the training examples by heart and then generalizes to
new instances based on some similarity measure.
• It is called instance-based because it builds the
hypotheses from the training instances.
• It is also known as memory-based learning or lazy-
learning.
• The time complexity of this algorithm depends upon
the size of training data.
• The worst-case time complexity of this algorithm is O
(n), where n is the number of training instances.
• For example, If we were to create a spam filter
with an instance-based learning algorithm,
instead of just flagging emails that are already
marked as spam emails, our spam filter would
be programmed to also flag emails that are
very similar to them.
• This requires a measure of resemblance
between two emails.
• A similarity measure between two emails
could be the same sender or the repetitive use
of the same keywords or something else.
Advantages:
• Instead of estimating for the entire instance
set, local approximations can be made to the
target function.
• This algorithm can adapt to new data easily,
one which is collected as we go .
Disadvantages:
• Classification costs are high
• Large amount of memory required to store
the data, and each query involves starting the
identification of a local model from scratch.
Some of the instance-based learning algorithms
are :
• K Nearest Neighbor (KNN)
• Self-Organizing Map (SOM)
• Learning Vector Quantization (LVQ)
• Locally Weighted Learning (LWL)
K Nearest Neighbor (KNN)
• It is one of the simplest Machine Learning algorithms based on
Supervised Learning technique.
• It assumes the similarity between the new case/data and
available cases and put the new case into the category that is
most similar to the available categories.
• It stores all the available data and classifies a new data point
based on the similarity.
• This means when new data appears then it can be easily
classified into a well suite category by using K- NN algorithm.
• K-NN algorithm can be used for Regression as well as for
Classification but mostly it is used for the Classification
problems.
• K-NN is a non-parametric algorithm, which means it does not
make any assumption on underlying data.
• It is also called a lazy learner algorithm because it does
not learn from the training set immediately instead it
stores the dataset and at the time of classification, it
performs an action on the dataset.
• KNN algorithm at the training phase just stores the
dataset and when it gets new data, then it classifies
that data into a category that is much similar to the
new data.
• Example: Suppose, we have an image of a creature that
looks similar to cat and dog, but we want to know
either it is a cat or dog.
• So for this identification, we can use the KNN
algorithm, as it works on a similarity measure.
• Our KNN model will find the similar features of the
new data set to the cats and dogs images and based on
the most similar features it will put it in either cat or
dog category.
Why do we need a K-NN Algorithm?
• Suppose there are two categories, i.e., Category A
and Category B, and we have a new data point x1, so
this data point will lie in which of these categories.
• To solve this type of problem, we need a K-NN
algorithm.
• With the help of K-NN, we can easily identify
the category or class of a particular dataset.
Consider the below diagram
How does K-NN work?
The K-NN working can be explained on the basis of the
below algorithm:
Step-1: Select the number K of the neighbors
Step-2: Calculate the Euclidean distance of K number
of neighbors
Step-3: Take the K nearest neighbors as per the
calculated Euclidean distance.
Step-4: Among these k neighbors, count the number of
the data points in each category.
Step-5: Assign the new data points to that category for
which the number of the neighbor is maximum.
Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the
required category. Consider the below image:

• Firstly, we will choose the number of neighbors, so we will

choose the k=5.
• Next, we will calculate the Euclidean distance between the
data points.
• The Euclidean distance is the distance between two points,
which we have already studied in geometry. It can be calculated
as:
As we can see the 3 nearest neighbors are from category A, hence this new data point must
belong to category A.
KNN Algorithm
Visit this link https://medium.com/@pramodmehta_76622/locally-
weighted-learning-
502426036d45#:~:text=Locally%20weighted%20Learni
ng%20is%20a%20Machine%20Learning%20algorithm,
problem%20statement.%20The%20basic%20problem%
20statement%20is%20Regression.
As we know Nearest Neighbour classifiers stores training tuples as points in Euclidean space.
But Case-Based Reasoning classifiers (CBR) use a database of problem solutions to solve new
problems. It stores the tuples or cases for problem-solving as complex symbolic descriptions.
 How CBR works?
When a new case arrises to classify, a Case-based Reasoner(CBR) will first check if
an identical training case exists. If one is found, then the accompanying solution to that
case is returned.
 If no identical case is found, then the CBR will search for training cases having
components that are similar to those of the new case.
 Conceptually, these training cases may be considered as neighbours of the new case. If
cases are represented as graphs, this involves searching for subgraphs that are similar to
subgraphs within the new case.
 The CBR tries to combine the solutions of the neighbouring training cases to propose a
solution for the new case. If compatibilities arise with the individual solutions, then
backtracking to search for other solutions may be necessary.
 The CBR may employ background knowledge and problem-solving strategies to propose a
feasible solution.
• Applications of CBR includes:
• Problem resolution for customer service help desks, where cases describe product-related
diagnostic problems.
• It is also applied to areas such as engineering and law, where cases are either technical
designs or legal rulings, respectively.
• Medical educations, where patient case histories and treatments are used to help
diagnose and treat new patients.
Challenges with CBR
• Finding a good similarity metric (eg for matching subgraphs) and suitable
methods for combining solutions.
• Selecting salient features for indexing training cases and the development
of efficient indexing techniques.
• CBR becomes more intelligent as the number of the trade-off between
accuracy and efficiency evolves as the number of stored cases becomes
very large. But after a certain point, the system’s efficiency will suffer as
the time required to search for and process relevant cases increases.
What is Reinforcement Learning?
• Reinforcement Learning is defined as a Machine Learning
method that is concerned with how software agents should take
actions in an environment.
• It is a part of the deep learning method that helps you to
maximize some portion of the cumulative reward.
• This neural network learning method helps you to learn how to
attain a complex objective or maximize a specific dimension over
many steps.
Agent: It is an assumed entity which performs actions in an environment to gain
some reward.
Environment (e): A scenario that an agent has to face.
Reward (R): An immediate return given to an agent when he or she performs
specific action or task.
State (s): State refers to the current situation returned by the environment.
Policy (π): It is a strategy which applies by the agent to decide the next action
based on the current state.
Value (V): It is expected long-term return with discount, as compared to the
short-term reward.
Value Function: It specifies the value of a state that is the total amount of
reward. It is an agent which should be expected beginning from that state.
Model of the environment: This mimics the behavior of the environment. It
helps you to make inferences to be made and also determine how the
environment will behave.
Model based methods: It is a method for solving reinforcement learning
problems which use model-based methods.
Q value or action value (Q): Q value is quite similar to value. The only
difference between the two is that it takes an additional parameter as a current
action.
How Reinforcement Learning works?
Let’s see some simple example which helps you to illustrate the reinforcement learning
mechanism.

In this case,
• Your cat is an agent that is exposed to
the environment. In this case, it is your
house. An example of a state could be
your cat sitting, and you use a specific
word in for cat to walk.
• Our agent reacts by performing an
action transition from one “state” to
another “state.”
• For example, your cat goes from sitting
to walking.
• The reaction of an agent is an action,
and the policy is a method of selecting
an action given a state in expectation of
better outcomes.
• After the transition, they may get a
reward or penalty in return.
Reinforcement Learning Algorithms
1.Value-Based:
• In a value-based Reinforcement Learning method, you should try to
maximize a value function V(s). In this method, the agent is expecting
a long-term return of the current states under policy π.
2.Policy-based:
• In a policy-based RL method, you try to come up with such a policy
that the action performed in every state helps you to gain maximum
reward in the future.
Two types of policy-based methods are:
Deterministic: For any state, the same action is produced by the policy π.
Stochastic: Every action has a certain probability, which is determined by
the following equation. Stochastic Policy :n{a\s) = P\A, = a\S, =S]
3.Model-Based:
• In this Reinforcement Learning method, you need to create a virtual
model for each environment. The agent learns to perform in that
specific environment.
Characteristics of Reinforcement Learning
• There is no supervisor, only a real number or reward signal
• Sequential decision making
• Time plays a crucial role in Reinforcement problems
• Feedback is always delayed, not instantaneous
• Agent’s actions determine the subsequent data it receives
Types of Reinforcement Learning
Two types of reinforcement learning methods are:
Positive:
• It is defined as an event, that occurs because of specific behavior. It
increases the strength and the frequency of the behavior and impacts
positively on the action taken by the agent.
• It maximize performance and sustain change for a more extended
period. However, too much Reinforcement may lead to over-
optimization of state, which can affect the results.
Negative:
• Negative Reinforcement is defined as strengthening of behavior that
occurs because of a negative condition which should have stopped or
avoided.
• It helps you to define the minimum stand of performance.
• However, the drawback of this method is that it provides enough to
meet up the minimum behavior.
Learning Models of Reinforcement
There are two important learning models in reinforcement learning:
1. Markov Decision Process
2. Q learning

Markov Decision Process

The following parameters are used to get a solution:
Set of actions- A
Set of states -S
Reward- R
Policy- n
Value- V
• The mathematical approach for mapping a solution in reinforcement
Learning is recon as a Markov Decision Process or (MDP).
Q Learning –
• It’s a value-based model free approach for supplying
information to intimate which action an agent should
perform.
• It revolves around the notion of updating Q values
which shows the value of doing action A in state S.
• Value update rule is the main aspect of the Q-learning
algorithm.
Q-learning Definition
• Q*(s,a) is the expected value (cumulative discounted reward) of
doing a in state s and then following the optimal policy.
• Q-learning uses Temporal Differences(TD) to estimate the value of
Q*(s,a).
• Temporal difference is an agent learning from an environment
through episodes with no prior knowledge of the environment.
• The agent maintains a table of Q[S, A], where S is the set
of states and A is the set of actions.
• Q[s, a] represents its current estimate of Q*(s,a).
What Is The Bellman Equation?
• The Bellman Equation is used to determine the value of a
particular state and deduce how good it is to be in/take that state.
• The optimal state will give us the highest optimal value.
• The equation is given below. It uses the current state, and the
reward associated with that state, along with the maximum
expected reward and a discount rate, which determines its
importance to the current state, to find the next state of our
agent.
• The learning rate determines how fast or slow, the model will be
learning.
How to Make a Q-Table?
While running our algorithm, we will come across various
solutions and the agent will take multiple paths. How do we
find out the best among them? This is done by tabulating
our findings in a table called a Q-Table.
A Q-Table helps us to find the best action for each state in
the environment. We use the Bellman Equation at each
state to get the expected future state and reward and save it
in a table to compare with other states.
Lets us create a q-table for an agent that has to learn to run,
fetch and sit on command. The steps taken to construct a q-
table are :
Step 1: Create an initial Q-Table with all values initialized to 0
When we initially start, the values of all states and rewards will be 0.
Consider the Q-Table shown below which shows a dog simulator
learning to perform actions :
Step 2: Choose an action and perform it. Update values in the table
This is the starting point. We have performed no other action as of
yet. Let us say that we want the agent to sit initially, which it does.
The table will change to:
Step 3: Get the value of the reward and calculate the value Q-
Value using Bellman Equation
For the action performed, we need to calculate the value of the
actual reward and the Q( S, A ) value
Step 4: Continue the same until the table is filled or an episode ends
The agent continues taking actions and for each action, the reward and
Q-value are calculated and it updates the table.
Example:
• Consider building a learning robot.
• The robot, or agent, has a set of sensors to observe the state of its environment,
and a set of actions it can perform to alter this state.
• Its task is to learn a control strategy, or policy, for choosing actions that achieve its
goals.
• The goals of the agent can be defined by a reward function that assigns a
numerical value.
• This reward function may be built into the robot, or known only to an external
teacher who provides the reward value for each action performed by the robot.
• The task of the robot is to perform sequences of actions, observe their
consequences, and learn a control policy.
• The control policy is one that, from any initial state, chooses actions that maximize
the reward accumulated over time by the agent.
• A mobile robot may have sensors such as a camera and sonars, and actions such
as "move forward" and "turn.“
• The robot may have a goal of docking onto its battery charger whenever its
battery level is low.
• The goal of docking to the battery charger can be captured by assigning a positive
reward (Eg., +100) to state-action transitions that immediately result in a
connection to the charger and a reward of zero to every other state-action
transition.
THE LEARNING TASK

• Consider Markov decision process (MDP) where the agent can perceive a
set S of distinct states of its environment and has a set A of actions that it
can perform.
• At each discrete time step t, the agent senses the current state st, chooses
a current action at, and performs it.
• The environment responds by giving the agent a reward rt = r(st, at)
by producing the succeeding state st+l = δ(st, at).
• Here the functions δ(st, at) and r(st, at) depend only on the current state and
action, and not on earlier states or actions.
How shall we specify precisely which policy π we would like the agent to learn?

1.One approach is to require the policy that produces the greatest possible cumulative
reward for the robot over time.
• To state this requirement more precisely, define the cumulative value Vπ (st)
achieved by following an arbitrary policy π from an arbitrary initial state st as
follows:
• The quantity Vπ (st) is called the discounted cumulative reward achieved by
policy π from initial state s.
2. Other definitions of total reward is finite horizon reward,

Considers the undiscounted sum of rewards over a finite number h of steps

3. Another approach is average reward

Considers the average reward per time step over the entire lifetime of the agent.

We require that the agent learn a policy π that maximizes Vπ (st)

for all states s. such a policy is called an optimal policy and denote it by π*

Refer the value function Vπ(s) an optimal policy as V(s).

• V*(s) gives the maximum discounted cumulative reward that the agent can
obtain starting from state s.
Example: A simple grid-world environment is depicted in the diagram

• The 6 grid squares in this diagram represent 6 possible states, or locations,

for the agent.
• Each arrow in the diagram represents a possible action the agent can take to
move from one state to another.
• The number associated with each arrow represents the immediate reward
r(s, a) the agent receives if it executes the corresponding state-action
transition
The immediate reward in this environment is defined to be zero for all state-
action transitions except for those leading into the state labelled G.
The state G as the goal state, and the agent can receive reward by entering this
state.
Once the states, actions, and immediate rewards are defined, choose a value for the
discount factor γ, determine the optimal policy π * and its value function V*(s).
The discounted future reward from the bottom centre state is
0+ γ 100+ γ2 0+ γ3 0+... = 90
Q LEARNING

How can an agent learn an optimal policy π * for an arbitrary

environment?
• The training information available to the learner is the sequence
of immediate rewards r(si,ai) for i = 0, 1,2.
• Given this kind of training information it is easier to learn a
numerical evaluation function defined over states and actions,
then implement the optimal policy in terms of this evaluation
function.
What evaluation function should the agent attempt to learn?
• One obvious choice is V*. The agent should prefer state sl over
state s2 whenever V*(sl) > V*(s2), because the cumulative future
reward will be greater from sl
The optimal action in state s is the action a that maximizes the sum
of the immediate reward r(s, a) plus the value V* of the immediate
successor state, discounted by γ.
Difference between Deterministic and Non-deterministic
Algorithms

Annexure-4 CertificatefromUniversity
No ratings yet
Annexure-4 CertificatefromUniversity
1 page
Punching Shear
100% (1)
Punching Shear
4 pages
Theory HRV 1
No ratings yet
Theory HRV 1
94 pages
Cambridge Advanced Practice Tests 2015
0% (1)
Cambridge Advanced Practice Tests 2015
17 pages
Igcse
No ratings yet
Igcse
9 pages
Hornady 2017 Product Catalog
No ratings yet
Hornady 2017 Product Catalog
132 pages
MYP 4 - Syllabus Booklet 2024-25 For Semester End Exam
No ratings yet
MYP 4 - Syllabus Booklet 2024-25 For Semester End Exam
32 pages
Ft-757gx2 User Hb9fax
No ratings yet
Ft-757gx2 User Hb9fax
37 pages
IBM DSAA-3xxx Series Hard Drives OEM Functional Specifications
No ratings yet
IBM DSAA-3xxx Series Hard Drives OEM Functional Specifications
50 pages
Week 11 Probability and Statistics
No ratings yet
Week 11 Probability and Statistics
27 pages
Raghuvamsa CantoV English Meaning
No ratings yet
Raghuvamsa CantoV English Meaning
69 pages
Introduction To TikTok Shop Affiliate Program
No ratings yet
Introduction To TikTok Shop Affiliate Program
10 pages
Anglais
No ratings yet
Anglais
19 pages
Collab Report Merged
No ratings yet
Collab Report Merged
55 pages
Ground Improvement Methods
No ratings yet
Ground Improvement Methods
32 pages
Heuristic Search Strategies
No ratings yet
Heuristic Search Strategies
23 pages
Datasheet - Hexaband APE4518R14v06 PDF
No ratings yet
Datasheet - Hexaband APE4518R14v06 PDF
3 pages
Esfuerzos en Vigas - PDF
No ratings yet
Esfuerzos en Vigas - PDF
6 pages
Frequently Asked Questions (Faqs) About The: Symmetry454 and Symmetry010 Calendars
No ratings yet
Frequently Asked Questions (Faqs) About The: Symmetry454 and Symmetry010 Calendars
17 pages
Research Paper Mytsak
No ratings yet
Research Paper Mytsak
27 pages
Detailed Lesson Plan
No ratings yet
Detailed Lesson Plan
6 pages
Fertility: Overview, 2012 To 2016: Report On The Demographic Situation in Canada
No ratings yet
Fertility: Overview, 2012 To 2016: Report On The Demographic Situation in Canada
19 pages
AIML Unit-IV & V
100% (1)
AIML Unit-IV & V
47 pages
Beton Od Recikliranog Agregata: April 2014
No ratings yet
Beton Od Recikliranog Agregata: April 2014
8 pages
Unit 5 1
No ratings yet
Unit 5 1
113 pages
PT Akasha Wira International TBK Swot Analysis Bac
No ratings yet
PT Akasha Wira International TBK Swot Analysis Bac
13 pages
Water Supply Base Map of Bellary City: Allipura Impounding Reservoir - 12633 ML
No ratings yet
Water Supply Base Map of Bellary City: Allipura Impounding Reservoir - 12633 ML
1 page
@vtudeveloper - in ML Mod 3
No ratings yet
@vtudeveloper - in ML Mod 3
32 pages
Objective Function Decisions Demand Supply Constraints
No ratings yet
Objective Function Decisions Demand Supply Constraints
7 pages
ESD Assignment
No ratings yet
ESD Assignment
14 pages
JD Science Physic Teacher
No ratings yet
JD Science Physic Teacher
4 pages
I Introduction and Design of The Study
No ratings yet
I Introduction and Design of The Study
5 pages
Lecture19 s12 KNN
No ratings yet
Lecture19 s12 KNN
16 pages
Machine Learning Unit 3
No ratings yet
Machine Learning Unit 3
40 pages
Instance Based Learning
100% (1)
Instance Based Learning
27 pages
MCA 4th Sem
No ratings yet
MCA 4th Sem
18 pages
Module3-Similarity-based Learning-11Mar2024
No ratings yet
Module3-Similarity-based Learning-11Mar2024
34 pages
Mechanical Engineering Seminars
No ratings yet
Mechanical Engineering Seminars
1 page
Unit 3 (B) NGP
No ratings yet
Unit 3 (B) NGP
84 pages
MLT - Module 3
No ratings yet
MLT - Module 3
99 pages
CAT - 2 Class
No ratings yet
CAT - 2 Class
62 pages
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
Week 09 Lesson 1 Intro Machine Learning 1 To 32
No ratings yet
Week 09 Lesson 1 Intro Machine Learning 1 To 32
61 pages
CH 2
No ratings yet
CH 2
30 pages
Unit 3 - Supervise Learning Classification
No ratings yet
Unit 3 - Supervise Learning Classification
23 pages
Unit-1 ML
No ratings yet
Unit-1 ML
39 pages
CHP 4
No ratings yet
CHP 4
24 pages
ML TRW
No ratings yet
ML TRW
5 pages
CH 4
No ratings yet
CH 4
106 pages
Text Book 2 Module 4 Chapter 3-Similarity Based Learning
No ratings yet
Text Book 2 Module 4 Chapter 3-Similarity Based Learning
12 pages
Week 8
No ratings yet
Week 8
70 pages
ML - Unit 3
No ratings yet
ML - Unit 3
32 pages
AML Mod5
No ratings yet
AML Mod5
33 pages
Lec 12 NN
No ratings yet
Lec 12 NN
20 pages
ML Merge
No ratings yet
ML Merge
145 pages
ML Unit-4
No ratings yet
ML Unit-4
9 pages
ML Module 5 1
No ratings yet
ML Module 5 1
37 pages
Lec05 InstanceBased
No ratings yet
Lec05 InstanceBased
13 pages
Chapter 6: Classification and Prediction: Classify Predictions
No ratings yet
Chapter 6: Classification and Prediction: Classify Predictions
23 pages
ML RUSA Module 1 Intro
No ratings yet
ML RUSA Module 1 Intro
30 pages
1 - Module5 - Machine Learning
100% (1)
1 - Module5 - Machine Learning
78 pages
Unit 3
No ratings yet
Unit 3
12 pages
Aiml M3 C2
No ratings yet
Aiml M3 C2
56 pages
Aiml Module 3 Part 2
No ratings yet
Aiml Module 3 Part 2
12 pages
Machine Learning - v1
No ratings yet
Machine Learning - v1
30 pages
Replace All Valid Mathematical Equations With High
No ratings yet
Replace All Valid Mathematical Equations With High
6 pages
Chapter 4: Machine Learning
No ratings yet
Chapter 4: Machine Learning
30 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
AI Notes Module - 4
No ratings yet
AI Notes Module - 4
13 pages
ML Mid2 Ans
No ratings yet
ML Mid2 Ans
24 pages
What Is
No ratings yet
What Is
5 pages
Learning
No ratings yet
Learning
48 pages
Ann Unit-2 Imp
No ratings yet
Ann Unit-2 Imp
8 pages
Module 1
No ratings yet
Module 1
50 pages
NNML
No ratings yet
NNML
113 pages
INSTANCE Based Learning
No ratings yet
INSTANCE Based Learning
12 pages
Unit 01
No ratings yet
Unit 01
32 pages
Ai Lect6 Genetic
No ratings yet
Ai Lect6 Genetic
94 pages
Classification
No ratings yet
Classification
58 pages
ML Unit V
No ratings yet
ML Unit V
10 pages
NNFL Lecture 5 21 July 2021
No ratings yet
NNFL Lecture 5 21 July 2021
66 pages
Co-2 ML 2019
No ratings yet
Co-2 ML 2019
71 pages
DW&M Unit 3 Part I
No ratings yet
DW&M Unit 3 Part I
101 pages
Intorduction of ML
No ratings yet
Intorduction of ML
14 pages
Lazy vs. Eager Learning
No ratings yet
Lazy vs. Eager Learning
6 pages
Instance Based Learning
100% (1)
Instance Based Learning
49 pages
Artificial Intelligence Chapter 18 (Updated)
No ratings yet
Artificial Intelligence Chapter 18 (Updated)
19 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
12 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 5 ML-2-70

Uploaded by

Unit 5 ML-2-70

Uploaded by

Contents

• Instance Based Learning: k-Nearest Neighbor

• The Machine Learning systems which are categorized

• Firstly, we will choose the number of neighbors, so we will

Markov Decision Process

Considers the undiscounted sum of rewards over a finite number h of steps

We require that the agent learn a policy π that maximizes Vπ (st)

Refer the value function Vπ(s) an optimal policy as V(s).

• The 6 grid squares in this diagram represent 6 possible states, or locations,

How can an agent learn an optimal policy π * for an arbitrary

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Unit 5 ML-2-70

Uploaded by

Unit 5 ML-2-70

Uploaded by

Contents

• Instance Based Learning: k-Nearest Neighbor

• The Machine Learning systems which are categorized

• Firstly, we will choose the number of neighbors, so we will

Markov Decision Process

Considers the undiscounted sum of rewards over a finite number h of steps

We require that the agent learn a policy π that maximizes Vπ (st)

Refer the value function Vπ*(s) an optimal policy as V*(s).

• The 6 grid squares in this diagram represent 6 possible states, or locations,

How can an agent learn an optimal policy π * for an arbitrary

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Refer the value function Vπ(s) an optimal policy as V(s).