0% found this document useful (0 votes)
26 views21 pages

ML Unit3

Uploaded by

svkr00001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views21 pages

ML Unit3

Uploaded by

svkr00001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

MACHINE LEARNING

NOTES MATERIAL
UNIT 3

For
B. TECH (CSE)
3rd YEAR – 2nd SEM (R18)

RAVIKRISHNA B

DEPARTMENT OF CSE
VIGNAN INSTITUTE OF TECHNOLOGY & SCIENCE
DESHMUKHI
IV B. Tech (CSE) MACHINE LEARNING

Bayesian Learning:
Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical approaches to certain
types of learning problems
Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct.
Prior knowledge can be combined with observed data.
Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities
Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal
decision making against which other methods can be measured

Why Bayesian Classification?


 Provides practical learning algorithms
 Naïve Bayes learning
 Bayesian belief network learning
 Combine prior knowledge (prior probabilities)
 Provides foundations for machine learning
 Evaluating learning algorithms
 Guiding the design of new algorithms
 Learning from models : meta learning

Basic Formulas for Probabilities:


Product Rule : probability P(A^B) of a conjunction of two events A and B:
P( A ^ B)  P( A | B) P( B)  P( B | A) P( A)
Sum Rule: probability of a disjunction of two events A and B:
P( AVB)  P( A)  P( B)  P( A ^ B)
Theorem of Total Probability : if events A1, …., An are mutually exclusive with
n
P( B)   P( B | Ai ) P( Ai )
i 1

VIGNAN – VITS – CSE Page 2


IV B. Tech (CSE) MACHINE LEARNING

Naïve Bayesian Classification:

 Naïve assumption: attribute independence


 P(x1,…,xk|C) = P(x1|C)·…·P(xk|C)
 If i-th attribute is categorical:
P(xi|C) is estimated as the relative freq of samples having value xi as i-th attribute in class C

VIGNAN – VITS – CSE Page 3


IV B. Tech (CSE) MACHINE LEARNING

 If i-th attribute is continuous:


P(xi|C) is estimated thru a Gaussian density function
 Computationally easy in both cases

What is Naive Bayes algorithm?


It is a classification technique based on Bayes’ Theorem with an assumption of independence among predictors. In
simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the
presence of any other feature.
For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches in diameter. Even if these
features depend on each other or upon the existence of the other features, all of these properties independently
contribute to the probability that this fruit is an apple and that is why it is known as ‘Naive’.
Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes
is known to outperform even highly sophisticated classification methods.
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c). Look at the equation
below:

Above,
 P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
 P(c) is the prior probability of class.
 P(x|c) is the likelihood which is the probability of predictor given class.
 P(x) is the prior probability of predictor.

How Naive Bayes algorithm works?

Let’s understand it using an example. Below I have a training data set of weather and corresponding target variable
‘Play’ (suggesting possibilities of playing). Now, we need to classify whether players will play or not based on weather
condition.
Problem: Players will play if weather is sunny. Is this statement is correct?
We can solve it using above discussed method of posterior probability.

VIGNAN – VITS – CSE Page 4


IV B. Tech (CSE) MACHINE LEARNING

P( Sunny / Yes ) * P (Yes )


P Yes / Sunny  
P ( Sunny )
3
P( Sunny / Yes )   0.33
9
9
P(Yes )   0.64
14
5
P( Sunny )   0.36
14

P( Sunny / Yes ) * P (Yes )


 P Yes / Sunny  
P ( Sunny )
0.33*0.64
 P Yes / Sunny    0.60
0.36
 P Yes / Sunny   0.60
Here we have has higher probability.
Naive Bayes uses a similar method to predict the probability of different class based on various attributes. This
algorithm is mostly used in text classification and with problems having multiple classes.

Let’s follow the below steps to perform it.


Step 1: Convert the data set into a frequency table
Step 2: Create Likelihood table by finding the probabilities like Overcast probability = 0.29 and probability of playing
is 0.64.

Step 3: Now, use Naive Bayesian equation to calculate the posterior probability for each class. The class with the
highest posterior probability is the outcome of prediction.

What are the Pros and Cons of Naive Bayes?


Pros:
 It is easy and fast to predict class of test data set. It also perform well in multi class prediction
 When assumption of independence holds, a Naive Bayes classifier performs better compare to
other models like logistic regression and you need less training data.
 It perform well in case of categorical input variables compared to numerical variable(s). For numerical variable,
normal distribution is assumed (bell curve, which is a strong assumption).
Cons:
 If categorical variable has a category (in test data set), which was not observed in training data set, then model
will assign a 0 (zero) probability and will be unable to make a prediction. This is often known as “Zero
Frequency”. To solve this, we can use the smoothing technique. One of the simplest smoothing techniques is
called Laplace estimation.
 On the other side naive Bayes is also known as a bad estimator, so the probability outputs from predict_proba
are not to be taken too seriously.
 Another limitation of Naive Bayes is the assumption of independent predictors. In real life, it is almost
impossible that we get a set of predictors which are completely independent.

Applications of Naive Bayes Algorithms


 Real time Prediction: Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for
making predictions in real time.
 Multi class Prediction: This algorithm is also well known for multi class prediction feature. Here we can predict
the probability of multiple classes of target variable.
 Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text classification
(due to better result in multi class problems and independence rule) have higher success rate as compared to

VIGNAN – VITS – CSE Page 5


IV B. Tech (CSE) MACHINE LEARNING

other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis
(in social media analysis, to identify positive and negative customer sentiments)
 Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation
System that uses machine learning and data mining techniques to filter unseen information and predict whether
a user would like a given resource or not.

BAYESIAN BELIEF NETWORKS


 A Bayesian belief network describes the probability distribution governing a set of variables by specifying a set
of conditional independence assumptions along with a set of conditional probabilities.
 In contrast to the naive Bayes classifier, which assumes that all the variables are conditionally independent
given the value of the target variable, Bayesian belief networks allow stating conditional independence
assumptions that apply to subsets of the variables.
 Thus, Bayesian belief networks provide an intermediate approach that is less constraining than the global
assumption of conditional independence made by the naive Bayes classifier, but more tractable than avoiding
conditional independence assumptions altogether.
 Bayesian belief networks are an active focus of current research, and a variety of algorithms have been
proposed for learning them and for using them for inference.

Sample complexity for finite hypothesis space:

Definition: The true error (denoted errorv(h)) of hypothesis h with respect to target concept c and distribution D is the
probability that h will misclassify an instance drawn at random according to D.
errorD (h)  Pr[c( x)  h( x)]
xD

Here the notation Pr indicates that the probability is taken over the instance distribution D.
xD

Sample complexity for finite hypothesis space:


PAC-learnability is largely determined by the number of training examples required by the learner.
The growth in the number of required training examples with problem size, called the sample complexity of the learning
problem, is the characteristic that is usually of greatest interest.
The reason is that in most practical settings the factor that most limits success of the learner is the limited availability
of training data.
Let us now present a general bound on the sample complexity for a very broad class of learners, called consistent
learners. A learner is consistent if it outputs hypotheses that perfectly fit the training data, whenever possible.
It is quite reasonable to ask that a learning algorithm be consistent, given that we typically prefer a hypothesis that
fits the training data over one that does not.
Can we derive a bound on the number of training examples required by any consistent learner, independent of the
specific algorithm it uses to derive a consistent hypothesis? The answer is yes.
To accomplish this, it is useful to recall the definition of version space
There we defined the version space, VS H , D to be the set of all hypotheses h  H that correctly classify the training
examples D.

VIGNAN – VITS – CSE Page 6


IV B. Tech (CSE) MACHINE LEARNING

VIGNAN – VITS – CSE Page 7


IV B. Tech (CSE) MACHINE LEARNING

VIGNAN – VITS – CSE Page 8


IV B. Tech (CSE) MACHINE LEARNING

VIGNAN – VITS – CSE Page 9


IV B. Tech (CSE) MACHINE LEARNING

VIGNAN – VITS – CSE Page 10


IV B. Tech (CSE) MACHINE LEARNING

VIGNAN – VITS – CSE Page 11


IV B. Tech (CSE) MACHINE LEARNING

What is a PAC Model? Explain in Detail.


Answer:
probably approximately correct (PAC):
We focus here on the problem of inductively learning an unknown target function, given only training examples of this
target function and a space of candidate hypotheses. Within this setting, we will be chiefly
concerned with questions such as how many training examples are sufficient to successfully learn the target function,
and how many mistakes will the learner make before succeeding. As we shall see, it is possible to set quantitative
bounds on these measures, depending on attributes of the learning problem such as:
 The size or complexity of the hypothesis space considered by the learner
 The accuracy to which the target concept must be approximated
 The probability that the learner will output a successful hypothesis
 The manner in which training examples are presented to the learner
 For the most part, we will focus not on individual learning algorithms, but
 Rather on broad classes of learning algorithms characterized by the hypothesis
 Spaces they consider, the presentation of training examples, etc.

Our goal is to answer questions such as:


Sample complexity. How many training examples are needed for a learner to converge (with high probability) to a
successful hypothesis?
Computational complexity. How much computational effort is needed for a learner to converge (with high probability)
to a successful hypothesis?
Mistake bound. How many training examples will the learner misclassify before converging to a successful hypothesis?

Note there are many specific settings in which we could pursue such questions. For example, there are various ways
to specify what it means for the learner to be "successful." We might specify that to succeed, the learner must output
a hypothesis identical to the target concept. Alternatively, we might simply require
that it output a hypothesis that agrees with the target concept most of the time, or that it usually output such a
hypothesis. Similarly, we must specify how training examples are to be obtained by the learner. We might specify that
training examples are presented by a helpful teacher, or obtained by the learner performing experiments, or simply
generated at random according to some process outside the learner's control. As we might expect, the answers to the
above questions depend on the particular setting, or learning model, we have in mind.

The Problem Setting


As in earlier chapters, let X refer to the set of all possible instances over which target functions may be defined. For
example, X might represent the set of all people, each described by the attributes age (e.g., young or old) and height
(short or tall).
Let C refer to some set of target concepts that our learner might be called upon to learn. Each target concept
c in C corresponds to some subset of X, or equivalently to some boolean-valued function c : X + {0, 1). For example,
one target concept c in C might be the concept "people who are skiers." If x is a positive example of c, then we will
write c(x) = 1; if x is a negative example, c(x) = 0.

We assume instances are generated at random from X according to some probability distribution D. For example, 2)
might be the distribution of instances generated by observing people who walk out of the largest sports store in Switzer-
land.

In general, D may be any distribution, and it will not generally be known to the learner. All that we require of D is that
it be stationary; that is, that the distribution not change over time. Training examples are generated by drawing an
instance x at random according to D, then presenting x along with its target value, c(x), to the learner.
The learner L considers some set H of possible hypotheses when attempting to learn the target concept. For example,
H might be the set of all hypotheses describable by conjunctions of the attributes age and height. After observing a
sequence of training examples of the target concept c, L must output some hypothesis
h from H, which is its estimate of c. To be fair, we evaluate the success of L by the performance of h over new instances
drawn randomly from X according to D, the same probability distribution used to generate the training data.

VIGNAN – VITS – CSE Page 12


IV B. Tech (CSE) MACHINE LEARNING

Within this setting, we are interested in characterizing the performance of various learners L using various hypothesis
spaces H, when learning individual target concepts drawn from various classes C. Because we demand that L be general
enough to learn any target concept from C regardless of the distribution of training examples, we will often be interested
in worst-case analyses over all possible target concepts from C and all possible instance distributions D.

The true error (denoted errorv(h)) of hypothesis h with respect to target concept c and distribution D is the probability
that h will misclassify an instance drawn at random according to D.

Here the notation Pr indicates that the probability is taken over the instance x€D distribution V.

Figure shows this definition of error in graphical form. The concepts c and h are depicted by the sets of instances within
X that they label as positive. The error of h with respect to c is the probability that a randomly drawn instance will fall
into the region where h and c disagree (i.e., their set difference). Note we have chosen to define error over the entire
distribution of instances-not simply over the training examples-because this is the true error we expect to encounter
when actually using the learned hypothesis h on subsequent instances drawn from D. Note that error depends strongly
on the unknown probability distribution that addresses this important special case.

VIGNAN – VITS – CSE Page 13


IV B. Tech (CSE) MACHINE LEARNING

INSTANCE BASED LEARNING:

Eager Learning:

 In artificial intelligence, eager learning is a learning method in which


the system tries to construct a general, explicit description of input-
independent target function during training of the system, where
generalization beyond the training data is delayed until a query is
made to the system.

 The main advantage gained in employing an eager learning method, such as an artificial neural network, is
that the target function will be approximated globally during training, thus requiring much less space than
using a lazy learning system.

 Eager learning systems also deal much better with noise in the training data.

 Eager learning is an example of offline learning, in which post-training queries to the system have no effect on
the system itself, and thus the same query to the system will always produce the same result.

Lazy Learning: (Instance based Learning)

 Lazy learning is a learning method in which generalization of the


training data is delayed until a query is made to the system,
where the system tries to generalize the training data before
receiving queries.

 Lazy learning methods simply store the data and generalizing


beyond these data is postponed until an explicit request is made.

 Instance-based learning methods such as nearest neighbor and


locally weighted regression are conceptually straightforward approaches to approximating real-valued or
discrete-valued target functions.

 Instance based learning includes nearest Neighbour and locally weighted regression methods that assume
instances can be represented as points in a Euclidean space. It also includes case-based reasoning methods
that use more complex, symbolic representations for instances.

 Instance-based methods which are also referred as "lazy" learning methods because they delay processing
until a new instance must be classified. The lazy learner can create many local approximations.

 A key advantage of this kind of delayed, or lazy, learning is that instead of estimating the target function once
for the entire instance space, these methods can estimate it locally and differently for each new instance to be
classified.

 One disadvantage of instance-based approaches is that the cost of


classifying new instances can be high. This is due to the fact that
nearly all computation takes place at classification.

 A second disadvantage is too many instance-based approaches,


especially nearest neighbor approaches, is that they typically consider
all attributes of the instances when attempting to retrieve similar
training examples from memory. If the target concept depends on
only a few of the many available attributes, then the instances that
are truly most "similar" may well be a large distance apart.

VIGNAN – VITS – CSE Page 14


IV B. Tech (CSE) MACHINE LEARNING

Eager vs. Lazy Learning:


Eager Learning Lazy Learning
 Lazy learning methods simply store the data and
 Eager learning methods construct general, explicit generalizing beyond these data is postponed until an
description of the target function based on the explicit request is made.
provided training examples.  Lazy learning methods can construct a different
 Eager learning methods use the same approximation approximation to the target function for each
to the target function, which must be learned based encountered query instance.
on training examples and before input queries are  Lazy learning methods simply store the data and
observed. generalizing beyond these data is postponed until an
 Eager learning methods construct general, explicit explicit request is made.
description of the target function based on the  Lazy learning methods can construct a different
provided training examples. approximation to the target function for each
 Eager learning methods use the same approximation encountered query instance.
to the target function, which must be learned based  Lazy learning is very suitable for complex and
on training examples and before input queries are incomplete problem domains, where a complex
observed. target function can be represented by a collection of
less complex local approximations.

K - Nearest Neighbour Learning:


When do we use KNN algorithm?
KNN can be used for both classification and regression predictive problems. However, it is more widely used in
classification problems in the industry. To evaluate any technique we generally look at 3 important aspects:
1. Ease to interpret output
2. Calculation time
3. Predictive Power

KNN algorithm fairs across all parameters of considerations. It is commonly used for its easy of interpretation and low
calculation time.

How does the KNN algorithm work?

Let’s take a simple case to understand this algorithm. Following is a spread


of red circles (RC) and green squares (GS):

You intend to find out the class of the blue star (BS). BS can either be RC
or GS and nothing else. The “K” is KNN algorithm is the nearest neighbors
we wish to take vote from. Let’s say K = 3. Hence, we will now make a
circle with BS as center just as big as to enclose only three data points on
the plane. Refer to following diagram for more details:

How do we choose the factor K?


First let us try to understand what exactly does K influences in the
algorithm. If we see the last example, given that all the 6 training observation remain constant, with a given K value
we can make boundaries of each class. These boundaries will segregate RC from GS. The same way, let’s try to see
the effect of value “K” on the class boundaries. Following are the different boundaries separating the two classes with
different values of K.

VIGNAN – VITS – CSE Page 15


IV B. Tech (CSE) MACHINE LEARNING

Some other Simulations/examples of KNN :

Pseudo Code of KNN


We can implement a KNN model by following the below steps:

1. Load the data.


2. Initialize the value of k.
3. For getting the predicted class, iterate from 1 to total number of training data points
1. Calculate the distance between test data and each row of training data. Here we will use Euclidean
distance as our distance metric since it’s the most popular method. The other metrics that can be used
are Chebyshev, cosine, etc.
2. Sort the calculated distances in ascending order based on distance values.
3. Get top k rows from the sorted array.
4. Get the most frequent class of these rows.
5. Return the predicted class.

Description of the KNN:

The most basic instance-based method is the k-NEARESNT NEIGHBORING Algorithm. The main idea behind k-NN
learning is so-called majority voting. This algorithm assumes all instances correspond to points in the n-dimensional
n
space . The nearest neighbors of an instance are defined in terms of the standard Euclidean distance. More precisely,
let an arbitrary instance x be described by the feature vector
 a1 ( x), a2 ( x), a3 ( x),....an ( x) 
Where ar ( x) denoted the value of the rth attribute of instance x, then the distance between two instances xi and x j
defined to be d( xi , x j ) where


n
d( xi , x j )  r 1
(ar ( xi )  ar ( x j ))2
In nearest-neighbour learning the target function may be either discrete-valued or real-valued.
Let us first consider learning discrete-valued target functions of the form f: n
 V , where V is the finite set such
that V  {v1 , v2 ...., vm } .
The k-NEAREST NEIGHBOUR algorithm for approximation of a discrete-valued target function is given below.
^
As shown there, the value f ( xq ) returned by this algorithm as its estimate of f ( xq ) is just the most common value
of f among the k training examples nearest to xi . If we choose k = 1, then the 1-NEAREST NEIGHBOUR algorithm

assigns to f ( xi ) . The value f ( xi ) where xi is the training instance nearest to xq . For larger values of k, the
algorithm assigns the most common value among the k nearest training examples.

VIGNAN – VITS – CSE Page 16


IV B. Tech (CSE) MACHINE LEARNING

Basic KNN algorithm:

For each training example  x, f ( x)  , add the example to the list of training_examples .
Classification algorithm:
Given a query instance xq to be classified,
Let ( x1 , x2 , x3 ,...., xk ) denote the k instances from training_examples that are nearest to xq
^ k
Return f ( xq )  arg max   (v, f ( xi ))
vV i 1

Where  (a, b)  1 if a=b and where  (a, b)  0 otherwise.

The k-NEAREST NEIGHBOR algorithm is easily adapted to approximating continuous-valued target functions. To
accomplish this, we have the algorithm calculate the mean value of the k nearest training examples rather than calculate
their most common value. More precisely, to approximate a real-valued target function f : R n  R . We replace the
final line of the above algorithm by the line
k

^   (v, f ( x )) i
f ( xq )  i 1

k
VORONOI DIAGRAM:

 The diagram on the right side shows the shape of this decision surface
induced by 1-NEAREST NEIGHBOR over the entire instance space.
The decision surface is a combination of convex polyhedral
surrounding each of the training examples.

 For every training example, the polyhedron indicates the set of query
points whose classification will be completely determined by that
training example. Query points outside the polyhedron are closer to
some other training example. This kind of diagram is often called the
Voronoi diagram of the set of training examples.

Distance-Weighted NEAREST NEIGHBOUR Algorithm:


One obvious refinement to the k-NEAREST NEIGHBOR Algorithm is to weight the contribution of each of the k neighbors
according to their distance to the query point xq , giving greater weight to closer neighbors. For example, in the above
algorithm, which approximates discrete-valued target functions, we might weight the vote of each neighbor according
to the inverse square of its distance from xq .
This can be accomplished by replacing the final line of the algorithm by
^ k
f ( xq )  arg max  wi (v, f ( xi ))
vV i 1
1
Where wi 
d ( xq , xi )2

VIGNAN – VITS – CSE Page 17


IV B. Tech (CSE) MACHINE LEARNING

Remarks on k-NEAREST NEIGHBOR Algorithm:


 The distance-weighted k-NEAREST NEIGHBOR Algorithm is a highly effective inductive inference method for
many practical problems.
 It is robust to noisy training data and quite effective when it is provided a sufficiently large set of training data.
 Note that by taking the weighted average of the k neighbors nearest to the query point, it can smooth out the
impact of isolated noisy training examples.

What is the inductive bias of k-NEAREST NEIGHBOR Algorithm?

 The basis for classifying new query points is easily understood based on the algorithm specified above.
 The inductive bias corresponds to an assumption that the classification of an instance xq will be most similar
to the classification of other instances that are nearby in Euclidean distance.
 One practical issue in applying k-NEAREST NEIGHBOR Algorithm is that the distance between instances is
calculated based on all attributes of the instance (i.e., on all axes in the Euclidean space containing the
instances).

VIGNAN – VITS – CSE Page 18


IV B. Tech (CSE) MACHINE LEARNING

 This lies in contrast to methods such as rule and decision tree learning systems that select only a subset of the
instance attributes when forming the hypothesis.

Case-Based reasoning (CBR):


Case-based reasoning (CBR) is a problem solving paradigm that is different from other major A.I. approaches. A CBR
system can be used in risk monitoring, financial markets, defense, and marketing just to name a few. CBR learns from
past experiences to solve new problems. Rather than relying on a domain expert to write the rules or make associations
along generalized relationships between problem descriptors and conclusions, a CBR system learns from previous
experience in the same way a physician learns from his patients.

 Case-Based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions
of similar past problems.
 It is an approach to model the way humans think to build intelligent systems.
 Case-based reasoning is a prominent kind of analogy making.
 CBR: Uses a database of problem solutions to solve new problems.
 Store symbolic description (tuples or cases)—not points in a Euclidean space
 Applications: Customer-service (product-related diagnosis), legal ruling.
 Case-Based Reasoning is a well-established research field that involves the investigation of theoretical
foundations, system development and practical application building of experience-based problem solving with
base line of by remembering the past experience.
 It can be classified as a sub-discipline of Artificial Intelligence
 learning process is based on analogy but not on deduction or induction
 best classified as supervised learning(recall the distinction between supervised, unsupervised and
reinforcement learning methods typically made in Machine Learning)
 Learning happens in a memory-based manner.
 Case – previously made and stored experience item
 Case-Base – core of every case – based problem solver - collection of cases
Everyday examples of CBR:
 An auto mechanic who fixes an engine by recalling another car that exhibited similar symptoms
 A lawyer who advocates a particular outcome in a trial based on legal precedents or a judge who creates case
law.
 An engineer copying working elements of nature (practicing biomimicry), is treating nature as a database of
solutions to problems.
Few commercially/industrially really successful AI methods:
 Customer support, help-desk systems: diagnosis and therapy of customer‘s problems, medical diagnosis
 Product recommendation and configuration: e-commerce
 Textual CBR: text classification, judicial applications (in particular in the countries where common law (not civil
law) is applied)[like USA, UK, India, Australia, many others]
 Applicability also in ill-structured and bad understood application domains.

There are three main types of CBR that difer significantly from one another concerning case representation and
reasoning:

1. Structural (a common structured vocabulary, i.e. an ontology)

2. Textual (cases are represented as free text, i.e. strings)

3. Conversational

 A case is represented through a list of questions that varies from one case to another ; knowledge is
contained in customer / agent conversations.

Architecture of CBR/ CBR Cycle:

VIGNAN – VITS – CSE Page 19


IV B. Tech (CSE) MACHINE LEARNING

VIGNAN – VITS – CSE Page 20


IV B. Tech (CSE) MACHINE LEARNING

CBR Cycle:
Despite the many different appearances of CBR systems, the essentials of CBR are captured in a surprisingly simple
and uniform process model.
• The CBR cycle is proposed by Aamodt and Plaza.
• The CBR cycle consists of 4 sequential steps around the knowledge of the CBR system.
• RETRIEVE
• REUSE
• REVIZE
• RETAIN
RETRIEVE:
• One or several cases from the case base are selected, based on the modeled similarity.
• The retrieval task is defined as finding a small number of cases from the case-base with the highest similarity
to the query.
• This is a k-nearest-neighbor retrieval task considering a specific similarity function.
• When the case base grows, the efficiency of retrieval decreases => methods that improve retrieval
efficiency,
e.g. specific index structures such as kd-trees, case-retrieval nets, or discrimination networks.

REUSE:
• Reusing a retrieved solution can be quite simple if the solution is returned unchanged as the proposed
solution for the new problem.
• Adaptation (if required, e.g. for synthetic tasks).
• Several techniques for adaptation in CBR
- Transformational adaptation
- Generative adaptation
• Most practical CBR applications today try to avoid extensive adaptation for pragmatic reasons.

REVISE:
• In this phase, feedback related to the solution constructed so far is obtained.
• This feedback can be given in the form of a correctness rating of the result or in the form of a manually
corrected revised case.
• The revised case or any other form of feedback enters the CBR system for its use in the subsequent retain
phase.
• Retain
• The retain phase is the learning phase of a CBR system (adding a revised case to the case base).
• Explicit competence models have been developed that enable the selective retention of cases (because of the
continuous increase of the case-base).
• The revised case or any other form of feedback enters the CBR system for its use in the subsequent retain
phase.

RETAIN:
• The retain phase is the learning phase of a CBR system (adding a revised case to the case base).
• Explicit competence models have been developed that enable the selective retention of cases (because of the
continuous increase of the case-base).
• The revised case or any other form of feedback enters the CBR system for its use in the subsequent retain
phase.

VIGNAN – VITS – CSE Page 21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy