0% found this document useful (0 votes)

19 views9 pages

Prediction Errors Tech Report

This document summarizes four types of prediction errors: 1. Mislabeling errors occur when examples in the training set are incorrectly labeled. 2. Representation errors happen when the feature set does not adequately represent objects to allow the learner to identify the target classification function. 3. Learner errors arise when the learning algorithm is unable to identify a good hypothesis even when provided with a sufficient feature set and correctly labeled training examples. 4. Boundary errors happen when the training examples do not adequately cover the space of objects near the decision boundary of the target classification function.

Uploaded by

seemachu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views9 pages

Prediction Errors Tech Report

Uploaded by

seemachu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

A Characterization of Prediction Errors

Christopher Meek
Microsoft Research
One Microsoft Way
Redmond, WA 98052

Abstract

Understanding prediction errors and determining how to fix them is critical to

building effective predictive systems. In this paper, we delineate four types of
prediction errors (mislabeling, representation, learner and boundary errors) and
demonstrate that these four types characterize all prediction errors. In addition, we
describe potential remedies and tools that can be used to reduce the uncertainty
when trying to determine the source of a prediction error and when trying to take
action to remove a prediction error.

Introduction

Prediction errors arise in interactive machine learning systems (e.g., Fails and Olsen 2003), machine
teaching (e.g. Simard et al 2014), and when statisticians, scientists and engineers build predictive
systems. Our goal in this paper is to provide an exhaustive categorization of the types of prediction
errors and to provide guidance on actions one can take to remedy prediction errors. We suspect that
this will be helpful to both expert and non-expert users trying to leverage machine learning and
statistical models in building predictive systems.
Our characterization of prediction errors has four top-level categories; mislabeling, representation,
learner, and boundary errors. Each of these error types are associated with specific deficiencies that,
when identified, are potentially remedied. Furthermore, we prove that the categorization into these
error types is sufficient to characterize all prediction errors.
We also suggest actions that can be taken to detect and remove prediction errors. With the aim of re-
moving an entire type of prediction error from consideration we introduce the concept of a consistent
learning algorithm. We demonstrate that there are consistent learning algorithms and describe how,
when consistent learning algorithms are used, none of the prediction errors are learner errors. We
also describe how a teacher might benefit from the identification of an invalidation set; a minimal set
of labeled examples that contain one or more prediction errors. Finally we consider the implications
of these results for developing teaching protocols that help the teacher to take appropriate actions to
remedy prediction errors.

Related Work

The problem of debugging statistical models has been studies in a number of contexts. An excellent
example of this work is the work by Amershi et al (2015) who also provides references to other
related work. Our categorization of prediction errors extends the informal categorization provided
by Amershi et al (2015). In that work, the authors describe potential sources of prediction errors
in developing tools for identifying and exploring prediction errors. Specifically they consider three
sources of errors; insufficient data, feature deficiencies, and mislabeled data. In our categorization,
errors of insufficient data are a specific type of learner error that we call an objective error (they do
not consider other types of learner errors), feature deficiencies are a specific type of representation
error that we call feature blindness and mislabeled data is what we call mislabeling errors. Amershi
et al (2015) do not consider boundary errors.
The concept of an invalidation set is related to a number of existing concepts in the theory of ma-
chine learning include the exclusion dimension (Angluin 1994), the unique specification dimension
(Hedigus 1995), and the certificate size (Hellerstein et al 1996). Our focus, however, is on teaching
with both labels and features whereas previous work considers only teaching with labels.

Prediction Errors
In this section, we define the set of prediction errors that can arise when a teacher teaches a machine
to classify objects by providing labeled examples and features. In addition, we provide essential
definitions for the remainder of the paper.
We are interested in building a classifier of objects. We use x and xi to denote particular objects and
X to denote the set of objects of interest. We use y and yi for particular labels and Y to denote the
space of possible labels. For binary classification Y = {0, 1}. A classification function is a function
from X to Y .1 The set of classification functions is denoted by C = X → Y . We use c∗ to denote
the target classification function that the teacher wants to teach the machine to implement.
One essential ingredient that a teacher provides are features or functions which map objects to scalar
values. A feature fi (or gi ) is a function from objects to real numbers (i.e. fi ∈ X → R). We denote
the set of teachable feature functions by R = {f1 , f2 , . . .} and call a finite subset of R a feature set
(i.e., F ⊂ 2R ). Clearly not all feature functions are directly teachable — if the target classification
function were teachable then we would not need to provide labeled examples. The feature set Fi =
{fi,1 , . . . , fi,p } is p-dimensional. We use a p-dimensional feature set to map an object to a point
in Rp . We denote the mapped object xk using feature set Fi by Fi (xk ) = (fi,1 (xk ), . . . , fi,p (xk ))
where the result is a vector of length p where the j th entry is the result of applying the j th feature
function in Fi to the object.
Another essential ingredient that a teacher provides is a training set, a set of labeled examples.
A training set T ⊂ X × Y is a set of labeled examples. We say that the training set T has n
examples if |T | = n and denote the set of training examples as {(x1 , y1 ), . . . , (xn , yn )}. A train-
ing set is unfeaturized. We use feature sets to create featurized training sets. For p-dimensional
feature set Fi and an n example training set T we denote the featurized training set Fi (T ) =
{(Fi (x1 ), y1 ), . . . , (Fi (xn ), yn )} ∈ {Rp × Y }n . We call the resulting training set an Fi featurized
training set or the Fi featurization of training set T .
The method by which the machine learns a classification function is called a learning algorithm. A
learning algorithm is, in fact, a set of learning algorithms as we now describe. First, a d-dimensional
learning algorithm `d is a function that takes a p-dimensional feature set F and a training set T and
outputs a function hp ∈ Rp → Y . Thus, the output hp of a learning algorithm using Fi and training
set T can be composed with the functions in the feature set to yield a classification function of objects
(i.e., hp ◦ Fi ∈ C). The hypothesis space of a d-dimensional learning algorithm `d is the image of the
function `d and is denoted by H`d (or Hd if there is no risk of confusion). A classification function
c ∈ C is consistent with a training set T if ∀(x, y) ∈ T it is the case that c(x) = y. A d-dimensional
learning algorithm `d is consistent if the learning algorithm outputs a hypothesis consistent with the
training set whenever there is a hypothesis in Hd that is consistent with the training set. A vector
learning algorithm ` = {`0 , `1 , . . .} is a set of d-dimensional learning algorithms one for each
dimensionality. A consistent vector learning algorithm is one in which each of the d-dimensional
learning algorithms is consistent. Finally, a (feature-vector) learning algorithm L takes a feature set
F , a training set T , and a vector learning algorithm ` and returns a classification function in C. In
particular L` (F, T ) = `|F | (F, T ) ◦ F ∈ C. We say that a classification function c is F -L-learnable
if there exists a training set T such that L(F, T ) = c. We denote the set of F -L-learnable functions
by C(F, L). When the vector learning algorithm is clear from context or we are discussing a generic
vector learning algorithm we drop the ` and write L(F, T ). One important property of a feature set
is whether it is sufficient to teach the target classification function c∗ . A feature feature set F is
sufficient for learner L and target classification function c∗ if c∗ is F -L learnable (i.e. c∗ ∈ C(F, L)).
1
Note that, while we call this mapping a classification function, the definition encompasses a broad class of
prediction problems including structured prediction, entity extraction, and regression.

2
The central component of an interactive machine learning system for teaching a classification func-
tion is a teaching protocol. A teaching protocol is the method by which a teacher teaches a machine
learning algorithm. While not our primary focus in this paper, our interest in teaching protocols, is
that they (1) provide a means of illustrating the potential value of the results we provide and (2)
provide a valuable avenue for future exploration as alternative teaching protocols provide different
types of support for teachers in their efforts to build a classifier.
Finally, we define a definition for a prediction error. An object x ∈ X is a prediction error for
training set T , feature set F , and learning algorithm L if the trained classifier L(F, T ) = c does
not agree with the target classification function on the object (i.e., c(x) 6= c∗ (x)). We distinguish
two types of prediction errors; a training set prediction error in which the prediction errors is on an
object in the training set x ∈ TX = {x|(x, y) ∈ T } and a generalization error in which the object
is not in the training set (i.e., x ∈ X \ TX ).

A Characterization of Prediction Errors

In this section, we develop a categorization for prediction errors considering both training set and
generalization errors. We also demonstrate that our categorization is exhaustive, that is, we provide
a characterization of prediction errors. Our categorization is relative to a particular training set T ,
feature set F , and learning algorithm L. We describe four categories of errors: mislabeling errors,
representation errors, learner errors, and boundary errors. Generalization errors are of a different
nature than training set prediction errors due to the fact that they are not in the training set. This
difference is important because the teacher can only see a generalization error when they provide a
label for an object not in the training set. We classify the types of generalization errors relative to a
particular training set T , feature set F , and learning algorithm L by considering the result of adding
a correctly labeled version of the object to the training set (i.e., for generalization error x ∈ X \ TX
we use training set T 0 = T ∪ {(x, c∗ (x))}).

Mislabeling Errors

A mislabeling error is a labeled object such that the label does not agree with the target classification
function (i.e., a labeled example (x, y) such that y 6= c∗ (x)). At first glance it is not clear that
mislabeling errors have anything to do with a prediction error, however, mislabeling errors can give
rise to prediction errors. In particular, if the learned classifier matches the label of a mislabeled object
there will be a prediction error. For instance, if we have only one labeled object (x, 1) in a training
set and it is mislabeled then any consistent classifier will result in a prediction error. This type of
prediction error arises due to an error by the teacher (a.k.a. labeler) who provided an incorrectly
labeled example. We assume that a teacher, when confronted with a mislabeling error can correct
the label to match the target classification function. In practice this may not be the case due a number
of factors, including lack of clarity about the target classification function c∗ and teacher error (see,
e.g., Kulesza et al 2014).

Learner Errors

A learner error is a prediction error that arises due to the fact that the learner does not find a clas-
sification function that correctly predict the training set when such a learnable classifier exists (i.e.,
∃c ∈ C(F, L)∀(x, y) ∈ T c(x) = c∗ (x) and ∀c ∈ C(F, L) if (∀(x, y) ∈ T c(x) = c∗ (x)) then
L(F, T ) 6= c). Note that when considering a generalization error we use the augmented training set
T 0 . Typical learning algorithms select a function from the possible learnable classification function
C(F, L) using a fitness function or loss function. In this case, it is natural to consider two types of
learner errors; optimization errors and objective errors. In an optimization error, there is a learnable
classification function that correctly classifies the training set and the consistent classification func-
tion has a lower loss than L(F, T ). In an objective error, all learnable classification functions that
correctly classify the training set have higher loss than L(F, T ).

Representation Errors

A representation error is a prediction error that arises due to the fact that there is no learnable clas-
sification function that correctly predicts the training set (i.e., ∀c ∈ C(F, L)∃(x, y) ∈ T s.t. c(x) 6=

3
c∗ (x)). Again, for a generalization error we use the augmented training set T 0 . Representation errors
arise due to a limitation of the feature set, a limitation of the learning algorithm or both. More specif-
ically, an error can arise due to the feature-blindness of the learning algorithm — it does not have
access to features that distinguish objects — or that the hypothesis class of the learning algorithm is
impoverished (e.g., trying to learn the x-or function with a linear classifiers).

Boundary Errors

Our final type of prediction error is a type of generalization error. A boundary error is a prediction
error for an object x if adding (x, c∗ (x)) to the training set yields a classification function c0 that
correctly predicts the augmented training set (i.e., c = L(F, T ) and c0 = L(F, T 0 ) and c(x) 6=
c0 (x) = c∗ (x)).

Characterization of Prediction Errors

We conclude this section by providing characterizations of training set prediction errors and of pre-
diction errors. Our first proposition demonstrates that there are three types of training set prediction
errors.

Proposition 1 If there is a training set prediction error given a training set, feature set, and learning
algorithm then there is either a mislabeling, representation, or learner error.

Proof Let x be a training set prediction error for training set T , feature set F and learning algorithm
L. That means that there exists (x, y) ∈ T such that c = L(F, T ) and c(x) 6= c∗ (x). If there are
mislabeled examples in T we are done. If there are no mislabeled examples then it must be the case
that either there is or is not a classification function in C(F, L) that correctly classifies T . If there is
such a classification function then we have a learner error and if not we have a representation error.

The following Proposition demonstrates that the only other type of error required to capture the
types of prediction errors is the boundary error.

Proposition 2 If there is a prediction error given a training set, feature set, and learning algorithm
then there is either a mislabeling, representation, learner, or boundary error.

Proof Let x be a prediction error for training set T , feature set F and learning algorithm L. We
consider two cases:
Case 1: x is a training set prediction error. This case is handled in Proposition 1.
Case 2: x is a generalization error and there is no training set prediction error for F , T , and L. In this
case, we consider the augmented training set T 0 = T ∪ (x, c∗ (x)) to identify the type of prediction
error for x. If L(F, T 0 ) is consistent with the training set T 0 we have a boundary error. Otherwise,
as described in case 1, there must either be a learner error or representation error. Note that while it
might be the case that there are mislabeled objects not included in the training set, such mislabeling
errors are not generalization errors and not relevant as the mislabeling cannot be the source of a
prediction error because it is not included in the training set. Thus, every generalization error can be
associated with one three prediction error types.

Detecting and Removing Types of Prediction Errors

In this section we discuss the problem of identifying the type of a prediction error that arises while a
teacher teaches a classification function. We also discuss a potential approach to reducing the effort
required by the teacher to identify and remove prediction errors.

Detecting Boundary Errors

A boundary error is a generalization error, an error for the currently trained classification function to
correctly classify an unseen object. A boundary error can only arise in a teaching protocol in which

4
there are labeled examples that are not included in a training set. The most common scenario where
this happens is when there is a test set that is used to obtain an estimate of the prediction perfor-
mance of the learned classification function. A teaching protocol can automatically detect whether
prediction error is a boundary error by including the example in the training set and determining if
the resulting classification function correctly predicts the error. A teaching protocol can potentially
leverage such a test to choose when to sample examples, for instance, sampling more examples in a
region with demonstrable ignorance about the boundary. This is related to the motivation for using
uncertainty sampling as an active learning strategy (Settles 2012).

Detecting and Removing Learner Errors

It is possible to completely eliminate learner errors by choosing to use consistent learning algorithm
as the following proposition demonstrates.

Proposition 3 If there is a training set prediction error for feature set F and consistent learning
algorithm L then the error must be a mislabeling or representation error.

Proof Recall the definition of a consistent learning algorithm; a consistent learning algorithm returns
a classification function that correctly predicts the training set if there is a learnable classification
function that does so. To prove the proposition we assume that there is a prediction error that is not a
mislabeling or a representation error. From Proposition 1 we know that there must be a learner error.
In this case, there is a learnable classification function that correctly classifies the training set. From
the consistency of L and the lack of representation or mislabeling errors, we know that L(F, T )
correctly classifies T which implies there is no training set prediction error which is a contradiction.

We have demonstrated that consistent learning functions can be used to eliminate learner errors.
Next we demonstrate that consistent learning algorithms exist.

Proposition 4 Maximum-likelihood logistic regression is a consistent learner.

We have moved the proofs for several propositions to the end of the paper to improve readability.
The fact that maximum likelihood logistic regression is a consistent learner is due in part to the fact
that the optimization problem is convex. It is also due to fact that we have restricted the functional
form of the classification function to be a generalized linear form limiting the capacity of the learning
algorithm. The following example demonstrates that we need not limit the capacity of the learning
algorithm to have a consistent learning algorithm.

Proposition 5 One nearest-neighbor (1NN) is a consistent learner.

Recall that there are two types of learner errors; optimization and objective errors. Next we illustrate
how objective errors can arise when applying learning algorithms to prediction problems. The most
common way in which objective errors arise is when one adds regularization to reduce generalization
error. For logistic regression, this adds a penalty to the loss function that penalizes the length of the
weight vector (i.e., −λ||w||). Figure 1 illustrates the 0.5 decision boundaries for different choices of
regularization parameter λ. With λ = 0 we obtain a consistent learning algorithm but with λ = 0.5
and λ = 1.0 we see examples of objective errors. Similarly, if we consider k-nearest-neighbor
algorithms for k > 1 there are training sets that can fail to correctly classify the training set due to
the fact that the prediction for an object x in the training set is also a function of k − 1 other points
that might disagree on the prediction at x.
As argued above, we can remove learning errors from consideration by using a consistent learner.
It is not clear whether this is the best approach in all teaching scenarios. It might be the case that
it is beneficial to the teaching process to use an inconsistent learner to, for instance, improve gen-
eralization performance. In such circumstances, one might be able to leverage a family of learning
algorithms in which there is a regularization parameter that can control the potential for objective
errors. An example of such a family is the family of λ regularized logistic regression learners. When
using such a family, one can detect a learner error by training with different settings of the regular-
ization parameter.

5
Figure 1: A two-dimensional example demonstrating that regularized logistic regression can be in-
consistent.

Representation and Mislabeling Errors

Next we consider the problem of detecting representation and mislabeling errors assuming that we
have no learner errors. It follows, for instance, from Proposition 3 that this is the situation when
using a consistent learning algorithm.
In general, we cannot distinguish between mislabeling errors and representation errors. To see this,
consider a binary classification training set of two objects {(x1 , 1), (x2 , 0)}. In this situation, it is
possible that the target classification function is the constant function c∗ (x) = 1 and the label for x2
is a mislabeling error or that there is a feature function f1 that distinguishes the two objects (e.g.,
f1 (x1 ) = 5 and f1 (x2 ) = 7) in which case there is a representation error.
While one cannot hope to automatically distinguish mislabeling and representation errors, one can
hope that the teacher can detect and distinguish such errors when they are presented to the teacher.
One way in which a teaching protocol might help the teacher to detect and diagnose representation
and mislabeling errors is by identifying a small set of labeled examples to inspect. We propose
the use of an invalidation set for this purpose. An invalidation set is a training set of minimal size
containing a prediction error. By identifying a minimal training set with a prediction error we aim
to reduce the effort required by the teacher to determine whether prediction errors are mislabeling
errors or representation errors.
The next results bounds the size of an invalidation set for any consistent linear learner including
maximum-likelihood logistic regression and the one nearest neighbor classifier.

Proposition 6 If T has a prediction error for target concept c∗ using feature set F and L where L
is a consistent linear learner then an invalidation set has at most |F | + 2 examples.

Proposition 7 If T has a prediction error for target concept c∗ using feature set F and L where L
is a one-nearest-neighbor learner then an invalidation set has at most 2 examples.

Discussion

We begin our discussion by presenting a teaching protocol through which a teacher might teach a
machine a classification function. This provides a means to both summarize our results and highlight
open issues.

6
Algorithm 1 Error-Driven-Teaching-Protocol
Input consistent learning algorithm L, set of objects X.
T = {} // training set T ⊂ X × Y
F = {} // feature set F ∈ F
c = L(T, F );
while !Terminate() do
(x, y) = Add-labeled-example(X, F, T, L);
T = T ∪ (x, y);
c = L(T, F ); //remove boundary errors by retraining
while (∃(x, y) ∈ T such that c(x) 6= y) do
Identify invalidation set T 0 ⊂ T
found-mislabeled-example =Check-labels(T 0 )
if (found-mislabeled-example) then
Correct-Labels(T 0 ) //fix mislabeling error
else
Add-feature(); //fix representation error
end if
end while
end while
return c;

Algorithm 1 describes a teaching protocol that illustrates one potential use of our categorization
of prediction errors. The teaching protocol uses the teacher to address particular sub-problems as
indicated by the underlined function calls. In particular, the teacher is required to determine whether
to terminate the teaching session, to choose a new example to label for the training set, to check and
correct labels and to add features.
The teaching protocol in Algorithm 1 assumes the use of a consistent learning algorithm which
removes the need to consider learner errors. After adding a new labeled example, the classifier is
immediately retrained to remove any potential boundary errors. Finally, we use the concept of an
invalidation set to reduce the effort required to identify and correct mislabeling and representation
errors.
This is an idealized teaching protocol but, as such, points to important research directions for pro-
viding support for teachers. These directions include support for choosing which item to select and
label, choosing which feature to add, choosing when to terminate the teaching effort and support
exploration of the space of objects and evolution of the target classification function.

Proofs
In this section we provide the proof for several proposition. Some of the proofs rely on convex ge-
ometry and linear algebra. We assume that the reader is familiar with basic concepts and elementary
results from convex geometry and linear algebra. We denote the convex closure of a set of points by
conv(S).
Proposition 4 Maximum-likelihood logistic regression is a consistent learner.

Proof We consider binary classification Y = {0, 1} using a d-dimensional feature set F . We use
w ∈ Rd , b ∈ R to parameterize our logistic regression. The likelihood function for logistic re-
gression is P r(Y = y|X = x, F, w, b) = exp((w Q · F (x) + b)y)/(1 + exp(w · F (x) + b)). The
maximum-likelihood estimator is ArgM axw,b (xi ,yi )∈T P r(yi |F (xi ), w, b). This function is a
convex function and, as such, we can guarantee that we do not have optimization errors. The like-
lihood maps featurized objects to real numbers and is thus not a binary classification function. We
can map a likelihood function into a classification function via a threshold. We will use a threshold
of 0.5 and thus c(x) = 1 if P r(Y = 1|X = x, F, w, b) > 0.5 and c(x) = 0 otherwise.
We reparameterize the likelihood function using the following definitions: w0 = w/||w||, b0 =
−b/||w||,β = ||w||, and d(x, w0 , b0 , F ) = w0 · F (x) − b0 where || · || is the Euclidean
length of the vector w. The likelihood function is then P r(Y = y|X = x, F, w0 , b0 , β) =

7
exp(βd(x, w0 , b0 , F )y)/(1 + exp(βd(x, w0 , b0 , F ))). This parameterization has a natural interpreta-
tion. The decision boundary (probability 0.5) for logistic regression is Hypw,b,F = {x|w·F (x)+b =
0} and the function d(x, w0 , b0 , F ) is the signed distance of a point to the decision boundary. The
parameter β controls the steepness of the logistic function (e.g., the slope of the likelihood at a point
on the decision boundary in the direction normal to the decision boundary). It follows that if a set
of points is linearly separable then the limiting likelihood is 1. In particular, using any separating
hyperplane to define w0 and increasing the slope parameter β will increase the likelihood with the
likelihood approaching 1.
To prove the claim we assume that maximum-likelihood logistic regression is not consistent. In this
case, there exists a feature set F and a training set T such that there is a learnable classification
function c using F and maximum-likelihood logistic regression such that c correctly classifies T
but that the classification function c0 = L(F, T ) does not correctly classify T . Note that due to the
convexity of the optimization problem we do not have an optimization error which implies that the
prediction error is an objective error. To prove the claim we need to demonstrate that cannot be the
case. At this point we know that there must be a labeled example (x, y) ∈ T such that c0 (x) 6= y.
In this case, the point F (x) is on the incorrect side of the decision boundary and thus the likelihood
for that point is at most 1/2. The likelihood on the other points is at most 1. Thus the maximum
likelihood obtainable on a training set with at least one error is at most 1/2. We argued above,
however, that the likelihood on a separable problem will approach 1 thus we have a contradiction.

Proposition 5 One nearest-neighbor (1NN) is a consistent learner.

Proof Nearest-neighbor algorithms are memorization learning algorithms. As defined above, a train-
ing set can only have one label per object (i.e., T ⊂ X × Y ). It is straight-forward to relax this as-
sumption but we choose not to do so here. A given d-dimensional feature set F might map multiple
training set objects to the same point in Rd in which case there is not a unique nearest neighbor. In
this case, we assume that the 1NN algorithm chooses one a canonical object from the set of zero dis-
tance neighbors (e.g., according to some ordering over the objects). In this case, if all of the objects
in each of these zero distance neighbor sets (subsets of the training set) has the same target label
then the resulting classifier is consistent. If, however, there is a set of zero distance neighbors that
contain objects with different target labels then the resulting classifier is not consistent but, in this
case, no consistent 1NN classification function using F is possible.
Lemma 1 [Kirchberger 1903; Shimrat 1955] Two finite sets S, T ⊂ Rd are strictly separable by
some hyperplane if and only if for every set U consisting of at most d + 2 points from S ∪ T the sets
U ∩ S and U ∩ T can be strictly separated.
Proposition 6 If T has a prediction error for target concept c∗ using feature set F and L where L
is a consistent linear learner then an invalidation set has at most |F | + 2 examples.

Proof Let X be our set of objects. Again define Define S = {F (x) ∈ Rd |x ∈ X and c∗ (x) = 1}
and T = {F (x) ∈ Rd |x ∈ X and c∗ (x) = 0}. Because F is not linearly sufficient there is no
separating hyperplane for S and T . From Lemma 1 and the fact that F is d-dimensional, we know
that there must be a subset U ⊂ {F (x)|x ∈ X} where |U | ≤ d + 2 and such that U ∩ S and U ∩ T
are not separated by any hyperplane.

Proposition 7 If T has a prediction error for target concept c∗ using feature set F and L where L
is a one-nearest-neighbor learner then an invalidation set has at most 2 examples.

Proof A 1NN classifier can have an invalidation set for a p-dimensional feature set only if the feature
set maps two objects with different class labels to the same point in Rp . A set with one such object
from each class is an invalidation set.

Acknowledgments
Thanks to Patrice Simard, Max Chickering, Jina Suh, Carlos Garcia Jurado Suarez, and Xanda
Schofield for helpful discussions about prediction errors.

8
References
Amershi, S.; Chickering, M.; Drucker, S. M.; Lee, B.; Simard, P.; and Suh, J. 2015. Modeltracker:
Redesigning performance analysis tools for machine learning. In Proceedings of the 33rd Annual
ACM Conference on Human Factors in Computing Systems, CHI ’15, 337–346. New York, NY,
USA: ACM.
Angluin, D. 2004. Queries revisited. Theor. Comput. Sci. 313(2):175–194.
Fails, J. A., and Olsen, Jr., D. R. 2003. Interactive machine learning. In Proceedings of the 8th
International Conference on Intelligent User Interfaces, IUI ’03, 39–45. New York, NY, USA:
ACM.
Hegedűs, T. 1995. Generalized teaching dimensions and the query complexity of learning. In
Proceedings of the Eighth Annual Conference on Computational Learning Theory, COLT ’95,
108–117. New York, NY, USA: ACM.
Hellerstein, L.; Pillaipakkamnatt, K.; Raghavan, V.; and Wilkins, D. 1996. How many queries are
needed to learn? J. ACM 43(5):840–862.
Kirchberger, P. 1903. Über tschebyscheffsche annäherungsmethoden. Mathematishe Annalen
57:509–540.
Kulesza, T.; Amershi, S.; Caruana, R.; Fisher, D.; and Charles, D. 2014. Structured labeling for
facilitating concept evolution in machine learning. In Proceedings of the SIGCHI Conference on
Human Factors in Computing Systems, CHI ’14, 3075–3084. New York, NY, USA: ACM.
Settles, B. 2012. Active Learning. Synthesis Lectures on Artificial Intelligence and Machine
Learning. Morgan & Claypool.
Shimrat, M. 1955. Simple proof of a theorem of P. Kirchberger. Pacific J. Math. 5(3):361–362.
Simard, P.; Chickering, D.; Lakshmiratan, A.; Charles, D.; Bottou, L.; Suarez, C.; Grangier, D.;
Amershi, S.; Verwey, J.; and Suh, J. 2014. ICE: Enabling Non-Experts to Build Models Interac-
tively for Large-Scale Lopsided Problems. ArXiv e-prints.

Data Mining Classification and Prediction
No ratings yet
Data Mining Classification and Prediction
17 pages
Quality Improvement Tool Poka Yoke
No ratings yet
Quality Improvement Tool Poka Yoke
27 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
Lec15 Regression Gradient Descent
No ratings yet
Lec15 Regression Gradient Descent
84 pages
ERROR and Confusion Matrix
No ratings yet
ERROR and Confusion Matrix
29 pages
DM - Ch4 - Classification (Part1)
No ratings yet
DM - Ch4 - Classification (Part1)
20 pages
Learning3 6pp
No ratings yet
Learning3 6pp
15 pages
Hamming Code: N (Where N 0.1.2 .N) I.E
No ratings yet
Hamming Code: N (Where N 0.1.2 .N) I.E
4 pages
19 Image Classification
No ratings yet
19 Image Classification
78 pages
CS229
No ratings yet
CS229
216 pages
1 Intro
No ratings yet
1 Intro
5 pages
Ds 2
No ratings yet
Ds 2
27 pages
Unit 3
No ratings yet
Unit 3
28 pages
School of Computing and Information Systems The University of Melbourne COMP90049 Introduction To Machine Learning (Semester 1, 2022)
No ratings yet
School of Computing and Information Systems The University of Melbourne COMP90049 Introduction To Machine Learning (Semester 1, 2022)
4 pages
16 Human Error
No ratings yet
16 Human Error
42 pages
Uncertainty Notes
No ratings yet
Uncertainty Notes
166 pages
3 Bias Variance Tradeoff
No ratings yet
3 Bias Variance Tradeoff
9 pages
Unit 1-1
No ratings yet
Unit 1-1
75 pages
Lec 25
No ratings yet
Lec 25
15 pages
CS168: The Modern Algorithmic Toolbox Lecture #5: Generalization (Or, How Much Data Is Enough?)
No ratings yet
CS168: The Modern Algorithmic Toolbox Lecture #5: Generalization (Or, How Much Data Is Enough?)
16 pages
CH 6
No ratings yet
CH 6
24 pages
Week 3
No ratings yet
Week 3
56 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
Unit Ii
No ratings yet
Unit Ii
118 pages
Main
No ratings yet
Main
5 pages
CrossValidation - Permutation Test For Studying Classifier Performance
No ratings yet
CrossValidation - Permutation Test For Studying Classifier Performance
31 pages
Error in Analysis
100% (1)
Error in Analysis
10 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
Mock Exams 2024
No ratings yet
Mock Exams 2024
81 pages
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
No ratings yet
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
20 pages
Lec 3
No ratings yet
Lec 3
21 pages
4-Errors, Irregularities, Fraud
No ratings yet
4-Errors, Irregularities, Fraud
6 pages
Slides On DataI
No ratings yet
Slides On DataI
33 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
Humman Error
No ratings yet
Humman Error
20 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
ML LVC 3 Post-Session Summary
No ratings yet
ML LVC 3 Post-Session Summary
16 pages
Machine Learning Lecture Notes
No ratings yet
Machine Learning Lecture Notes
119 pages
Supervised Learning
No ratings yet
Supervised Learning
5 pages
INT354 - Unit 1
No ratings yet
INT354 - Unit 1
72 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Perception Errors
No ratings yet
Perception Errors
4 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Six Lectures On NN - Montanari
No ratings yet
Six Lectures On NN - Montanari
77 pages
CS229 Andrew NG Lecture Notes
No ratings yet
CS229 Andrew NG Lecture Notes
216 pages
Error Detection and Correction
No ratings yet
Error Detection and Correction
44 pages
cs188 Fa23 Note21
No ratings yet
cs188 Fa23 Note21
8 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
j077 2011 KulHar WileyTutorial
No ratings yet
j077 2011 KulHar WileyTutorial
14 pages
Machine Learning
No ratings yet
Machine Learning
20 pages
7Cs of Effective Communication - Notes Prepared by SM
100% (11)
7Cs of Effective Communication - Notes Prepared by SM
10 pages
Classification
No ratings yet
Classification
53 pages
Fit Without Fear - Remarkable Mathematical Phenomena of Deep Learning Through The Prism of Interpolation
No ratings yet
Fit Without Fear - Remarkable Mathematical Phenomena of Deep Learning Through The Prism of Interpolation
51 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
An Adventure of Epic Porpoises
No ratings yet
An Adventure of Epic Porpoises
174 pages
© Ncert Not To Be Republished: Collection of Data
No ratings yet
© Ncert Not To Be Republished: Collection of Data
13 pages
ML 01
No ratings yet
ML 01
24 pages
Cyclic Codes: EE 430 / Dr. Muqaibel
No ratings yet
Cyclic Codes: EE 430 / Dr. Muqaibel
55 pages
1.3-1.4 Scalar N Vector
No ratings yet
1.3-1.4 Scalar N Vector
54 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
SML Book Draft Latest
No ratings yet
SML Book Draft Latest
194 pages
6036 Lecture Notes
No ratings yet
6036 Lecture Notes
56 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
Data Analytis Using Advanced Excel
No ratings yet
Data Analytis Using Advanced Excel
5 pages
Punim Diplomeee
No ratings yet
Punim Diplomeee
60 pages
Chapter Introduction
No ratings yet
Chapter Introduction
7 pages
NPS Troubleshooting
No ratings yet
NPS Troubleshooting
28 pages
Poly Aml
No ratings yet
Poly Aml
76 pages
Lecture 2 - Supervised Learning
No ratings yet
Lecture 2 - Supervised Learning
6 pages
Error Detection and Correction
No ratings yet
Error Detection and Correction
29 pages
Examen Evau 2019 Ingles
No ratings yet
Examen Evau 2019 Ingles
8 pages
Lecture 2 - Characteristics of MS
No ratings yet
Lecture 2 - Characteristics of MS
33 pages
SG Recency Primacy Words
No ratings yet
SG Recency Primacy Words
1 page
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Barney Error 101 (Ft. Nightbot 0)
No ratings yet
Barney Error 101 (Ft. Nightbot 0)
116 pages
Lecture Notes For Week 5-1
No ratings yet
Lecture Notes For Week 5-1
14 pages
Error Correction and Detection
No ratings yet
Error Correction and Detection
36 pages
Applied Error Analysis: Data Collection
No ratings yet
Applied Error Analysis: Data Collection
5 pages
Topic 4
No ratings yet
Topic 4
33 pages
Static and Dynamic Characteristics of Measuring Instruments
No ratings yet
Static and Dynamic Characteristics of Measuring Instruments
6 pages
Module 5 - 2
No ratings yet
Module 5 - 2
6 pages
EWIT - 3rd Sem Certificate - BRANCH WISE-with Number
No ratings yet
EWIT - 3rd Sem Certificate - BRANCH WISE-with Number
247 pages
This Story Paraphrased From A Post On 9/4/12
No ratings yet
This Story Paraphrased From A Post On 9/4/12
7 pages
Beginning Sentence Correction 8 PDF
No ratings yet
Beginning Sentence Correction 8 PDF
1 page
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
CRC16 Sample Code
No ratings yet
CRC16 Sample Code
2 pages
Sqa Unit-1
No ratings yet
Sqa Unit-1
21 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Prediction Errors Tech Report

Uploaded by

Prediction Errors Tech Report

Uploaded by

A Characterization of Prediction Errors

Understanding prediction errors and determining how to fix them is critical to

A Characterization of Prediction Errors

Characterization of Prediction Errors

Detecting and Removing Types of Prediction Errors

Detecting Boundary Errors

Detecting and Removing Learner Errors

Proposition 4 Maximum-likelihood logistic regression is a consistent learner.

Proposition 5 One nearest-neighbor (1NN) is a consistent learner.

Representation and Mislabeling Errors

Proposition 5 One nearest-neighbor (1NN) is a consistent learner.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.