0% found this document useful (0 votes)
51 views26 pages

MATERIAL 5-Discriminant PDF

This document provides an overview of discriminant analysis for classification. It discusses two main types of classification - cluster analysis which aims to uncover groups from unclassified data, and discriminant function analysis which derives rules for classifying individuals into predefined groups based on their variable values. It provides examples of applications in different fields and discusses the assumptions and process of discriminant analysis, including deriving the discriminant function to maximize differences between group means.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views26 pages

MATERIAL 5-Discriminant PDF

This document provides an overview of discriminant analysis for classification. It discusses two main types of classification - cluster analysis which aims to uncover groups from unclassified data, and discriminant function analysis which derives rules for classifying individuals into predefined groups based on their variable values. It provides examples of applications in different fields and discusses the assumptions and process of discriminant analysis, including deriving the discriminant function to maximize differences between group means.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

DISCRIMINANT ANALYSIS

BIBLIOGRAPHY (I)

CHATFIELD, C. and COLLINS, A.J. (1980): Introduction to Multivariate


Analysis. Chapman and Hall, London.

CUADRAS, C. M. (2014). Nuevos Métodos de Análisis Multivariante.


CMC Editions. Barcelona.

KRZANOWSKI, W. J.(1988). Principles of multivariate analysis. Oxford


science publications.

MARDIA, K.V.; KENT, J.T. y BIBBY, J.M. (1979): Multivariate Analysis.


Academic Press, London.

MORRISON, D.F. (1978): Multivariate Statistical Methods. McGraw-Hill,


London.
RENCHER, A. (2002): Methods of Multivariate Analysis. Wiley.
SEBER, G. A. F. (1984): Multivariate Observations. Wiley, New York.
BIBLIOGRAPHY (II)
Everitt, B: An R and S-Plus Companion to Multivariate Analysis (Springer-
Verlag, 2005).
Everitt, B; Hothorn; T: An Introduction to Applied Multivariate Analysis
With R. (Springer-Verlag, 2011).
Hand, D; Mannila, H; Smyth, P: Principles of Data Mining. Cambridge,
2001.
Hastie; T, R. Tibshirani, J. Friedman: The Elements of Statistical Learning:
Data mining, inference, and prediction, 2nd Edition (Springer-Verlag, 2009).
Zelterman, D. (2015): Applied Multivariate Statistics with R. Springer.

CLASSIFICATION: DISCRIMINATION AND CLUSTERING


Classification is an important component of virtually all scientific research.
Statistical techniques concerned with classification are essentially of two
types. The first CLUSTER ANALYSIS (unsupervised learning) aim to
uncover groups of observations from initially unclassified data. There are
many such techniques.
The second DISCRIMINANT FUNCTION ANALYSIS; works with data that
is already classified into groups to derive rules for classifying new (and as
yet unclassified) individuals on the basis of their observed variable values.
The most well-known technique here is Fisher’s linear discriminant function
analysis.
This is the statistical classification problem, and relies on partitioning the
sample space so as to be able to discriminate between the different groups.
Data that has already been classified into groups is often known as a training
set, and methods which make use of such data are often known as supervised
learning algorithms.
INTRODUCTION
Discriminant analysis is useful for: (1) discriminating, on the basis of
variables; groups defined a priori, and (2) classifying new cases in groups
established a priori on the basis of a classification rule based on variables. It
was proposed by Sir. Ronald Fisher back in the 1930s, ‘The use of multiple
measurements in taxonomic problems’, Annals of Eugenics,1936.

Some applications: Voting intention, Archaeological Classification, Level of


profitability of companies, Medicine, Psychology, Biology, etc.
In different scientific disciplines classification plays a crucial role. Examples:
1) Classification of chemical elements in the periodic table, 2) Taxonomies
of animal or plant species.

VOTING INTENTION

In an interesting study on voting intention, developed in the Galician


community by Varela (1998), the possibilities of Discriminant Analysis to
predict the classification of subjects according to predefined groups were
studied. Specifically, the answers given by 1829 people over 18 years of age
to a questionnaire of 25 questions referring to political issues, economic
aspects, valuation of political leaders, etc. are analyzed. Among the
questions was one on what would be the meaning of their vote if elections
were held the next day.

Using the Discriminant Analysis technique, a total of 10 variables were


identified (answers to as many questions in the questionnaire) that
contributed to the construction of a discriminant function, from which
80.19% of individuals were correctly classified in the different voting
options.

Consequently, it was proposed to use the variables involved in the


discriminant function to estimate the meaning of the vote in those subjects
who had preferred not to pronounce themselves on this issue (those
included in the 'do not know/do not answer' option).
Discriminant Analysis can be used descriptively (Descriptive or exploratory
analysis of data) or inferentially. The application of Discriminant Analysis is
basically based on the following assumptions:
1.-Multivariate Normality
2.-Homogeneity of Variance-Covariance matrices.

The analysis starts from a matrix of data Y of n individuals in which p


variables have been measured, and it is assumed that the individuals are
divided into groups determined a priori.
From this, a discriminant mathematical model will be obtained against
which the profile of a new individual whose group is unknown will be
contrasted, to be assigned to one of them.

THE DISCRIMINANT FUNCTION FOR TWO GROUPS


The two populations to be compared have the same covariance matrix 𝚺
but distinct mean 𝛍𝟏 and 𝛍𝟐 . We work with samples 𝐲11 , 𝐲12 , . . . . , 𝐲1𝑛1 and
𝐲21 , 𝐲22 , . . . . , 𝐲2𝑛2 from the two populations. As usual, each vector
𝐲𝑖𝑗 consists of measurements on p variables. The discriminant function is
the linear combination of these p variables that maximizes the distance
between the two (transformed) group mean vectors. A linear combination
𝑧 = 𝐚′ 𝒚 transforms each observation vector to a scalar:
𝑧1𝑖 = 𝐚′ 𝐲1𝑖 = a1 y1i1 + a2 y1i2 + . . . . . +ap y1ip , i = 1, 2, … . , n1
𝑧2𝑖 = 𝐚′ 𝐲2𝑖 = a1 y2i1 + a2 y2i2 + . . . . . +ap y2ip , i = 1, 2, … . , n2
Hence the 𝑛1 + 𝑛2 observation vectors in the two samples,
𝐲11 𝐲21
𝐲12 𝐲22
⋮ ⋮
𝐲1𝑛1 𝐲2𝑛2

And transformed to scalars,


𝑧11 𝑧21
𝑧12 𝑧22
⋮ ⋮
𝑧1𝑛1 𝑧2𝑛2
We find the means:
𝑛1
𝑧1𝑖
𝑧̅1 = ∑ = 𝐚′ 𝐲̅1 and 𝑧̅2 = 𝐚′ 𝐲̅2
𝑛1
𝑖=1

𝑛1 1𝑖𝐲 𝑛2 2𝑖 𝐲
Where 𝐲̅1 = ∑𝑖=1 and 𝐲̅2 = ∑𝑖=1
𝑛 1 𝑛 2

We then looked for a new variable and a linear combination of the observed
variables, 𝑧 = 𝐚′ 𝒚, which shows the greatest differences between the
means of the two groups in such a way as to allow us to classify one of them
with the maximum possible resolution.

The means of the values of the new variable for each group are:
z̅1 = 𝐚′ 𝐲̅1 z̅2 = 𝐚′ 𝐲̅2
The difference between means is, then:
z̅1 − z̅2 = 𝐚′ 𝐲̅1 − 𝐚′ 𝐲̅2 = 𝐚′ (𝐲̅1 − 𝐲̅2 )
It is therefore a question of maximizing the expression:
| 𝐚′ (𝐲̅1 − 𝐲̅2 )|
Or equivalently, maximize the standardized distance:
(z̅1 − z̅2 )2 (z̅1 − z̅2 )2
=
𝑠𝑧2 𝒂′ 𝐒𝐩𝐥 𝒂
( 𝐚′ 𝐲̅1 − 𝐚′ 𝐲̅2 )2 [ 𝐚′ (𝐲̅1 − 𝐲̅2 )]𝟐
= =
𝒂′ 𝐒𝐩𝐥 𝒂 𝒂′ 𝐒𝐩𝐥 𝒂

Subject to the restriction: 𝒂′ 𝐒𝐩𝐥 𝒂 =1, because you want the variability
within the groups in the new variable to be one. This maximization problem
is solved using the Lagrange Multipliers Method, and the solution is given
by:

𝐚 = 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )

Then, the discriminant function that we're looking for, is:


𝑧 = 𝐚′ 𝒚

𝑧 = (𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )) 𝒚

𝑧 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲


Note: If
𝐚 = 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )
Then:
(z̅1 − z̅2 )2
2
= (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )
𝑠𝑧
Proof:
(z̅1 − z̅2 )2 [ 𝐚′ (𝐲̅1 − 𝐲̅2 )]𝟐
=
𝑠𝑧2 𝒂′ 𝐒𝐩𝐥 𝒂

But 𝐚 = 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 ):


(z̅1 − z̅2 )2 [ (𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 ))′ (𝐲̅1 − 𝐲̅2 )]𝟐
=
𝑠𝑧2 (𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 ))′ 𝐒𝐩𝐥 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )

[ ( 𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )]𝟐


=
( 𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐒𝐩𝐥 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )

[ ( 𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )]𝟐


=
( 𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )

(z̅1 − z̅2 )2
2
= (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )
𝑠𝑧

Standardized distance Standardized distance


Between z̅1 𝑦 z̅2 Between 𝐲̅1 y 𝐲̅2
Then, a is obtained as:

And the discriminant function is:


EJEMPLO
Veamos el siguiente ejemplo tomado de Cuadras (2014), pag. 216.
Copépodos.

Mytilicola intestinalis es un copépodo parásito del mejillón, que en estado


larval presenta diferentes estadios de crecimiento. El primer estadio
(Nauplis) y el segundo estadio (Metanauplius) son difíciles de distinguir.
Sobre una muestra de n1 = 76 y n2 = 91 copépodos que se pudieron
identificar al microscopio como del primero y segundo estadio
respectivamente,
se midieron las variables
l = longitud, a = anchura.
CLASSIFICATION ANALYSIS: ALLOCATION OF
OBSERVATIONS TO GROUPS (C 9)

The descriptive aspect of discriminant analysis, in which group separations


characterized by means of discriminant functions, was covered in the last
class. We turn now to allocation of observations to groups, which is the
predictive aspect of discriminant analysis. Classification is often referred to
simply as discriminant analysis. In engineering and computer science,
classification is usually called pattern recognition. Some writers use the
term classification analysis to describe cluster analysis, in which the
observations are clustered according to variable values rather than into
predefined groups.
INTUITIVE APPROACH

Let us start by assuming that our p-dimensional observations belong to one


of k groups, and that there is a mean vector associated with each group, so
the group means are 𝛍1 , 𝛍2 , . . . , 𝛍k. Now suppose that we are given an
observation y ∈ ℝ𝑝 . Which group should we allocate the observation to?
One obvious method would be to allocate the observation to the group with
mean that is closest to y. That is, we could allocate y to group i if:

‖𝐲 − 𝛍𝑖 ‖ < ‖𝐲 − 𝛍𝑗 ‖ ∀ 𝑗 ≠ 𝑖

This is an example of a classification rule. Note that it has the effect of


partitioning ℝ𝑝 into k regions, 𝑅1 , 𝑅2 , . . . , 𝑅𝑘 , where 𝑅𝑖 ⊆ ℝ𝑝 , 𝑖 =
1, 2, . . . , 𝑘, 𝑅𝑖 ∩ 𝑅𝑗 = ∅, ∀ 𝑖 ≠ 𝑗 and ⋃𝑘𝑖=1 𝑅𝑖 = ℝ𝑝 in such a way that y is
assigned to group i if 𝐲 ∈ 𝑅𝑖 . Different classification rules lead to different
partitions, and clearly some methods of choosing the partition will be
more effective than others for ensuring that most observations will be
assigned to the correct group. In this case, the partition is known as the
Voronoi tessellation of ℝ𝑝 generated by the ‘seeds’ 𝛍1 , 𝛍2 , . . . , 𝛍k. The
boundaries between the classes are piecewise linear, and to see why, we
will begin by considering the case k = 2. In this case one is typically
interested in deciding whether a particular binary variable of interest is
“true” or “false”. Examples include deciding whether or not a patient has a
particular disease, or deciding if a manufactured item should be flagged as
potentially faulty.

Case k=2:
‖𝐲 − 𝛍1 ‖ < ‖𝐲 − 𝛍2 ‖
⟺ ‖𝐲 − 𝛍1 ‖2 < ‖𝐲 − 𝛍2 ‖2
⟺ (𝐲 − 𝛍1 )′ (𝐲 − 𝛍1 ) < (𝐲 − 𝛍2 )′ (𝐲 − 𝛍2 )
⟺ (𝐲 ′ − 𝛍1′ ) (𝐲 − 𝛍1 ) < (𝐲 ′ − 𝛍′𝟐 )(𝐲 − 𝛍2 )
⟺ 𝐲 ′ 𝐲 − 2𝛍1′ 𝐲 + 𝛍1′ 𝛍1 < 𝐲 ′ 𝐲 − 2𝛍′2 𝐲 + 𝛍′2 𝛍2 (∗)
⟺ 2 (𝛍2 − 𝛍1 )′ 𝐲 < 𝛍′2 𝛍2 − 𝛍1′ 𝛍1
1
⟺ (𝛍2 − 𝛍1 )′ 𝐲 < (𝛍 − 𝛍1 )′ (𝛍2 + 𝛍1 )
2 2
1
⟺ (𝛍2 − 𝛍1 )′ [𝐲 − (𝛍 + 𝛍2 )] < 0
2 1
1
⟺ (𝛍1 − 𝛍2 )′ [𝐲 − (𝛍 + 𝛍2 )] > 0
2 1

Note 1: the quadratic term 𝐲 ′ 𝐲 (∗) cancels out to leave a discrimination


rule that is linear in y.
If we think about the boundary between the two classes, this is clearly
given by solutions to the equation:

1
(𝛍1 − 𝛍2 )′ [𝐲 − (𝛍 + 𝛍2 )] = 0
2 1

Note 2: This boundary passes through the midpoint of the group means,
1
(𝛍1 + 𝛍2 ).
2

Note 3: It represents a hyperplane_orthogonal to the vector 𝛍1 − 𝛍2 .That


is, the boundary is the separating hyperplane which perpendicularly bisects
𝛍1 and 𝛍2 at their mid-point.
For k > 2, it is clear that y will be allocated to group i if:

′ 1
(𝛍𝑖 − 𝛍𝑗 ) [𝐲 − (𝛍 + 𝛍𝑗 )] > 0, ∀ j ≠ i
2 𝑖
Then, the region Ri will be an intersection of half-spaces, and hence a
convex polytope.

Example: We will illustrate the basic idea using an example with k = 2 and
p = 2. Suppose that the group means are 𝛍1 = (2,1)′ and 𝛍2 = (−1,2)′ .

What is the classification rule? We first compute the mid-point:


1 0.5
(𝛍1 + 𝛍2 ) = ( ),
2 1.5
and the difference between the means:
3
𝛍1 − 𝛍2 = ( )
−1
Then we allocate to group 1 if:
𝑦
(3, −1) [(𝑦1 ) − (0.5)] > 0
2 1.5

⟹ 3𝑦1 > 𝑦2

Therefore the boundary is given by the line 𝑦2 = 3𝑦1 . Observations lying


above the line will be allocated to group 2, and those falling below will be
allocated to group 1.

Exercise: y=(5,2); y=(-1, 1)


LINEAR DISCRIMINANT ANALYSIS

The closest group mean classifier is a simple and natural way to discriminate
between groups. However, it ignores the covariance structure in the data,
including the fact that some variables are more variable than others. The
more variable (and more highly correlated) variables are likely to dominate
the Euclidean distance, and hence will have a disproportionate effect on the
classification rule. We could correct for this by first applying a
standardization transformation to the data and the group means, and then
carry out closest group mean classification, or we can directly adapt our rule
to take covariance structure into account.

In the case of two populations, we have a sampling unit to be classified into


one of the two populations. The information we have available consist of
the p variables in the observation vector y measured on the sampling unit.
In the first example before, we have an applicant with high school grades
and various test scores recorded in y. We do not know if the applicant will
succeed or fail at the university, but we have data on previous students at
the university for whom it is now know whether they succeeded or failed.
By comparing y with 𝐲̅1 for those who succeeded and 𝐲̅2 for those who
failed, we attempt to predict the group to which the applicant will
eventually belong.

When there are two populations, we can use a classification procedure due
to Fisher (1936). The principal assumption for Fisher’s procedure is that the
two populations have the same covariance matrix (𝚺𝟏 = 𝚺2 ). Normality is
not required. We obtain a sample from each of the two population and
compute 𝐲̅1, 𝐲̅2 and 𝐒𝑝𝑙 . A procedure for classification can be based on the
discriminant function,
𝑧 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲 (*)

Where y is the vector of measurement on a new sampling unit that we wish


to classify into one of the two groups (populations).
To determine whether y is closer to 𝐲̅1, or ̅𝐲2, we check to see if z in (*) is
closer to the transformed mean ̅z1 or to z̅2 . We evaluate (*) for each
observation 𝐲1𝑖 from the first sample, and them obtain: 𝑧̅1 = 𝐚′ 𝐲̅1 and
similarly 𝑧̅2 = 𝐚′ 𝐲̅2 . Denote the two groups by G1 and G2.
Fisher’s (1936) linear classification procedure assigns y to G1 if 𝑧 = 𝐚′ 𝐲 is
closer to 𝑧̅1 than to 𝑧̅2 and assigns y to G2 if z is closer to ̅𝑧2 .

1 1
The midpoint between 𝑧̅1 and 𝑧̅2 is (𝑧̅1 + 𝑧̅2 ). So, if 𝑧 > (𝑧̅1 + 𝑧̅2 )
2 2
implies that z is closer to 𝑧̅1 .
But, z̅1 = 𝐚′ 𝐲̅1 y z̅2 = 𝐚′ 𝐲̅2

⟹ z̅1 + z̅2 = 𝐚′ 𝐲̅1 + 𝐚′ 𝐲̅2 but 𝐚′ = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏

⟹ z̅1 + z̅2 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲̅1 + (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲̅2

⟹ z̅1 + z̅2 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 + 𝐲̅2 )


1 1
⟹ ( z̅1 + z̅2 ) = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 + 𝐲̅2 )
2 2

But 𝑧 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲


1
Then, if 𝑧 > (𝑧̅1 + 𝑧̅2 ) implies that
2

1
(𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲 > 2 (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 + 𝐲̅2 )

Therefore the classification rule in terms of y is:


Assign to G1 if
1
𝑧 = 𝐚′ 𝐲 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲 > 2 (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 + 𝐲̅2 ) (**)

And assign y to G2 if
1
𝑧 = 𝐚′ 𝐲 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲 < 2 (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 + 𝐲̅2 ) (***)

Note: Fisher’s (1936) approach using (**) and (***) is essentially


nonparametric because no distributional assumptions were made.
However, if the two populations are normal with equal covariance matrices,
then this method is (asymptotically) optimal; that is, the probability of
misclassification is minimized.

Note: 𝑧̅1 > 𝑧̅2


Proof: We know that,

z̅1 = 𝐚′ 𝐲̅1 y z̅2 = 𝐚′ 𝐲̅2 and 𝐚′ = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏


⟹ z̅1 − z̅2 = 𝐚′ 𝐲̅1 − 𝐚′ 𝐲̅2
⟹ z̅1 − z̅2 = 𝐚′ ( 𝐲̅1 − 𝐲̅2 )
⟹ z̅1 − z̅2 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 ( 𝐲̅1 − 𝐲̅2 )
But (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 ( 𝐲̅1 − 𝐲̅2 ) > 0
⟹ z̅1 − z̅2 > 0

⟹ 𝑧
̅
1 > 𝑧̅2

Example: For the psychological data

Then: 𝐚′ = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 = (0.5104, −0.2032, 0.4660, −0.3097)


For G1, the male group:
𝑧̅1 = 𝐚′ 𝐲̅1 = 10.5427
Similarly, For G2, the female group:
𝑧̅2 = 𝐚′ 𝐲̅2 = 4.4426

Thus, we assign an observation vector y to G1 if:


1
𝑧 = 𝐚′ 𝐲 > 7.4927 = (𝑧̅1 + 𝑧̅2 )
2

And assign y to G2 if z< 7.4927.

So, for y = (15, 17, 24, 14), we have:


z=0.5104(15) − 0.2032(17) + 0.4660(24) − 0.3097(14) = 11.0498
Which is greater than 7.4927, then y belong to G1.

DISCRIMINATION FUNCTIONS
There are many different approaches that can be taken to classification,
each leading to a different rule. Some of the rules can be described in within
a common framework by introducing the concept of a discriminant
function. For each class i = 1; 2; . . . ; k, we define a corresponding function:

𝑄𝑖 (⋅): ℝ𝑝 ⟶ ℝ

known as a discriminant function which determines a partition of ℝ𝑝 ,


R1;R2; . . . ;Rk, by assigning an observation y to Ri if:

𝑄𝑖 (𝐲) > 𝑄𝑗 (𝐲) ∀ 𝑗 ≠ 𝑖

Note the use of a > inequality rather than a < inequality, so instead of a
measure of distance or dissimilarity, the discriminant functions represent
the likelihood or propensity of an observation to belong to a particular
group. This turns out to be more natural and convenient, especially in the
context of model-based methods for classification.
Given a particular set of discriminant functions, we can study the properties
of the resulting classifier, such as misclassification rates, either empirically,
or theoretically, to try and establish whether the method is good or bad.

MAXIMUM LIKELIHOOD DISCRIMINATION


Model-based approaches to classification assume a probability model for
each group, of the form:

𝑓𝑖 (⋅) = ℝ𝑝 ⟶ ℝ; 𝑖 = 1, 2, . . . , 𝑘

So for observations 𝐲 ∈ ℝ𝑝 , each model 𝑓𝑖 (𝐲) represents a probability


density function (or likelihood) for the random variable Y from group i. The
maximum likelihood discriminant rule is to classify observations by
assigning them to the group with the maximum likelihood. In other words,
it corresponds to using the discriminant functions

𝑄𝑖 (𝐲) = 𝑓𝑖 (𝐲); 𝑖 = 1, 2, . . . , 𝑘

The simplest case of maximum likelihood discrimination arises in the case


where it is assumed that observations from all groups are multivariate
normal, and all share a common variance matrix, Σ. That is observations
from group i are assumed to be iid N(𝛍i , 𝚺) random variables. In other
words,

𝑝 1 1
𝑄𝑖 (𝐲) = 𝑓𝑖 (𝐲) = (2𝜋)−2 |𝚺|−2 𝑒𝑥𝑝 {− (𝐲 − 𝛍𝑖 )′ 𝚺−1 (𝐲 − 𝛍𝑖 )}
2

After simplifying we can see that, in the case of equal variance matrices, the
maximum likelihood discriminant rule corresponds exactly to Fisher’s linear
discriminant.
QUADRATIC DISCRIMINANT ANALYSIS
It is natural to next consider how the maximum likelihood discriminant rule
changes when we allow the variance matrices associated with each group
to be unequal. That is, we assume that observations from group i are iid
N(𝛍i , 𝚺𝑖 ). In this case we have:

𝑝 1 1
𝑄𝑖 (𝐲) = 𝑓𝑖 (𝐲) = (2𝜋)−2 |𝚺𝑖 |−2 𝑒𝑥𝑝 {− (𝐲 − 𝛍𝑖 )′ 𝚺𝑖 −1 (𝐲 − 𝛍𝑖 )}
2
We can simplify this a little by noting that:
𝑄𝑖 (𝐲) > 𝑄𝑗 (𝐲)

𝑝 1 1
⟺ (2𝜋)−2 |𝚺𝑖 |−2 𝑒𝑥𝑝 {− (𝐲 − 𝛍𝑖 )′ 𝚺𝑖 −1 (𝐲 − 𝛍𝑖 )} >
2
𝑝 1 1
−2 − ′
(2𝜋) |𝚺𝑗 | 2 𝑒𝑥𝑝 {− (𝐲 − 𝛍𝑗 ) 𝚺𝑗 −1 (𝐲 − 𝛍𝑗 )}
2

1 1
⟺ |𝚺𝑖 |−2 𝑒𝑥𝑝 {− (𝐲 − 𝛍𝑖 )′ 𝚺𝑖 −1 (𝐲 − 𝛍𝑖 )} >
2
1 1
−2 ′
|𝚺𝑗 | 𝑒𝑥𝑝 {− (𝐲 − 𝛍𝑗 ) 𝚺𝑗 −1 (𝐲 − 𝛍𝑗 )}
2

⟺ −𝐿𝑜𝑔|𝚺𝑖 | − (𝐲 − 𝛍𝑖 )′ 𝚺𝑖 −1 (𝐲 − 𝛍𝑖 ) >

−𝐿𝑜𝑔|𝚺𝑗 | − (𝐲 − 𝛍𝑗 ) 𝚺𝑗 −1 (𝐲 − 𝛍𝑗 )

Then, the discriminant function is:


𝑄𝑖 (𝐲) = −𝐿𝑜𝑔 |𝚺𝑖 | − (𝐲 − 𝛍𝑖 )′ 𝚺𝑖 −1 (𝐲 − 𝛍𝑖 )

Note that this is a quadratic form in y.

Example: In the k = 2 case, we assign to group 1 if Q (x) > Q (x), that is:
1 2
−𝐿𝑜𝑔|𝚺1 | − (𝐲 − 𝛍1 )′ 𝚺1 −1 (𝐲 − 𝛍1 )
> −𝐿𝑜𝑔|𝚺2 | − (𝐲 − 𝛍2 )′ 𝚺2 −1 (𝐲 − 𝛍2 )

⟺ 𝐿𝑜𝑔 |𝚺1 | − (𝐲 − 𝛍1 )′ 𝚺1 −1 (𝐲 − 𝛍1 )
< 𝐿𝑜𝑔|𝚺2 | − (𝐲 − 𝛍2 )′ 𝚺2 −1 (𝐲 − 𝛍2 )

Then:
𝐲 ′ (𝚺1 −1 − 𝚺2 −1 )𝐲 + 2 ( 𝛍′2 𝚺2 −1 − 𝛍1′ 𝚺1 −1 )𝐲 + 𝛍1′ 𝚺1 −1 𝛍1 − 𝛍′2 𝚺2 −1 𝛍2
|𝚺1 |
+ 𝑙𝑜𝑔 <0
|𝚺2 |

Here we can see explicitly that the quadratic term does not cancel out, and
that the boundary between the two classes corresponds to the contour of
a quadratic form.

MISCLASSIFICATION
Obviously, whatever discriminant functions we use, we will not characterize
the group of interest perfectly, and so some future observations will be
classified incorrectly. An obvious way to characterize a classification
scheme is by some measure of the degree of misclassification associated
with the scheme.

There are several ways to find a degree of misclassification. One of them is


the percentage of misclassified individuals. Researching other ways is a
good subject for final work.
CLASSIFICATION INTO SEVERAL GROUPS

The left plot shows some data from three classes, with linear decision
boundaries found by linear discriminant analysis. The right plot shows
quadratic decision boundaries. (Hastie; T, R. Tibshirani, J. Friedman)

When we have several groups, we will have several possible rules of


classification by pairs:

1
𝑊𝑖𝑗 = (𝐲̅𝑖 − 𝐲̅𝑗 )′ 𝑺𝒑𝒍 −𝟏 𝐲 − (𝐲̅𝑖 − 𝐲̅𝑗 )′ 𝑺𝒑𝒍 −𝟏 (𝐲̅𝑖 + 𝐲̅𝑗 )
2
For example, with three groups, we have three possible rules:

Classify y as:
Population 1 if W12 >0 and W13 >0

Population 2 if W12 <0 and W23 >0

Population 3 if W13 <0 and W23 <0

Example: The objects in the data matrix are 50 irises of species Iris setosa,
Iris versicolour and Iris Virginia. The variables are are:
Y1 = sepal length; Y2 = sepal width
Y3 = petal lenght; Y4 = petal width.

Salida de InfoStat:
Funciones discriminantes - datos estandarizados con las varianzas
comunes
1 2
SepalLen -0,43 0,01
SepalWid -0,52 0,74
PetalLen 0,95 -0,40
PetalWid 0,58 0,58

Z1= -0,43(SepalLen) -0,52(SepalWid)+ 0,95(PetalLen)+ 0,58 (Petalwid)

Z2= 0,01(SepalLen) +0,74(SepalWid)- 0,40(PetalLen)+ 0,58 (Petalwid)

Centroides en el espacio discriminante


Grupo Eje 1 Eje 2
Setosa -7,61 0,22
Versicolor 1,83 -0,73
Virginica 5,78 0,51
Tabla de clasificación cruzada (tasa de error aparente)
Grupo Setosa Versicolor Virginica Total Error(%)
Setosa 50 0 0 50 0,00
Versicolor 0 48 2 50 4,00
Virginica 0 1 49 50 2,00
Total 50 49 51 150 2,00

En filas se representa el grupo al que pertenece la observación y en


columnas el grupo al que es asignada al usar la función discriminante.
Luego, las 50 plantas del grupo Setosa fueron bien clasificadas, la tasa de
error de clasificación en este grupo es de 0%. De los 50 individuos del grupo
versicolor, 48 fueron asignados bien y dos fueron mal clasificados dentro
del grupo virginia. La tasa de error es del 2%. Análogamente se interpreta
el grupo virginia para el que la tasa de error es del 2%.
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil TítuloVersión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Virginica
2,97
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Setosa
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión 1,52
Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Eje Canónico 2

Versicolor
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
0,07
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil-1,38
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión-2,84
Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil -10,06 Estudiantil
Versión -5,18Estudiantil
Versión -0,31
Versión Estudiantil 4,56
Versión Estudiantil 9,44
Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Eje Canónico
Versión Estudiantil 1 Estudiantil
Versión Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
PROBLEMS
1.-Rencher: 8.8; 8.9; 8.10; 9.6(a,b); 9.7(a,b).
2.- a)Let M1 and M2 be two populations from 𝑁𝑝 (𝛍𝟏 , 𝚺) and 𝑁𝑝 (𝛍𝟐 , 𝚺)
respectively. Fisher's linear discriminator is defined as:

1
𝐿(𝐲) = (𝐲 − (𝛍1 + 𝛍2 )) 𝚺 −1 (𝛍1 − 𝛍2 )
2
Express 𝐿(𝐲) as the difference between the squares of the Mahalanobis
distances from y to 𝛍1 and from y to 𝛍2 .
b) The Maximum likelihood discriminant function is defined as:
𝑉 (𝐲) = ln 𝑓1 (𝐲) − ln 𝑓2 (𝐲)
Where 𝑓𝑖 (𝐲), i=1, 2, is the density function.
Prove that the Maximum likelihood discriminant function is the same than
the Fisher’s linear discriminator.

3.- A company conducts research into the possibility of a competitor's


customers changing supplier. A survey is carried out on 15 customers of the
supplier where the answers to variables X1: Competitiveness in price; and
X2: Level of service were measured. The evaluations are made on a scale of
10 points (1=very low, to 10=excellent).
In group 1 are the people who will change, in group 2 the undecided, and
in group 3 those who will not change.

Group X1 X2
1 2 2
1 1 2
1 3 2
1 2 1
1 2 3
2 4 2
2 4 3
2 5 1
2 5 2
2 5 3
3 2 6
3 3 6
3 4 6
3 5 6
3 5 7

Do a Discriminant Analysis and interpret the results.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy