0% found this document useful (0 votes)

51 views26 pages

MATERIAL 5-Discriminant PDF

This document provides an overview of discriminant analysis for classification. It discusses two main types of classification - cluster analysis which aims to uncover groups from unclassified data, and discriminant function analysis which derives rules for classifying individuals into predefined groups based on their variable values. It provides examples of applications in different fields and discusses the assumptions and process of discriminant analysis, including deriving the discriminant function to maximize differences between group means.

Uploaded by

franciscomurillo7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views26 pages

MATERIAL 5-Discriminant PDF

Uploaded by

franciscomurillo7

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

DISCRIMINANT ANALYSIS

BIBLIOGRAPHY (I)

CHATFIELD, C. and COLLINS, A.J. (1980): Introduction to Multivariate

Analysis. Chapman and Hall, London.

CUADRAS, C. M. (2014). Nuevos Métodos de Análisis Multivariante.

CMC Editions. Barcelona.

KRZANOWSKI, W. J.(1988). Principles of multivariate analysis. Oxford

science publications.

MARDIA, K.V.; KENT, J.T. y BIBBY, J.M. (1979): Multivariate Analysis.

Academic Press, London.

MORRISON, D.F. (1978): Multivariate Statistical Methods. McGraw-Hill,

London.
RENCHER, A. (2002): Methods of Multivariate Analysis. Wiley.
SEBER, G. A. F. (1984): Multivariate Observations. Wiley, New York.
BIBLIOGRAPHY (II)
Everitt, B: An R and S-Plus Companion to Multivariate Analysis (Springer-
Verlag, 2005).
Everitt, B; Hothorn; T: An Introduction to Applied Multivariate Analysis
With R. (Springer-Verlag, 2011).
Hand, D; Mannila, H; Smyth, P: Principles of Data Mining. Cambridge,
2001.
Hastie; T, R. Tibshirani, J. Friedman: The Elements of Statistical Learning:
Data mining, inference, and prediction, 2nd Edition (Springer-Verlag, 2009).
Zelterman, D. (2015): Applied Multivariate Statistics with R. Springer.

CLASSIFICATION: DISCRIMINATION AND CLUSTERING

Classification is an important component of virtually all scientific research.
Statistical techniques concerned with classification are essentially of two
types. The first CLUSTER ANALYSIS (unsupervised learning) aim to
uncover groups of observations from initially unclassified data. There are
many such techniques.
The second DISCRIMINANT FUNCTION ANALYSIS; works with data that
is already classified into groups to derive rules for classifying new (and as
yet unclassified) individuals on the basis of their observed variable values.
The most well-known technique here is Fisher’s linear discriminant function
analysis.
This is the statistical classification problem, and relies on partitioning the
sample space so as to be able to discriminate between the different groups.
Data that has already been classified into groups is often known as a training
set, and methods which make use of such data are often known as supervised
learning algorithms.
INTRODUCTION
Discriminant analysis is useful for: (1) discriminating, on the basis of
variables; groups defined a priori, and (2) classifying new cases in groups
established a priori on the basis of a classification rule based on variables. It
was proposed by Sir. Ronald Fisher back in the 1930s, ‘The use of multiple
measurements in taxonomic problems’, Annals of Eugenics,1936.

Some applications: Voting intention, Archaeological Classification, Level of

profitability of companies, Medicine, Psychology, Biology, etc.
In different scientific disciplines classification plays a crucial role. Examples:
1) Classification of chemical elements in the periodic table, 2) Taxonomies
of animal or plant species.

VOTING INTENTION

In an interesting study on voting intention, developed in the Galician

community by Varela (1998), the possibilities of Discriminant Analysis to
predict the classification of subjects according to predefined groups were
studied. Specifically, the answers given by 1829 people over 18 years of age
to a questionnaire of 25 questions referring to political issues, economic
aspects, valuation of political leaders, etc. are analyzed. Among the
questions was one on what would be the meaning of their vote if elections
were held the next day.

Using the Discriminant Analysis technique, a total of 10 variables were

identified (answers to as many questions in the questionnaire) that
contributed to the construction of a discriminant function, from which
80.19% of individuals were correctly classified in the different voting
options.

Consequently, it was proposed to use the variables involved in the

discriminant function to estimate the meaning of the vote in those subjects
who had preferred not to pronounce themselves on this issue (those
included in the 'do not know/do not answer' option).
Discriminant Analysis can be used descriptively (Descriptive or exploratory
analysis of data) or inferentially. The application of Discriminant Analysis is
basically based on the following assumptions:
1.-Multivariate Normality
2.-Homogeneity of Variance-Covariance matrices.

The analysis starts from a matrix of data Y of n individuals in which p

variables have been measured, and it is assumed that the individuals are
divided into groups determined a priori.
From this, a discriminant mathematical model will be obtained against
which the profile of a new individual whose group is unknown will be
contrasted, to be assigned to one of them.

THE DISCRIMINANT FUNCTION FOR TWO GROUPS

The two populations to be compared have the same covariance matrix 𝚺
but distinct mean 𝛍𝟏 and 𝛍𝟐 . We work with samples 𝐲11 , 𝐲12 , . . . . , 𝐲1𝑛1 and
𝐲21 , 𝐲22 , . . . . , 𝐲2𝑛2 from the two populations. As usual, each vector
𝐲𝑖𝑗 consists of measurements on p variables. The discriminant function is
the linear combination of these p variables that maximizes the distance
between the two (transformed) group mean vectors. A linear combination
𝑧 = 𝐚′ 𝒚 transforms each observation vector to a scalar:
𝑧1𝑖 = 𝐚′ 𝐲1𝑖 = a1 y1i1 + a2 y1i2 + . . . . . +ap y1ip , i = 1, 2, … . , n1
𝑧2𝑖 = 𝐚′ 𝐲2𝑖 = a1 y2i1 + a2 y2i2 + . . . . . +ap y2ip , i = 1, 2, … . , n2
Hence the 𝑛1 + 𝑛2 observation vectors in the two samples,
𝐲11 𝐲21
𝐲12 𝐲22
⋮ ⋮
𝐲1𝑛1 𝐲2𝑛2

And transformed to scalars,

𝑧11 𝑧21
𝑧12 𝑧22
⋮ ⋮
𝑧1𝑛1 𝑧2𝑛2
We find the means:
𝑛1
𝑧1𝑖
𝑧̅1 = ∑ = 𝐚′ 𝐲̅1 and 𝑧̅2 = 𝐚′ 𝐲̅2
𝑛1
𝑖=1

𝑛1 1𝑖𝐲 𝑛2 2𝑖 𝐲
Where 𝐲̅1 = ∑𝑖=1 and 𝐲̅2 = ∑𝑖=1
𝑛 1 𝑛 2

We then looked for a new variable and a linear combination of the observed
variables, 𝑧 = 𝐚′ 𝒚, which shows the greatest differences between the
means of the two groups in such a way as to allow us to classify one of them
with the maximum possible resolution.

The means of the values of the new variable for each group are:
z̅1 = 𝐚′ 𝐲̅1 z̅2 = 𝐚′ 𝐲̅2
The difference between means is, then:
z̅1 − z̅2 = 𝐚′ 𝐲̅1 − 𝐚′ 𝐲̅2 = 𝐚′ (𝐲̅1 − 𝐲̅2 )
It is therefore a question of maximizing the expression:
| 𝐚′ (𝐲̅1 − 𝐲̅2 )|
Or equivalently, maximize the standardized distance:
(z̅1 − z̅2 )2 (z̅1 − z̅2 )2
=
𝑠𝑧2 𝒂′ 𝐒𝐩𝐥 𝒂
( 𝐚′ 𝐲̅1 − 𝐚′ 𝐲̅2 )2 [ 𝐚′ (𝐲̅1 − 𝐲̅2 )]𝟐
= =
𝒂′ 𝐒𝐩𝐥 𝒂 𝒂′ 𝐒𝐩𝐥 𝒂

Subject to the restriction: 𝒂′ 𝐒𝐩𝐥 𝒂 =1, because you want the variability
within the groups in the new variable to be one. This maximization problem
is solved using the Lagrange Multipliers Method, and the solution is given
by:

𝐚 = 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )

Then, the discriminant function that we're looking for, is:

𝑧 = 𝐚′ 𝒚
′
𝑧 = (𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )) 𝒚

𝑧 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲

Note: If
𝐚 = 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )
Then:
(z̅1 − z̅2 )2
2
= (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )
𝑠𝑧
Proof:
(z̅1 − z̅2 )2 [ 𝐚′ (𝐲̅1 − 𝐲̅2 )]𝟐
=
𝑠𝑧2 𝒂′ 𝐒𝐩𝐥 𝒂

But 𝐚 = 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 ):

(z̅1 − z̅2 )2 [ (𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 ))′ (𝐲̅1 − 𝐲̅2 )]𝟐
=
𝑠𝑧2 (𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 ))′ 𝐒𝐩𝐥 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )

[ ( 𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )]𝟐

=
( 𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐒𝐩𝐥 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )

[ ( 𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )]𝟐

=
( 𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )

(z̅1 − z̅2 )2
2
= (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )
𝑠𝑧

Standardized distance Standardized distance

Between z̅1 𝑦 z̅2 Between 𝐲̅1 y 𝐲̅2
Then, a is obtained as:

And the discriminant function is:

EJEMPLO
Veamos el siguiente ejemplo tomado de Cuadras (2014), pag. 216.
Copépodos.

Mytilicola intestinalis es un copépodo parásito del mejillón, que en estado

larval presenta diferentes estadios de crecimiento. El primer estadio
(Nauplis) y el segundo estadio (Metanauplius) son difíciles de distinguir.
Sobre una muestra de n1 = 76 y n2 = 91 copépodos que se pudieron
identificar al microscopio como del primero y segundo estadio
respectivamente,
se midieron las variables
l = longitud, a = anchura.
CLASSIFICATION ANALYSIS: ALLOCATION OF
OBSERVATIONS TO GROUPS (C 9)

The descriptive aspect of discriminant analysis, in which group separations

characterized by means of discriminant functions, was covered in the last
class. We turn now to allocation of observations to groups, which is the
predictive aspect of discriminant analysis. Classification is often referred to
simply as discriminant analysis. In engineering and computer science,
classification is usually called pattern recognition. Some writers use the
term classification analysis to describe cluster analysis, in which the
observations are clustered according to variable values rather than into
predefined groups.
INTUITIVE APPROACH

Let us start by assuming that our p-dimensional observations belong to one

of k groups, and that there is a mean vector associated with each group, so
the group means are 𝛍1 , 𝛍2 , . . . , 𝛍k. Now suppose that we are given an
observation y ∈ ℝ𝑝 . Which group should we allocate the observation to?
One obvious method would be to allocate the observation to the group with
mean that is closest to y. That is, we could allocate y to group i if:

‖𝐲 − 𝛍𝑖 ‖ < ‖𝐲 − 𝛍𝑗 ‖ ∀ 𝑗 ≠ 𝑖

This is an example of a classification rule. Note that it has the effect of

partitioning ℝ𝑝 into k regions, 𝑅1 , 𝑅2 , . . . , 𝑅𝑘 , where 𝑅𝑖 ⊆ ℝ𝑝 , 𝑖 =
1, 2, . . . , 𝑘, 𝑅𝑖 ∩ 𝑅𝑗 = ∅, ∀ 𝑖 ≠ 𝑗 and ⋃𝑘𝑖=1 𝑅𝑖 = ℝ𝑝 in such a way that y is
assigned to group i if 𝐲 ∈ 𝑅𝑖 . Different classification rules lead to different
partitions, and clearly some methods of choosing the partition will be
more effective than others for ensuring that most observations will be
assigned to the correct group. In this case, the partition is known as the
Voronoi tessellation of ℝ𝑝 generated by the ‘seeds’ 𝛍1 , 𝛍2 , . . . , 𝛍k. The
boundaries between the classes are piecewise linear, and to see why, we
will begin by considering the case k = 2. In this case one is typically
interested in deciding whether a particular binary variable of interest is
“true” or “false”. Examples include deciding whether or not a patient has a
particular disease, or deciding if a manufactured item should be flagged as
potentially faulty.

Case k=2:
‖𝐲 − 𝛍1 ‖ < ‖𝐲 − 𝛍2 ‖
⟺ ‖𝐲 − 𝛍1 ‖2 < ‖𝐲 − 𝛍2 ‖2
⟺ (𝐲 − 𝛍1 )′ (𝐲 − 𝛍1 ) < (𝐲 − 𝛍2 )′ (𝐲 − 𝛍2 )
⟺ (𝐲 ′ − 𝛍1′ ) (𝐲 − 𝛍1 ) < (𝐲 ′ − 𝛍′𝟐 )(𝐲 − 𝛍2 )
⟺ 𝐲 ′ 𝐲 − 2𝛍1′ 𝐲 + 𝛍1′ 𝛍1 < 𝐲 ′ 𝐲 − 2𝛍′2 𝐲 + 𝛍′2 𝛍2 (∗)
⟺ 2 (𝛍2 − 𝛍1 )′ 𝐲 < 𝛍′2 𝛍2 − 𝛍1′ 𝛍1
1
⟺ (𝛍2 − 𝛍1 )′ 𝐲 < (𝛍 − 𝛍1 )′ (𝛍2 + 𝛍1 )
2 2
1
⟺ (𝛍2 − 𝛍1 )′ [𝐲 − (𝛍 + 𝛍2 )] < 0
2 1
1
⟺ (𝛍1 − 𝛍2 )′ [𝐲 − (𝛍 + 𝛍2 )] > 0
2 1

Note 1: the quadratic term 𝐲 ′ 𝐲 (∗) cancels out to leave a discrimination

rule that is linear in y.
If we think about the boundary between the two classes, this is clearly
given by solutions to the equation:

1
(𝛍1 − 𝛍2 )′ [𝐲 − (𝛍 + 𝛍2 )] = 0
2 1

Note 2: This boundary passes through the midpoint of the group means,
1
(𝛍1 + 𝛍2 ).
2

Note 3: It represents a hyperplane_orthogonal to the vector 𝛍1 − 𝛍2 .That

is, the boundary is the separating hyperplane which perpendicularly bisects
𝛍1 and 𝛍2 at their mid-point.
For k > 2, it is clear that y will be allocated to group i if:

′ 1
(𝛍𝑖 − 𝛍𝑗 ) [𝐲 − (𝛍 + 𝛍𝑗 )] > 0, ∀ j ≠ i
2 𝑖
Then, the region Ri will be an intersection of half-spaces, and hence a
convex polytope.

Example: We will illustrate the basic idea using an example with k = 2 and
p = 2. Suppose that the group means are 𝛍1 = (2,1)′ and 𝛍2 = (−1,2)′ .

What is the classification rule? We first compute the mid-point:

1 0.5
(𝛍1 + 𝛍2 ) = ( ),
2 1.5
and the difference between the means:
3
𝛍1 − 𝛍2 = ( )
−1
Then we allocate to group 1 if:
𝑦
(3, −1) [(𝑦1 ) − (0.5)] > 0
2 1.5

⟹ 3𝑦1 > 𝑦2

Therefore the boundary is given by the line 𝑦2 = 3𝑦1 . Observations lying

above the line will be allocated to group 2, and those falling below will be
allocated to group 1.

Exercise: y=(5,2); y=(-1, 1)

LINEAR DISCRIMINANT ANALYSIS

The closest group mean classifier is a simple and natural way to discriminate
between groups. However, it ignores the covariance structure in the data,
including the fact that some variables are more variable than others. The
more variable (and more highly correlated) variables are likely to dominate
the Euclidean distance, and hence will have a disproportionate effect on the
classification rule. We could correct for this by first applying a
standardization transformation to the data and the group means, and then
carry out closest group mean classification, or we can directly adapt our rule
to take covariance structure into account.

In the case of two populations, we have a sampling unit to be classified into

one of the two populations. The information we have available consist of
the p variables in the observation vector y measured on the sampling unit.
In the first example before, we have an applicant with high school grades
and various test scores recorded in y. We do not know if the applicant will
succeed or fail at the university, but we have data on previous students at
the university for whom it is now know whether they succeeded or failed.
By comparing y with 𝐲̅1 for those who succeeded and 𝐲̅2 for those who
failed, we attempt to predict the group to which the applicant will
eventually belong.

When there are two populations, we can use a classification procedure due
to Fisher (1936). The principal assumption for Fisher’s procedure is that the
two populations have the same covariance matrix (𝚺𝟏 = 𝚺2 ). Normality is
not required. We obtain a sample from each of the two population and
compute 𝐲̅1, 𝐲̅2 and 𝐒𝑝𝑙 . A procedure for classification can be based on the
discriminant function,
𝑧 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲 (*)

Where y is the vector of measurement on a new sampling unit that we wish

to classify into one of the two groups (populations).
To determine whether y is closer to 𝐲̅1, or ̅𝐲2, we check to see if z in (*) is
closer to the transformed mean ̅z1 or to z̅2 . We evaluate (*) for each
observation 𝐲1𝑖 from the first sample, and them obtain: 𝑧̅1 = 𝐚′ 𝐲̅1 and
similarly 𝑧̅2 = 𝐚′ 𝐲̅2 . Denote the two groups by G1 and G2.
Fisher’s (1936) linear classification procedure assigns y to G1 if 𝑧 = 𝐚′ 𝐲 is
closer to 𝑧̅1 than to 𝑧̅2 and assigns y to G2 if z is closer to ̅𝑧2 .

1 1
The midpoint between 𝑧̅1 and 𝑧̅2 is (𝑧̅1 + 𝑧̅2 ). So, if 𝑧 > (𝑧̅1 + 𝑧̅2 )
2 2
implies that z is closer to 𝑧̅1 .
But, z̅1 = 𝐚′ 𝐲̅1 y z̅2 = 𝐚′ 𝐲̅2

⟹ z̅1 + z̅2 = 𝐚′ 𝐲̅1 + 𝐚′ 𝐲̅2 but 𝐚′ = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏

⟹ z̅1 + z̅2 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲̅1 + (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲̅2

⟹ z̅1 + z̅2 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 + 𝐲̅2 )

1 1
⟹ ( z̅1 + z̅2 ) = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 + 𝐲̅2 )
2 2

But 𝑧 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲

1
Then, if 𝑧 > (𝑧̅1 + 𝑧̅2 ) implies that
2

1
(𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲 > 2 (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 + 𝐲̅2 )

Therefore the classification rule in terms of y is:

Assign to G1 if
1
𝑧 = 𝐚′ 𝐲 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲 > 2 (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 + 𝐲̅2 ) (**)

And assign y to G2 if
1
𝑧 = 𝐚′ 𝐲 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲 < 2 (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 + 𝐲̅2 ) (***)

Note: Fisher’s (1936) approach using () and (*) is essentially

nonparametric because no distributional assumptions were made.
However, if the two populations are normal with equal covariance matrices,
then this method is (asymptotically) optimal; that is, the probability of
misclassification is minimized.

Note: 𝑧̅1 > 𝑧̅2

Proof: We know that,

z̅1 = 𝐚′ 𝐲̅1 y z̅2 = 𝐚′ 𝐲̅2 and 𝐚′ = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏

⟹ z̅1 − z̅2 = 𝐚′ 𝐲̅1 − 𝐚′ 𝐲̅2
⟹ z̅1 − z̅2 = 𝐚′ ( 𝐲̅1 − 𝐲̅2 )
⟹ z̅1 − z̅2 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 ( 𝐲̅1 − 𝐲̅2 )
But (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 ( 𝐲̅1 − 𝐲̅2 ) > 0
⟹ z̅1 − z̅2 > 0

⟹ 𝑧
̅
1 > 𝑧̅2

Example: For the psychological data

Then: 𝐚′ = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 = (0.5104, −0.2032, 0.4660, −0.3097)

For G1, the male group:
𝑧̅1 = 𝐚′ 𝐲̅1 = 10.5427
Similarly, For G2, the female group:
𝑧̅2 = 𝐚′ 𝐲̅2 = 4.4426

Thus, we assign an observation vector y to G1 if:

1
𝑧 = 𝐚′ 𝐲 > 7.4927 = (𝑧̅1 + 𝑧̅2 )
2

And assign y to G2 if z< 7.4927.

So, for y = (15, 17, 24, 14), we have:

z=0.5104(15) − 0.2032(17) + 0.4660(24) − 0.3097(14) = 11.0498
Which is greater than 7.4927, then y belong to G1.

DISCRIMINATION FUNCTIONS
There are many different approaches that can be taken to classification,
each leading to a different rule. Some of the rules can be described in within
a common framework by introducing the concept of a discriminant
function. For each class i = 1; 2; . . . ; k, we define a corresponding function:

𝑄𝑖 (⋅): ℝ𝑝 ⟶ ℝ

known as a discriminant function which determines a partition of ℝ𝑝 ,

R1;R2; . . . ;Rk, by assigning an observation y to Ri if:

𝑄𝑖 (𝐲) > 𝑄𝑗 (𝐲) ∀ 𝑗 ≠ 𝑖

Note the use of a > inequality rather than a < inequality, so instead of a
measure of distance or dissimilarity, the discriminant functions represent
the likelihood or propensity of an observation to belong to a particular
group. This turns out to be more natural and convenient, especially in the
context of model-based methods for classification.
Given a particular set of discriminant functions, we can study the properties
of the resulting classifier, such as misclassification rates, either empirically,
or theoretically, to try and establish whether the method is good or bad.

MAXIMUM LIKELIHOOD DISCRIMINATION

Model-based approaches to classification assume a probability model for
each group, of the form:

𝑓𝑖 (⋅) = ℝ𝑝 ⟶ ℝ; 𝑖 = 1, 2, . . . , 𝑘

So for observations 𝐲 ∈ ℝ𝑝 , each model 𝑓𝑖 (𝐲) represents a probability

density function (or likelihood) for the random variable Y from group i. The
maximum likelihood discriminant rule is to classify observations by
assigning them to the group with the maximum likelihood. In other words,
it corresponds to using the discriminant functions

𝑄𝑖 (𝐲) = 𝑓𝑖 (𝐲); 𝑖 = 1, 2, . . . , 𝑘

The simplest case of maximum likelihood discrimination arises in the case

where it is assumed that observations from all groups are multivariate
normal, and all share a common variance matrix, Σ. That is observations
from group i are assumed to be iid N(𝛍i , 𝚺) random variables. In other
words,

𝑝 1 1
𝑄𝑖 (𝐲) = 𝑓𝑖 (𝐲) = (2𝜋)−2 |𝚺|−2 𝑒𝑥𝑝 {− (𝐲 − 𝛍𝑖 )′ 𝚺−1 (𝐲 − 𝛍𝑖 )}
2

After simplifying we can see that, in the case of equal variance matrices, the
maximum likelihood discriminant rule corresponds exactly to Fisher’s linear
discriminant.
QUADRATIC DISCRIMINANT ANALYSIS
It is natural to next consider how the maximum likelihood discriminant rule
changes when we allow the variance matrices associated with each group
to be unequal. That is, we assume that observations from group i are iid
N(𝛍i , 𝚺𝑖 ). In this case we have:

𝑝 1 1
𝑄𝑖 (𝐲) = 𝑓𝑖 (𝐲) = (2𝜋)−2 |𝚺𝑖 |−2 𝑒𝑥𝑝 {− (𝐲 − 𝛍𝑖 )′ 𝚺𝑖 −1 (𝐲 − 𝛍𝑖 )}
2
We can simplify this a little by noting that:
𝑄𝑖 (𝐲) > 𝑄𝑗 (𝐲)

𝑝 1 1
⟺ (2𝜋)−2 |𝚺𝑖 |−2 𝑒𝑥𝑝 {− (𝐲 − 𝛍𝑖 )′ 𝚺𝑖 −1 (𝐲 − 𝛍𝑖 )} >
2
𝑝 1 1
−2 − ′
(2𝜋) |𝚺𝑗 | 2 𝑒𝑥𝑝 {− (𝐲 − 𝛍𝑗 ) 𝚺𝑗 −1 (𝐲 − 𝛍𝑗 )}
2

1 1
⟺ |𝚺𝑖 |−2 𝑒𝑥𝑝 {− (𝐲 − 𝛍𝑖 )′ 𝚺𝑖 −1 (𝐲 − 𝛍𝑖 )} >
2
1 1
−2 ′
|𝚺𝑗 | 𝑒𝑥𝑝 {− (𝐲 − 𝛍𝑗 ) 𝚺𝑗 −1 (𝐲 − 𝛍𝑗 )}
2

⟺ −𝐿𝑜𝑔|𝚺𝑖 | − (𝐲 − 𝛍𝑖 )′ 𝚺𝑖 −1 (𝐲 − 𝛍𝑖 ) >
′
−𝐿𝑜𝑔|𝚺𝑗 | − (𝐲 − 𝛍𝑗 ) 𝚺𝑗 −1 (𝐲 − 𝛍𝑗 )

Then, the discriminant function is:

𝑄𝑖 (𝐲) = −𝐿𝑜𝑔 |𝚺𝑖 | − (𝐲 − 𝛍𝑖 )′ 𝚺𝑖 −1 (𝐲 − 𝛍𝑖 )

Note that this is a quadratic form in y.

Example: In the k = 2 case, we assign to group 1 if Q (x) > Q (x), that is:
1 2
−𝐿𝑜𝑔|𝚺1 | − (𝐲 − 𝛍1 )′ 𝚺1 −1 (𝐲 − 𝛍1 )
> −𝐿𝑜𝑔|𝚺2 | − (𝐲 − 𝛍2 )′ 𝚺2 −1 (𝐲 − 𝛍2 )

⟺ 𝐿𝑜𝑔 |𝚺1 | − (𝐲 − 𝛍1 )′ 𝚺1 −1 (𝐲 − 𝛍1 )
< 𝐿𝑜𝑔|𝚺2 | − (𝐲 − 𝛍2 )′ 𝚺2 −1 (𝐲 − 𝛍2 )

Then:
𝐲 ′ (𝚺1 −1 − 𝚺2 −1 )𝐲 + 2 ( 𝛍′2 𝚺2 −1 − 𝛍1′ 𝚺1 −1 )𝐲 + 𝛍1′ 𝚺1 −1 𝛍1 − 𝛍′2 𝚺2 −1 𝛍2
|𝚺1 |
+ 𝑙𝑜𝑔 <0
|𝚺2 |

Here we can see explicitly that the quadratic term does not cancel out, and
that the boundary between the two classes corresponds to the contour of
a quadratic form.

MISCLASSIFICATION
Obviously, whatever discriminant functions we use, we will not characterize
the group of interest perfectly, and so some future observations will be
classified incorrectly. An obvious way to characterize a classification
scheme is by some measure of the degree of misclassification associated
with the scheme.

There are several ways to find a degree of misclassification. One of them is

the percentage of misclassified individuals. Researching other ways is a
good subject for final work.
CLASSIFICATION INTO SEVERAL GROUPS

The left plot shows some data from three classes, with linear decision
boundaries found by linear discriminant analysis. The right plot shows
quadratic decision boundaries. (Hastie; T, R. Tibshirani, J. Friedman)

When we have several groups, we will have several possible rules of

classification by pairs:

1
𝑊𝑖𝑗 = (𝐲̅𝑖 − 𝐲̅𝑗 )′ 𝑺𝒑𝒍 −𝟏 𝐲 − (𝐲̅𝑖 − 𝐲̅𝑗 )′ 𝑺𝒑𝒍 −𝟏 (𝐲̅𝑖 + 𝐲̅𝑗 )
2
For example, with three groups, we have three possible rules:

Classify y as:
Population 1 if W12 >0 and W13 >0

Population 2 if W12 <0 and W23 >0

Population 3 if W13 <0 and W23 <0

Example: The objects in the data matrix are 50 irises of species Iris setosa,
Iris versicolour and Iris Virginia. The variables are are:
Y1 = sepal length; Y2 = sepal width
Y3 = petal lenght; Y4 = petal width.

Salida de InfoStat:
Funciones discriminantes - datos estandarizados con las varianzas
comunes
1 2
SepalLen -0,43 0,01
SepalWid -0,52 0,74
PetalLen 0,95 -0,40
PetalWid 0,58 0,58

Z1= -0,43(SepalLen) -0,52(SepalWid)+ 0,95(PetalLen)+ 0,58 (Petalwid)

Z2= 0,01(SepalLen) +0,74(SepalWid)- 0,40(PetalLen)+ 0,58 (Petalwid)

Centroides en el espacio discriminante

Grupo Eje 1 Eje 2
Setosa -7,61 0,22
Versicolor 1,83 -0,73
Virginica 5,78 0,51
Tabla de clasificación cruzada (tasa de error aparente)
Grupo Setosa Versicolor Virginica Total Error(%)
Setosa 50 0 0 50 0,00
Versicolor 0 48 2 50 4,00
Virginica 0 1 49 50 2,00
Total 50 49 51 150 2,00

En filas se representa el grupo al que pertenece la observación y en

columnas el grupo al que es asignada al usar la función discriminante.
Luego, las 50 plantas del grupo Setosa fueron bien clasificadas, la tasa de
error de clasificación en este grupo es de 0%. De los 50 individuos del grupo
versicolor, 48 fueron asignados bien y dos fueron mal clasificados dentro
del grupo virginia. La tasa de error es del 2%. Análogamente se interpreta
el grupo virginia para el que la tasa de error es del 2%.
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil TítuloVersión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Virginica
2,97
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Setosa
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión 1,52
Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Eje Canónico 2

Versicolor
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
0,07
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil-1,38
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión-2,84
Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
Versión Estudiantil -10,06 Estudiantil
Versión -5,18Estudiantil
Versión -0,31
Versión Estudiantil 4,56
Versión Estudiantil 9,44
Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Eje Canónico
Versión Estudiantil 1 Estudiantil
Versión Versión Estudiantil Versión Estudiantil
Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil Versión Estudiantil
PROBLEMS
1.-Rencher: 8.8; 8.9; 8.10; 9.6(a,b); 9.7(a,b).
2.- a)Let M1 and M2 be two populations from 𝑁𝑝 (𝛍𝟏 , 𝚺) and 𝑁𝑝 (𝛍𝟐 , 𝚺)
respectively. Fisher's linear discriminator is defined as:
′
1
𝐿(𝐲) = (𝐲 − (𝛍1 + 𝛍2 )) 𝚺 −1 (𝛍1 − 𝛍2 )
2
Express 𝐿(𝐲) as the difference between the squares of the Mahalanobis
distances from y to 𝛍1 and from y to 𝛍2 .
b) The Maximum likelihood discriminant function is defined as:
𝑉 (𝐲) = ln 𝑓1 (𝐲) − ln 𝑓2 (𝐲)
Where 𝑓𝑖 (𝐲), i=1, 2, is the density function.
Prove that the Maximum likelihood discriminant function is the same than
the Fisher’s linear discriminator.

3.- A company conducts research into the possibility of a competitor's

customers changing supplier. A survey is carried out on 15 customers of the
supplier where the answers to variables X1: Competitiveness in price; and
X2: Level of service were measured. The evaluations are made on a scale of
10 points (1=very low, to 10=excellent).
In group 1 are the people who will change, in group 2 the undecided, and
in group 3 those who will not change.

Group X1 X2
1 2 2
1 1 2
1 3 2
1 2 1
1 2 3
2 4 2
2 4 3
2 5 1
2 5 2
2 5 3
3 2 6
3 3 6
3 4 6
3 5 6
3 5 7

Do a Discriminant Analysis and interpret the results.

Photoshop MCQ Questions and Answers
73% (15)
Photoshop MCQ Questions and Answers
9 pages
Ace of PACE Sample Paper
55% (20)
Ace of PACE Sample Paper
5 pages
Varioklav 75s and 135s
No ratings yet
Varioklav 75s and 135s
6 pages
SPSS ANNOTATED OUTPUT Discriminant Analysis 1
No ratings yet
SPSS ANNOTATED OUTPUT Discriminant Analysis 1
14 pages
Champion Aviation Product Application / SkySupplyUSA
No ratings yet
Champion Aviation Product Application / SkySupplyUSA
64 pages
Document
No ratings yet
Document
6 pages
Analisis Diskriminan 2
No ratings yet
Analisis Diskriminan 2
30 pages
Empirical Data Analysis in Accounting and Finance
No ratings yet
Empirical Data Analysis in Accounting and Finance
37 pages
TQM - TRG - F-09 - Discriminant Analysis - Rev01 - 20180602 PDF
No ratings yet
TQM - TRG - F-09 - Discriminant Analysis - Rev01 - 20180602 PDF
22 pages
Discriminant Analysis PDF
No ratings yet
Discriminant Analysis PDF
9 pages
Discriminant Analysis: Plot of Y X. Symbol Is Value of GROUP
No ratings yet
Discriminant Analysis: Plot of Y X. Symbol Is Value of GROUP
8 pages
Discriminant Analysis Psy.
No ratings yet
Discriminant Analysis Psy.
5 pages
Classification Models
No ratings yet
Classification Models
95 pages
Discriminant Analysis: Prepared By-Sumit Jain
No ratings yet
Discriminant Analysis: Prepared By-Sumit Jain
44 pages
Notes Discriminant Analysis March 2021
No ratings yet
Notes Discriminant Analysis March 2021
59 pages
Discriminant Function Analysis and Critical Analysis
No ratings yet
Discriminant Function Analysis and Critical Analysis
58 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
2 pages
Chapter - 14 Advanced Regression Models
No ratings yet
Chapter - 14 Advanced Regression Models
49 pages
DFA Interpretation Help
No ratings yet
DFA Interpretation Help
36 pages
Multiple Discriminant Analysis: Dr. Hemal Pandya
No ratings yet
Multiple Discriminant Analysis: Dr. Hemal Pandya
29 pages
Classification: 12.1 Discriminant Analysis
No ratings yet
Classification: 12.1 Discriminant Analysis
21 pages
Pattern Recognition
No ratings yet
Pattern Recognition
9 pages
Discriminant Analysis
100% (1)
Discriminant Analysis
17 pages
Materi 5 - 2
No ratings yet
Materi 5 - 2
25 pages
An Introduction To Multivariate Analysis
No ratings yet
An Introduction To Multivariate Analysis
28 pages
Discriminant & Logit Analysis Using SAS Enterprise Guide
No ratings yet
Discriminant & Logit Analysis Using SAS Enterprise Guide
53 pages
MAS 408 - Discriminant Analysis
No ratings yet
MAS 408 - Discriminant Analysis
7 pages
Discriminant Analysis
100% (1)
Discriminant Analysis
16 pages
Chapter11 Slides
No ratings yet
Chapter11 Slides
20 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
19 pages
Chapter 11 KNN Naive Bayes and LDA
No ratings yet
Chapter 11 KNN Naive Bayes and LDA
15 pages
Discriminant Analysis For Risk Classification and Prediction
No ratings yet
Discriminant Analysis For Risk Classification and Prediction
23 pages
Discriminant Function Analysis
100% (1)
Discriminant Function Analysis
30 pages
AE - Tema 5 - Two-Class Fisher Discriminant Analysis
No ratings yet
AE - Tema 5 - Two-Class Fisher Discriminant Analysis
6 pages
Define Discriminant Function Analysis
100% (1)
Define Discriminant Function Analysis
5 pages
Discriminant and Cluster Analysis
100% (1)
Discriminant and Cluster Analysis
4 pages
Multiple Discriminant Analysis: 10.1 Concept
No ratings yet
Multiple Discriminant Analysis: 10.1 Concept
2 pages
Chapter 25 - Discriminant Analysis
No ratings yet
Chapter 25 - Discriminant Analysis
20 pages
Discriminant Analysis
100% (1)
Discriminant Analysis
20 pages
Discriminant Analysis Example 2: Fisher's Iris Data
No ratings yet
Discriminant Analysis Example 2: Fisher's Iris Data
12 pages
ML 4
No ratings yet
ML 4
101 pages
Discriminanat Analysis
No ratings yet
Discriminanat Analysis
13 pages
Two Group Discriminant Function Analysis
No ratings yet
Two Group Discriminant Function Analysis
4 pages
MVDAUnit 5
No ratings yet
MVDAUnit 5
19 pages
Discriminant Ana
No ratings yet
Discriminant Ana
8 pages
Introduction To Discriminant Procedures: Sas/Stat User's Guide
No ratings yet
Introduction To Discriminant Procedures: Sas/Stat User's Guide
16 pages
Types of Discriminant Analysis
No ratings yet
Types of Discriminant Analysis
22 pages
CBM342 BCI Unit IV
No ratings yet
CBM342 BCI Unit IV
22 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
20 pages
Greenacre c11 2010
No ratings yet
Greenacre c11 2010
11 pages
Discriminant 5
No ratings yet
Discriminant 5
10 pages
4gaussian Discriminant
No ratings yet
4gaussian Discriminant
50 pages
Asdfghjkl
No ratings yet
Asdfghjkl
22 pages
Lachenbruch EstimationErrorRates 1968
No ratings yet
Lachenbruch EstimationErrorRates 1968
12 pages
9 ASAP Advanced Statistics Discriminant Analyisi
No ratings yet
9 ASAP Advanced Statistics Discriminant Analyisi
45 pages
Discriminant Function Analysis
No ratings yet
Discriminant Function Analysis
16 pages
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Calculus Volume1
From Everand
Calculus Volume1
Ming Yao Tsai
No ratings yet
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
From Everand
Applications of Derivatives Errors and Approximation (Calculus) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Algebra & Trigonometry II Essentials
From Everand
Algebra & Trigonometry II Essentials
Editors of REA
4/5 (4)
Otto Cycle - Wikipedia
No ratings yet
Otto Cycle - Wikipedia
13 pages
CSP2101 Scripting Languages Assignment 3 - Software Based Solution
No ratings yet
CSP2101 Scripting Languages Assignment 3 - Software Based Solution
8 pages
KRNT fx175qtv Data Cheet PDF
No ratings yet
KRNT fx175qtv Data Cheet PDF
2 pages
Grammar Jeopardy: Modal Auxiliaries, Relative Adverbs, & Relative Pronouns
No ratings yet
Grammar Jeopardy: Modal Auxiliaries, Relative Adverbs, & Relative Pronouns
18 pages
Activity Fluid Machinery
No ratings yet
Activity Fluid Machinery
1 page
Grade 2 Tos Sum1
No ratings yet
Grade 2 Tos Sum1
5 pages
CH11-Digital Logic
No ratings yet
CH11-Digital Logic
6 pages
TT Plus Catalogue RCF - ENG
No ratings yet
TT Plus Catalogue RCF - ENG
52 pages
Design For Test Scan Test
100% (1)
Design For Test Scan Test
31 pages
Kobelco 6E - Hyd Motors PDF
100% (1)
Kobelco 6E - Hyd Motors PDF
26 pages
AQA GCSE Chem C2 Summary Question Answers
No ratings yet
AQA GCSE Chem C2 Summary Question Answers
4 pages
Heat Treatment of Steel: Assessment Performance Criteria
No ratings yet
Heat Treatment of Steel: Assessment Performance Criteria
6 pages
Trojan Port List
No ratings yet
Trojan Port List
13 pages
MacOS Monograph
No ratings yet
MacOS Monograph
58 pages
MFJ 249 Manual
No ratings yet
MFJ 249 Manual
18 pages
Thesis Topics On Image Processing
100% (3)
Thesis Topics On Image Processing
6 pages
Tables and Formulas Used in Dry-Run
No ratings yet
Tables and Formulas Used in Dry-Run
3 pages
Determine and Describe The Intersection of Sets Using Various Representations and B
No ratings yet
Determine and Describe The Intersection of Sets Using Various Representations and B
18 pages
Amrut Brochure
100% (1)
Amrut Brochure
19 pages
Chapter 24 Spectroscopic Methods
No ratings yet
Chapter 24 Spectroscopic Methods
44 pages
SAFECode Dev Practices0211
No ratings yet
SAFECode Dev Practices0211
56 pages
Ergonomically Designed Turmeric - FINALE
No ratings yet
Ergonomically Designed Turmeric - FINALE
24 pages
Chromatopac C-R3A. Instruction Manual (расп.)
100% (1)
Chromatopac C-R3A. Instruction Manual (расп.)
210 pages
CH 19 Cardiovascular System
No ratings yet
CH 19 Cardiovascular System
25 pages
Introduction To Language and Communication-Week11
No ratings yet
Introduction To Language and Communication-Week11
33 pages
PS1 Solutions PDF
100% (1)
PS1 Solutions PDF
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

MATERIAL 5-Discriminant PDF

Uploaded by

MATERIAL 5-Discriminant PDF

Uploaded by

DISCRIMINANT ANALYSIS

CHATFIELD, C. and COLLINS, A.J. (1980): Introduction to Multivariate

CUADRAS, C. M. (2014). Nuevos Métodos de Análisis Multivariante.

KRZANOWSKI, W. J.(1988). Principles of multivariate analysis. Oxford

MARDIA, K.V.; KENT, J.T. y BIBBY, J.M. (1979): Multivariate Analysis.

MORRISON, D.F. (1978): Multivariate Statistical Methods. McGraw-Hill,

CLASSIFICATION: DISCRIMINATION AND CLUSTERING

Some applications: Voting intention, Archaeological Classification, Level of

In an interesting study on voting intention, developed in the Galician

Using the Discriminant Analysis technique, a total of 10 variables were

Consequently, it was proposed to use the variables involved in the

The analysis starts from a matrix of data Y of n individuals in which p

THE DISCRIMINANT FUNCTION FOR TWO GROUPS

And transformed to scalars,

𝐚 = 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )

Then, the discriminant function that we're looking for, is:

𝑧 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲

But 𝐚 = 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 ):

[ ( 𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )]𝟐

[ ( 𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 − 𝐲̅2 )]𝟐

Standardized distance Standardized distance

And the discriminant function is:

Mytilicola intestinalis es un copépodo parásito del mejillón, que en estado

The descriptive aspect of discriminant analysis, in which group separations

Let us start by assuming that our p-dimensional observations belong to one

This is an example of a classification rule. Note that it has the effect of

Note 1: the quadratic term 𝐲 ′ 𝐲 (∗) cancels out to leave a discrimination

Note 3: It represents a hyperplane_orthogonal to the vector 𝛍1 − 𝛍2 .That

What is the classification rule? We first compute the mid-point:

Therefore the boundary is given by the line 𝑦2 = 3𝑦1 . Observations lying

Exercise: y=(5,2); y=(-1, 1)

In the case of two populations, we have a sampling unit to be classified into

Where y is the vector of measurement on a new sampling unit that we wish

⟹ z̅1 + z̅2 = 𝐚′ 𝐲̅1 + 𝐚′ 𝐲̅2 but 𝐚′ = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏

⟹ z̅1 + z̅2 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 (𝐲̅1 + 𝐲̅2 )

But 𝑧 = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 𝐲

Therefore the classification rule in terms of y is:

Note: Fisher’s (1936) approach using (**) and (***) is essentially

Note: 𝑧̅1 > 𝑧̅2

z̅1 = 𝐚′ 𝐲̅1 y z̅2 = 𝐚′ 𝐲̅2 and 𝐚′ = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏

Example: For the psychological data

Then: 𝐚′ = (𝐲̅1 − 𝐲̅2 )′ 𝐒𝐩𝐥 −𝟏 = (0.5104, −0.2032, 0.4660, −0.3097)

Thus, we assign an observation vector y to G1 if:

And assign y to G2 if z< 7.4927.

So, for y = (15, 17, 24, 14), we have:

known as a discriminant function which determines a partition of ℝ𝑝 ,

𝑄𝑖 (𝐲) > 𝑄𝑗 (𝐲) ∀ 𝑗 ≠ 𝑖

MAXIMUM LIKELIHOOD DISCRIMINATION

So for observations 𝐲 ∈ ℝ𝑝 , each model 𝑓𝑖 (𝐲) represents a probability

The simplest case of maximum likelihood discrimination arises in the case

Then, the discriminant function is:

Note that this is a quadratic form in y.

There are several ways to find a degree of misclassification. One of them is

When we have several groups, we will have several possible rules of

Population 2 if W12 <0 and W23 >0

Population 3 if W13 <0 and W23 <0

Z1= -0,43(SepalLen) -0,52(SepalWid)+ 0,95(PetalLen)+ 0,58 (Petalwid)

Z2= 0,01(SepalLen) +0,74(SepalWid)- 0,40(PetalLen)+ 0,58 (Petalwid)

Centroides en el espacio discriminante

En filas se representa el grupo al que pertenece la observación y en

3.- A company conducts research into the possibility of a competitor's

Do a Discriminant Analysis and interpret the results.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: Fisher’s (1936) approach using () and (*) is essentially