Pert 3 Advanced Feature Selection Teqnique
Pert 3 Advanced Feature Selection Teqnique
An overview
Thanks to Qiang Yang
Modified by Charles Ling
Feature selection:
Problem of selecting some subset of a
learning algorithm’s input variables upon
which it should focus attention, while
ignoring the rest
(DIMENSIONALITY REDUCTION)
Biology
Monkeys performing classification task
Diagnostic features:
- Eye separation
- Eye height
Non-Diagnostic features:
- Mouth height
Data Mining: Concepts and Techniq
- Noseueslength 4/544
Motivational example from
Biology
Monkeys performing classification task
Results:
activity of a population of 150 neurons
features
Feature extraction creates new features
F F‘
F‘
} Mining:
{ f1 ,..., fi ,..., f nData {g1 ( and
Concepts
f . extraction
f1 ,..., f n ),..., g j ( f1 ,..., f n ),..., g m ( f1 ,..., f n )}
Techniq
ues 9
Outline
generalization.
Data Mining: Concepts and Techniq
ues 13
Feature reduction in task 2
task 2: We’re interested in features—we want to
know which are relevant. If we fit a model, it
should be interpretable.
What causes lung cancer?
Features are aspects of a patient’s
medical history
Binary response variable: did the
night
Data Mining: Concepts and Techniq
ues 15
Outline
Kohavi-John, 1997
Question: Are
attributes A1 and A2 Outlook Tempreature Humidity Windy Class
sunny hot high false N
independent? sunny hot high true N
If they are very overcast hot high false P
rain mild high false P
dependent, we can rain cool normal false P
rain cool normal true N
remove either overcast cool normal true P
A1 or A2 sunny mild high false N
sunny cool normal false P
If A1 is independent rain mild normal false P
on a class attribute sunny mild normal true P
overcast mild high true P
A2, we can overcast hot normal false P
remove A1 from our rain mild high true N
training data
Data Mining: Concepts and Techniq
ues 27
Chi-Squared Test (cont.)
Question: Are attributes A1 and A2 independent?
Outlook Temperatur
e
Sunny High
Cloudy Low
Sunny High
Cloudy Low
Cloudy 0 1 1
Sunny High
Tempera 2 1 Total
ture count in
Subtotal: table =3
Observed
80 40 120
numbers (O)
Expected
60 60 120
numbers (E)
O-E 20 -20 0
@data
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
sunny,mild,high,FALSE,no
sunny,cool,normal,FALSE,yes
rainy,mild,normal,FALSE,yes
sunny,mild,normal,TRUE,yes
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes
rainy,mild,high,TRUE,no
A1? A6?
eigen x
x x
x
vector, x x
x x X1
Y2 is the x
x
x Key observation:
x
second. Y2
x variance = largest!
ignorable.
n 30 70
(X
i 1
i X )(Yi Y ) 30
30
70
90
cov( X , Y )
(n 1) 40 70
Data Mining: Concepts and Techniq 30 90
ues 46
More than two attributes: covariance
matrix
Contains covariance values between all
possible dimensions (=attributes):
nxn
C (cij | cij cov(Dimi , Dim j ))
Example for three attributes (x,y,z):
Eigenvectors e : C e = e
How to calculate e and :
Calculate det(C-I), yields a polynomial (degree
n)
Determine roots to det(C-I)=0, roots are
eigenvalues
Check out any math book such as
Elementary Linear Algebra by Howard Anton,
39 74 14.9 20.25 30
20
10
0
30 87 5.9 33.25 0 10 20 30 40 50
30 23 5.9 -30.75
40
30
15 35 -9.1 -18.75 20
10
-20
15 32 -9.1 -21.75
-30
-40
30 73 5.9 19.25
Data Mining: Concepts and Techniq
ues 50
Covariance Matrix
75 106
C=
106 482
e1=(-0.98,-0.21), 1=51.8
e2=(0.21,-0.98), 2=560.2
dimension of 0 31.37
4
e2=(0.21,-0.98) -40 -20 -0.1 0 20 40
16.46
-0.2
We can obtain the final -0.3
4
8.624
data as -0.4
-0.5 19.40
4
-17.63
0.21
yi xi1 xi 2 0.21* xi1 0.98 * xi 2
0.98
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
Transform data