Slides 0411
Slides 0411
Donghui Yan
Outline
• Introduction
• Formalism and terminology
• Evaluation methodology
• AlphaGo
I Beat Ke Jie (ranked #1 in world) 3:0 in 2017
I Major milestone in AI research
• Self-driving
• Conversation robot.
• Classification
I Y ∈ C = {c1 , c2 , ..., ck }, called labels
• Clustering
I Y not given (often called unsupervised learning)
• Regression
I Y ∈ R, called response
• Ranking
• And a lot more new topics emerging in recent years
I Topic model (e.g., what is the topic of a blogger article)
I Manifold (topological) learning
I Salient sentence extraction
I Graph learning etc.
Loss function
Depends on the application, typical loss functions
• The 0-1 loss
1, if f (X) 6= Y
cost =
0, otherwise.
I A loss function of special interest and most commonly used
• Cost-sensitive loss functions, i.e., a cost matrix, for a 6= b,
0 b
a 0
I Suitable when errors in diff classes have diff consequence
– e.g., fraud detection, cost a when mistaking fraud as normal and b
when mistaking normal as fraud.
Function class
• Linear classifiers
I Logistic regression: logit(P (Y |X)) = Xβ
I SVM: f (X) = ni=1 wi K(Xi , X) + w0
P
• Boosting
PT (n)
I f (X) = i=1 ai h(X1 , ..., Xn , X)
with h from some data dependent basis library
• Tree-based classifiers
I C4.5, CART
I Random Forests and its variants.
Evaluation methodology
Evaluation illustrated
Performance metrics
• Error rate
– Most commonly used in statistics and machine learning
• Kappa statistics
– Commonly used in remote sensing, medical assessment
• Area under curve (AUC)
– When detection and false alarm rate matter, e.g., biomarker
discovery, anomaly/fraud detection.
Kappa statistic
I C = # classes
I nij = # points from class i but classified as j
I n = size of the sample.
Calculation of κ
where
C C
X ni. n.i 1X
Pexpected = . , Pobserved = nii .
n n n
i=1 i=1
Example κ statistics
Confusion matrix of Logit on the South Africa Heart data
True/Predicted 1 2 Total
1 130 41 171
2 22 38 60
Total 152 79 231