1 ML Introduction
1 ML Introduction
Introduction
Dariush Hosseini
dariush.hosseini@ucl.ac.uk
Department of Computer Science
University College London
1 36
Lecture Overview
Lecture Overview
1 Lecture Overview
5 Summary
2 36
Lecture Overview
3 36
Machine Learning: Context & Aims
Lecture Overview
1 Lecture Overview
5 Summary
4 36
Machine Learning: Context & Aims
Academic Context
ML
Engineering Statistics
6 36
Machine Learning: Context & Aims
Redmon et al, ‘You Only Look Once: Unified, Real-Time Object Detection’ [2015]
Link
1
Johnson et al, ‘DenseCap: Fully Convolutional Localization Networks for Dense Captioning’ [2015]
7 36
Machine Learning: Context & Aims
Example: Recommendations
8 36
Machine Learning: Context & Aims
2
Gatys et al, ‘A Neural Algorithm of Artistic Style’ [2015]
9 36
Machine Learning: Context & Aims
Tasks
Supervised Learning
Training data is labelled and we seek to predict outputs given inputs
Classification - where outputs are discrete
Regression - where the outputs are real-valued
Unsupervised Learning
Training data is unlabelled and we seek structure
Dimensionality Reduction
Clustering
Reinforcement Learning
Exploration / Exploitation
Agent seeks to learn the optimal actions to take based on the
outcomes of past actions
Link
10 36
Machine Learning: Paradigms
Lecture Overview
1 Lecture Overview
5 Summary
11 36
Machine Learning: Paradigms
A Learning Framework
12 36
Machine Learning: Paradigms
Online
Non- Learning
Per-
Probabilistic ceptron
Uncertainty
Probabilistic:
Generative
Logistic
Probabilistic: Frequentist Regres-
Distribution- sion
Free
Bayesian Naı̈ve
Bayes
COLT
G.P.’s
Bayesian
Ridge Re-
gression
Boosting
SVM
3
N.B. This is illustrative - the same algorithm can often be generated by separate approaches.
13 36
Machine Learning: Paradigms
...And Another4
4
https://xkcd.com/
14 36
Attributes of Machine Learning Systems
Lecture Overview
1 Lecture Overview
5 Summary
15 36
Attributes of Machine Learning Systems
Evaluation
We provide the computer with an objective (or cost or loss) function
with which to distinguish good hypotheses from bad ones
Optimisation
We need a procedure for sorting through our hypothesis class, and
for selecting the one which produces the optimal results upon
evaluation
16 36
Attributes of Machine Learning Systems / Motivating Example
A Motivating Example
17 36
Attributes of Machine Learning Systems / Motivating Example
Data
For each species, we have the measurements of 21 fleas, meaning
that we have 63 data points in total
We say that the species of the flea is our output or target variable
18 36
Attributes of Machine Learning Systems / Motivating Example
Data
18 concinna
heptapotamica
x2 , Aedeagus Front Angle (×7.5◦ )
heikertingeri
16
14
12
10
115 120 125 130 135 140 145 150 155 160 165 170
19 36
Attributes of Machine Learning Systems / Motivating Example
Training Data
We apply our learning algorithm to a portion of this data (in this
case all of it) in order to train or learn a function (or model) which
we can use to classify fleas
(i ) (i ) T
h i
where x(i ) = x1 , x2 ∈ R2 , y (i ) ∈ {1, 2, 3}, and n = 63
Model
f : R2 → {1, 2, 3}
y ≈ f (x)
21 36
Attributes of Machine Learning Systems / Motivating Example
A Simple Model
if ((x1 > 128 and x1 < 146) and (x2 > 7.5 and x2 < 12.5))
22 36
Attributes of Machine Learning Systems / Motivating Example
A Simple Model
18
x2 , Aedeagus Front Angle (×7.5◦ )
16
14
12
10
115 120 125 130 135 140 145 150 155 160 165 170
23 36
Attributes of Machine Learning Systems / Motivating Example
Training Phase
Data Transformation
optional
Representation
Learner Evaluation
Optimisation
Labelled Data
produces
Testing / Prediction Phase
Model
24 36
Attributes of Machine Learning Systems / Representation
Representation
Now let’s modify the flea classification setting, such that we now
seek to classify a flea as concinna or not
25 36
Attributes of Machine Learning Systems / Representation
Representation
18
(a,b)
x2 , Aedeagus Front Angle (×7.5◦ )
16
14
12 (a + w,b + h)
10
115 120 125 130 135 140 145 150 155 160 165 170
26 36
Attributes of Machine Learning Systems / Representation
Representation
This is our hypothesis space and is the space of all such possible
rectangles
27 36
Attributes of Machine Learning Systems / Evaluation
Error Evaluation
18
x2 , Aedeagus Front Angle (×7.5◦ )
16
14
12
10
115 120 125 130 135 140 145 150 155 160 165 170
28 36
Attributes of Machine Learning Systems / Evaluation
Error Evaluation
Clearly this error should take into account the training data and the
function which we are evaluating
E:R×R→R
29 36
Attributes of Machine Learning Systems / Evaluation
Error Evaluation
The choice of loss measure will depend on the task and on the
representation being used.
30 36
Attributes of Machine Learning Systems / Evaluation
Error Evaluation
The loss function is a mapping from the loss measure, and some
aspect of the data (either in or out of sample), which aggregates the
loss measure evaluated at the data points as an expectation.
31 36
Attributes of Machine Learning Systems / Evaluation
Error Evaluation
Evaluated in sample, for a realised dataset, S, we define the
empirical loss, L : (E, S, fθ ) 7→ L(E, S, fθ ) ∈ R, as follows:
n
1X
L(E, S, fθ ) = ES [E(fθ (X), Y)] = E(fθ (x(i ) ), y (i ) )
n
i =1
32 36
Attributes of Machine Learning Systems / Evaluation
Error Evaluation
1
L(E, D, fθ ) = ED (fθ (X) − Y)2
2
33 36
Attributes of Machine Learning Systems / Optimisation
Optimisation
Now that we have a way of measuring the loss of a particular model
we can proceed to select the optimal one, characterised by θ∗ ,
which exhibits minimal loss.
5
N.B. Here we use the empirical loss for illustration.
We shall look more carefully at the generalisation loss and its approximations in subsequent lectures.
34 36
Summary
Lecture Overview
1 Lecture Overview
5 Summary
35 36
Summary
Lecture Summary
In the next lecture we will begin to disucss the mathematical tools that
will be useful in understanding the origin of many of the algorithms which
we will encounter over this term.
36 36