0% found this document useful (0 votes)
66 views36 pages

1 ML Introduction

This document provides an introduction and overview of machine learning. It discusses the context and aims of machine learning, including how it allows computers to find patterns in data without being explicitly programmed. It also covers machine learning paradigms and approaches to learning, as well as the key components of a machine learning system, using a flea classification example to illustrate these concepts. The document is intended to help readers understand the basic concepts and goals of the machine learning field.

Uploaded by

karuneshn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views36 pages

1 ML Introduction

This document provides an introduction and overview of machine learning. It discusses the context and aims of machine learning, including how it allows computers to find patterns in data without being explicitly programmed. It also covers machine learning paradigms and approaches to learning, as well as the key components of a machine learning system, using a flea classification example to illustrate these concepts. The document is intended to help readers understand the basic concepts and goals of the machine learning field.

Uploaded by

karuneshn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Machine Learning

Introduction

Dariush Hosseini

dariush.hosseini@ucl.ac.uk
Department of Computer Science
University College London

1 36
Lecture Overview

Lecture Overview

1 Lecture Overview

2 Machine Learning: Context & Aims

3 Machine Learning: Paradigms

4 Attributes of Machine Learning Systems

5 Summary

2 36
Lecture Overview

Learning Outcomes for Today’s Lecture

By the end of this lecture you should:

1 Know the context and aims of machine learning

2 Understand that machine learning algorithms have paradigmatic


motivations

3 Know the attributes of a machine learning system

3 36
Machine Learning: Context & Aims

Lecture Overview

1 Lecture Overview

2 Machine Learning: Context & Aims

3 Machine Learning: Paradigms

4 Attributes of Machine Learning Systems

5 Summary

4 36
Machine Learning: Context & Aims

So what is Machine Learning?

Allows computers to find hidden patterns...


...without being explicitly programmed to do so.

Method of data analysis that automates (principled) model building


Infers knowledge from data
Generalises this to unseen data

Practical yet (often) principled science of inductive rather than


deductive reasoning
5 36
Machine Learning: Context & Aims

Academic Context

An interdisciplinary field that develops both the mathematical


foundations and practical applications of systems that learn
from data.

Computer Science Mathematics

ML

Engineering Statistics

6 36
Machine Learning: Context & Aims

Example: Image Recognition1

Redmon et al, ‘You Only Look Once: Unified, Real-Time Object Detection’ [2015]
Link
1
Johnson et al, ‘DenseCap: Fully Convolutional Localization Networks for Dense Captioning’ [2015]
7 36
Machine Learning: Context & Aims

Example: Recommendations

8 36
Machine Learning: Context & Aims

Example: Neural Style2

2
Gatys et al, ‘A Neural Algorithm of Artistic Style’ [2015]
9 36
Machine Learning: Context & Aims

Tasks
Supervised Learning
Training data is labelled and we seek to predict outputs given inputs
Classification - where outputs are discrete
Regression - where the outputs are real-valued

Unsupervised Learning
Training data is unlabelled and we seek structure
Dimensionality Reduction
Clustering

Reinforcement Learning
Exploration / Exploitation
Agent seeks to learn the optimal actions to take based on the
outcomes of past actions
Link

10 36
Machine Learning: Paradigms

Lecture Overview

1 Lecture Overview

2 Machine Learning: Context & Aims

3 Machine Learning: Paradigms

4 Attributes of Machine Learning Systems

5 Summary

11 36
Machine Learning: Paradigms

A Learning Framework

But Machine Learning is more than a menu of algorithms...

It also offers a framework for how we should create these


algorithms...

It can provide a set of principled approaches to algorithm design,


and hence a logic of induction, a way to reason with uncertainty...

12 36
Machine Learning: Paradigms

Some Approaches to Learning 3


SGD

Online
Non- Learning
Per-
Probabilistic ceptron

Uncertainty

Probabilistic:
Generative
Logistic
Probabilistic: Frequentist Regres-
Distribution- sion

Free

Bayesian Naı̈ve
Bayes

COLT
G.P.’s

Bayesian
Ridge Re-
gression

Boosting
SVM

3
N.B. This is illustrative - the same algorithm can often be generated by separate approaches.
13 36
Machine Learning: Paradigms

...And Another4

4
https://xkcd.com/
14 36
Attributes of Machine Learning Systems

Lecture Overview

1 Lecture Overview

2 Machine Learning: Context & Aims

3 Machine Learning: Paradigms

4 Attributes of Machine Learning Systems

5 Summary

15 36
Attributes of Machine Learning Systems

Components of a Machine Learning System


Representation
Output of a learning algorithm is a function (or hypothesis or
model)
We select this function from a set of functions which we provide to
the computer
This set is called the function (or hypothesis or model) class or
the representation of the learner

Evaluation
We provide the computer with an objective (or cost or loss) function
with which to distinguish good hypotheses from bad ones

Optimisation
We need a procedure for sorting through our hypothesis class, and
for selecting the one which produces the optimal results upon
evaluation
16 36
Attributes of Machine Learning Systems / Motivating Example

A Motivating Example

A simple motivating example.

We are interested in automatically


identifying the species of a flea based
on measurements taken of its body.

This is a classification task.

17 36
Attributes of Machine Learning Systems / Motivating Example

Data
For each species, we have the measurements of 21 fleas, meaning
that we have 63 data points in total

We say that the species of the flea is our output or target variable

We take two measurements from each flea and we will refer to


these measurements as our attributes or features, which make up
our input variable

Since we have two measurements, we can plot our data in


2-dimensional feature space, and colour the data points to indicate
their output class

18 36
Attributes of Machine Learning Systems / Motivating Example

Data

18 concinna
heptapotamica
x2 , Aedeagus Front Angle (×7.5◦ )

heikertingeri
16

14

12

10

115 120 125 130 135 140 145 150 155 160 165 170

x1 , Maximal Aedeagus Width (µm)

19 36
Attributes of Machine Learning Systems / Motivating Example

Training Data
We apply our learning algorithm to a portion of this data (in this
case all of it) in order to train or learn a function (or model) which
we can use to classify fleas

We call this portion the training data, S

More formally, we might represent this training data as:

S = {(x(1) , y (1) ), (x(2) , y (2) ), . . . , (x(n) , y (n) )}

(i ) (i ) T
h i
where x(i ) = x1 , x2 ∈ R2 , y (i ) ∈ {1, 2, 3}, and n = 63

Here x(i ) are outcomes of a random variable, X, while y (i ) , are


outcomes of a random variable Y
20 36
Attributes of Machine Learning Systems / Motivating Example

Model

And we represent the function (or model) which we wish to learn as


f , which maps from our inputs to the set of outputs, such that:

f : R2 → {1, 2, 3}

Clearly we wish to learn a particular f such that:

y ≈ f (x)

21 36
Attributes of Machine Learning Systems / Motivating Example

A Simple Model

One of the simplest approaches to perform classification is to


produce a set of rules which split our feature space up into distinct
regions

For example, a classification rule for heptapotamica might be:

if ((x1 > 128 and x1 < 146) and (x2 > 7.5 and x2 < 12.5))

So we end up with a set of rules for each species, and we then


apply these rules to a new flea to identify its species

22 36
Attributes of Machine Learning Systems / Motivating Example

A Simple Model

18
x2 , Aedeagus Front Angle (×7.5◦ )

16

14

12

10

115 120 125 130 135 140 145 150 155 160 165 170

x1 , Maximal Aedeagus Width (µm)

23 36
Attributes of Machine Learning Systems / Motivating Example

Training Phase

Data Transformation
optional

Representation
Learner Evaluation
Optimisation

Labelled Data
produces
Testing / Prediction Phase

Model

New Data Predicted Outputs

24 36
Attributes of Machine Learning Systems / Representation

Representation

Now let’s modify the flea classification setting, such that we now
seek to classify a flea as concinna or not

Correspondingly our output set is modified such that y ∈ {0, 1}

And a particular rule-based fθ is characterised by the parameters:


θ = [a, b, w , h] as follows:

25 36
Attributes of Machine Learning Systems / Representation

Representation

18
(a,b)
x2 , Aedeagus Front Angle (×7.5◦ )

16

14

12 (a + w,b + h)

10

115 120 125 130 135 140 145 150 155 160 165 170

x1 , Maximal Aedeagus Width (µm)

26 36
Attributes of Machine Learning Systems / Representation

Representation

Since our hypothesis uses four parameters to represent our positive


class, we can define the space of all possible combinations of these
parameters as:

F = {fθ |θ = [a, b, w , h], ∀ [a, b, w , h] ∈ R4 }

This is our hypothesis space and is the space of all such possible
rectangles

The learning process of our classifier is the task of searching F for


the best possible hypothesis: fθ ∈ F

27 36
Attributes of Machine Learning Systems / Evaluation

Error Evaluation

18
x2 , Aedeagus Front Angle (×7.5◦ )

16

14

12

10

115 120 125 130 135 140 145 150 155 160 165 170

x1 , Maximal Aedeagus Width (µm)

28 36
Attributes of Machine Learning Systems / Evaluation

Error Evaluation

In order to assess how good a particular model is we need a way of


evaluating the error.

Clearly this error should take into account the training data and the
function which we are evaluating

For this we need to first define a loss measure, E, a similarity


mapping between two inputs, a ∈ R, b ∈ R:

E:R×R→R

29 36
Attributes of Machine Learning Systems / Evaluation

Error Evaluation

There are numerous functions that we can use as loss measures.


Some common ones include:

Squared Error: E(fθ (x), y ) = (fθ (x) − y )2

Absolute Error: E(fθ (x), y ) = |fθ (x) − y |

Misclassification Error: E(fθ (x), y ) = I[y 6= fθ (x)]

The choice of loss measure will depend on the task and on the
representation being used.

30 36
Attributes of Machine Learning Systems / Evaluation

Error Evaluation

After we have defined a loss measure, we must then use it to define


a loss function, L.

The loss function is a mapping from the loss measure, and some
aspect of the data (either in or out of sample), which aggregates the
loss measure evaluated at the data points as an expectation.

The loss function allows us to evaluate how well a particular model


performs with respect to a particular loss measure.

31 36
Attributes of Machine Learning Systems / Evaluation

Error Evaluation
Evaluated in sample, for a realised dataset, S, we define the
empirical loss, L : (E, S, fθ ) 7→ L(E, S, fθ ) ∈ R, as follows:
n
1X
L(E, S, fθ ) = ES [E(fθ (X), Y)] = E(fθ (x(i ) ), y (i ) )
n
i =1

Evaluated out of sample, for a data generating distribution, D,


characterised by a probability distribution function, pX,Y , from
which S is drawn (i.e. S ∼ Dn ), we define the generalisation loss,
L : (E, D, fθ ) 7→ L(E, D, fθ ) ∈ R, as follows:
ZZ
L(E, D, fθ ) = ED [E(fθ (X), Y)] = E(fθ (x), y )pX,Y (x, y ) dx dy

32 36
Attributes of Machine Learning Systems / Evaluation

Error Evaluation

So, for example, for the squared error loss measure:


n
1X1
L(E, S, fθ ) = (fθ (x(i ) ) − y (i ) )2
n 2
i =1

1
 
L(E, D, fθ ) = ED (fθ (X) − Y)2
2

33 36
Attributes of Machine Learning Systems / Optimisation

Optimisation
Now that we have a way of measuring the loss of a particular model
we can proceed to select the optimal one, characterised by θ∗ ,
which exhibits minimal loss.

Given our representation (hypothesis class), we will typically use


an optimisation approach coupled with a suitable evaluation
method (loss function) to help us search for the optimal values of θ.
We can write this formally as5 :
Representation
z}|{

θ = argmin L( E, S, fθ )
θ
| {z } | {z }
Optimisation Evaluation

5
N.B. Here we use the empirical loss for illustration.
We shall look more carefully at the generalisation loss and its approximations in subsequent lectures.
34 36
Summary

Lecture Overview

1 Lecture Overview

2 Machine Learning: Context & Aims

3 Machine Learning: Paradigms

4 Attributes of Machine Learning Systems

5 Summary

35 36
Summary

Lecture Summary

1 We can characterise Machine Learning in a number of different


ways: academic discipline; task; algorithm; etc.

2 It is important to remember the value of understanding the


theoretical motivation of a learning algorithm

3 A useful guide in our examination of learning algorithms will be the


Representation + Evaluation + Optimisation view

In the next lecture we will begin to disucss the mathematical tools that
will be useful in understanding the origin of many of the algorithms which
we will encounter over this term.

36 36

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy