0% found this document useful (0 votes)

55 views86 pages

Lecture B1 - Overview and Intro

Uploaded by

Riccardo Forte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views86 pages

Lecture B1 - Overview and Intro

Uploaded by

Riccardo Forte

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 86

Beginselen van Machine Learning

MSc Computerwetenschappen, Artificiële Intelligentie

Principles of Machine Learning

MSc Computer Science

H. Blockeel, J. Davis, L. De Raedt

Overview and
practicalities
Actually 2 courses…
• H0E96A: Beginselen van machine learning
• H0E98A: Principles of machine learning
• Same content, different language

• Study materials for both courses are in English

• Live lectures mostly in Dutch. Lecture recordings in both Dutch and English
• Exercise sessions: explanations available in both Dutch and English, according
to student’s preferences
• Exam: bilingual Dutch / English

• Toledo: only one course is activated, please register for H0E96A (Beginselen…)

3
Goals & materials
• Goals of the course:
• 1. provide an overview of machine learning theory and methods, including insight
in how / why the different methods work
• 2. enable you to apply machine learning in non-trivial contexts

• Study material: slides + recordings + reader (pointers to a selection of texts)

• Machine learning is a broad and fast-evolving domain
• Textbooks tend to be either deep and narrow (focusing on one particular type of
approach), or broad but less deep
• This course aims at being broad and deep, and is meant for students with a
strong technical background in engineering, mathematics, and computer science
• No suitable single textbook

4
Background
• Course reserved for Master of CS / Master of AI
• We assume knowledge of
• Calculus
• Linear algebra
• Probability theory
• Statistics
• Programming & Algorithms
• Artificial Intelligence (H06U1A)
as seen in earlier courses in KU Leuven’s bachelor programs
Informatica / Burgerlijk Ingenieur - Computerwetenschappen

5
Teaching schedule
Week Wednesday Thursday Ex.

1 23/9 Introduction - Blockeel K-nearest neighbors, evaluation metrics - Davis

2 30/9 Decision trees - Blockeel - Flipped classroom Ensembles - Davis
14:00 Dutch - 15:00 English

3 7/10 Rule learning - Blockeel - 14:00 - 15:00 Experimental methodology - Davis

4 14/10 Learning theory - Blockeel

5 21/10 Q&A session Statistical learning, introduction to optimization - Davis
6 28/10 Support vector machines - Blockeel
7 4/11 Artificial neural networks - Blockeel
8 11/11 Artificial neural networks - Blockeel

9 18/11 Q&A session

10 25/11
11 2/12 Reinforcement learning - De Raedt Reinforcement learning - De Raedt
12 9/12

Dutch (recording in English available)

English (recording with Dutch subtitles available)
Bilingual

6
✔︎
✔︎
✔︎
✔︎
✔︎
✔︎
✔︎
✔︎
The Exam
• Written exam, closed book. Testing both knowledge and insight, both theory and practice.

• Q: “Up to what level of detail should I study this? Do I need to know everything that’s in the
reader?”
• A: You should
• know & understand everything mentioned in the lectures
• be able to solve problems of similar difficulty as those covered in the exercise sessions
• be able to answer questions that require you to reason about the concepts you’ve
seen
• be able to extrapolate: apply a concept in a different context
Reading materials are meant to help you digest the content of the lectures, not provide
additional content (unless mentioned otherwise)

7
You should be able to…
• Execute (parts of) algorithms we’ve seen on concrete data. E.g.: given some data,
• show the kernel matrix that an SVM learner computes
• perform a backpropagation step in a given neural network
• Reason about the behavior of an algorithm on a more abstract level (based on its
properties rather than on mimicking it)
• E.g.: an SVM has k support vectors out of n instances, and 0 training error: give a
lower bounds on the accuracy estimate obtained using leave-one-out cross-validation
• What would happen if we change … in the algorithm we’ve seen?
• The answer is not given in the course materials, but you can infer it by reasoning
about the concepts that we have seen
• Explain the architecture of a particular type of model, and explain its advantages and
disadvantages
• …

8
This is a challenging course
• In some exam periods, <50% of students pass this exam. Mean score
often <10. (Max score still often 19 or 20)
• Perceived causes for low grades:
• Not reading the question carefully
• Tendency to reproduce rather than reason
• Tendency to study questions & answers, rather than the course
itself
• Superficial rather than deep understanding of course topics
• Standard investment in a 6-ECTS course: 180 hours (lectures,
exercises, self-study). Aim for 12 hours per week during the semester.

9
What is Machine
Learning?
Discussion time…
Learn from data
Find pattern in data and generalize

How would you define machine learning?

recognize cancer, amazon with

In what contexts is it used? advertisment

How does it relate to the rest of AI?

learning from examples is machine learning

AI is broader

11
The RoboSail project
• Pieter Adriaans (Univ. of Amsterdam), around 2003: first auto-
pilot for sailing boats (www.robosail.com)
• No suitable mathematical theory => had to learn how to sail
• Boat full of AI technology (agents, sensors, ...), including
machine learning components

12
Language learning
• Children learn language by simply hearing sentences being used
in a certain context
• Can a computer do the same: given examples (sentence +
description of context), learn the meaning of words/sentences?

“Mike is kicking the ball” Mental

Model
Learn Test

Data & pictures by Zitnick et al., 2013

13
Autonomous cars
• Very first race with fully autonomous cars was in… 2004
Censored
• DARPA grand challenge : have autonomous cars race each other
on desert roads
• In 2004, no winner - “best” car got about 12 km far
• In 2005, five cars made it to the finish (212 km)

14
The Robot Scientist
• King et al., Nature, 2004
• Scientific research, e.g., in drug discovery, is iterative:
• Determine what experiment to perform
• Perform the experiment Feedback

• Interpret the results

• Robot Scientist removes the human from the loop, by reasoning about its
own learning process: which new experiments will be most informative?

2nd version, “Eve” (2015)

discovered lead against
malaria on first run

15
Automating manual tasks
• E.g.: nurse rostering in a hospital: need to accommodate non-obvious
constraints (e.g., leave enough time between shifts)
• Hard to automate, unless constraints can be learned from earlier examples

Illustration from L. De Raedt’s

SYNTH project,
picture by G. De Smet

16
Other applications…
• Recommender systems: e.g., Google, Facebook, Amazon, … try
to show you ads you might like
• Email spam filters: by observing which mails you flag as “spam”,
try to learn your preferences
• Natural language processing: e.g., Sentiment analysis: is this
movie review mostly positive or negative?
• Vision: learn to recognize pedestrians, …
• … and many, many more

• P. Domingos’ bestseller The Master Algorithm provides an excellent

account of how machine learning affects our daily life

17
Definitions of machine learning?
• Tom Mitchell, 1996: Machine learning is the study of how to make programs
improve their performance on certain tasks from (own) experience
• “performance” = speed, accuracy, …
• “experience” = earlier observations
• “Improve performance” in the most general meaning: this includes learning
from scratch.
• Useful (in principle) for anything that we don’t know how to program -
computer “programs itself”
• Vision: recognizing faces, traffic signs, …
• Game playing, e.g., AlphaGo
• Link to artificial intelligence : computer solves hard problems autonomously

18
Machine learning vs. other AI
• In machine learning, the key is data
• Examples of questions & their answer
• Observations of earlier attempts to solve some problem

• Machine learning makes use of inductive inference: reasoning from

specific to general
• In statistics: sample → population
• In philosophy of science: concrete observations → general theory
• In machine learning: observations → any situation

• This aspect of machine learning links it to data mining, data analysis,

statistics, …

19
Machine learning and AI
• Many misconceptions about machine learning these days
• Since the mid-2000s, “Deep Learning” has received a lot of
attention: it revolutionized computer vision, speech recognition,
natural language processing
• Avalanche of new researchers drawn to the field, without knowledge
of the broader field of AI, or history of ML (“AI = ML = deep learning”)
• See, e.g., A. Darwiche, https://www.youtube.com/watch?v=UTzCwCic-Do
(also published in Communications of the ACM, October 2018)

AI Logic, Machine Deep

expert systems Learning Learning
Timeline
1970 1980 1990 2000 2010
“did not work”, “precise”, “formal”, “works best!”,
“not precise” “worked much better” “forget all the rest”

20
Machine learning and AI
• There is still progress on all fronts, deep learning is just one of them
• This course reflects that viewpoint
• (schema below is incomplete, just serves to illustrate complexity of
scientific impact)

AI Constraint solving SAT solvers, ASP, …

l l for Agents AI
e s
rk s w blem
Wo e pro Machine Learning Inductive log. prog.
som
Logic
Too r Statistical
igi o r ks Lifted Learning
(noise d for other Probabilistic W Relational & Inference
, unce
rtaint
probl
ems Logics Doe Learning ML
y) sn ’t
“Subsymbolic” Deep learning
Neural networks DL
methods

21
Tasks Techniques Models Applications

The machine learning landscape

Automata Support vector
Statistical
machines
relational
Neural networks Natural
learning
Recommender language
systems Regression processing
Deep learning
Clustering Nearest neighbors
Decision trees
Convex
Rule learners
optimization Classification
Matrix
factorization Greedy
Transfer learning
search Bayesian
Vision
Probabilistic learning
graphical models Reinforcement learning
Learning theory
Speech

22
Basic concepts and
terminology
Machine learning
• ML in its typical form:
• Input = dataset
• Output = some kind of model
• ML in its most general form:
• input = knowledge
• output = a more general form of knowledge
• Learning = inferring general knowledge from more specific knowledge (observations
➔ model) = inductive inference

• Learning methods are often categorized according to the format of the input and
output, and according to the goal of the learning process (but there are many more
dimensions along which they can be categorized)

24
A typical task
• Given examples of pictures + label (saying what’s on the picture),
infer a procedure that will allow you to correctly label new pictures
• E.g.: learn to classify fish as “salmon” or “sea bass”

25
A generic approach
• Find informative features (here: lightness, width)
• Find a line/curve/hyperplane/… in this feature space that separates
the classes

26
27
28
29
Predictive versus descriptive
THE MODEL YOU LEARN REPRESENTS A MATHEMATICAL FUNCTION
• Predictive learning : learn a model that can predict a particular property / attribute / variable
from inputs
• Many tasks are special cases of predictive learning
• E.g., face recognition: given a picture of a face, say who it is
• E.g., spam filtering: given an email, say whether it’s spam or not
Name of task Learns a model that can …
Concept learning / Distinguish instances of class C from other instances
Binary classification
Classification Assign a class C (from a given set of classes) to an instance
Regression Assign a numerical value to an instance
Multi-label classification Assign a set of labels (from a given set) to an instance
Multivariate regression Assign a vector of numbers to an instance
Multi-target prediction Assign a vector of values (numerical, categorical) to an instance
Ranking Assign ≤ or > to a pair of instances

30
PREDICTIVE LEARNING: YOU LEARN FUNCTIONS

Predictive versus descriptive

• Descriptive learning : given a data set, describe certain
patterns in the dataset (or in the population it is drawn from)
• E.g., analyzing large databases:
• “Bank X always refuses loans to people who earn less than
1200 euros per month”
• “99.7% of all pregnant patients in this hospital are female”
• “At supermarket X, people who buy cheese are twice as likely
to also buy wine”

31
Function learning
• Task : learn a function X →Y that fits the given data (with X and Y sets
of variables that occur in the data)
• Such a function will obviously be useful for predicting Y from X
• May also be descriptive, if we can understand the function
• Often, some family of functions F is given, and we need to estimate
the parameters of the function f in F that best fits the data
• e.g., linear regression : determine a and b such that y = ax + b fits
the data as well as possible
• What does “fit the data” mean? Measured by a so-called loss function
2
∑
• e.g., quadratic loss: ( f (x) − y) with f the learned function and
(x,y)∈D
D the dataset

32
Distribution learning
• Task: given a data set drawn from a distribution, estimate this distribution
• Often made distinction: parametric vs. non-parametric
• Parametric: a family of distributions is given (e.g., “Gaussian”), we only need to
estimate the parameters of the target distribution
• Non-parametric: no specific family is assumed
• Often made distinction: generative vs. discriminative CAPISCI MEGLIO
• Generative: learn the joint probability distribution (JPD) over all variables (once
you have that, you can generate new instances by random sampling from it)
• Discriminative: learn a conditional probability distribution of Y given X, for some
given set of variables X (called input variables) and Y (called target variables)

Parametric Non-parametric LE COSE CHE GENERANO IMMAGINI

SONO GENERATIVE

. ...... . . . ...... . .
33
These categorizations are somewhat
fuzzy…
• A descriptive pattern may be useful for prediction
• “Bank X always refuses loans to people who earn less than
1200 euros per month” (description)
• Bob earns 1100 euros per month => Bank X will not give him
a loan
• While functions are directly useful for prediction, a probability
distribution can be used just as well
• Given known information X, predict as value for Y, the value
with the highest conditional probability given X

34
Parametric vs. non-parametric
• Parametric: a family of functions (or distributions, or …) is given,
and each function is uniquely defined by the values of a fixed set
of parameters
• e.g. (function learning): linear regression
• e.g. (distribution learning): fitting a gaussian
• Non-parametric: no specific family of functions is assumed
• Typically, we are searching a space that contains models with
varying structure, rather than just different parameter values
• This often requires searching a discrete space
• E.g.: decision trees, rules, …. (see later)

35
Link with “explainable AI”
• Explainable AI (XAI) refers to the study of AI systems that can explain their
decisions / whose decisions we can understand
• Two different levels here:
• We understand the (learned) model used for decision making
• We understand the individual decision
• E.g. “I could not get a loan because I earn too little”: we can understand this
decision even if we don’t know the whole decision process the bank uses
• A learned model that is not straightforward to interpret, is called a black-
box model
• Machine learning poses additional challenges for XAI, as it often learns
black-box models

36
Responsible AI : challenges
• Privacy-preserving data analysis
• We need lots of data to learn from; this may include personal data
• How can we guarantee that the analysis of these data will not violate
the privacy of the people whose data this is?
• Generally, when data is collected, consent is needed for a specific
purpose, and data must be used solely for that purpose — how can we
guarantee it won’t be abused?

• Learning “safe” models : models that will not violate certain constraints
that are imposed (including constraints on bias, discrimination, privacy, …)

37
Predictive learning
• A very large part of machine learning focuses on predictive
learning
• In the following, we zoom in on that part

38
Prediction: task definition
The prediction task, in general:
o Given: a description of some instance
o Predict: some property of interest (the “target”)

Examples:
o classify emails as spam / non-spam
o classify fish as salmon / bass
o forecast tomorrow’s weather based on today’s measurements

How? By analogy to cases seen before

39
Training & prediction sets
• Training set: a set of examples, instance descriptions that
include the target property (a.k.a. labeled instances)
• Prediction set: a set of instance descriptions that do not include
the target property (“unlabeled” instances)

• Prediction task : predict the labels of the unlabeled instances

Dog Dog Dog

??? ???

Cat Cat Cat

40
Inductive vs. transductive learning
We can consider as outcome of the learning process, either
• the predictions themselves: transductive learning
• or: a function that can predict the label of any unlabeled
instance: inductive learning

.(x1,y1) .(x1,y1)
.(x4, ) .(x4, )
.(x2,y2) .(x2,y2)
.(x5, ) .(x5, )
.(x3,y3) .(x3,y3)

Transduction: outcome=predictions Induction: outcome = function

for making predictions

41
Inductive vs. transductive learning
We can consider as outcome of the learning process, either
• the predictions themselves: transductive learning
• or: a function that can predict the label of any unlabeled
instance: inductive learning

f: X→Y
.(x1,y1) .(x1,y1)
.(x4,y4) .(x4, )
.(x2,y2) .(x2,y2)
.(x5,y5) .(x5, )
.(x3,y3) .(x3,y3)

Transduction: outcome=predictions Induction: outcome = function

for making predictions

42
Inductive vs. transductive learning
We can consider as outcome of the learning process, either
• the predictions themselves: transductive learning
• or: a function that can predict the label of any unlabeled
instance: inductive learning

f: X→Y
.(x1,y1) .(x1,y1)
.(x4,y4) .(x4,f(x4))
.(x2,y2) .(x2,y2)
.(x5,y5) .(x5,f(x5))
.(x3,y3) .(x3,y3)

Transduction: outcome=predictions Induction: outcome = function

for making predictions

43
Interpretable vs. black-box
The predictive function or model learned from the data may be
represented in a format that we can easily interpret, or not

Non-interpretable models are also called black-box models

In some cases, it is crucial that predictions can be explained (e.g.: bank

deciding whether to give you a loan)

Note difference between explaining a model and explaining a prediction

44
Overfitting and underfitting
• “Occam’s razor”: among equally accurate models, choose
the simpler one
• Trade-off: explain data vs. simplicity
• Both overfitting and underfitting are harmful

45
Levels of supervision
• Supervised learning: learning a (predictive) model from labeled
instances (as in cats & dogs example)

• Unsupervised learning: learning a model from unlabeled instances

• such models are usually not directly predictive (without any
information on what to predict, how could you learn from that?)
• still useful indirectly, or for non-predictive tasks: see later

• Semi-supervised learning: learn a predictive model from a few

labeled and many unlabeled examples

46
Semi-supervised learning
• How can unlabeled examples help learn a better model?

+ -

+
-
This illustration:
- 2 classes, called + and - ?
- Representing instances
in a 2-dimensional space

47
Semi-supervised learning
• How can unlabeled examples help learn a better model?

. . . . .. .
. . . . + . -
. . . . . . .
. . . .
. .
. + . .. .
.
. . . .
. . - . .
. ?. .
. . . . . .
. . . .
48
Semi-supervised learning
• How can unlabeled examples help learn a better model?

. . . . .. .
. . . . + . -
. . . . . . .
. . . .
. .
. + . .. .
.
. . . . .
. .. - . .
. . ?. .
. . . .
. . .
49
Unsupervised learning
• Can you see three classes here?
• Even though we don’t know the names of the classes, we still see some structure
(clusters) that we could use to predict which class a new instance belongs to
• Identifying this structure is called clustering
• From a predictive point of view, this is unsupervised learning

. . . .. .
. . . . .
. . . .
.
.. . .
. .. . .
. .
50
PU-learning
• PU-learning is a special case of semi-supervised learning
• PU stands for “positive / unlabeled”
• All the labeled examples belong to one class (called the
“positive” class)

. . “Mike is kicking the ball”

.+ . Learning the meaning of

. .. . .
“kicking the ball” requires
PU-learning because:
+ When Mike kicks the ball,

.. .. the sentence may mention

this, or not. When Mike does

. . . .
not kick the ball, it is never
mentioned that he does not.

.
51
Weakly supervised learning
• Weakly supervised learning is a generalized form of semi-supervised learning
• Semi-supervised: for a single instance, we either know its label or we do not
• Weakly supervised: we may have partial information about a label
• e.g., it is certainly a member of a given set (= superset learning)
• e.g., at least one instance among a given set of instances has the label,
but we do not know which one (= multi-instance learning)
• e.g., we know two instances have the same label, but we don’t know
which one it is (= constraint-based clustering)
• …

“There’s a Lamborghini in this picture”

“This is either a Ferrari

or a Lamborghini”

52
Relationship between different
supervision settings

Predictive learning

Supervised Unsupervised
Learning Learning
Weakly supervised

Semi-supervised Multi-instance Superset Constraint-based

learning learning learning clustering

PU-learning

53
Format of input data
Format of input data
• Input is often assumed to be a set of instances that are all described using the
same variables (features, attributes)
• The data are “i.i.d.”: “independent and identically distributed”
• The training set can be seen as a random sample from one distribution
• The training set can be shown as a table (instances x variables) : tabular data
• This is also called the standard setting

• There are other formats: instances can be

• nodes in a graph
• whole graphs
• elements of a sequence
• …

55
Format of input data: tabular
Sepal Sepal Petal Petal Class
length width length width
5.1 3.5 1.4 0.2 Setosa
Training
4.9 3.0 1.4 0.2 Setosa
set
7.0 3.2 4.7 1.4 Versicolor
6.3 3.3 6.0 2.5 Virginica

Sepal Sepal Petal Petal Class

length width length width
Prediction
4.8 3.2 1.3 0.3 ?
set
7.1 3.3 5.2 1.7 ?

56
Format of input data: sequences
• Learning from sequences:
• 1 prediction per sequence?
• 1 prediction per element?
• 1 element in sequence can be …
• A number (e.g., time series)
• A symbol (e.g., strings)
abababab: +
• A tuple aabbaabb: -
• A more complex structure

57
Format of input data: trees
• 1 prediction per tree / per node in the tree
• Nodes can be …
• Unlabeled
• Labeled with symbols (e.g., HTML/XML structures)
• …
ul-

- -
<li>- <li> <li>
E.g.: this tree indicates as “positive” a text field
preceded by Address: inside a list (<li>) context <b>- (text)+
“Adress:” -
58
Format of input data: graph
• Example: Social network
• Target value known for some
nodes, not for others
• Predict node label
• Predict edge
• Predict edge label
• …
• Use network structure for
these predictions

59
Format of input data: raw data
• “Raw” data are in a format that seems simple (e.g., a vector of
numbers), but components ≠ meaningful features
• Example: photo (vector of pixels)
• Raw data often need to be processed in a non-trivial way to obtain
meaningful features; on the basis of these features, a function can be
learned
• This is what deep learning excels at

(Image: Nielsen, 2017, Neural networks and deep learning)

60
Format of input data: knowledge
• “Knowledge” can consist of facts, rules, definitions, ….
• We can represent knowledge about some domain in a
knowledge representation language (such languages are often
based on logic)

atm(m1,a1,o,2,3.43,-3.11,0.04). ...
atm(m1,a2,c,2,6.03,-1.77,0.67). hacc(M,A):- atm(M,A,o,2,_,_,_).
... hacc(M,A):- atm(M,A,o,3,_,_,_).
bond(m1,a2,a3,2). hacc(M,A):- atm(M,A,s,2,_,_,_).
bond(m1,a5,a6,1). hacc(M,A):- atm(M,A,n,ar,_,_,_).
bond(m1,a6,a7,du). zincsite(M,A):-
... atm(M,A,du,_,_,_,_).
hdonor(M,A) :-
atm(M,A,h,_,_,_,_),
not(carbon_bond(M,A)), !.
...

61
Data preprocessing
• Data may not be in a format that your learner can handle
• Data wrangling: bring it into the right format
• Even if it’s in a format you learner can handle (e.g., tabular), the
features it contains may not be very informative, or there may be
very few relevant features among many irrelevant ones.
• E.g.: individual pixels in an image are usually not very informative
• Feature selection: select among many input features the most
informative ones
• Feature construction: construct new features, derived from the
given ones

62
What learning method to use?
• Which learners are suitable for your problem, depends strongly
(but not solely!) on the structure of the input data
• Most learners use the standard format
• A set of instances, where each instance is described by a
fixed set of attributes (a.k.a. features, variables)
• also called attribute-value format or tabular format
• At the other extreme, inductive logic programming handles any
kind of knowledge that can be represented using clausal logic
• This includes sequences, graphs, …

63
What learning method to use?
• The data format and the learning task impose strong constraints
on which learning methods can be used
• Other aspects determine whether the method performs well:
• Inductive bias (see later)
• Ability to handle missing values
• Ability to handle noise
• Ability to handle high-dimensional data
• Ability to handle large datasets
• Ability to generalize from small datasets (avoid overfitting)
• We’ll cover many of these aspects at different points in the course

64
Missing data Sepal Sepal Petal Petal Class
length width length width
5.1 ? 1.4 0.2 Setosa
Some training examples may have 4.9 3.0 1.4 0.2 Setosa
missing values… how to handle these?
7.0 3.2 ? 1.4 Versicolor
Some options:
6.3 3.3 6.0 2.5 Virginica
1. Leave out from training set
- Information loss… Missingness itself can be relevant! May
2. Guess the missing value correlate with class (e.g., exit polls), …
- What if guess is wrong? Statisticians distinguish MCAR, MAR, NMAR:
3. Treat ‘?’ as a separate value Missing (Completely) At Random, or Not
… program it to consider it just if you have it

Handling missing data can be tricky…

Some learning methods “can handle them” (no user
intervention needed) - but not always optimally

65
Output formats,
methods (overview)
Output formats
• The output of a learning system is a model
• Many different types of model exist
• The learning algorithm or method is strongly linked to the type of model
• High-level overviews of machine learning methods often categorize
them along this axis

67
Different views of the landscape
Domingos: Flach: Bishop:
“five tribes” “three types of models” “the world is Bayesian”

- Symbolists - Probabilistic - Bayes

- Connectionists - Geometric - Bayes
- Evolutionaries - Logical - Bayes
- Bayesians
- Analogizers

68
Parametrized functions
• Typically, a certain format for the functions is provided; e.g.:
linear functions of the inputs
• Within this set, we look for the parameter values that best fit the
data
• Standard example: linear regression
2.5

● ●●
● ●
●●●● ● ● ● ●

y = ax + b
● ● ●
●●●● ● ●
2.0

●●●● ● ●
●● ● ●
●●
● ● ● ●● ● ● ●
● ●
● ● ● ●
1.5

● ●●● ●●●
Petal.Width

● ● ●●● ●

●
● ● ● ●●
●● ● ●
●●
●●●●●
●
width = 0.416*length - 0.363
1.0

● ● ● ●●

●
0.5

●
● ●●● ●
●●● ●
● ●●
● ● ●●
●●
●●●● ●
● ●●

1 2 3 4 5 6 7

Petal.Length

69
Conjunctive concepts
• A conjunctive concept is expressed as a set of conditions, all of
which must be true
• “x has class C if and only if <condition1> and <condition2> and
… and <condition k>”

• E.g.: accept application for mortgage if and only if :

salary ≥ 3 * monthly payback and no other mortgage running

70
Rule sets
• A rule set is a set of rules of the form “if … then …” or “if … then
… else …”
• Example: definition of leap years

Examples of leap years: Input

1900, 1992, 2004, …
Examples of non-leap years:
1993, 2000, 2011, 2018, …
Output

If year is a multiple of 400 then leap

else if year is a multiple of 100 then not leap
else if year is multiple of 4 then leap
else not leap

71
Decision trees
• A decision tree represents a stepwise procedure to arrive at
some decision

Is today a good day to play tennis?

72
Neural networks
• A neural network is a complex structure of neurons, each of
which aggregate multiple input signals into a single output signal

h11 = f(a11x+b11y+c11z)

x
y Out

73
Probabilistic graphical models
• A PGM represents a (high-dimensional) joint distribution over
multiple variables as a product of (low-dimensional) factors
• Different type of PGMs: Bayesian networks, Markov networks,
factor graphs, …

f1(a) A B f2(a)

Example: Bayesian network C f3(a, b, c)

f4(c, d) D E f5(c, e)

f(a, b, c, d, e) = f1(a) ⋅ f2(b) ⋅ f3(a, b, c) ⋅ f4(c, d) ⋅ f5(c, e)

P(A, B, C, D, E) = P(A) ⋅ P(B) ⋅ P(C | A, B) ⋅ P(D | C) ⋅ P(E | C)
74
Instance-based learning
(a.k.a. “nearest neighbor methods”)
• The “model” is simply the data set itself
• Predictions for new cases are made by comparing it to earlier observed cases
• If it’s similar for observed features, it’s probably also similar for unobserved (to
be predicted) features

. . .
. . . .
. . ? . .
.. . .. . .
. .
75
Search methods
• How do we find the most suitable model?
• Sometimes, there is a closed form solution (e.g., linear regression)
• If not, we typically need to search some hypothesis space
• Two very different types of spaces, each with their own search
methods :
• Discrete spaces (methods: hill-climbing, best-first, …)
• Continuous spaces (methods: gradient descent, …)
• Typically:
• Model structure not fixed in advanced => discrete
• Fixed model structure, tune numerical parameters => continuous

76
Example: gradient descent in a
continuous space
B

.
(-1,10) Color encodes
loss
. .
Y . . Gradient

. . . descent
x
. .
(1,3)

.
. (2,0)

X -1 1 2 A

y=2x y=-x+10 y=x+3

(a,b) represents y = ax + b

Input/output space Parameter space

77
Example: Version Spaces
• We try to identify a conjunctive concept from data
• More specific: given a hypothesis space H, return all concepts in
H that are consistent with the data. This set is called the Version
Space.
• The algorithm called Candidate Elimination does this by
exploiting a generality ordering over H, and returning only the
most general and most specific hypotheses in H that are
consistent with the data (the “borders” of H)
• This involves an exhaustive search in a discrete space

78
Candidate Elimination: illustration
• A company produces intelligent robots. Some robots
misbehave. We suspect that one particular combination of
features is the cause for this misbehavior.
• For ease of discussion, we here assume robots have four
relevant characteristics:
• Color : B R M
• Body shape : S T
• =Legs/wheels: L W
• #“eyes” : 1 2
• Find the combination that misbehaves

79
Candidate Elimination: illustration
• We will represent a hypothesis as a tuple <color, body, legs,
eyes> where color = B, R, M or ? (? means “any color”) etc.
• Hypothesis space: {B,R,M,?} x {S,T,?} x {L,W,?} x {1,2,?}
• Let S(h) be the set of robots characterized by a hypothesis h
• Hypothesis h1 is more general than h2 if and only if S(h2)⊆S(h1)

• Most general hypothesis is … <?,?,?,?> censored

• Most specific hypothesis is … there are many! <B,T,L,1> is one

censored

• Can extend hypothesis space with ⊥: S(⊥)=∅

80
Search space is a lattice
<?,?,?,?>

B??? R??? M??? ?S?? ?T?? ??L? ??W? ???1 ???2

BS?? BT?? RS?? RT?? MS?? MT?? B?L? B?W? R?L? R?W? M?L? M?W? … ??L1 ??L2 ??W1 ??W2

BSL? BSW? BTL? BTW? ......... ?TL2 ?TW1 ?TW2

BSL1 BSL2 BSW1 BSW2 BTL1 BTL2 BTW1 BTW2 ......... MTL2 MTW1 MTW2

⊥
81
Observation 1:

Candidate Elimination
Misbehaves

!
<?,?,?,?>

B??? R??? M??? ?S?? ?T?? ??L? ??W? ???1 ???2

BS?? BT?? RS?? RT?? MS?? MT?? B?L? B?W? R?L? R?W? M?L? M?W? … ??L1 ??L2 ??W1 ??W2

BSL? BSW? BTL? BTW? ..... B?W2 ... BS?2 ... ?SW2 ... ?TL2 ?TW1 ?TW2

BSL1 BSL2 BSW1 BSW2 BTL1 BTL2 BTW1 BTW2 ......... MTL2 MTW1 MTW2

⊥
82
Observation 2:

Candidate Elimination
Does not misbehave

<?,?,?,?>

B??? R??? M??? ?S?? ?T?? ??L? ??W? ???1 ???2

BS?? BT?? RS?? RT?? MS?? MT?? B?L? B?W? R?L? R?W? M?L? M?W? … ??L1 ??L2 ??W1 ??W2

BSL? BSW? BTL? BTW? ..... B?W2 ... BS?2 ... ?SW2 ... ?TL2 ?TW1 ?TW2

BSL1 BSL2 BSW1 BSW2 BTL1 BTL2 BTW1 BTW2 ......... MTL2 MTW1 MTW2

⊥
83
Candidate Elimination

+ +

84
Candidate Elimination

The most/least general solutions

define the whole version space
-

+ +

85
Candidate Elimination
• The candidate elimination algorithm illustrates
• Search in a discrete hypothesis space (with lattice structure)
• Search for all solutions, rather than just one, in an efficient manner
• Importance of generality ordering

• Some obvious disadvantages:

• Not robust to noise: result = set of hypotheses consistent with all
data; 1 erroneous data point → set may be empty!
• Only conjunctive concepts : strong limitation

• We’ll see many other learning approaches, all with their own pros & cons

ML Notes(BCS602)
No ratings yet
ML Notes(BCS602)
186 pages
Ing Apos 156
No ratings yet
Ing Apos 156
586 pages
01_lecture1
No ratings yet
01_lecture1
36 pages
Chapter 1
No ratings yet
Chapter 1
62 pages
Lesson Script in English: National Reading Program
No ratings yet
Lesson Script in English: National Reading Program
63 pages
Untitled
No ratings yet
Untitled
70 pages
INtroduction to AdvancedMachine Learning2019
No ratings yet
INtroduction to AdvancedMachine Learning2019
69 pages
Module 1
No ratings yet
Module 1
175 pages
MLUnit_1 Share (1)
No ratings yet
MLUnit_1 Share (1)
162 pages
Tirth.pdf
No ratings yet
Tirth.pdf
19 pages
Lectures On Machine Learning
100% (1)
Lectures On Machine Learning
69 pages
Lec 01 - Intro To ML
No ratings yet
Lec 01 - Intro To ML
28 pages
Unit 1&2
No ratings yet
Unit 1&2
270 pages
Ml Microst
No ratings yet
Ml Microst
264 pages
Machine Learning Notes_ Concepts, Algorithms
No ratings yet
Machine Learning Notes_ Concepts, Algorithms
171 pages
MLUnit_1
No ratings yet
MLUnit_1
131 pages
Introduction To ML
No ratings yet
Introduction To ML
48 pages
Lec 7_8_Machine Learning Introduction
No ratings yet
Lec 7_8_Machine Learning Introduction
55 pages
Machine Learning and Soft Computing: CSCC53 Mca V Sem 2020
No ratings yet
Machine Learning and Soft Computing: CSCC53 Mca V Sem 2020
33 pages
Lessons Learned How To Avoid The Biggest Mistakes Made by College Resident Assistants
No ratings yet
Lessons Learned How To Avoid The Biggest Mistakes Made by College Resident Assistants
244 pages
Presentation 33360 Content Document 20250319044717PM
No ratings yet
Presentation 33360 Content Document 20250319044717PM
126 pages
Lecture1 PDF
No ratings yet
Lecture1 PDF
37 pages
ML_Unit I_Final
No ratings yet
ML_Unit I_Final
132 pages
ML Module 1
No ratings yet
ML Module 1
52 pages
Introduction To Machine Learning: Pekka Parviainen
No ratings yet
Introduction To Machine Learning: Pekka Parviainen
39 pages
Topic 1 - Introduction
No ratings yet
Topic 1 - Introduction
30 pages
UNIT-I
No ratings yet
UNIT-I
132 pages
Intor AI ESGB
No ratings yet
Intor AI ESGB
26 pages
1-Introduction
No ratings yet
1-Introduction
81 pages
ML Unit 1 Notes
No ratings yet
ML Unit 1 Notes
134 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
5 pages
ML Overview
No ratings yet
ML Overview
26 pages
Lecture 1
No ratings yet
Lecture 1
65 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
English Syllabus Forms 1-2-1
No ratings yet
English Syllabus Forms 1-2-1
165 pages
Upload_Unit_1
No ratings yet
Upload_Unit_1
36 pages
Previewpdf
No ratings yet
Previewpdf
72 pages
Computational Intelligence: (Introduction To Machine Learning)
No ratings yet
Computational Intelligence: (Introduction To Machine Learning)
55 pages
Mlfa Autumn 22 Lec 01
No ratings yet
Mlfa Autumn 22 Lec 01
43 pages
Lesson 1
No ratings yet
Lesson 1
20 pages
Machine Learning (CS601) Shivani (Brocode Engineering)
No ratings yet
Machine Learning (CS601) Shivani (Brocode Engineering)
96 pages
01 LecIntro
No ratings yet
01 LecIntro
23 pages
Question Paper LH - MLT
No ratings yet
Question Paper LH - MLT
93 pages
Helsenki - Intro To ML
No ratings yet
Helsenki - Intro To ML
35 pages
ML 1
No ratings yet
ML 1
79 pages
ML UNIT I_IT
No ratings yet
ML UNIT I_IT
30 pages
UNIT-1 Machine Learning
No ratings yet
UNIT-1 Machine Learning
43 pages
Introduction
No ratings yet
Introduction
18 pages
Intro To Machine Learning
100% (1)
Intro To Machine Learning
250 pages
UNIT III
No ratings yet
UNIT III
39 pages
Handling Non Readers
No ratings yet
Handling Non Readers
12 pages
Unit-1
No ratings yet
Unit-1
88 pages
TRAINING REPORT Abha Shrivas 0801EC171002
No ratings yet
TRAINING REPORT Abha Shrivas 0801EC171002
17 pages
6.1.Unit-1 ML Handsout
No ratings yet
6.1.Unit-1 ML Handsout
18 pages
ML - Week 1
No ratings yet
ML - Week 1
37 pages
Grade 10-Math DLL Week 3
50% (2)
Grade 10-Math DLL Week 3
11 pages
Soft Skills Interview Questions
No ratings yet
Soft Skills Interview Questions
3 pages
2001 Malaysia Primary School Model Partnership
No ratings yet
2001 Malaysia Primary School Model Partnership
13 pages
Annual Operation Plan SY: 2017-2018: A. Philosophy, Vision, Mission, Goals and Objectives
No ratings yet
Annual Operation Plan SY: 2017-2018: A. Philosophy, Vision, Mission, Goals and Objectives
3 pages
LM #01-Introduction To ML
No ratings yet
LM #01-Introduction To ML
33 pages
AIF-C01 (87 Questions)
No ratings yet
AIF-C01 (87 Questions)
79 pages
Unit 3
No ratings yet
Unit 3
62 pages
A PDF
No ratings yet
A PDF
26 pages
Pretorius 2012
No ratings yet
Pretorius 2012
8 pages
Senior4 SpeakingTest May 2021
No ratings yet
Senior4 SpeakingTest May 2021
8 pages
1.2.1 ML Intro
No ratings yet
1.2.1 ML Intro
18 pages
Longman Picture Dictionary - Games With Flashcards
No ratings yet
Longman Picture Dictionary - Games With Flashcards
4 pages
Tiếng Anh 6 Smart World - Unit 2. SCHOOL
No ratings yet
Tiếng Anh 6 Smart World - Unit 2. SCHOOL
25 pages
Visual Arts Program Year 8 Aussie
100% (1)
Visual Arts Program Year 8 Aussie
6 pages
ML Notes Unit 1-2
No ratings yet
ML Notes Unit 1-2
55 pages
Erick Myers - Python Machine Learning is the Complete Guide to Everything You Need to Know About Python Machine Learning_ Keras, Numpy, Scikit Learn, Tensorflow, With Useful Exercises and Examples. (2
50% (2)
Erick Myers - Python Machine Learning is the Complete Guide to Everything You Need to Know About Python Machine Learning_ Keras, Numpy, Scikit Learn, Tensorflow, With Useful Exercises and Examples. (2
175 pages
The Internet Safety of Young Learners
No ratings yet
The Internet Safety of Young Learners
4 pages
Time Prepositions Practice
No ratings yet
Time Prepositions Practice
4 pages
Undergraduate Transcript Fall 2007-2008: Course Title Grade
No ratings yet
Undergraduate Transcript Fall 2007-2008: Course Title Grade
3 pages
Introduction To ML
100% (1)
Introduction To ML
39 pages
Udl Lesson Plan 2
No ratings yet
Udl Lesson Plan 2
14 pages
Abel Zenebe
No ratings yet
Abel Zenebe
3 pages
Module 6-Social Literacy
No ratings yet
Module 6-Social Literacy
24 pages
Lecture 1.2 Introduction to Machine Learning
No ratings yet
Lecture 1.2 Introduction to Machine Learning
31 pages
LP For BOOK Basic Steps in Baking
No ratings yet
LP For BOOK Basic Steps in Baking
2 pages
2_syllabus
No ratings yet
2_syllabus
3 pages
Unit 1 Introduction of Machine Learning Notes
No ratings yet
Unit 1 Introduction of Machine Learning Notes
57 pages
John-Pickerings-Resume 1
No ratings yet
John-Pickerings-Resume 1
2 pages
Siddiq Resume 2023ME
No ratings yet
Siddiq Resume 2023ME
1 page
DLL 3rd W4
No ratings yet
DLL 3rd W4
2 pages
CLC 12 - Capstone Draft Proposal Worksheet
No ratings yet
CLC 12 - Capstone Draft Proposal Worksheet
3 pages
16-ELS-Final-Module 16-08082020
No ratings yet
16-ELS-Final-Module 16-08082020
18 pages
G8 T3 Overview FINAL
No ratings yet
G8 T3 Overview FINAL
1 page
The AI Artificial Intelligence Course From Beginner to Expert
From Everand
The AI Artificial Intelligence Course From Beginner to Expert
Asomoo Ebooks
No ratings yet
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture B1 - Overview and Intro

Uploaded by

Lecture B1 - Overview and Intro

Uploaded by

Beginselen van Machine Learning

MSc Computerwetenschappen, Artificiële Intelligentie

Principles of Machine Learning

H. Blockeel, J. Davis, L. De Raedt

• Study materials for both courses are in English

• Study material: slides + recordings + reader (pointers to a selection of texts)

1 23/9 Introduction - Blockeel K-nearest neighbors, evaluation metrics - Davis

3 7/10 Rule learning - Blockeel - 14:00 - 15:00 Experimental methodology - Davis

4 14/10 Learning theory - Blockeel

9 18/11 Q&A session

Dutch (recording in English available)

How would you define machine learning?

recognize cancer, amazon with

How does it relate to the rest of AI?

learning from examples is machine learning

“Mike is kicking the ball” Mental

Data & pictures by Zitnick et al., 2013

• Interpret the results

2nd version, “Eve” (2015)

Illustration from L. De Raedt’s

• P. Domingos’ bestseller The Master Algorithm provides an excellent

• Machine learning makes use of inductive inference: reasoning from

• This aspect of machine learning links it to data mining, data analysis,

AI Logic, Machine Deep

AI Constraint solving SAT solvers, ASP, …

The machine learning landscape

Predictive versus descriptive

Parametric Non-parametric LE COSE CHE GENERANO IMMAGINI

How? By analogy to cases seen before

• Prediction task : predict the labels of the unlabeled instances

Dog Dog Dog

Cat Cat Cat

Transduction: outcome=predictions Induction: outcome = function

Transduction: outcome=predictions Induction: outcome = function

Transduction: outcome=predictions Induction: outcome = function

Non-interpretable models are also called black-box models

In some cases, it is crucial that predictions can be explained (e.g.: bank

Note difference between explaining a model and explaining a prediction

• Unsupervised learning: learning a model from unlabeled instances

• Semi-supervised learning: learn a predictive model from a few

. . “Mike is kicking the ball”

.. .. the sentence may mention

“There’s a Lamborghini in this picture”

“This is either a Ferrari

Semi-supervised Multi-instance Superset Constraint-based

• There are other formats: instances can be

Sepal Sepal Petal Petal Class

(Image: Nielsen, 2017, Neural networks and deep learning)

Handling missing data can be tricky…

- Symbolists - Probabilistic - Bayes

• E.g.: accept application for mortgage if and only if :

Examples of leap years: Input

If year is a multiple of 400 then leap

Is today a good day to play tennis?

Example: Bayesian network C f3(a, b, c)

f(a, b, c, d, e) = f1(a) ⋅ f2(b) ⋅ f3(a, b, c) ⋅ f4(c, d) ⋅ f5(c, e)

y=2x y=-x+10 y=x+3

Input/output space Parameter space

• Most general hypothesis is … <?,?,?,?> censored

• Most specific hypothesis is … there are many! <B,T,L,1> is one

• Can extend hypothesis space with ⊥: S(⊥)=∅

B??? R??? M??? ?S?? ?T?? ??L? ??W? ???1 ???2

BSL? BSW? BTL? BTW? ......... ?TL2 ?TW1 ?TW2

B??? R??? M??? ?S?? ?T?? ??L? ??W? ???1 ???2

B??? R??? M??? ?S?? ?T?? ??L? ??W? ???1 ???2

The most/least general solutions

• Some obvious disadvantages:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.