0% found this document useful (0 votes)
82 views

Geomorphometry Concepts, Software, Applications

This document is the introduction for a machine learning foundations course taught in Mandarin by Hsuan-Tien Lin from National Taiwan University. The course will take a story-like approach over 8 weeks to cover the key concepts of when, why, and how machines can learn from a balanced mixture of theory, techniques, and practice. The goal is to provide students with a solid understanding of the foundations needed to apply and extend machine learning techniques.

Uploaded by

The ENTP
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
0% found this document useful (0 votes)
82 views

Geomorphometry Concepts, Software, Applications

This document is the introduction for a machine learning foundations course taught in Mandarin by Hsuan-Tien Lin from National Taiwan University. The course will take a story-like approach over 8 weeks to cover the key concepts of when, why, and how machines can learn from a balanced mixture of theory, techniques, and practice. The goal is to provide students with a solid understanding of the foundations needed to apply and extend machine learning techniques.

Uploaded by

The ENTP
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
You are on page 1/ 32

Machine Learning Foundations

(機器學習基石)

Lecture 1: The Learning Problem


Hsuan-Tien Lin (林軒田)
htlin@csie.ntu.edu.tw

Department of Computer Science


& Information Engineering
National Taiwan University
(國立台灣大學資訊工程系)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 0/27


The Learning Problem Course Introduction

Course Design (1/2)


Machine Learning: a mixture of theoretical and practical tools
• theory oriented
• derive everything deeply for solid understanding
• less interesting to general audience
• techniques oriented
• flash over the sexiest techniques broadly for shiny coverage
• too many techniques, hard to choose, hard to use properly

our approach: foundation oriented

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 1/27


The Learning Problem Course Introduction

Course Design (2/2)


Foundation Oriented ML Course
• mixture of philosophical illustrations, key theory, core techniques,
usage in practice, and hopefully jokes :-)
—what every machine learning user should know
• story-like:
• When Can Machines Learn? (illustrative + technical)
• Why Can Machines Learn? (theoretical + illustrative)
• How Can Machines Learn? (technical + practical)
• How Can Machines Learn Better? (practical + theoretical)

allows students to learn ‘future/untaught’


techniques or study deeper theory easily

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 2/27


The Learning Problem Course Introduction

Course History
NTU Version Coursera Version
• 15-17 weeks (2+ hours) • 8 weeks of ‘foundation’ (this
• highly-praised with English course) + 7 weeks of
and blackboard teaching ‘techniques’ (coming course)
• Mandarin teaching to reach
more audience in need
• slides teaching improved
with Coursera’s quiz and
homework mechanisms

goal: try making Coursera version


even better than NTU version

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 3/27


The Learning Problem Course Introduction

Fun Time
Which of the following description of this course is true?
1 the course will be taught in Taiwanese
2 the course will tell me the techniques that create the android
Lieutenant Commander Data in Star Trek
3 the course will be 15 weeks long
4 the course will be story-like

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/27


The Learning Problem Course Introduction

Fun Time
Which of the following description of this course is true?
1 the course will be taught in Taiwanese
2 the course will tell me the techniques that create the android
Lieutenant Commander Data in Star Trek
3 the course will be 15 weeks long
4 the course will be story-like

Reference Answer: 4
1 no, my Taiwanese is unfortunately not
good enough for teaching (yet)
2 no, although what we teach may serve as
foundations of those (future) techniques
3 no, unless you choose to join the next
course
4 yes, let’s begin the story
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 4/27
The Learning Problem Course Introduction

Roadmap
1 When Can Machines Learn?

Lecture 1: The Learning Problem


Course Introduction
What is Machine Learning
Applications of Machine Learning
Components of Machine Learning
Machine Learning and Other Fields

2 Why Can Machines Learn?


3 How Can Machines Learn?
4 How Can Machines Learn Better?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 5/27


The Learning Problem What is Machine Learning

From Learning to Machine Learning


learning: acquiring skill
with experience accumulated from observations

observations learning skill

machine learning: acquiring skill


with experience accumulated/computed from data

data ML skill

What is skill?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 6/27


The Learning Problem What is Machine Learning

A More Concrete Definition


skill
⇔ improve some performance measure (e.g. prediction accuracy)

machine learning: improving some performance measure


with experience computed from data

improved
data ML performance
measure

An Application in Computational Finance


stock data ML more investment gain

Why use machine learning?


Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 7/27
The Learning Problem What is Machine Learning

Yet Another Application: Tree Recognition

• ‘define’ trees and hand-program: difficult


• learn from data (observations) and
recognize: a 3-year-old can do so
• ‘ML-based tree recognition system’ can be
easier to build than hand-programmed
system

ML: an alternative route to


build complicated systems
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 8/27
The Learning Problem What is Machine Learning

The Machine Learning Route


ML: an alternative route to build complicated systems

Some Use Scenarios


• when human cannot program the system manually
—navigating on Mars
• when human cannot ‘define the solution’ easily
—speech/visual recognition
• when needing rapid decisions that humans cannot do
—high-frequency trading
• when needing to be user-oriented in a massive scale
—consumer-targeted marketing

Give a computer a fish, you feed it for a day;


teach it how to fish, you feed it for a lifetime. :-)

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 9/27


The Learning Problem What is Machine Learning

Key Essence of Machine Learning


machine learning: improving some performance measure
with experience computed from data

improved
data ML performance
measure

1 exists some ‘underlying pattern’ to be learned


—so ‘performance measure’ can be improved
2 but no programmable (easy) definition
—so ‘ML’ is needed
3 somehow there is data about the pattern
—so ML has some ‘inputs’ to learn from

key essence: help decide whether to use ML


Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 10/27
The Learning Problem What is Machine Learning

Fun Time
Which of the following is best suited for machine learning?
1 predicting whether the next cry of the baby girl happens at an
even-numbered minute or not
2 determining whether a given graph contains a cycle
3 deciding whether to approve credit card to some customer
4 guessing whether the earth will be destroyed by the misuse of
nuclear power in the next ten years

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 11/27


The Learning Problem What is Machine Learning

Fun Time
Which of the following is best suited for machine learning?
1 predicting whether the next cry of the baby girl happens at an
even-numbered minute or not
2 determining whether a given graph contains a cycle
3 deciding whether to approve credit card to some customer
4 guessing whether the earth will be destroyed by the misuse of
nuclear power in the next ten years

Reference Answer: 3
1 no pattern
2 programmable definition
3 pattern: customer behavior;
definition: not easily programmable;
data: history of bank operation
4 arguably no (or not enough) data yet
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 11/27
The Learning Problem Applications of Machine Learning

Daily Needs: Food, Clothing, Housing, Transportation


data ML skill

1 Food (Sadilek et al., 2013)


• data: Twitter data (words + location)
• skill: tell food poisoning likeliness of restaurant properly
2 Clothing (Abu-Mostafa, 2012)
• data: sales figures + client surveys
• skill: give good fashion recommendations to clients
3 Housing (Tsanas and Xifara, 2012)
• data: characteristics of buildings and their energy load
• skill: predict energy load of other buildings closely
4 Transportation (Stallkamp et al., 2012)
• data: some traffic sign images and meanings
• skill: recognize traffic signs accurately

ML is everywhere!
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 12/27
The Learning Problem Applications of Machine Learning

Education
data ML skill

• data: students’ records on quizzes on a Math tutoring system


• skill: predict whether a student can give a correct answer to
another quiz question

A Possible ML Solution
answer correctly ≈ Jrecent strength of student > difficulty of questionK
• give ML 9 million records from 3000 students
• ML determines (reverse-engineers) strength and difficulty
automatically

key part of the world-champion system from


National Taiwan Univ. in KDDCup 2010

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 13/27


The Learning Problem Applications of Machine Learning

Entertainment: Recommender System (1/2)


data ML skill

• data: how many users have rated some movies


• skill: predict how a user would rate an unrated movie

A Hot Problem
• competition held by Netflix in 2006
• 100,480,507 ratings that 480,189 users gave to 17,770 movies
• 10% improvement = 1 million dollar prize
• similar competition (movies → songs) held by Yahoo! in KDDCup
2011
• 252,800,275 ratings that 1,000,990 users gave to 624,961 songs

How can machines learn our preferences?

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 14/27


The Learning Problem Applications of Machine Learning

Entertainment: Recommender System (2/2)

s?
? us
te r
u ise
? A Possible ML Solution
dy n? ckb Cr
me ctio blo om
c o a
es es f er
s
e sT • pattern:
lik lik pre lik
rating ← viewer/movie factors
viewer
• learning:
Match movie and add contributions
from each factor
predicted
rating
known rating
viewer factors
→ learned factors
movie
To
→ unknown rating prediction
co
ac d y c
blo n co tent

m
me
tio on
ck nte

Cr
bu nt

u
s te

ise
r?

in
it?

key part of the world-champion (again!)


system from National Taiwan Univ.
in KDDCup 2011

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 15/27


The Learning Problem Applications of Machine Learning

Fun Time
Which of the following field cannot use machine learning?
1 Finance
2 Medicine
3 Law
4 none of the above

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 16/27


The Learning Problem Applications of Machine Learning

Fun Time
Which of the following field cannot use machine learning?
1 Finance
2 Medicine
3 Law
4 none of the above

Reference Answer: 4
1 predict stock price from data
2 predict medicine effect from data
3 summarize legal documents from data
4 :-) Welcome to study this hot topic!

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 16/27


The Learning Problem Components of Machine Learning

Components of Learning:
Metaphor Using Credit Approval
Applicant Information
age 23 years
gender female
annual salary NTD 1,000,000
year in residence 1 year
year in job 0.5 year
current debt 200,000

unknown pattern to be learned:


‘approve credit card good for bank?’

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 17/27


The Learning Problem Components of Machine Learning

Formalize the Learning Problem


Basic Notations
• input: x ∈ X (customer application)
• output: y ∈ Y (good/bad after approving credit card)
• unknown pattern to be learned ⇔ target function:
f : X → Y (ideal credit approval formula)
• data ⇔ training examples: D = {(x1 , y1 ), (x2 , y2 ), · · · , (xN , yN )}
(historical records in bank)
• hypothesis ⇔ skill with hopefully good performance:
g : X → Y (‘learned’ formula to be used)

{(xn , yn )} from f ML g

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 18/27


The Learning Problem Components of Machine Learning

Learning Flow for Credit Approval


unknown target function
f: X →Y
(ideal credit approval formula)

training examples learning final hypothesis


D : (x1 , y1 ), · · · , (xN , yN ) algorithm g≈f
A
(historical records in bank) (‘learned’ formula to be used)

• target f unknown
(i.e. no programmable definition)
• hypothesis g hopefully ≈ f
but possibly different from f
(perfection ‘impossible’ when f unknown)

What does g look like?


Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 19/27
The Learning Problem Components of Machine Learning

The Learning Model

training examples learning final hypothesis


D : (x1 , y1 ), · · · , (xN , yN ) algorithm g≈f
A
(historical records in bank) (‘learned’ formula to be used)

hypothesis set
H

(set of candidate formula)

• assume g ∈ H = {hk }, i.e. approving if


• h1 : annual salary > NTD 800,000
• h2 : debt > NTD 100,000 (really?)
• h3 : year in job ≤ 2 (really?)
• hypothesis set H:
• can contain good or bad hypotheses
• up to A to pick the ‘best’ one as g

learning model = A and H


Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 20/27
The Learning Problem Components of Machine Learning

Practical Definition of Machine Learning


unknown target function
f: X →Y
(ideal credit approval formula)

training examples learning final hypothesis


D : (x1 , y1 ), · · · , (xN , yN ) algorithm g≈f
A
(historical records in bank) (‘learned’ formula to be used)

hypothesis set
H

(set of candidate formula)

machine learning:
use data to compute hypothesis g
that approximates target f

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 21/27


The Learning Problem Components of Machine Learning

Fun Time
How to use the four sets below to form a learning problem for
song recommendation?
S1 = [0, 100]
S2 = all possible (userid, songid) pairs
S3 = all formula that ‘multiplies’ user factors & song factors,
indexed by all possible combinations of such factors
S4 = 1,000,000 pairs of ((userid, songid), rating)
1 S1 = X , S2 = Y, S3 = H, S4 = D
2 S1 = Y, S2 = X , S3 = H, S4 = D
3 S1 = D, S2 = H, S3 = Y, S4 = X
4 S1 = X , S2 = D, S3 = Y, S4 = H

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/27


The Learning Problem Components of Machine Learning

Fun Time
How to use the four sets below to form a learning problem for
song recommendation?
S1 = [0, 100]
S2 = all possible (userid, songid) pairs
S3 = all formula that ‘multiplies’ user factors & song factors,
indexed by all possible combinations of such factors
S4 = 1,000,000 pairs of ((userid, songid), rating)
1 S1 = X , S2 = Y, S3 = H, S4 = D
2 S1 = Y, S2 = X , S3 = H, S4 = D
3 S1 = D, S2 = H, S3 = Y, S4 = X
4 S1 = X , S2 = D, S3 = Y, S4 = H

Reference Answer: 2
A on S
3
S4 −−−−−→ (g : S2 → S1 )
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 22/27
The Learning Problem Machine Learning and Other Fields

Machine Learning and Data Mining

Machine Learning Data Mining


use data to compute hypothesis g use (huge) data to find property
that approximates target f that is interesting

• if ‘interesting property’ same as ‘hypothesis that approximate


target’
—ML = DM (usually what KDDCup does)
• if ‘interesting property’ related to ‘hypothesis that approximate
target’
—DM can help ML, and vice versa (often, but not always)
• traditional DM also focuses on efficient computation in large
database

difficult to distinguish ML and DM in reality


Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 23/27
The Learning Problem Machine Learning and Other Fields

Machine Learning and Artificial Intelligence

Machine Learning Artificial Intelligence


use data to compute hypothesis g compute something
that approximates target f that shows intelligent behavior

• g ≈ f is something that shows intelligent behavior


—ML can realize AI, among other routes
• e.g. chess playing
• traditional AI: game tree
• ML for AI: ‘learning from board data’

ML is one possible route to realize AI

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 24/27


The Learning Problem Machine Learning and Other Fields

Machine Learning and Statistics

Machine Learning Statistics


use data to compute hypothesis g use data to make inference
that approximates target f about an unknown process

• g is an inference outcome; f is something unknown


—statistics can be used to achieve ML
• traditional statistics also focus on provable results with math
assumptions, and care less about computation

statistics: many useful tools for ML

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 25/27


The Learning Problem Machine Learning and Other Fields

Fun Time
Which of the following claim is not totally true?
1 machine learning is a route to realize artificial intelligence
2 machine learning, data mining and statistics all need data
3 data mining is just another name for machine learning
4 statistics can be used for data mining

Reference Answer: 3
While data mining and machine learning do
share a huge overlap, they are arguably not
equivalent because of the difference of focus.

Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 26/27


The Learning Problem Machine Learning and Other Fields

Summary
1 When Can Machines Learn?

Lecture 1: The Learning Problem


Course Introduction
foundation oriented and story-like
What is Machine Learning
use data to approximate target
Applications of Machine Learning
almost everywhere
Components of Machine Learning
A takes D and H to get g
Machine Learning and Other Fields
related to DM, AI and Stats
• next: a simple and yet useful learning model (H and A)
2 Why Can Machines Learn?
3 How Can Machines Learn?
4 How Can Machines Learn Better?
Hsuan-Tien Lin (NTU CSIE) Machine Learning Foundations 27/27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy