1-Introduction
1-Introduction
Zijun Yao
Assistant Professor, EECS Department
The University of Kansas
Class and Office Hour
• E-mail: zyao@ku.edu
• Recommended Subject: EECS836 <Your Last Name> <Brief headline>
2
Course Coverage
• Fundamental concepts in machine learning
• Deep neural networks (MLP, CNN, RNN, Transformer, GenAI)
• How to train deep learning models with popular deep learning
frameworks such as PyTorch
• How machine learning is used in real-world application
• Hands-on (e.g., computer vision, natural language processing,
recommender systems)
• Analyzing business cases
3
Syllabus and Course Schedule
• Course schedule available under Syllabus section at Canvas
5
Grading Policy
• Attendance: Check 4 times by random
• Assignments: Late submissions will receive less
1. Attendance 4% credit
2. Assignments 26% • Exams: There will be no make-up exams
3. Project 20% • Team project: a team with at least 3 students, up
4. Exam I 25% to 4; proposal and report will be required.
5. Exam II 25% • The final grade is based on a curve
Total 100% • Active class participation earns up to 3% credit
• Academic integrity: Do NOT cheat in any
homework and exam. Highly identical answer will
require explanation.
6
Project
1. Group. Up to 4 students.
2. Proposal. Maximum 8 PPT pages of project proposal, dataset,
problem definition, data processing, models and (optional)
expected outcomes.
3. Report. Maximum 15-page report (single-space, 12-point font)
consists of sections of introduction, motivation, method, results,
and conclusion.
4. Project tutorial. Will be available in following lectures
7
Project
• Topic: anything you’re interested (sentiment analysis,
recommendation systems, healthcare, Covid-19, finance, security)
• Search on Internet website for ideas
• Kaggle has rich resources including ideas, data and code
https://www.kaggle.com/
• Scope: not too broad, not too narrow (eg., training a linear classifier
for standard dataset)
9
Textbooks (Optional)
• Use following books as the primary reference
11
Course Expectation
• Develop a strong vocabulary and understanding of ML
techniques
• Make informed trade-offs on what ML approaches to use
• Communicate with confidence among developers and
consultants
• Hands-on experience via a staged progression of exercises
using application data
• Exposure to various AI/ML tools
• Approach business/research problems analytically by identifying
opportunities to derive actionable insights from data
12
What Do You Think of AI
13
Robotics
https://www.youtube.com/watch?v=fn3KWM1kuAw
14
Autonomous cars
https://www.youtube.com/watch?v=tlThdr3O5Qo 15
Alpha Go
18
What is AI?
The science of making machines (or computers) that
19
AI - Turing test (1950)
21
Machine Learning: Since 1990s
• Machine learning
• Support Vector Machine (1995)
• Graphical models
• Bayesian Network
• Topic Modeling (2002)
• Chess-playing case
• Deep Blue Beats Kasparov
(1997)
History of AI
22
Deep Learning: Since 2010s
• Two conditions boost AI
• Big data
• Computer power
• Deep learning (computer vision, NLP, robotics)
• 2012, AlexNet competed in the ImageNet Large Scale Visual
Recognition Challenge
• Demonstrate the power of deep learning
• Accurate training of DNN with GPUs
• 2014, generative adversarial network (GAN)
• 2016, AlphaGo defeated 18-time world champion Lee Sedol on
Go game
• 2020, AlphaFold, breakthrough AI solution to a 50-year-old grand
challenge in biology
• 2022, ChatGPT, the fastest-growing consumer software application
in history
• There are more: Midjourney, DALL-E, Stable Diffusion …
23
AI vs. ML vs. Deep Learning
Artificial
Enable a machine to mimic human cognitive
Intelligence functions such as learning and problem-solving.
Machine
Learning Allow a machine to use algorithm to automatically
learn from past data without programming explicitly
(A major application of AI)
Deep
Use a layered structure of algorithms called an artificial
Learning neural network (ANN) (A major technique of ML)
24
Machine Learning is Everywhere
• Speech technologies (e.g. Siri)
• Automatic speech recognition (ASR)
• Text-to-speech synthesis (TTS)
• Dialog systems
• Web search
25
Robotic surgery and medical diagnosis Intelligent surveillance
Computer vision
Self-driving cars
26
Tools for Predictions & Decisions
27
Reasons for Tremendous Advances in ML?
• Big data
• ImageNet has 14 million images have been hand-annotated
• Text data available on Internet, eg. Wikipedia
• ……
• Machine (deep) learning models
• AlexNet, Residual Nets, GANs, Attention, BERT
• Computer power
• GPUs
• Deep learning frameworks, eg, PyTorch, TensorFlow
28
AlexNet in ImageNet Challenge (2012)
30
Machine Learning
• Machine learning aims to build a mathematical (statistical)
model based on sample data, known as "training data", to make
predictions or decisions
31
Machine Learning: Speech Recognition Example
Learning ......
“Hi”
Learning ......
“monkey”
“cat”
This is “cat”
“dog”
• Playing Go f( ) = “5-5”
(next move)
• Dialogue System
f ( “How are you?” ) = “I am fine.”
(what the user said) (system response) 34
Image Recognition:
Start with a Project
f( )= “cat”
A set of Model
function f1 , f 2 Different parameters
f1 ( )= “cat” f2 ( )= “monkey”
f1 ( )= “dog” f2 ( )= “snake”
35
Image Recognition:
Framework
f( )= “cat”
A set of Model
function f1 , f 2 Better!
Goodness of
function f
Supervised Learning
Training Testing
A set of Model
function f1 , f 2 “cat”
Step 1
Training
Data
“monkey” “cat” “dog” 37
Step 0 - Problem Formulation
Application oriented
Step 0: What kind of function do you want to find? • Different data
• Different Tasks
Step 1: Step 2: Step 3: pick
define a set goodness of the best
of function function function
Just like the three steps to put an elephant into the fridge……
38
(Big) Data is Everywhere…
processed about over 20
petabytes of data per day Twitter now sends and
receives as many as 500
million “tweets” every day.
As of January 2013, Facebook users had S3: 449B objects, peak 290k
uploaded over 240 billion photos, with 350 request/second (7/2011)
million new photos every day. 1T objects (6/2012)
39
What is Data?
Attributes
• Collection of data objects and their
attributes
Tid Refund Marital Taxable
Income Cheat
• An attribute is a property or Status
Objects
• Attribute is also known as variable, field, 4 Yes Married 120K No
• Types:
• Nominal
• Binary
• Ordinal
• Numeric
41
Attribute Values
42
Attribute Types
• Nominal: used for naming or labelling variables, without any quantitative value
• Categories, states, or “names of things”
• Hair_color = {auburn, black, blond, brown, grey, red, white}
• marital status, occupation, ID numbers, zip codes
• Binary
• Nominal attribute with only 2 states (0 and 1)
• e.g., medical test (positive vs. negative)
• Ordinal
• Values have a meaningful order (ranking) but magnitude between successive values is not
known.
• Size = {small, medium, large}, grades = {A, B, C, D, F}
• Numeric
• Continuous: real numbers such as speed
• Discrete: can only take certain values
43
Types of Data Sets
timeout
season
coach
game
score
team
ball
lost
pla
wi
n
y
• Record
Document 1 3 0 5 0 2 6 0 2 0 2
• Relational records
• Data matrix, e.g., numerical matrix, crosstabs Document 2 0 7 0 2 1 0 0 3 0 0
• Transaction data
• Ordered
• Video data: sequence of images
• Temporal data: time-series
• Sequential Data: transaction sequences
• Genetic sequence data
44
Record Data
• Data that consists of a collection of records, each of which
consists of a fixed set of attributes
Tid Refund Marital Taxable
Status Income Cheat
45
Graph Data
• Examples: Social networks, Generic graph, a Molecule, and Webpages
2
5 1 Social networks
2
5
time
series
47
Spatial Data
map images
48
Data Matrix
• If data objects have the same fixed set of numeric attributes, then the
data objects can be thought of as points in a multi-dimensional space,
where each dimension represents a distinct attribute
49
Image Data
50
Image Data: Color Images
51
Document Data
52
Step 0 - Problem Formulation
Application oriented
Step 0: What kind of function do you want to find? • Different data
• Different Tasks
Step 1: Step 2: Step 3: find
define a goodness of the best
function function function
Just like the three steps to put an elephant into the fridge……
53
Task: Regression
PM2.5 today
Predict
PM2.5
PM2.5 yesterday f PM2.5 tomorrow
……. (scalar)
Training Data:
Output:
Input:
9/03 PM2.5 = 100
9/01 PM2.5 = 63 9/02 PM2.5 = 65
Input: Output:
9/12 PM2.5 = 30 9/13 PM2.5 = 25 9/14 PM2.5 = 20
54
Task: Classification
• Given a collection of records (training set)
• Each record is characterized by a tuple (x, y), where x is the attribute
set and y is the class label
• x: attribute, predictor, independent variable, input
• y: class, response, dependent variable, output
• Task:
– Learn a model that maps each attribute set x into one of the predefined
class labels y
55
Task: Classification
Function f Function f
Input Input
56
Binary Classification
Spam
filtering Function Yes/No
Yes
Training
Data No
(http://spam-filter-review.toptenreviews.com/)
57
Multi-Class Classification
Go Play
Each position
is a class
(19 x 19 classes)
Function
a position on
the board
Next move
Playing GO
Step 1 – Function with Unknow Parameters
Application oriented
Step 0: What kind of function do you want to find? • Different data
• Different Tasks
Step 1: Step 2: Step 3: find
define a goodness of the best
function function function
Just like the three steps to put an elephant into the fridge……
60
The function we want to find …
𝑦=𝑓
Return on
Monday?
Function with Unknown Parameters
63
Neural Network
Neuron
z = a1w1 + ... + ak wk + ... + aK wK + b
a1 w1 A simple function
…
wk z (z )
ak + a𝑦
…
Activation
…
wK function
aK weights b bias
(z ) =
1
−z
1+ e z
2
1
(z )
4
-1 -2 + 0.98
Activation
-1
function
1 weights 1 bias
65
Input Model Output
x1 x1 y1
0.1 is 1
x2 x2
y2
0.7 is 2
𝑓
……
……
Output: “2”
……
……
x256 x256
16 x 16 = 256
y10
0.2 is 0
Ink → 1, No ink → 0
x1 …… y1 is 1
x2 …… y2 is 2
“2”
……
……
……
……
……
……
x256 …… y10 is 0
Input Layer 1 Layer 2 Layer L Output 66
Step 2 - Measure Error
Application oriented
Step 0: What kind of function do you want to find? • Different data
• Different Tasks
Step 1: Step 2: Step 3: find
define a goodness of the best
function function function
Just like the three steps to put an elephant into the fridge……
67
Supervised Learning
Speech
“How are you”
Recognition y
x
Supervised
x1: x2: x3:
y1: Hello y2: Good y3: I am fine
Image
x “cat”
Recognition y
Supervised
x1: x2: x3:
−5+1𝑥1 = 𝑦ො 243.42
𝑒1= 𝑦 − 𝑦ො = 4.97
label 𝑦
238.45
Define Loss ➢ Loss is a function of
from Training Data parameters 𝐿 𝑏, 𝑤
➢ Loss: how good a set of
values is.
𝐿 −5,1 𝑦 = 𝑏 + 𝑤𝑥1 𝑦 = −5 + 1𝑥1 How good it is?
Data in 2024
01/02/2024 01/03 01/04 …… 12/30/2024 12/31
248.42 238.45
𝑏 + 𝑤𝑥1 = 𝑦 1
Loss: 𝐿 = 𝑒𝑛
𝑒1 𝑁
𝑛
𝑦ො
238.45
𝑏 Error Surface
Large 𝐿 𝑤
Step 3 - Optimization
Application oriented
Step 0: What kind of function do you want to find? • Different data
• Different Tasks
Step 1: Step 2: Step 3: find
define a set goodness of the best
of function function function
Just like the three steps to put an elephant into the fridge……
73
Find the Best Function through Optimization
Network parameters 𝜃 =
𝑤1 , 𝑤2 , 𝑤3 , ⋯ , 𝑏1 , 𝑏2 , 𝑏3 , ⋯ 106
weights
……
……
Millions of parameters
74
Source of image: http://chico386.pixnet.net/album/photo/171572850
Optimization
𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿
𝑤,𝑏
Gradient Descent
➢ (Randomly) Pick an initial value 𝑤 0
𝜕𝐿
➢ Compute |𝑤=𝑤 0
𝜕𝑤
Loss
𝐿 Negative Increase w
Positive Decrease w
𝑤0 𝑤
Source of image: http://chico386.pixnet.net/album/photo/171572850
Optimization
𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿
𝑤,𝑏
Gradient Descent
➢ (Randomly) Pick an initial value 𝑤 0
𝜕𝐿
➢ Compute |𝑤=𝑤 0
𝜕𝑤
Loss 𝜕𝐿
1 0
𝐿 𝑤 ←𝑤 −𝜂 |𝑤=𝑤 0
𝜕𝑤
𝜕𝐿
𝜂 |𝑤=𝑤 0 𝜂: learning rate
𝜕𝑤
hyperparameters
𝑤0 𝑤1 𝑤
Source of image: http://chico386.pixnet.net/album/photo/171572850
Optimization
𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿
𝑤,𝑏
Gradient Descent
➢ (Randomly) Pick an initial value 𝑤 0
𝜕𝐿
➢ Compute |𝑤=𝑤 0
𝜕𝑤
Loss 𝜕𝐿
1 0
𝐿 𝑤 ←𝑤 −𝜂 |𝑤=𝑤 0
𝜕𝑤
➢ Update 𝑤 iteratively
Does local minima truly cause the problem?
Local global
minima minima
𝑤0 𝑤1 𝑤2 𝑤𝑇 𝑤
Optimization
𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿
𝑤,𝑏
𝑤 ∗ = 0.97, 𝑏 ∗ = 0.1𝑘
𝑏 𝐿 𝑤 ∗ , 𝑏 ∗ = 0.48𝑘
(−𝜂 𝜕𝐿Τ𝜕𝑤, −𝜂 𝜕𝐿Τ𝜕𝑏)
𝑤
Optimization and Gradient Descent
80
Summary: Steps in Machine Learning