Unit-5 AI ETC
Unit-5 AI ETC
• Playing Go
f( ) = “5-5” (next move)
• Dialogue System
f( “Hi” )= “Hello”
(what the user said) (system response)
An Activation Function decides whether a
neuron should be activated or not. This
means that it will decide whether the
neuron’s input to the network is
important or not in the process of
prediction using simpler mathematical
operations.
But—
Framework f( )= “cat”
A set of Model
function f1 , f 2
f1 ( )= “cat” f2 ( )= “money”
f1 ( )= “dog” f2 ( )= “snake”
Image Recognition:
Framework f( )= “cat”
A set of Model
function f1 , f 2 Better!
Goodness of
function f
Supervised Learning
Framework f( )= “cat”
Training Testing
A set of Model
function f1 , f 2 “cat”
Step 1
Training
Data
“monkey” “cat” “dog”
Three Steps for Learning
a1
… w1 A simple function
…
wk z (z )
ak … + a
Activation
…
wK function
aK weights b bias
Neural Network
Neuron Sigmoid Function (z )
(z ) =
1
−z
1+ e z
2
1
(z )
4
-1 -2 + 0.98
Activation
-1
function
1 weights 1 bias
Neural Network
Different connections lead to
different network structures
+ (z )
+ (z ) + (z )
+ (z )
The neurons have different values of
weights and biases.
Weights and biases are network parameters 𝜃
Activation functions
• Transforms neuron’s input into output
• Features of activation functions:
• A squashing effect is required
• Prevents accelerating growth of activation levels
through the network
• Simple and easy to calculate
Fully Connect Feedforward Network
1 4 0.98
1
-2
1
-1 -2 0.12
-1
1
0
Sigmoid Function (z )
(z ) =
1
−z
1+ e z
Fully Connect Feedforward Network
1 4 0.98 2 0.86 3 0.62
1
-2 -1 -1
1 0 -2
-1 -2 0.12 -2 0.11 -1 0.83
-1
1 -1 4
0 0 2
Fully Connect Feedforward Network
1 0.73 2 0.72 3 0.51
0
-2 -1 -1
1 0 -2
-1 0.5 -2 0.12 -1 0.85
0
1 -1 4
0 0 2
This is a function. 1 0.62 0 0.51
𝑓 = 𝑓 =
Input vector, output vector −1 0.83 0 0.85
Given parameters 𝜃, define a function
Given network structure, define a function set
Fully Connect Feedforward Network
neuron
Input Layer 1 Layer 2 Layer L Output
x1 …… y1
x2 …… y2
……
……
……
……
……
xN …… yM
Input Output
Layer Hidden Layers Layer
Deep means many hidden layers
Deep = Many hidden layers
22 layers
19 layers
8 layers
6.7%
7.3%
16.4%
152 layers
Special
structure
3.57%
7.3% 6.7%
16.4%
AlexNet VGG GoogleNet Residual Net
(2012) (2014) (2014) (2015)
Output Layer
• Softmax layer as the output layer
Ordinary Layer
z1 ( )
y1 = z1
In general, the output of
z2 ( )
y2 = z 2
network can be any value.
3 0.88 3
e
20
z1 e e z1
y1 = e z1 zj
j =1
1 0.12 3
z2 e e z 2 2.7
y2 = e z2
e
zj
j =1
0.05 ≈0
z3 -3
3
e e z3
y3 = e z3
e
zj
3 j =1
+ e zj
j =1
Example Application
Input Output
y1
0.1 is 1
x1
x2 y2
0.7 is 2
The image
is “2”
……
……
……
x256 y10
0.2 is 0
16 x 16 = 256
Ink → 1 Each dimension represents
No ink → 0 the confidence of a digit.
Example Application
• Handwriting Digit Recognition
x1 y1 is 1
x2
y2 is 2
Neural
…… Machine “2”
……
……
Network
x256 y10 is 0
What is needed is a
function ……
Input: output:
256-dim vector 10-dim vector
Example Application
Input Layer 1 Layer 2 Layer L Output
x1 …… y1 is 1
x2 ……
A function set containing the y2 is 2
candidates for “2”
……
……
……
……
……
……
Handwriting Digit Recognition
xN …… y10 is 0
Input Output
Layer Hidden Layers Layer
Softmax
x2 …… y2 is 2
……
……
……
x256 …… y10 is 0
16 x 16 = 256
Ink → 1 The learning target is ……
No ink → 0
Input: y1 has the maximum value
“1”
x1 …… y1 As close as 1
x2 possible
Softmax
Given a set ……
of y2 0
parameters
……
……
……
……
……
Loss
x256 …… y10 𝑙 0
𝐿 = 𝑙𝑟
For all training data … 𝑟=1
x1 NN y1 𝑦ො 1
𝑙1 As small as possible
x2 NN y2 𝑦ො 2
𝑙2 Find a function in
function set that
x3 NN y3 𝑦ො 3 minimizes total loss L
𝑙3
……
……
……
…… Find the network
xR NN yR 𝑦ො 𝑅
parameters 𝜽∗ that
𝑙𝑅 minimize total loss L
Three Steps for Deep Learning
Network parameters 𝜃 =
106
𝑤1 , 𝑤2 , 𝑤3 , ⋯ , 𝑏1 , 𝑏2 , 𝑏3 , ⋯
weights
……
……
Millions of parameters
w
Network parameters 𝜃 =
Gradient Descent 𝑤1 , 𝑤2 , ⋯ , 𝑏1 , 𝑏2 , ⋯
Positive Decrease w
w
Network parameters 𝜃 =
Gradient Descent 𝑤1 , 𝑤2 , ⋯ , 𝑏1 , 𝑏2 , ⋯
η is called
−𝜂𝜕𝐿Τ𝜕𝑤 “learning rate” w
Network parameters 𝜃 =
Gradient Descent 𝑤1 , 𝑤2 , ⋯ , 𝑏1 , 𝑏2 , ⋯
w
Local Minima
Total
Loss Very slow at the
plateau
Stuck at saddle point
𝜕𝐿 ∕ 𝜕𝑤 𝜕𝐿 ∕ 𝜕𝑤 𝜕𝐿 ∕ 𝜕𝑤
≈0 =0 =0
“dog”
For example, you can do …….
“Talk” in e-mail
Spam
filtering Network 1/0
(Yes/No)
“free” in e-mail
1 (Yes)
0 (No)
Backpropagation: an efficient way to
compute 𝜕𝐿Τ𝜕𝑤 in neural network
Backpropagation
Back Propagation algorithm – Illustration
K Kotecha
Forward phase
K Kotecha
Forward phase
K Kotecha
Forward phase
K Kotecha
Forward phase
K Kotecha
Forward phase
K Kotecha
Forward phase
K Kotecha
Computing error
K Kotecha
Backward phase
K Kotecha
Backward phase
K Kotecha
Backward phase
K Kotecha
Backward phase
K Kotecha
Backward phase
K Kotecha
Weight Update
K Kotecha
Weight Update
K Kotecha
Weight Update
K Kotecha
Weight Update
K Kotecha
Weight Update
K Kotecha
Weight Update
K Kotecha
Advantages & Disadvantages
Advantages
Disadvantages
K Kotecha
ANN- When ?
K Kotecha