0% found this document useful (0 votes)
10 views64 pages

Unit-5 AI ETC

The document provides an overview of Artificial Neural Networks (ANNs) and their applications in various fields such as speech and image recognition, as well as game playing. It explains the role of activation functions, the structure of neural networks, and the process of training through steps like defining functions, assessing their goodness, and selecting the best function. Additionally, it discusses concepts like gradient descent, local minima, and backpropagation for efficient computation in neural networks.

Uploaded by

choudharyvaish91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views64 pages

Unit-5 AI ETC

The document provides an overview of Artificial Neural Networks (ANNs) and their applications in various fields such as speech and image recognition, as well as game playing. It explains the role of activation functions, the structure of neural networks, and the process of training through steps like defining functions, assessing their goodness, and selecting the best function. Additionally, it discusses concepts like gradient descent, local minima, and backpropagation for efficient computation in neural networks.

Uploaded by

choudharyvaish91
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Artificial Neural Networks

Machine Learning ≈ Looking for a Function


• Speech Recognition
f( ) = “How are you”
• Image Recognition
f( ) = “Cat”

• Playing Go
f( ) = “5-5” (next move)
• Dialogue System
f( “Hi” )= “Hello”
(what the user said) (system response)
An Activation Function decides whether a
neuron should be activated or not. This
means that it will decide whether the
neuron’s input to the network is
important or not in the process of
prediction using simpler mathematical
operations.

The role of the Activation Function is to


derive output from a set of input values
fed to a node (or a layer).

But—

Let’s take a step back and clarify: What


exactly is a node?

Well, if we compare the neural network to


our brain, a node is a replica of a neuron
that receives a set of input signals—
external stimuli.
Image Recognition:

Framework f( )= “cat”

A set of Model
function f1 , f 2 

f1 ( )= “cat” f2 ( )= “money”

f1 ( )= “dog” f2 ( )= “snake”
Image Recognition:

Framework f( )= “cat”

A set of Model
function f1 , f 2  Better!

Goodness of
function f
Supervised Learning

Training function input:


Data
function output: “monkey” “cat” “dog”
Image Recognition:

Framework f( )= “cat”

Training Testing
A set of Model
function f1 , f 2  “cat”
Step 1

Goodness of Pick the “Best” Function


Using f
function f f*
Step 2 Step 3

Training
Data
“monkey” “cat” “dog”
Three Steps for Learning

Step 1: define a set of function


Neural Network

Step 2: goodness of function

Step 3: pick the best function


Neural Network
Neuron
z = a1w1 +  + ak wk +  + aK wK + b

a1
… w1 A simple function


wk z  (z )
ak … + a
Activation

wK function
aK weights b bias
Neural Network
Neuron Sigmoid Function  (z )

 (z ) =
1
−z
1+ e z
2
1

 (z )
4
-1 -2 + 0.98

Activation
-1
function
1 weights 1 bias
Neural Network
Different connections lead to
different network structures

+  (z )

+  (z ) +  (z )

+  (z )
The neurons have different values of
weights and biases.
Weights and biases are network parameters 𝜃
Activation functions
• Transforms neuron’s input into output
• Features of activation functions:
• A squashing effect is required
• Prevents accelerating growth of activation levels
through the network
• Simple and easy to calculate
Fully Connect Feedforward Network
1 4 0.98
1
-2
1
-1 -2 0.12
-1
1
0
Sigmoid Function  (z )

 (z ) =
1
−z
1+ e z
Fully Connect Feedforward Network
1 4 0.98 2 0.86 3 0.62
1
-2 -1 -1
1 0 -2
-1 -2 0.12 -2 0.11 -1 0.83
-1
1 -1 4
0 0 2
Fully Connect Feedforward Network
1 0.73 2 0.72 3 0.51
0
-2 -1 -1
1 0 -2
-1 0.5 -2 0.12 -1 0.85
0
1 -1 4
0 0 2
This is a function. 1 0.62 0 0.51
𝑓 = 𝑓 =
Input vector, output vector −1 0.83 0 0.85
Given parameters 𝜃, define a function
Given network structure, define a function set
Fully Connect Feedforward Network
neuron
Input Layer 1 Layer 2 Layer L Output
x1 …… y1
x2 …… y2

……
……

……

……

……
xN …… yM
Input Output
Layer Hidden Layers Layer
Deep means many hidden layers
Deep = Many hidden layers
22 layers

19 layers

8 layers
6.7%
7.3%
16.4%

AlexNet (2012) VGG (2014) GoogleNet (2014)


Deep = Many hidden layers

152 layers

Special
structure

3.57%

7.3% 6.7%
16.4%
AlexNet VGG GoogleNet Residual Net
(2012) (2014) (2014) (2015)
Output Layer
• Softmax layer as the output layer

Ordinary Layer

z1  ( )
y1 =  z1
In general, the output of
z2  ( )
y2 =  z 2
network can be any value.

May not be easy to interpret


z3  ( )
y3 =  z 3
Output Layer
Probability:
• Softmax layer as the output layer ◼ 1 > 𝑦𝑖 > 0
◼ σ𝑖 𝑦𝑖 = 1
Softmax Layer

3 0.88 3

e
20
z1 e e z1
 y1 = e z1 zj

j =1

1 0.12 3
z2 e e z 2 2.7
 y2 = e z2
e
zj

j =1
0.05 ≈0
z3 -3
3
e e z3
 y3 = e z3
e
zj

3 j =1

+ e zj

j =1
Example Application

Input Output

y1
0.1 is 1
x1
x2 y2
0.7 is 2
The image
is “2”

……
……
……
x256 y10
0.2 is 0
16 x 16 = 256
Ink → 1 Each dimension represents
No ink → 0 the confidence of a digit.
Example Application
• Handwriting Digit Recognition

x1 y1 is 1
x2
y2 is 2
Neural
…… Machine “2”

……
……
Network
x256 y10 is 0
What is needed is a
function ……
Input: output:
256-dim vector 10-dim vector
Example Application
Input Layer 1 Layer 2 Layer L Output
x1 …… y1 is 1
x2 ……
A function set containing the y2 is 2
candidates for “2”

……
……

……

……

……
……
Handwriting Digit Recognition
xN …… y10 is 0
Input Output
Layer Hidden Layers Layer

You need to decide the network structure to


let a good function in your function set.
FAQ

• Q: How many layers? How many neurons for each


layer?
Trial and Error + Intuition
• Q: Can we design the network structure?
Convolutional Neural Network (CNN)

• Q: Can the structure be automatically determined?


• Yes, but not widely studied yet.
Three Steps for Deep Learning

Step 1: define a set of function

Step 2: goodness of function

Step 3: pick the best function


Training Data
• Preparing training data: images and their labels

“5” “0” “4” “1”

“9” “2” “1” “3”

The learning target is defined on


the training data.
x1 …… y1 is 1

Softmax
x2 …… y2 is 2

……

……

……
x256 …… y10 is 0
16 x 16 = 256
Ink → 1 The learning target is ……
No ink → 0
Input: y1 has the maximum value

Input: y2 has the maximum value


A good function should make the loss
Loss of all examples as small as possible.

“1”

x1 …… y1 As close as 1
x2 possible

Softmax
Given a set ……
of y2 0
parameters
……

……
……

……

……
Loss
x256 …… y10 𝑙 0

Loss can be square error or cross entropy target


between the network output and target
Total Loss:
Total Loss 𝑅

𝐿 = ෍ 𝑙𝑟
For all training data … 𝑟=1

x1 NN y1 𝑦ො 1
𝑙1 As small as possible
x2 NN y2 𝑦ො 2
𝑙2 Find a function in
function set that
x3 NN y3 𝑦ො 3 minimizes total loss L
𝑙3
……
……

……
…… Find the network
xR NN yR 𝑦ො 𝑅
parameters 𝜽∗ that
𝑙𝑅 minimize total loss L
Three Steps for Deep Learning

Step 1: define a set of function

Step 2: goodness of function

Step 3: pick the best function


How to pick the best function

Find network parameters 𝜽∗ that minimize total loss L


Layer l Layer l+1
Enumerate all possible values

Network parameters 𝜃 =
106
𝑤1 , 𝑤2 , 𝑤3 , ⋯ , 𝑏1 , 𝑏2 , 𝑏3 , ⋯
weights

……
……
Millions of parameters

E.g. speech recognition: 8 layers and


1000 1000
1000 neurons each layer
neurons neurons
Network parameters 𝜃 =
Gradient Descent 𝑤1 , 𝑤2 , ⋯ , 𝑏1 , 𝑏2 , ⋯

Find network parameters 𝜽∗ that minimize total loss L


➢ Pick an initial value for w
Total
Random, pre-train
Loss 𝐿
Usually good enough

w
Network parameters 𝜃 =
Gradient Descent 𝑤1 , 𝑤2 , ⋯ , 𝑏1 , 𝑏2 , ⋯

Find network parameters 𝜽∗ that minimize total loss L


➢ Pick an initial value for w
Total ➢ Compute 𝜕𝐿Τ𝜕𝑤
Loss 𝐿 Negative Increase w

Positive Decrease w

w
Network parameters 𝜃 =
Gradient Descent 𝑤1 , 𝑤2 , ⋯ , 𝑏1 , 𝑏2 , ⋯

Find network parameters 𝜽∗ that minimize total loss L


➢ Pick an initial value for w
Total ➢ Compute 𝜕𝐿Τ𝜕𝑤
Loss 𝐿 𝑤 ← 𝑤 − 𝜂𝜕𝐿Τ𝜕𝑤
Repeat

η is called
−𝜂𝜕𝐿Τ𝜕𝑤 “learning rate” w
Network parameters 𝜃 =
Gradient Descent 𝑤1 , 𝑤2 , ⋯ , 𝑏1 , 𝑏2 , ⋯

Find network parameters 𝜽∗ that minimize total loss L


➢ Pick an initial value for w
Total ➢ Compute 𝜕𝐿Τ𝜕𝑤
Loss 𝐿 𝑤 ← 𝑤 − 𝜂𝜕𝐿Τ𝜕𝑤
Repeat Until 𝜕𝐿Τ𝜕𝑤 is approximately small
(when update is little)

w
Local Minima
Total
Loss Very slow at the
plateau
Stuck at saddle point

Stuck at local minima

𝜕𝐿 ∕ 𝜕𝑤 𝜕𝐿 ∕ 𝜕𝑤 𝜕𝐿 ∕ 𝜕𝑤
≈0 =0 =0

The value of a network parameter w


Local Minima
• Gradient descent never guarantee global minima

Different initial point

Reach different minima,


so different results
𝑤1 𝑤2
Gradient Descent
This is the “learning” of machines in deep
learning ……
Even alpha go using this approach.
People image …… Actually …..

I hope you are not too disappointed


For example, you can do …….
• Image Recognition
“monkey”
“monkey”
“cat”
Network
“cat”
“dog”

“dog”
For example, you can do …….
“Talk” in e-mail
Spam
filtering Network 1/0
(Yes/No)
“free” in e-mail
1 (Yes)

0 (No)
Backpropagation: an efficient way to
compute 𝜕𝐿Τ𝜕𝑤 in neural network

Backpropagation
Back Propagation algorithm – Illustration

K Kotecha
Forward phase

K Kotecha
Forward phase

K Kotecha
Forward phase

K Kotecha
Forward phase

K Kotecha
Forward phase

K Kotecha
Forward phase

K Kotecha
Computing error

K Kotecha
Backward phase

K Kotecha
Backward phase

K Kotecha
Backward phase

K Kotecha
Backward phase

K Kotecha
Backward phase

K Kotecha
Weight Update

K Kotecha
Weight Update

K Kotecha
Weight Update

K Kotecha
Weight Update

K Kotecha
Weight Update

K Kotecha
Weight Update

K Kotecha
Advantages & Disadvantages

Advantages

• Massively parallel in nature


• Fault (noise) tolerant because of parallelism
• Can be designed to be adaptive

Disadvantages

• No clear rules or design guidelines for arbitrary applications


• No general way to assess the internal operation of the network
• (therefore, an ANN system is seen as a “black-box”)
• Difficult to predict future network performance (generalization)

K Kotecha
ANN- When ?

◼ Input is high-dimensional discrete or real-valued


◼ The target function is real-valued, discrete-valued or vector-valued
◼ Possibly noisy data
◼ The form of the target function is unknown
◼ Human readability of result is not (very) important
◼ Long training time is accepted
◼ Short classification/prediction time is required

K Kotecha

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy