0% found this document useful (0 votes)

30 views82 pages

06 NeuralNetworks 2024

The document provides an overview of neural networks, detailing their structure, function, and applications in machine learning. It discusses the evolution of neural networks, their biological inspirations, and how they can model complex non-linear functions through interconnected layers of neurons. The content also covers various activation functions, training methods, and the capabilities of neural networks in tasks such as classification and regression.

Uploaded by

Arkajyoti Saha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views82 pages

06 NeuralNetworks 2024

Uploaded by

Arkajyoti Saha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 82

EECS708: Machine Learning

Neural Networks
Dr. Ioannis Patras
School of EECS

Slide thanks: Dr. Tim Hospedales

Course Context
• Supervised Learning
– (Linear) regression
– Logistic Regression (Classification)
– Neural Networks
• Unsupervised
– Clustering
– Density Estimation
– Dimensionality reduction (partial)
• Advanced topics
– Deep Learning, Convolutional Neural Networks
– Ensemble Learning
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
Intro to Neural Nets
• The term ‘neural network’ has origins in attempts to find
mathematical representations of information processing in biological
systems.
– (McCulloch and Pitts, 1943; Widrow and Hoff, 1960; Rosenblatt, 1962; Rumelhart et al., 1986).

• Neural Networks:
– Inspired from algorithms that try to mimic the brain
– Formalized as systems of interconnected “neurons”
– Each neuron does a very simple calculation
– Many layers of neurons put together.
– The collective can do qualitatively more powerful
computation than any individual unit.
Neural Nets now!
• Until late 00’s, they were largely shunned.
– Now they are back!
• Superhuman performance at:
– Character recognition
– Face recognition
– Object recognition
– Atari video game playing
– Go playing(!)
• Pervasive:
– Run OCR in banks/mail.
– Every facebook photo upload (millions per day) goes
through a Deep CNN.
– Run Cortana, Google’s, Siri speech recognition.
In the Media
Deepmind Neural Atari
…and scandal!
Brain vs Computers
• Brain is massively parallel, but slow
– 100 billion (1011) neurons
– 100 trillion (1014-15) connections (synapses)
– 102 operations per second
• Contrast fast serial computers
– 1-103 processors
– 1-10 billion operators per second
Von Neuman computer versus
biological neural system

Artificial neural
networks have an
architecture roughly
similar to the
biological neural
system

ELEM041 Machine Learning 10

Real Neurons in Real Brains
• The Brain inspires NNs.
• In turn NNs and ML
could (??) help us
understand how the
brain works
• Brains do massively
parallel computation.
• Synapses change
strength (cf weight
change in ML model)
• Local signals used to
change the strength
Real Neurons in Real Brains
• Dendrites collect
incoming signals
• Polarization builds up in
cell…
• After reaching a charge
threshold, neuron fires
• Impulse travels down
Axon
• And into other neurons
via synapses
• …and repeat
Real Neurons in Real Brains
• Real neurons use
spikes not continuous
outputs
• However, they have a
minimum (0) and
maximum firing rate
• Artificial neurons can
roughly be considered
to simulate natural
neurons at firing rate
level
Artificial Neural Networks
• Nodes (cf neurons)
• Adaptive Weights (cf synaptic strength)
• Defined by:
– Interconnection pattern between layers of neurons
– The activation function that converts a neuron’s
weighted input to its output activation.
– The learning process for updating the interconnection
weights

• Can model arbitrarily complex non-linear

functions of inputs.
Fixed Versus Adaptive Basis Functions
• Previously we used fixed linear combinations
of features.
• Limited by the curse of dimensionality
• In contrast, Neural Nets adapt the parameters
of the basis functions during training.
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
Perceptron: A Simple Model Neuron
• Uses the step function as activation
– Linear Threshold Unit (LTU)
Neuron as a Logistic Regression Unit

Can think of a logistic regression unit as a neuron (function) that multiplies the
input by the parameters (weights) and squashes the resulting sum through the
sigmoid.
• Logistic Regression Unit => Perception if weight strength goes to infinity, so
sigmoid => step.
Feed Forward Neural Net
• Connected set of logistic regression units
– Arranged in layers.
– Each unit’s output is a non-linear function (e.g.,
sigmoid, step function) of a linear combination of its
inputs.
• Input layer size:
– Size of input to classify/regress
• Output layer size
– Size of target for classification/regression
• Any number of hidden neurons
• Any number of hidden layers
A Sigmoid/Logistic Neuron
Sigmoid
Activation

Weights

! x $
# 0 & &' # #
x = # x1 & $
' = $'" ! !
# & $%' ! !"
#" x2 &%
A Single Neuron
x0

x1
hθ (x)

x2
A Neural Network
x0

x1
h12
θ (x)

x2
h13
θ (x)
A Neural Network
x0
h11
θ (x)

x1
h12
θ (x)

x2
h13
θ (x)

Input Layer Output Layer

A Multi-Layer Neural Network
h11
θ (x) hθ21(x)
x0

h12
θ (x) hθ22(x)
x1

h13
θ (x) hθ23(x)
x2

Input Layer Hidden Layer Output Layer

A Deep Neural Network
11
h (x)
θ
21
h (x)
θ
hθ31(x)
x0

h12
θ (x)
22
h (x)
θ
hθ32(x)
x1
13
h (x)
θ
23
h (x)
θ
hθ33(x)
x2

Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer

Summary:
• Neural nets as interconnected layers of (e.g.,
sigmoid / logistic regression) neurons/units.
• Layer l+1’s inputs are layer l’s outputs.
• Can have any number of layers.
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
Linearly Non-Separable Problems
E.g., the famous XOR problem

Can you draw a linear decision boundary to separate the two classes?
How can a NN help?
• It can combine decision boundaries at each
level of the network.
– The decision of one layer can become the input of
the next
• This can make more complex boundaries
overall.
• Lets look at some examples…
Logical Functions
• Neural AND?
• Neural OR?
• Neural XOR?
Logistic Unit AND
x0 = 1
W0=-1.5

x1 = 0 /1
W1=1
hw (x)
W2=1
x2 = 0 /1

Recall Hw(x)=1 if w1x1+w2x2+w0>0

What weights will give AND?
Otherwise Hw(x)=0
Logistic Unit OR
x0 = 1
W0=-0.5

x1 = 0 /1
W1=1
hw (x)
W2=1
x2 = 0 /1

Recall Hw(x)=1 if w1x1+w2x2+w0>0

What weights will give OR?
Otherwise Hw(x)=0
Logistic Unit XOR?
x0 = 1

x1 = 0 /1
hw (x)

x2 = 0 /1

Recall Hw(x)=1 if w1x1+w2x2+w0>0

What weights will give XOR?
Otherwise Hw(x)=0
Neural Net XOR
x10 = 1
x0 = 1 -0.5

-1.5
1

x1 = 0 /1
1

x2 = 0 /1 1
Neural Net XOR
x10 = 1
x0 = 1 -0.5
-1.5
OR
-1.5 1
1

x1 = 0 /1 AND
1
[OR] AND [NOT AND]
-1
1
AND
x2 = 0 /1 1
How to Come up With these Weights?
h11
θ (x) hθ21(x) hθ31(x)
x0

h12
θ (x) hθ22(x) hθ32(x)
x1

h13
θ (x) hθ23(x) hθ33(x)
x2

Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer

Backpropagation Algorithm. Coming up next…
Summary:
• Single neurons can only do linear decision
boundaries.
• Networks of neurons can do (arbitrarily)
complicated non-linear decisions.
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
Typical Activations
• Each neuron computes %$" " !# = %$! #! $! + #" #
! =!

• Activation functions h(a)

– Perception (step-function activation) 1 if a>0
h(a) =
• Issue: Not differentiable wrt weights. −1 if a<0
– Sigmoid h(a) =
1
1+ e−a
– tanh e a − e−a
h(a) = a −a
e +e
– Identity… x0
x1
x2 hw (x)
Typical Activations
• Each neuron computes %$" " !# = %$! #! $! + #" #
! =!

• Activation functions h(a)

– Rectified Linear Unit (ReLU) = max(0, x)
– Widely used in Deep Neural Nets
– Accelerates convergence during training
– Leaky ReLU
Unpacking the Prediction Made
• What’s the prediction made?
h11 (x) = h(w110 + w111 x1 + w121 x2 )
h12 (x) = h(w120 + w121 x1 + w122 x2 )
h13 (x) = h(w130 + w131 x1 + w132 x2 )
2 2 2 1
h (x) = h(w + w h + w h + w h )2 1 2 1 h10
1 10 11 1 12 2 13 3 x0

h11 (x)
x1
h12 (x) h12(x)

x2
h13 (x)

Input Layer Hidden Layer Output Layer

Unpacking the Prediction Made
• What’s the prediction made?
h11 (x) = h(w110 + w111 x1 + w121 x2 )
h12 (x) = h(w120 + w121 x1 + w122 x2 )
h13 (x) = h(w130 + w131 x1 + w132 x2 )
2 2 2 1
h (x) = h(w + w h + w h + w h )2 1 2 1 h10
1 10 11 1 12 2 13 3 x0

h11 (x)
x1
& $ " $! ! #
( % "!! = (% $ ( D %# ' ( # "( D #" &" ! !!
$
$ "#!
h12 (x) h12(x)
% # =# " =# " x2
h13 (x)

Input Layer Hidden Layer Output Layer

h11 (x)
x1
h12 (x) h12(x)

x2
# " !! = # "
$
( " $!
! #" " "#!
!! ) h13 (x)

Input Layer Hidden Layer Output Layer

Aside: Non-linearities are important.
• What’s the prediction made?
– Suppose we used linear activation h. h(a) = a
• What happens to the prediction?
# "!! = #("
$ " $!
! #" " "#!
!! )
# $ #!! = " # $ ! " #"! !
h10
# $ # !" = " ! ! x0

h11 (x)
• No matter how many layers…. x1
h12 (x) h12(x)
– Still a simple linear model
x2
h13 (x)

Input Layer Hidden Layer Output Layer

Neural Nets can do many tasks
• Single output regression
• Multiple output regression
• Binary classification.
• Multiclass classification.
• Binary multi-label classification.

• … by changing the activation function of

output layer.
Neural Nets can do many tasks:
Single output regression
• Single output regression
– Identity activation h 2 (a) = a
• Linear combination of previous layer is range +/- inf.

h11 (x) = h(w110 + w111 x1 + w121 x2 ) h10

1 1 1 1 x0
h (x) = h(w + w x + w x )
2 20 21 1 22 2
h11 (x)
h13 (x) = h(w130 + w131 x1 + w132 x2 ) x1
h12 (x) h12(x)
h12(x) = w10
2 2 1
+ w11 2 1
h1 + w12 2 1
h 2 + w13 h3 x2
h13 (x)

Input Layer Hidden Layer Output Layer

Neural Nets can do many tasks:
Multiple output regression
• Multiple output regression
– Identity activation h 2 (a) = a
• Linear combination of previous layer is range +/- inf.
– Use multiple output units.
h10
h12(x) = w10
2 2 1
+ w11 2 1
h1 + w12 2 1
h 2 + w13 h3 x0

h11 (x) h12(x)

h 22 (x) = w 220 + w 221h11 + w 222 h12 + w 223h13 x1
h12 (x) h12(x)

x2
h13 (x)

Input Layer Hidden Layer Output Layer

Neural Nets can do many tasks:
Binary classification
• Binary classification
– Sigmoid activation h 2 (a) = 1 / (1+ e−a )
• Linear combination of previous layer squashed into
range [0,1].

h10
x0

2 2 2 1 2 1 2 1 h11 (x)
h (x) = σ (w + w h + w h + w h )
1 10 11 1 12 2 13 3 x1
h12 (x) h12(x)

x2
h13 (x)

Input Layer Hidden Layer Output Layer

Neural Nets can do many tasks:
Multiclass/Multi-label classification
• Multiclass classification
– Sigmoid activation h 2 (a) = 1 / (1+ e−a )
• Linear combination of previous layer squashed into
range [0,1]
– Use multiple output units in 1-of-K encoding.
h10
2 2 2 1 2 1 2 1 x0
h (x) = σ (w + w h + w h + w h )
1 10 11 1 12 2 13 3
h11 (x) h12(x)
h 22 (x) = σ (w 220 + w 221h11 + w 222 h12 + w 223h13 ) x1
h12 (x) h12(x)
h 32 (x) = σ (w 30
2 2 1
+ w 31 2 1
h1 + w 32 2 1
h 2 + w 33 h3 )
x2
h13 (x) h12(x)

Input Layer Hidden Layer Output Layer

– y=[1,0,0] : person h12(x)

h11 (x)
– y=[0,1,0] : car x1

– y=[0,0,1] : truck h12 (x) h12(x)

x2
h13 (x) h12(x)

Input Layer Hidden Layer Output Layer

Summary:
• Non-linear activations are essential
• Different activations and different structures
for different tasks
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
Cost Functions: Regression
• As with linear & logistic regression, to train a
Neural Net we’ll need a cost function.
– And training data D={x,y} pairs….
• Regression prediction:
h10
! " !! = ! " "
% % %
$& + " ! + " ! + " ! ! x0
% $
$$ $
% $
$% %
% $
$# #

= "#"$ + "#""!"" + "#"# !"# + "#"! !"! h11 (x)

x1
• Regression cost: MSE h12 (x) h12(x)

! (" )
" !
$ $# # = !
$" ! # " ! ! x2
!# ! =" h13 (x)

Input Layer Hidden Layer Output Layer

Cost Functions: Classification
• As with linear & logistic regression, to train a
Neural Net we’ll need a cost function.
– And training data D={x,y} pairs….
• Classification prediction:
2 2
h (x) = h(w + w h + w h + w h ) 2 1 2 1 2 1 h10
10 11 1 12 2 13 3 x0
= ! " "%$& + "%$$!$$ + "%$% !$% + "%$# !$# ! h11 (x)
x1
• Classification cost: h12 (x) h12(x)
#
% "" ! = "
$
!#
! =#
! $%& " ' "! ! ! + "# " #! ! $%&"# " " ' "! ! !! x2
h13 (x)

Input Layer Hidden Layer Output Layer

Computing the Cost
• To train NN, find weights to minimise the cost.
• We know how to write code to compute E(w)
– 1. Do the “forward propagation” and make
predictions.
– 2. Use predictions to compute cost.
• …. But how to minimise it efficiently?
h 2 (x) = w 2h1 (W 1x) ! $ "!! = ! "# $ !# " " #!!!

! (" $" # " ! )

" ! #
$ $# # = !#
!
! ! % "" ! = " ! $%& " ' "! ! ! + "# " #! ! $%&"# " " ' "! ! !!
!# ! =" $ ! =#
Minimising the Cost
• Cost is Non-Convex
Neural Net Cost is Non Convex
• Cost is Non-Convex:
– No global minima.
– Require iterative gradient methods to train.
– Gradient descent will only converge to local
minima.
How to Optimise Cost?
• For Linear/Logistic regression, each weight
had a straightforward derivative.
• For neural net, complicated by many layers of
interdependent weights.

h 2 (x) = w 2h1 (W 1x) ! $ "!! = ! "# $ !# " " #!!!

! (" $" # " ! )

" ! #
$ $# # = !#
!
! ! % "" ! = " $%& " ' "! ! ! + "# " #! ! $%&"# " " ' "! ! !!
!# ! =" $ ! =#
!
Backpropagation Algorithm
• Derivative of cost wrt any weight wji:
– Depends on activation: the linear combination of
inputs before non-linearity.
– Activation:
• Every neuron does something like: % " = ! $ "! #!
!
– (z may inputs or previous layer’s outputs)
• Before doing non-linearity: #" " ! !
– Weight update via activations:
!% "! ! !% "! ! !$ ! !% " "# "! ! "% "! !
= = #! =!! = ! " #!
!# !" !$ ! !# !" !$ "! "" ! "$ "!
Error Term:
Backpropagation Algorithm
• Iterate:
– Propagate forward to find activations a and h(a) of
all internal and output units. [Done]
– Evaluate error term δ for output units [TBD].
– Backpropagate output δs to obtain internal δs.
! " = %# "$ " !" #!"! ! Activ. Must be differentiable!
!
– Use δs to get a gradient update for each weight:
"% "! ! "% "! !
= ! " #! = !" # !
"$ "! "$"!
Output Error Term: Regression
• Output error term: "# "! !
= !!
""!
• For linear regression:
– Cost: Square Deviation
– Activation: Identity.

"$& "! !
= ## ""% !(#""% ! ! ! )
2
En (w) = 0.5 ( h(a j ) − y)
""%

"$& "! !
h(ak ) = ak h'(ak ) = 1 = # % = (#""% ! ! ! )
""%
Output Error Term: Binary Classifier
• Output error term: "# "! !
= !!
""!
• For binary classifier
– Cost: Square Devation (or cross-entropy)
– Activation: Sigmoid.

"$& "! !
= ## ""% !(#""% ! ! ! )
2
En (w) = 0.5 ( h(ak ) − y)
""%

#$! ! = " $! ! = " #$" + " ! ! ! "#% "! !

= # # ""$ !(# ""$ ! ! ! )
"$ "! ! = ! $ "! ! = ! "! !"# " ! "! !! ""$
Summary: Regression
• Iterate:
– Propagate forward to find activations a and h(a) of
all internal and output units: h 2 (x) = w 2h1 (W 1x)
– Evaluate error term δ for output units: " $ = (#""$ ! ! ! )
– Backpropagate output δs to obtain internal δs.

! " = %# "$ " !" #!"! !

– Use δs to get a gradient update for each weight:
!

"% #! ! "% #! !
= ! " #! = ! " # #"!
!
"$ "!
#"!
"$ "!
# $!
Summary: Binary Classification
• Iterate:
– Propagate forward to find activations a and h(a) of all
internal and output units: ! $ "!! = ! "# $ !# " " #!!!
– Evaluate error term δ for output units:
# # = " # ""# !(" ""# ! ! ! )
– Backpropagate output δs to obtain internal δs.
! " = %# "$ " !" #!"! !
!
– Use δs to get a gradient update for each weight:

"% #! ! "% #! !
= ! " #! = ! " # #"!
!
"$ "!
#"!
"$ "!
# $!
Training Overview
• Implement forward propagation to get h(x) for
any x.
• Implement cost function computation.
• Implement backpropagation to compute
partial derivatives.
• Iterate forward & backward propagation.
Summary
• Training neural nets:
– Use backpropagation to minimise their cost
function.
– Errors at later nodes are backpropagated to
compute errors at earlier nodes.
– Errors at each node give gradient update for that
node.
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
Batch vs Online
• Updates in past couple slides are gradients
WRT one single input n.
– Online Gradient Descent:
• Iterate over data n, and weights w(i,j):
!%# "! !
$ !" = $ !" " #
!$ !"
– Batch Gradient Descent:
• Iterate over weights w(i,j):
"%! "! !
$ "# = $ "# # $ !
! "$ "#
Gradient Checking
– When implementing gradient-descent algorithms like
backprop….
– May be useful to check correctness of your derivatives
numerically. "% "! ! Should be equal
= ! " #!
"$ "!
– Can implement finite difference numerical
differentiation to check them
#%# #! " %# # $ !" + ! " " %# # $ !" " ! "
=
#$ !" !!
– I.e., perturb the current weight and re-compute the
network’s error. Get gradient from this change in error.
Gradient Checking
– Aside: You could also implement gradient descent
by numerical differentiation, but slow!
• Each forward propagation costs O(W).
• Each weight must be perturbed individually at cost O(W).
– => Overall cost O(W2)
#%# #! " %# # $ !" + ! " " %# # $ !" " ! "
=
#$ !" !!

– I.e., perturb the current weight and re-compute the

network’s error. Get gradient from this change in
error.
Initialization
• For gradient methods need to pick initial
weight vectors w.
– Zero initialization:
• Every hidden unit gets the same input.
• Nothing different to backprop.
h10
• Network never learns anything. x0
h11 (x)
x1
h12 (x) h12(x)

x2
h13 (x)

Input Layer Hidden Layer Output Layer

Initialization => Symmetry Breaking
• For gradient methods need to pick initial
weight vectors w.
– Zero initialization:
• Every hidden unit gets the same input.
• Nothing different to backprop.
h10
• Network never learns anything. x0
• Solution: h11 (x)
x1
– Init randomly in [-ε,ε] h12 (x) h12(x)

x2
h13 (x)

Input Layer Hidden Layer Output Layer

Local Minima
• Neural Net costs are not Convex
• Solutions:
1. Accept local minima.
2. Online rather than batch gradient descent may help
jitter out of minima.
3. Repeatedly restart from different random initial
conditions, take the best performing network.
(Expensive)
4. …momentum, etc.
Overfitting in Neural Nets
• Neural Nets can have lots of parameters
(weights)
– 10s of millions in practice!
– Overfitting is a risk, especially if limited data.
• Solutions: Regularization:
– Use L2 regularizer/weight decay as we did with
linear/logistic regression (needs update to
gradients)
Overfitting in Neural Nets
• Neural Nets can have lots of parameters
(weights)
– 10s of millions in practice!
– Overfitting is a risk, especially if limited data.
• Solutions: Early Stopping:
– At each iteration of gradient descent, check the
network cost on validation set.
– Stop once validation error starts to increase
(although train error will still be decreasing)
Overfitting in Neural Nets

• Solutions: Early Stopping:

– At each iteration of gradient descent, check the
network cost on validation set.
– Stop once validation error starts to increase
(although train error will still be decreasing)
NN: Design Decisions
• Network Architecture
– # of Hidden Layers.
– # of Hidden Nodes.
• (# of input & output relatively easy)

• Cost function choice

• Activation function choice
• Learning rate
Training Overview (2)
• Choose architecture
• Initialize weights to small random numbers
• Implement forward propagation to get h(x) for any x.
• Implement cost function computation.
• Implement backpropagation to compute partial
derivatives.
• Iterate forward & backward propagation.
• Use gradient checking to debug.
• Disable gradient checking once debugged.
• Early Stop training once validation error increasing.
• (Repeatedly retrain to try and find different local
minima)
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
• Resurgence and Deep Neural Nets
NN Example - EECS Research J
Best Paper Prize: BMVC 2015!
• Our AI system plays superhuman level of
pictionary!
– Key components: Neural Net => Metric Learning =>
Nearest Neighbor.
NN Example - EECS Research J
Best Paper Prize: BMVC 2015!
• Specs:
– Eight layers (five convolutional).
– “Only” 8.5 million parameters. (CF 144 mil).
– Trained on 300M images with 32C Xenon CPU: 80
hours, or K40 GPU: 10 hours.
NN Example - EECS Research J
Reported in BBC, Popular Science, Business Insider, and many more….

RGPV Notes - Machine Learning
No ratings yet
RGPV Notes - Machine Learning
4 pages
d2l en
No ratings yet
d2l en
1,157 pages
Robust Evaluation Frameworks For High-Frequency Trading Models: Addressing Core Challenges and Methodologies
No ratings yet
Robust Evaluation Frameworks For High-Frequency Trading Models: Addressing Core Challenges and Methodologies
28 pages
Tcs
No ratings yet
Tcs
1 page
Arkajyoti Saha
No ratings yet
Arkajyoti Saha
6 pages
BSC - MSC Project Suggestions
No ratings yet
BSC - MSC Project Suggestions
4 pages
AI Engineer Cheat Sheet Micro1
No ratings yet
AI Engineer Cheat Sheet Micro1
2 pages
Unit 1.7 Linear Discriminants
No ratings yet
Unit 1.7 Linear Discriminants
7 pages
13 Ann
No ratings yet
13 Ann
39 pages
Lect8 DNN
No ratings yet
Lect8 DNN
33 pages
Age Estimation From Facial Images - Interview Q&A R
No ratings yet
Age Estimation From Facial Images - Interview Q&A R
4 pages
05 Classification II 2024
No ratings yet
05 Classification II 2024
54 pages
التعلم العميق
No ratings yet
التعلم العميق
100 pages
07 Clustering 2024
No ratings yet
07 Clustering 2024
51 pages
NN Bnu2
No ratings yet
NN Bnu2
47 pages
Deep Learning For Financial Applications - A Survey
100% (1)
Deep Learning For Financial Applications - A Survey
29 pages
7 CNN
No ratings yet
7 CNN
66 pages
DL - ANN - RNN - CNN (Autosaved) (Autosaved)
No ratings yet
DL - ANN - RNN - CNN (Autosaved) (Autosaved)
53 pages
Neural Networks
100% (1)
Neural Networks
119 pages
Generative AI (21CS733) AAT-1 Final Marks
No ratings yet
Generative AI (21CS733) AAT-1 Final Marks
8 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
36-Gated RNNs - Optimization For Long-Term Dependencies - Explicit Memory-07!10!2024
No ratings yet
36-Gated RNNs - Optimization For Long-Term Dependencies - Explicit Memory-07!10!2024
4 pages
Deep Learning - Part-1
No ratings yet
Deep Learning - Part-1
143 pages
Deep Learning and NLP With PYTHON - Course Outline
No ratings yet
Deep Learning and NLP With PYTHON - Course Outline
11 pages
Lecture 28 TransformerIntroductionFinal 1
No ratings yet
Lecture 28 TransformerIntroductionFinal 1
69 pages
Refined Chapter 5 UceQEJ
No ratings yet
Refined Chapter 5 UceQEJ
79 pages
Seminar Final Presentation
No ratings yet
Seminar Final Presentation
13 pages
Lec 06
No ratings yet
Lec 06
20 pages
Notes ML 02 Slides RNN ANN
No ratings yet
Notes ML 02 Slides RNN ANN
105 pages
Week 2
No ratings yet
Week 2
3 pages
DR - Amin.ML Ch07 DeepLearning 1
No ratings yet
DR - Amin.ML Ch07 DeepLearning 1
12 pages
Madaline Algorithm
No ratings yet
Madaline Algorithm
1 page
Module - 3 AAI
No ratings yet
Module - 3 AAI
119 pages
Neural Network
No ratings yet
Neural Network
18 pages
Unit 4 Neural Networks
No ratings yet
Unit 4 Neural Networks
76 pages
7 NN Apr 28 2021
No ratings yet
7 NN Apr 28 2021
81 pages
Neural Nets
No ratings yet
Neural Nets
33 pages
Chapter 5 Artificial Neural Networks
No ratings yet
Chapter 5 Artificial Neural Networks
50 pages
7 - Neural Networks
No ratings yet
7 - Neural Networks
66 pages
A Text Classification Model Based On GCN and BiGRU Fusion
No ratings yet
A Text Classification Model Based On GCN and BiGRU Fusion
5 pages
Unit 2 - Soft Computing
No ratings yet
Unit 2 - Soft Computing
49 pages
Unit III
No ratings yet
Unit III
29 pages
Neural Networks
No ratings yet
Neural Networks
33 pages
Lecture 8
No ratings yet
Lecture 8
65 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Aircraft Engine Remaining Useful Life Prediction U
No ratings yet
Aircraft Engine Remaining Useful Life Prediction U
3 pages
Unit V
No ratings yet
Unit V
49 pages
Machine Learning
No ratings yet
Machine Learning
77 pages
Ai Quiz
No ratings yet
Ai Quiz
2 pages
Deep Learning Techniques (Important Questions)
No ratings yet
Deep Learning Techniques (Important Questions)
5 pages
Introduction To Artificial Neural Networks and Its Application
No ratings yet
Introduction To Artificial Neural Networks and Its Application
16 pages
Assessment: Assigned
No ratings yet
Assessment: Assigned
13 pages
Supervised learningNN
No ratings yet
Supervised learningNN
73 pages
Lecture 7 - Neural Networks
No ratings yet
Lecture 7 - Neural Networks
48 pages
Neural
No ratings yet
Neural
32 pages
AN2DL 02 2324 Perceptron 2 FeedForward
No ratings yet
AN2DL 02 2324 Perceptron 2 FeedForward
55 pages
What Is A Neural Network? - IBM
No ratings yet
What Is A Neural Network? - IBM
10 pages
Lecture2 Slides 1
No ratings yet
Lecture2 Slides 1
28 pages
Basics
No ratings yet
Basics
48 pages
Convolutional Neural Network With An Optimized Backpropagation Technique
No ratings yet
Convolutional Neural Network With An Optimized Backpropagation Technique
5 pages
CNN Short
No ratings yet
CNN Short
61 pages
Unit 1
No ratings yet
Unit 1
16 pages
Neural Networks - Annotated
No ratings yet
Neural Networks - Annotated
21 pages
Term Paper: Dept of CSE, GMRIT
No ratings yet
Term Paper: Dept of CSE, GMRIT
16 pages
Neural Networks (Representation) : 1a. Non-Linear Hypothesis
No ratings yet
Neural Networks (Representation) : 1a. Non-Linear Hypothesis
11 pages
Assignment Problems
No ratings yet
Assignment Problems
12 pages
Neural Networks
No ratings yet
Neural Networks
28 pages
ARM Founded in November 1990: Advanced RISC Machines
No ratings yet
ARM Founded in November 1990: Advanced RISC Machines
27 pages
Unit Iv DM
No ratings yet
Unit Iv DM
58 pages
Neural Networks: - Genetic Algorithms - Genetic Programming - Behavior-Based Systems
No ratings yet
Neural Networks: - Genetic Algorithms - Genetic Programming - Behavior-Based Systems
74 pages
Uni2 NNDL
No ratings yet
Uni2 NNDL
21 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
Chapter 6 AI
No ratings yet
Chapter 6 AI
52 pages
Types of Neural Networks and Definition of Neural Network
No ratings yet
Types of Neural Networks and Definition of Neural Network
15 pages
Intro To NN & FL
No ratings yet
Intro To NN & FL
42 pages
Deep Learning - Unit 1 Notes
No ratings yet
Deep Learning - Unit 1 Notes
27 pages
Exercise 3
No ratings yet
Exercise 3
3 pages
Quiz - Review On Machine Learning
No ratings yet
Quiz - Review On Machine Learning
6 pages
Module1 ECO-598 AI & ML Aug 21
No ratings yet
Module1 ECO-598 AI & ML Aug 21
45 pages
Unit - 4
No ratings yet
Unit - 4
17 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Module 2
No ratings yet
Module 2
44 pages
ECSE484 Intro v2
No ratings yet
ECSE484 Intro v2
67 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Neural Networks
No ratings yet
Neural Networks
27 pages
Notes Chapter Neural Networks
No ratings yet
Notes Chapter Neural Networks
18 pages
Institute For Advanced Management Systems Research Department of Information Technologies Abo Akademi University
No ratings yet
Institute For Advanced Management Systems Research Department of Information Technologies Abo Akademi University
41 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
82 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
Neural Networks: Some Material Adopted From Notes by
No ratings yet
Neural Networks: Some Material Adopted From Notes by
35 pages
Neural Network
No ratings yet
Neural Network
7 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Department of Mathematics MAT-3003: Complex Variables and Partial Differential Equations Digital Assignment-I Faculty: Dr. R. Nageshwar Rao
No ratings yet
Department of Mathematics MAT-3003: Complex Variables and Partial Differential Equations Digital Assignment-I Faculty: Dr. R. Nageshwar Rao
1 page
Neural Networks and Their Statistical Application
No ratings yet
Neural Networks and Their Statistical Application
41 pages
91329-0136097111 ch01
No ratings yet
91329-0136097111 ch01
12 pages
Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge
100% (1)
Artificial Neural Networks: System That Can Acquire, Store, and Utilize Experiential Knowledge
40 pages
Kiet School of Engineering & Technology: Department of Computer Appication
No ratings yet
Kiet School of Engineering & Technology: Department of Computer Appication
30 pages
Unit 4
100% (1)
Unit 4
57 pages
Deep Learning in Python - Master Data Science and Machine Learning With Modern Neural Networks Written in Python, Theano, and TensorFlow (PDFDrive)
100% (2)
Deep Learning in Python - Master Data Science and Machine Learning With Modern Neural Networks Written in Python, Theano, and TensorFlow (PDFDrive)
104 pages
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
From Everand
Hyperbolic Functions (Trigonometry) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

06 NeuralNetworks 2024

Uploaded by

06 NeuralNetworks 2024

Uploaded by

EECS708: Machine Learning

Slide thanks: Dr. Tim Hospedales

ELEM041 Machine Learning 10

• Can model arbitrarily complex non-linear

Input Layer Output Layer

Input Layer Hidden Layer Output Layer

Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer

Recall Hw(x)=1 if w1x1+w2x2+w0>0

Recall Hw(x)=1 if w1x1+w2x2+w0>0

Recall Hw(x)=1 if w1x1+w2x2+w0>0

Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer

• Activation functions h(a)

• Activation functions h(a)

Input Layer Hidden Layer Output Layer

Input Layer Hidden Layer Output Layer

Input Layer Hidden Layer Output Layer

Input Layer Hidden Layer Output Layer

• … by changing the activation function of

h11 (x) = h(w110 + w111 x1 + w121 x2 ) h10

Input Layer Hidden Layer Output Layer

h11 (x) h12(x)

Input Layer Hidden Layer Output Layer

Input Layer Hidden Layer Output Layer

Input Layer Hidden Layer Output Layer

– y=[1,0,0] : person h12(x)

– y=[0,0,1] : truck h12 (x) h12(x)

Input Layer Hidden Layer Output Layer

= "#"$ + "#""!"" + "#"# !"# + "#"! !"! h11 (x)

Input Layer Hidden Layer Output Layer

Input Layer Hidden Layer Output Layer

! (" $" # " ! )

h 2 (x) = w 2h1 (W 1x) ! $ "!! = ! "# $ !# " " #!!!

! (" $" # " ! )

#$! ! = " $! ! = " #$" + " ! ! ! "#% "! !

! " = %# "$ " !" #!"! !

– I.e., perturb the current weight and re-compute the

Input Layer Hidden Layer Output Layer

Input Layer Hidden Layer Output Layer

• Solutions: Early Stopping:

• Cost function choice

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.