0% found this document useful (0 votes)
30 views82 pages

06 NeuralNetworks 2024

The document provides an overview of neural networks, detailing their structure, function, and applications in machine learning. It discusses the evolution of neural networks, their biological inspirations, and how they can model complex non-linear functions through interconnected layers of neurons. The content also covers various activation functions, training methods, and the capabilities of neural networks in tasks such as classification and regression.

Uploaded by

Arkajyoti Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views82 pages

06 NeuralNetworks 2024

The document provides an overview of neural networks, detailing their structure, function, and applications in machine learning. It discusses the evolution of neural networks, their biological inspirations, and how they can model complex non-linear functions through interconnected layers of neurons. The content also covers various activation functions, training methods, and the capabilities of neural networks in tasks such as classification and regression.

Uploaded by

Arkajyoti Saha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

EECS708: Machine Learning

Neural Networks
Dr. Ioannis Patras
School of EECS

Slide thanks: Dr. Tim Hospedales


Course Context
• Supervised Learning
– (Linear) regression
– Logistic Regression (Classification)
– Neural Networks
• Unsupervised
– Clustering
– Density Estimation
– Dimensionality reduction (partial)
• Advanced topics
– Deep Learning, Convolutional Neural Networks
– Ensemble Learning
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
Intro to Neural Nets
• The term ‘neural network’ has origins in attempts to find
mathematical representations of information processing in biological
systems.
– (McCulloch and Pitts, 1943; Widrow and Hoff, 1960; Rosenblatt, 1962; Rumelhart et al., 1986).

• Neural Networks:
– Inspired from algorithms that try to mimic the brain
– Formalized as systems of interconnected “neurons”
– Each neuron does a very simple calculation
– Many layers of neurons put together.
– The collective can do qualitatively more powerful
computation than any individual unit.
Neural Nets now!
• Until late 00’s, they were largely shunned.
– Now they are back!
• Superhuman performance at:
– Character recognition
– Face recognition
– Object recognition
– Atari video game playing
– Go playing(!)
• Pervasive:
– Run OCR in banks/mail.
– Every facebook photo upload (millions per day) goes
through a Deep CNN.
– Run Cortana, Google’s, Siri speech recognition.
In the Media
Deepmind Neural Atari
…and scandal!
Brain vs Computers
• Brain is massively parallel, but slow
– 100 billion (1011) neurons
– 100 trillion (1014-15) connections (synapses)
– 102 operations per second
• Contrast fast serial computers
– 1-103 processors
– 1-10 billion operators per second
Von Neuman computer versus
biological neural system

Artificial neural
networks have an
architecture roughly
similar to the
biological neural
system

ELEM041 Machine Learning 10


Real Neurons in Real Brains
• The Brain inspires NNs.
• In turn NNs and ML
could (??) help us
understand how the
brain works
• Brains do massively
parallel computation.
• Synapses change
strength (cf weight
change in ML model)
• Local signals used to
change the strength
Real Neurons in Real Brains
• Dendrites collect
incoming signals
• Polarization builds up in
cell…
• After reaching a charge
threshold, neuron fires
• Impulse travels down
Axon
• And into other neurons
via synapses
• …and repeat
Real Neurons in Real Brains
• Real neurons use
spikes not continuous
outputs
• However, they have a
minimum (0) and
maximum firing rate
• Artificial neurons can
roughly be considered
to simulate natural
neurons at firing rate
level
Artificial Neural Networks
• Nodes (cf neurons)
• Adaptive Weights (cf synaptic strength)
• Defined by:
– Interconnection pattern between layers of neurons
– The activation function that converts a neuron’s
weighted input to its output activation.
– The learning process for updating the interconnection
weights

• Can model arbitrarily complex non-linear


functions of inputs.
Fixed Versus Adaptive Basis Functions
• Previously we used fixed linear combinations
of features.
• Limited by the curse of dimensionality
• In contrast, Neural Nets adapt the parameters
of the basis functions during training.
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
Perceptron: A Simple Model Neuron
• Uses the step function as activation
– Linear Threshold Unit (LTU)
Neuron as a Logistic Regression Unit

Can think of a logistic regression unit as a neuron (function) that multiplies the
input by the parameters (weights) and squashes the resulting sum through the
sigmoid.
• Logistic Regression Unit => Perception if weight strength goes to infinity, so
sigmoid => step.
Feed Forward Neural Net
• Connected set of logistic regression units
– Arranged in layers.
– Each unit’s output is a non-linear function (e.g.,
sigmoid, step function) of a linear combination of its
inputs.
• Input layer size:
– Size of input to classify/regress
• Output layer size
– Size of target for classification/regression
• Any number of hidden neurons
• Any number of hidden layers
A Sigmoid/Logistic Neuron
Sigmoid
Activation

Weights

! x $
# 0 & &' # #
x = # x1 & $
' = $'" ! !
# & $%' ! !"
#" x2 &%
A Single Neuron
x0

x1
hθ (x)

x2
A Neural Network
x0

x1
h12
θ (x)

x2
h13
θ (x)
A Neural Network
x0
h11
θ (x)

x1
h12
θ (x)

x2
h13
θ (x)

Input Layer Output Layer


A Multi-Layer Neural Network
h11
θ (x) hθ21(x)
x0

h12
θ (x) hθ22(x)
x1

h13
θ (x) hθ23(x)
x2

Input Layer Hidden Layer Output Layer


A Deep Neural Network
11
h (x)
θ
21
h (x)
θ
hθ31(x)
x0

h12
θ (x)
22
h (x)
θ
hθ32(x)
x1
13
h (x)
θ
23
h (x)
θ
hθ33(x)
x2

Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer


Summary:
• Neural nets as interconnected layers of (e.g.,
sigmoid / logistic regression) neurons/units.
• Layer l+1’s inputs are layer l’s outputs.
• Can have any number of layers.
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
Linearly Non-Separable Problems
E.g., the famous XOR problem

Can you draw a linear decision boundary to separate the two classes?
How can a NN help?
• It can combine decision boundaries at each
level of the network.
– The decision of one layer can become the input of
the next
• This can make more complex boundaries
overall.
• Lets look at some examples…
Logical Functions
• Neural AND?
• Neural OR?
• Neural XOR?
Logistic Unit AND
x0 = 1
W0=-1.5

x1 = 0 /1
W1=1
hw (x)
W2=1
x2 = 0 /1

Recall Hw(x)=1 if w1x1+w2x2+w0>0


What weights will give AND?
Otherwise Hw(x)=0
Logistic Unit OR
x0 = 1
W0=-0.5

x1 = 0 /1
W1=1
hw (x)
W2=1
x2 = 0 /1

Recall Hw(x)=1 if w1x1+w2x2+w0>0


What weights will give OR?
Otherwise Hw(x)=0
Logistic Unit XOR?
x0 = 1

x1 = 0 /1
hw (x)

x2 = 0 /1

Recall Hw(x)=1 if w1x1+w2x2+w0>0


What weights will give XOR?
Otherwise Hw(x)=0
Neural Net XOR
x10 = 1
x0 = 1 -0.5

-1.5
1

x1 = 0 /1
1

x2 = 0 /1 1
Neural Net XOR
x10 = 1
x0 = 1 -0.5
-1.5
OR
-1.5 1
1

x1 = 0 /1 AND
1
[OR] AND [NOT AND]
-1
1
AND
x2 = 0 /1 1
How to Come up With these Weights?
h11
θ (x) hθ21(x) hθ31(x)
x0

h12
θ (x) hθ22(x) hθ32(x)
x1

h13
θ (x) hθ23(x) hθ33(x)
x2

Input Layer Hidden Layer 1 Hidden Layer 2 Output Layer


Backpropagation Algorithm. Coming up next…
Summary:
• Single neurons can only do linear decision
boundaries.
• Networks of neurons can do (arbitrarily)
complicated non-linear decisions.
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
Typical Activations
• Each neuron computes %$" " !# = %$! #! $! + #" #
! =!

• Activation functions h(a)


– Perception (step-function activation) 1 if a>0
h(a) =
• Issue: Not differentiable wrt weights. −1 if a<0
– Sigmoid h(a) =
1
1+ e−a
– tanh e a − e−a
h(a) = a −a
e +e
– Identity… x0
x1
x2 hw (x)
Typical Activations
• Each neuron computes %$" " !# = %$! #! $! + #" #
! =!

• Activation functions h(a)


– Rectified Linear Unit (ReLU) = max(0, x)
– Widely used in Deep Neural Nets
– Accelerates convergence during training
– Leaky ReLU
Unpacking the Prediction Made
• What’s the prediction made?
h11 (x) = h(w110 + w111 x1 + w121 x2 )
h12 (x) = h(w120 + w121 x1 + w122 x2 )
h13 (x) = h(w130 + w131 x1 + w132 x2 )
2 2 2 1
h (x) = h(w + w h + w h + w h )2 1 2 1 h10
1 10 11 1 12 2 13 3 x0

h11 (x)
x1
h12 (x) h12(x)

x2
h13 (x)

Input Layer Hidden Layer Output Layer


Unpacking the Prediction Made
• What’s the prediction made?
h11 (x) = h(w110 + w111 x1 + w121 x2 )
h12 (x) = h(w120 + w121 x1 + w122 x2 )
h13 (x) = h(w130 + w131 x1 + w132 x2 )
2 2 2 1
h (x) = h(w + w h + w h + w h )2 1 2 1 h10
1 10 11 1 12 2 13 3 x0

h11 (x)
x1
& $ " $! ! #
( % "!! = (% $ ( D %# ' ( # "( D #" &" ! !!
$
$ "#!
h12 (x) h12(x)
% # =# " =# " x2
h13 (x)

Input Layer Hidden Layer Output Layer


Unpacking the Prediction Made
• What’s the prediction made?
h11 (x) = h(w110 + w111 x1 + w121 x2 )
h12 (x) = h(w120 + w121 x1 + w122 x2 )
h13 (x) = h(w130 + w131 x1 + w132 x2 )
2 2 2 1
h (x) = h(w + w h + w h + w h )2 1 2 1 h10
1 10 11 1 12 2 13 3 x0

h11 (x)
x1
h12 (x) h12(x)

x2
# " !! = # "
$
( " $!
! #" " "#!
!! ) h13 (x)

Input Layer Hidden Layer Output Layer


Aside: Non-linearities are important.
• What’s the prediction made?
– Suppose we used linear activation h. h(a) = a
• What happens to the prediction?
# "!! = #("
$ " $!
! #" " "#!
!! )
# $ #!! = " # $ ! " #"! !
h10
# $ # !" = " ! ! x0

h11 (x)
• No matter how many layers…. x1
h12 (x) h12(x)
– Still a simple linear model
x2
h13 (x)

Input Layer Hidden Layer Output Layer


Neural Nets can do many tasks
• Single output regression
• Multiple output regression
• Binary classification.
• Multiclass classification.
• Binary multi-label classification.

• … by changing the activation function of


output layer.
Neural Nets can do many tasks:
Single output regression
• Single output regression
– Identity activation h 2 (a) = a
• Linear combination of previous layer is range +/- inf.

h11 (x) = h(w110 + w111 x1 + w121 x2 ) h10


1 1 1 1 x0
h (x) = h(w + w x + w x )
2 20 21 1 22 2
h11 (x)
h13 (x) = h(w130 + w131 x1 + w132 x2 ) x1
h12 (x) h12(x)
h12(x) = w10
2 2 1
+ w11 2 1
h1 + w12 2 1
h 2 + w13 h3 x2
h13 (x)

Input Layer Hidden Layer Output Layer


Neural Nets can do many tasks:
Multiple output regression
• Multiple output regression
– Identity activation h 2 (a) = a
• Linear combination of previous layer is range +/- inf.
– Use multiple output units.
h10
h12(x) = w10
2 2 1
+ w11 2 1
h1 + w12 2 1
h 2 + w13 h3 x0

h11 (x) h12(x)


h 22 (x) = w 220 + w 221h11 + w 222 h12 + w 223h13 x1
h12 (x) h12(x)

x2
h13 (x)

Input Layer Hidden Layer Output Layer


Neural Nets can do many tasks:
Binary classification
• Binary classification
– Sigmoid activation h 2 (a) = 1 / (1+ e−a )
• Linear combination of previous layer squashed into
range [0,1].

h10
x0

2 2 2 1 2 1 2 1 h11 (x)
h (x) = σ (w + w h + w h + w h )
1 10 11 1 12 2 13 3 x1
h12 (x) h12(x)

x2
h13 (x)

Input Layer Hidden Layer Output Layer


Neural Nets can do many tasks:
Multiclass/Multi-label classification
• Multiclass classification
– Sigmoid activation h 2 (a) = 1 / (1+ e−a )
• Linear combination of previous layer squashed into
range [0,1]
– Use multiple output units in 1-of-K encoding.
h10
2 2 2 1 2 1 2 1 x0
h (x) = σ (w + w h + w h + w h )
1 10 11 1 12 2 13 3
h11 (x) h12(x)
h 22 (x) = σ (w 220 + w 221h11 + w 222 h12 + w 223h13 ) x1
h12 (x) h12(x)
h 32 (x) = σ (w 30
2 2 1
+ w 31 2 1
h1 + w 32 2 1
h 2 + w 33 h3 )
x2
h13 (x) h12(x)

Input Layer Hidden Layer Output Layer


Neural Nets can do many tasks:
Multiclass/Multi-label classification
• Multiclass classification
– Sigmoid activation h 2 (a) = 1 / (1+ e−a )
• Linear combination of previous layer squashed into
range [0,1]
– Use multiple output units in 1-of-K encoding.
• I.e., Train set would be x0
h10

– y=[1,0,0] : person h12(x)


h11 (x)
– y=[0,1,0] : car x1

– y=[0,0,1] : truck h12 (x) h12(x)

x2
h13 (x) h12(x)

Input Layer Hidden Layer Output Layer


Summary:
• Non-linear activations are essential
• Different activations and different structures
for different tasks
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
Cost Functions: Regression
• As with linear & logistic regression, to train a
Neural Net we’ll need a cost function.
– And training data D={x,y} pairs….
• Regression prediction:
h10
! " !! = ! " "
% % %
$& + " ! + " ! + " ! ! x0
% $
$$ $
% $
$% %
% $
$# #

= "#"$ + "#""!"" + "#"# !"# + "#"! !"! h11 (x)


x1
• Regression cost: MSE h12 (x) h12(x)

! (" )
" !
$ $# # = !
$" ! # " ! ! x2
!# ! =" h13 (x)

Input Layer Hidden Layer Output Layer


Cost Functions: Classification
• As with linear & logistic regression, to train a
Neural Net we’ll need a cost function.
– And training data D={x,y} pairs….
• Classification prediction:
2 2
h (x) = h(w + w h + w h + w h ) 2 1 2 1 2 1 h10
10 11 1 12 2 13 3 x0
= ! " "%$& + "%$$!$$ + "%$% !$% + "%$# !$# ! h11 (x)
x1
• Classification cost: h12 (x) h12(x)
#
% "" ! = "
$
!#
! =#
! $%& " ' "! ! ! + "# " #! ! $%&"# " " ' "! ! !! x2
h13 (x)

Input Layer Hidden Layer Output Layer


Computing the Cost
• To train NN, find weights to minimise the cost.
• We know how to write code to compute E(w)
– 1. Do the “forward propagation” and make
predictions.
– 2. Use predictions to compute cost.
• …. But how to minimise it efficiently?
h 2 (x) = w 2h1 (W 1x) ! $ "!! = ! "# $ !# " " #!!!

! (" $" # " ! )


" ! #
$ $# # = !#
!
! ! % "" ! = " ! $%& " ' "! ! ! + "# " #! ! $%&"# " " ' "! ! !!
!# ! =" $ ! =#
Minimising the Cost
• Cost is Non-Convex
Neural Net Cost is Non Convex
• Cost is Non-Convex:
– No global minima.
– Require iterative gradient methods to train.
– Gradient descent will only converge to local
minima.
How to Optimise Cost?
• For Linear/Logistic regression, each weight
had a straightforward derivative.
• For neural net, complicated by many layers of
interdependent weights.

h 2 (x) = w 2h1 (W 1x) ! $ "!! = ! "# $ !# " " #!!!

! (" $" # " ! )


" ! #
$ $# # = !#
!
! ! % "" ! = " $%& " ' "! ! ! + "# " #! ! $%&"# " " ' "! ! !!
!# ! =" $ ! =#
!
Backpropagation Algorithm
• Derivative of cost wrt any weight wji:
– Depends on activation: the linear combination of
inputs before non-linearity.
– Activation:
• Every neuron does something like: % " = ! $ "! #!
!
– (z may inputs or previous layer’s outputs)
• Before doing non-linearity: #" " ! !
– Weight update via activations:
!% "! ! !% "! ! !$ ! !% " "# "! ! "% "! !
= = #! =!! = ! " #!
!# !" !$ ! !# !" !$ "! "" ! "$ "!
Error Term:
Backpropagation Algorithm
• Iterate:
– Propagate forward to find activations a and h(a) of
all internal and output units. [Done]
– Evaluate error term δ for output units [TBD].
– Backpropagate output δs to obtain internal δs.
! " = %# "$ " !" #!"! ! Activ. Must be differentiable!
!
– Use δs to get a gradient update for each weight:
"% "! ! "% "! !
= ! " #! = !" # !
"$ "! "$"!
Output Error Term: Regression
• Output error term: "# "! !
= !!
""!
• For linear regression:
– Cost: Square Deviation
– Activation: Identity.

"$& "! !
= ## ""% !(#""% ! ! ! )
2
En (w) = 0.5 ( h(a j ) − y)
""%

"$& "! !
h(ak ) = ak h'(ak ) = 1 = # % = (#""% ! ! ! )
""%
Output Error Term: Binary Classifier
• Output error term: "# "! !
= !!
""!
• For binary classifier
– Cost: Square Devation (or cross-entropy)
– Activation: Sigmoid.

"$& "! !
= ## ""% !(#""% ! ! ! )
2
En (w) = 0.5 ( h(ak ) − y)
""%

#$! ! = " $! ! = " #$" + " ! ! ! "#% "! !


= # # ""$ !(# ""$ ! ! ! )
"$ "! ! = ! $ "! ! = ! "! !"# " ! "! !! ""$
Summary: Regression
• Iterate:
– Propagate forward to find activations a and h(a) of
all internal and output units: h 2 (x) = w 2h1 (W 1x)
– Evaluate error term δ for output units: " $ = (#""$ ! ! ! )
– Backpropagate output δs to obtain internal δs.

! " = %# "$ " !" #!"! !


– Use δs to get a gradient update for each weight:
!

"% #! ! "% #! !
= ! " #! = ! " # #"!
!
"$ "!
#"!
"$ "!
# $!
Summary: Binary Classification
• Iterate:
– Propagate forward to find activations a and h(a) of all
internal and output units: ! $ "!! = ! "# $ !# " " #!!!
– Evaluate error term δ for output units:
# # = " # ""# !(" ""# ! ! ! )
– Backpropagate output δs to obtain internal δs.
! " = %# "$ " !" #!"! !
!
– Use δs to get a gradient update for each weight:

"% #! ! "% #! !
= ! " #! = ! " # #"!
!
"$ "!
#"!
"$ "!
# $!
Training Overview
• Implement forward propagation to get h(x) for
any x.
• Implement cost function computation.
• Implement backpropagation to compute
partial derivatives.
• Iterate forward & backward propagation.
Summary
• Training neural nets:
– Use backpropagation to minimise their cost
function.
– Errors at later nodes are backpropagated to
compute errors at earlier nodes.
– Errors at each node give gradient update for that
node.
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
Batch vs Online
• Updates in past couple slides are gradients
WRT one single input n.
– Online Gradient Descent:
• Iterate over data n, and weights w(i,j):
!%# "! !
$ !" = $ !" " #
!$ !"
– Batch Gradient Descent:
• Iterate over weights w(i,j):
"%! "! !
$ "# = $ "# # $ !
! "$ "#
Gradient Checking
– When implementing gradient-descent algorithms like
backprop….
– May be useful to check correctness of your derivatives
numerically. "% "! ! Should be equal
= ! " #!
"$ "!
– Can implement finite difference numerical
differentiation to check them
#%# #! " %# # $ !" + ! " " %# # $ !" " ! "
=
#$ !" !!
– I.e., perturb the current weight and re-compute the
network’s error. Get gradient from this change in error.
Gradient Checking
– Aside: You could also implement gradient descent
by numerical differentiation, but slow!
• Each forward propagation costs O(W).
• Each weight must be perturbed individually at cost O(W).
– => Overall cost O(W2)
#%# #! " %# # $ !" + ! " " %# # $ !" " ! "
=
#$ !" !!

– I.e., perturb the current weight and re-compute the


network’s error. Get gradient from this change in
error.
Initialization
• For gradient methods need to pick initial
weight vectors w.
– Zero initialization:
• Every hidden unit gets the same input.
• Nothing different to backprop.
h10
• Network never learns anything. x0
h11 (x)
x1
h12 (x) h12(x)

x2
h13 (x)

Input Layer Hidden Layer Output Layer


Initialization => Symmetry Breaking
• For gradient methods need to pick initial
weight vectors w.
– Zero initialization:
• Every hidden unit gets the same input.
• Nothing different to backprop.
h10
• Network never learns anything. x0
• Solution: h11 (x)
x1
– Init randomly in [-ε,ε] h12 (x) h12(x)

x2
h13 (x)

Input Layer Hidden Layer Output Layer


Local Minima
• Neural Net costs are not Convex
• Solutions:
1. Accept local minima.
2. Online rather than batch gradient descent may help
jitter out of minima.
3. Repeatedly restart from different random initial
conditions, take the best performing network.
(Expensive)
4. …momentum, etc.
Overfitting in Neural Nets
• Neural Nets can have lots of parameters
(weights)
– 10s of millions in practice!
– Overfitting is a risk, especially if limited data.
• Solutions: Regularization:
– Use L2 regularizer/weight decay as we did with
linear/logistic regression (needs update to
gradients)
Overfitting in Neural Nets
• Neural Nets can have lots of parameters
(weights)
– 10s of millions in practice!
– Overfitting is a risk, especially if limited data.
• Solutions: Early Stopping:
– At each iteration of gradient descent, check the
network cost on validation set.
– Stop once validation error starts to increase
(although train error will still be decreasing)
Overfitting in Neural Nets

• Solutions: Early Stopping:


– At each iteration of gradient descent, check the
network cost on validation set.
– Stop once validation error starts to increase
(although train error will still be decreasing)
NN: Design Decisions
• Network Architecture
– # of Hidden Layers.
– # of Hidden Nodes.
• (# of input & output relatively easy)

• Cost function choice


• Activation function choice
• Learning rate
Training Overview (2)
• Choose architecture
• Initialize weights to small random numbers
• Implement forward propagation to get h(x) for any x.
• Implement cost function computation.
• Implement backpropagation to compute partial
derivatives.
• Iterate forward & backward propagation.
• Use gradient checking to debug.
• Disable gradient checking once debugged.
• Early Stop training once validation error increasing.
• (Repeatedly retrain to try and find different local
minima)
Overview
• Neural Nets Intro
• From Neurons to Neural Nets
• Multi-Layer Neural Nets for Non-Linear
Functions
• Neural Net Prediction Details
• Training Neural Nets
• More considerations
• Resurgence and Deep Neural Nets
NN Example - EECS Research J
Best Paper Prize: BMVC 2015!
• Our AI system plays superhuman level of
pictionary!
– Key components: Neural Net => Metric Learning =>
Nearest Neighbor.
NN Example - EECS Research J
Best Paper Prize: BMVC 2015!
• Specs:
– Eight layers (five convolutional).
– “Only” 8.5 million parameters. (CF 144 mil).
– Trained on 300M images with 32C Xenon CPU: 80
hours, or K40 GPU: 10 hours.
NN Example - EECS Research J
Reported in BBC, Popular Science, Business Insider, and many more….

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy