TensorFlow NN
TensorFlow NN
Over view
Neural networks are representation based machine
learning algorithms
Mammals Fish
Members of the infraorder Cetacea Look like fish, swim like fish, move with
fish
Whales: Fish or Mammals?
ML-based Classifier
ML-based Classifier
Training Prediction
Feed in a large corpus of data classified Use it to classify new instances which it
correctly has not seen before
Training the ML-based Classifier
Classification
ML-based Classifier
Corpus
Feedback - loss
Improves model parameters function or cost
function
ML-based Binary Classifier
Corpus
“Traditional” ML-based Binary Classifier
Corpus
ML-based Binary Classifier
Corpus
ML-based Binary Classifier
Corpus
ML-based Binary Classifier
Output: Label
Corpus
ML-based Binary Classifier
Corpus
ML-based Binary Classifier
Corpus
ML-based Binary Classifier
Corpus
ML-based Binary Classifier
Corpus
Understanding Deep Learning
“Traditional” ML-based Binary Classifier
Feature Selection by
Corpus Classification ML-based Classifier
Experts
Algorithm
“Traditional” ML-based Binary Classifier
Corpus
“Traditional” ML-based systems still rely on experts to
decide what features to pay attention to
“Traditional” ML-based Binary Classifier
Feature Selection by
Corpus Classification ML-based Classifier
Experts
Algorithm
“Representation” ML-based Binary Classifier
Feature Selection
Corpus Classification ML-based Classifier
Algorithm
Algorithm
“Representation” ML-based Binary Classifier
Feature Selection
Corpus Classification ML-based Classifier
Algorithm
Algorithm
“Representation” ML-based systems figure out by
themselves what features to pay attention to
“Representation” ML-based Binary Classifier
Corpus
“Deep Learning” systems are one type of
representation systems
Deep Learning and Neural Net works
Algorithms that learn what The most common class of Simple building blocks that
features matter deep learning algorithms actually “learn”
“Deep Learning”-based Binary Classifier
Object Parts
Corners
Edges
Pixels
Object Parts
Corners
Edges
Pixels
Object Parts
Corners
Edges
Pixels
Layer 2
Layer 1
Layer N
…
Corpus of Layers in a neural network ML-based Classifier
Images
Neural Net works Introduced
Corpus of The nodes in the computation graph are neurons ML-based Classifier
Images (simple building blocks)
The Computational Graph
Corpus of The edges in the computation graph are data items ML-based Classifier
Images called tensors
Neuron as a Learning Unit
A Neural Net work
X2
Mathematical Y
function
Y
Xi
…
…
Xn Y
For an active neuron a change in inputs should trigger a
corresponding change in the outputs
Operation of a Single Neuron
Mathematical
function
Mathematical
Mathematical
function
function
The outputs of neurons feed into the neurons from the next
layer
Operation of a Single Neuron
Mathematical
W function
Mathematical
function
Mathematical
W function
Mathematical
function
W1
W2
Mathematical
function
W3
Once a neural network is trained all edges have weights which help it
make predictions
Operation of a Single Neuron
W1
X1
X2
W2 Affine Activation
Wi
Transformation Wx + b Function max(Wx+b,
…
Xi
0)
Wn
…
Xn b
Xi
0)
Wn
…
Xn b
Xi
0)
Wn
…
Xn b
Xi
0)
Wn
…
Xn b
Xi
0)
Wn
…
Xn b
Xi
0)
Wn
…
Xn b
Xi
0)
Wn
…
Xn b
Xi
0)
Wn
…
Xn b
Set of Points
Single Neuron Regression Line
Regression: The Simplest Neural Net work
W1
X1
X2
W2 Affine Activation
Wi Transformation Wx + b Function Wx+b
…
Xi
…
Wn
Xn b
Regression: The Simplest Neural Net work
W1
X1
X2
W2 Affine Identity
Wi Transformation Wx + b Function Wx+b
…
Xi
…
Wn
Xn b
Xi
…
Wn
Xn b
W1
X1
X2 W2 Affine Identity
Wi Transformation Wx + b Function
…
Xi
Wn
…
Xn
b
Logistic Regression with One Neuron
W1
X1
X2 W2 Softmax P(Y = True)
Affine Transformation
Wi Wx + b Function
…
Xi P(Y = False)
Wn
…
Xn
b
Logistic Regression with One Neuron
W1
X1 W2
X2 Affine W2 Softmax P(Y = True)
Wi
Transformation W1x + b1 Function
…
Xi P(Y = False)
Wn
…
Xn b1 b2
Logistic Regression with One Neuron
W1
X1
X2
W2 Affine W2 Softmax P(Y = True)
Wi Transformation Wx + b Function
…
P(Y = False)
Xi
Wn
…
Xn b b2
Logistic Regression with One Neuron
W1 1
X1
W2 Affine 1+e -(W2x’+b2)
X2
Wi Transformation W1x + b1
…
Softmax
Xi p(Y = True)
x’ Function
Wn W2,b2
…
1
Xn b1 W2x’ + b2
1+e
1 - p(Y = True)
Logistic Regression with One Neuron
W1
X1
X2
W2 Affine W2 Softmax P(Y = True)
Wi Transformation x’ Function
…
P(Y = False)
Xi
Wn
…
Xn b b2
Logistic Regression with One Neuron
W1 1
X1
X2
W2 1+e -(W2x’+b2)
Affine Transformation
Wi W1x + b1 Softmax
…
W2,b2
1
Xn
b1 W2x’ + b2
1+e
1 - p(Y = True)
Logistic Regression with One Neuron
W1
X1
X2
W2 Affine P(Y = True)
Wi Transformation W1x + b1
Softmax
…
Xi Function
Wn
…
P(Y = False)
Xn
b1
Logistic Regression
p(y)
(xn, yn)
Regression Curve
1
p(y) =
1+e -(A+Bx)
(x1, y1)
(x2, y2)
X
SoftMax for True/False Classification
1
1+e -(Wx+B)
p(Y = True)
Softmax
x
Function 1
Wx + B
1+e
p(Y = False)
Linear Regression with One Neuron
1-dimensional feature
vector
Shape (W) = [1,1] Regression Line
1-dimensional feature
Shape (W) = [1,2] S-Curve
vector
Shape (b) = [2]
SoftMax N-category Classification
P(Y = Y1)
P(Y = Y2)
Softmax
Function …
P(Y = YN)
Multilabel Digit Classification
One-versus-all: Train 10 binary classifiers
- 0-detector, 1-detector…
max(Wx+b,
0) The most common form of the activation function is
the ReLU
ReLU(x) = max(0,x)
Tanh Activation
Xi
0)
Wn
…
Xn b
…
Example: Training for Linear Regression
Set of
Single neuron with no Regression Line
Points
activation function
Example: Training for Linear Regression
W1
X1
X2
W2 Affine Activation
Wi
Transformation Wx + b Function max(Wx+b,
…
Xi
0)
Wn
…
Xn b
y = Wx + b
The “Best” Regression Line
Y
b
Minimizing MSE
MSE
b
Minimizing MSE
MSE
As small as possible!
W
Initial value of MSE
Best value of W
Best value of b
Smallest value of MSE
Start Somewhere
MSE
Initial values - have to start
somewhere
Initial value of W
Initial value of b
Initial value of MSE
“Gradient Descent”
MSE
Converging on the “best” value
using an optimization algorithm
W
Initial value of MSE
W
Initial value of MSE
ML-based Classifier
Object Parts
Corners
Edges
Pixels
Error
Optimiser
Training via Back Propagation
ML-based Classifier
Object Parts
Corners
Edges
Pixels
b
Error
Optimiser
Training via Back Propagation
ML-based Classifier
Object Parts
Corners
Edges
Pixels
b
Error
Optimiser
Training via Back Propagation
ML-based Classifier
Object Parts
Corners
Edges
Pixels
b
Error
Optimiser
Training via Back Propagation
ML-based Classifier
Object Parts
Corners
Edges
Pixels
Error
Optimiser
Training via Back Propagation
ML-based Classifier
Object Parts
Corners
Edges
Pixels
Error
Optimiser
Back propagation allows the weights and biases
of the neurons to converge to their final values
Hyperparameters
Decisions in Traditional ML Models
Measure using validation datasets to find the best Hyperparameter tuning to generate the model which
possible model is then evaluated using validation datasets
Vanishing, Exploding Gradients, Dying
Neurons
Training via Back Propagation
ML-based Classifier
Object Parts
Corners
Edges
Pixels
Error
Optimiser
Back Propagation
Fails either if
W
Initial value of MSE
W
Initial value of MSE
Normal distribution:
Proper - mean 0
initialisation - standard deviation based on num_inputs and
num_outputs for that layer
Xavier and He Initialization
Uniform distribution:
…
Unresponsive Neurons
Saturation
Saturation
Saturation
Saturating Activation Functions
Saturation Saturation
Saturates for very large and very small Saturates for very small (negative values) of
values of input inputs
ELU Activation
A curve has a “good fit” if the distances of points from the curve
are small
Connecting the Dots
Y
But given a new set of points, this curve might perform quite
poorly
Connecting the Dots
Y
Test data
Training data
The original points were “training data”, the new points are “test
data”
Overfitting
Y
Test data
Training data
Model does very well in training… …but poorly with real data
Cause of Overfitting
Few assumptions about the underlying data More assumptions about the underlying
data
Bias
Training data all-important, model parameter Model parameter all-important, training data
counts for little counts for little
Variance
The model changes significantly when training The model doesn’t change much when the
data changes training data changes
Variance
Model varies too much with changing training Model not very sensitive to training data
data
Bias-Variance Trade-off
- Regression
- Decision trees
- Regularisation
- Cross-validation
- Ensemble learning
- Dropout
Preventing Overfitting
“Hyperparameter tuning”
Dropout
Specify a fraction of neurons that will stay off in
each training step
Corpus of During actual usage in test mode, full dense ML-based Classifier
Images neural network is used