0% found this document useful (0 votes)

11 views57 pages

Lecture_09_slides_-_after

The document provides an overview of neural networks, focusing on their application in classification and regression tasks. It covers key concepts such as activation functions, training methods including backpropagation and stochastic gradient descent, and the advantages of deep learning over traditional methods like logistic regression. Additionally, it highlights the importance of deep learning frameworks like PyTorch and TensorFlow for efficient neural network implementation and training.

Uploaded by

baptiste.ferrer10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views57 pages

Lecture_09_slides_-_after

Uploaded by

baptiste.ferrer10

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 57

1

Neural networks for

classification and
regression
Outline

▪ Why deep learning?

▪ Neural networks
• activation functions
▪ Training
• Back propogation
• Stochastic gradient descent

▪ Quiz Review and answer to some questions from exercises/problem sets

3
Introduction Linear regression Logistic regression

Feature engineering Data statistics Naive Bayes

KNN Clustering Dimensionality reduction

Neural networks Convolutional neural Decision-trees

networks
Background for Neural networks
Review of supervised learning
i d
feature vectors, independent variables x E IR i = 1
,
,
2
...,
N

m
i

Labels, dependent variables, target, outcome Y

i
e IR F
J
+ 11 ,
2
, . . .,
k)

Training data/set/example S & xi ,

yig for supervised leary
,

\xih . for unsupervised leaving

Sample, sample point ,
data point
↑

xi
ji)
&

S ,
X
Why deep learning?
Logistic regression review

Logistic regression for d-dimensional data:

Input: Activation
Input Weight Output
function
Weights:
Bias: x1
w1
1
Logistic: σ(z) = x2
1 + e −z w2

… wT x + b σ ŷ
wd
xd
Why deep learning?
Limitations of logistic regression

Issue: Logistic regression performs badly on non-

linearly separable data

Potential fix: Use feature engineering to make data

linearly separable, then use logistic regression
However:
▪ Features that linearly separate the data can be
hard to find manually, specially in high dimension
Why deep learning?
From logistic regression to neural networks
Neural networks have been successful in learning complex, non-linear functions
Why deep learning?
New way to approach ML

Before deep learning:

Data Features Model

Hand-design the features

Deep learning:

Data Features Model

Deep neural networks derive

useful features from the data!
Neural networks
Neural networks
Representation
weights weights

“2-layer Neural Net”, or “3-layer Neural Net”, or

“1-hidden-layer Neural Net” “2-hidden-layer Neural Net”

“Fully-connected” layers
Each neuron of a layer is connected
to all neurons of the following layer
Neural networks
Inside a neuron

g = Activation function
Connections to biological neurons

Source: towardsdatascience.com
Applications - nowadays everywhere!
Applications - examples in IGM
Neural networks
Representation
x1
[l]
ai Layer

Node in layer
x = x2 Shape (3, 1)
x3
Neural networks
Representation [1]
w1
x1
[l]
ai Layer

Node in layer
x = x2
x3
Weight vector for first node of first layer:
[1]
w1,1
[1] [1]
w1 = w1,2
1]
w1,3
Neural networks
Representation [1]
w1
x1
[l]
ai Layer

Node in layer
x = x2
x3
Weight vector for first node of first layer:
[1]
w1,1
[1] [1]
w1 = w1,2
1]
w1,3

[1] [1]T [1]

z1 = w1 x + b1
[1] [1]T [1]
z2 = w2 x + b2
[1] [1]T [1]
z3 = w3 x + b3
[1] [1]T [1]
z4 = w4 x + b4
Neural networks
Representation [1]
w1
x1
[l]
ai Layer

Node in layer
x = x2
x3
Weight vector for first node of first layer:
[1]
w1,1
[1] [1]
w1 = w1,2 Shape (3, 1)

1]
w1,3

[1] [1]T [1] [1] [1] [1]

z1 = w1 x + b1 a1 = g (z1 ) nonlinear transformaten
[1] [1]T [1] [1] [1] [1] of
z2 = w2 x + b2 a2 = g (z2 ) x

Apply activation
[1] [1]T [1] [1] [1] [1]
z3 = w3 x + b3 a3 = g (z3 )
[1]
z4 = [1]T [1]
w4 x + b4
g] [1]
a4 = [1] [1]
g (z4 )
Neural networks
Representation [1]
w1
x1
[l]
ai Layer

Node in layer
x = x2
x3

Vector notation:
[1]
b1
⋮ ⋮ ⋮ ⋮ [1]
[1] [1] [1] [1] [1] b2
W = w1 w2 w3 w4 b [1]
= [1]
b3
⋮ ⋮ ⋮ ⋮
[1]
b4
Shape (3, 4)

[1] [1]T [1] [1] [1] [1]

z =W x+b Apply activation a = g (z )
Activation functions
NN - Activation Function
Introduction
[2] [2]T [1] [1]T [1] [2]
ŷ = g (W g (W x+b )+b )

Q: What happens if we remove the activations?

NN - Activation Function
Introduction
[2] [2]T [1] [1]T [1] [2]
ŷ = g (W g (W x+b )+b )

Q: What happens if we remove the activations?

̂y = W[2]T (W[1]T x + b[1]) + b[2]

[2]T [1]T [2]T [1] [2]
ŷ = W W x+W b +b

T [2]T [1]T [2]T [1] [2]

Define W′ =W W Define b′ = W b +b
T
ŷ = W′ x + b′

A: We end up with a linear classifier!

NN - Activation functions
Introduction

To model a nonlinear problem:

▪ Pass the output of each neuron through a nonlinear function,
called activation function
▪ Connection to neuron firing in brain

Some well-known activation functions:

▪ Sigmoid
▪ Tanh
▪ ReLU
NN - Activation functions
Overview

Sigmoid (σ): Tanh:

▪ Squashes input in a [0, 1] range ▪ Squashes input in a [-1, 1] range
▪ Approximately nullifies gradient (for “large” positive or ▪ Like sigmoid, nullifies gradient (for “large” positive or
negative inputs) -> vanishing gradient problem negative inputs)
▪ rarely used except for final layer of binary classification ▪ Zero-centered, preferable over sigmoid as an activation
network ▪ Rarely used in practice (ReLU is more popular)
NN - Activation functions
Overview

Rectified Linear Unit (ReLU): Leaky ReLU:

▪ Easily computed, simple gradient ▪ Attempts to fix “dying ReLU” problem by having a
▪ Greatly accelerates convergence of gradient descent small negative slope for x < 0.

▪ Saturates in only one direction, suffers less from ▪ Leaky ReLU and other ReLU variants (ELU,
vanishing gradient problem SELU, GELU, Swish, etc…) are sometimes used
over ReLU
▪ commonly used in practice
NN - Activation functions
Derivatives
Sigmoid: Rectified Linear Unit (ReLU):

{0 if x ≤ 0
1 x if x > 0
σ(x) = ReLU(x) = = max(0, x)
1 + e −x
d
{0 if x < 0
σ(x) = σ(x)(1 − σ(x)) d 1 if x > 0
dx ReLU(x) =
dx

Tanh:
x −x Note: Derivative of ReLU is undefined for
e −e
tanh(x) = x x = 0. By convention, it is set to 0.
e + e −x
d 2
tanh(x) = 1 − tanh (x)
dx
Training neural nets
Neural networks
Training
Forward pass of 2 layer NN (for a single example):
[1] [1]T [1]
z =W x+b
[1] [1] [1]
a = g (z )
[2] [2]T [1] [2]
z =W a +b
[2] [2] [2]
ŷ = a = g (z )
[2] [2]T [1] [1]T [1] [2]
ŷ = g (W g (W x+b )+b )

[1] [2]
W W
Neural networks
Training
Forward pass of 2 layer NN (for a single sample):
[2] [2]T [1] [1]T [1] [2]
ŷ = g (W g (W x+b )+b )

To train, we need a loss function: L(y,̂ y)

Using that loss function, we want to update

[1] [1] [2] [2]
W ,b ,W ,b
using gradient descent.

[1] [2]
W W
Loss function
Loss function
Gradient descent

∂L ∂L
Need to compute: ,
∂W[i] ∂b[i]
=> Gradient of loss with respect
to weights

Once gradients are computed,

update weights with:
[i] ∂L
[i]
▪ W := W − α
∂W[i]
[i] [i] ∂L
▪ b := b − α [i]
∂b
where α is the learning rate
Neural networks
Forward / Backward pass

Forward pass: Compute the output of a neural network for a given input
Backward pass: Compute derivatives of the network parameters given the output

During training, you need both the forward pass and the backward pass.

During inference, you only need the forward pass.

Inference: the process of using a trained machine learning model for prediction
Computing gradients
Back propagation
Computing gradients
Back propagation
Neural networks
Forward pass
Neural networks
Forward pass

[1] [1]T [1]

z =W x+b
Neural networks
Forward pass

[1] [1]T [1]

z =W x+b
Neural networks
Forward pass

[1] [1]T [1]

z =W x+b
Neural networks
Forward pass

[1] [1]T [1]

z =W x+b
Neural networks
Forward pass

[1] [1]T [1]

z =W x+b
[1] [1] [1]
a = g (z )
Neural networks
Forward pass

[1] [1]T [1]

z =W x+b
[1] [1] [1]
a = g (z )
[2] [2]T [1] [2]
z =W a +b
Neural networks
Forward pass

[1] [1]T [1]

z =W x+b
[1] [1] [1]
a = g (z )
[2] [2]T [1] [2]
z =W a +b
Neural networks
Forward pass

[1] [1]T [1]

z =W x+b
[1] [1] [1]
a = g (z )
[2] [2]T [1] [2]
z =W a +b
[2] [2] [2]
a = g (z )
Neural networks
Forward pass

[1] [1]T [1]

z =W x+b
[1] [1] [1]
a = g (z )
[2] [2]T [1] [2]
z =W a +b
[2] [2] [2]
a = g (z )
[2]
ŷ = a
Neural networks
Forward pass

Forward pass of this 2-layer NN:

[1] [1]T [1]
z =W x+b
[1] [1] [1]
a = g (z )
[2] [2]T [1] [2]
z =W a +b
[2] [2] [2]
a = g (z )
[2]
ŷ = a

Rewriting it in one equation:

[2] [2]T [1] [1]T [1] [2]
ŷ = g (W g (W x+b )+b )
Stochastic gradient descent
Mini-batch stochastic gradient descent
Problems with training
Recap on training a neural network

Loop:
1. Sample a batch of data
2. Forward pass to get the loss
3. Backward pass to calculate gradient
4. Update parameters using the gradient

▪ Forward pass computes result of an operation and save any intermediates

needed for gradient computation in memory
▪ Backward pass applies the chain rule to compute the gradient of the loss
function with respect to the inputs
Deep learning frameworks
Overview

Deep learning frameworks are used to efficiently define and train neural networks
• Support for many types of layers, activations, loss functions, optimizers, …
• Backpropagation computed automatically (e.g. loss.backward() in PyTorch)
• GPU support for faster training

Most popular frameworks today:

• PyTorch (https://pytorch.org)
• TensorFlow (https://www.tensorflow.org/)
Deep learning frameworks
Implementing a simple neural network in PyTorch
Extensions of feedforward neural networks

▪ Convolutional neural networks (next lecture)

▪ Recurrent neural networks (relevance for control system)

Python exercises
▪ You will create a neural network for hand-written digit classification
▪ Training data is based on MNIST dataset: online dataset of 70,000 images containing hand-written digits

▪ Training neural networks is time (and energy) consuming

28 x 28 =
784
▪ We will use Google Colab as it has access to faster processing
• GPU (Graphic processing units)
• TPU (Tensor processing units)
↑
D
Brief review of last lecture
PCA

k-means
Questions from problem set
Data covariance matrix

Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
ML807_Distributed_and_Federated_Learning_Slides_2
No ratings yet
ML807_Distributed_and_Federated_Learning_Slides_2
211 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
The-Visual Mba
No ratings yet
The-Visual Mba
12 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
101 Psychological Tips
100% (3)
101 Psychological Tips
122 pages
2.game AI 1
No ratings yet
2.game AI 1
268 pages
22 NeuralNetworks
No ratings yet
22 NeuralNetworks
29 pages
Lecture 2 - Neural Network v1.0
No ratings yet
Lecture 2 - Neural Network v1.0
64 pages
2. Neural Network Training
No ratings yet
2. Neural Network Training
73 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Neural Networks
No ratings yet
Neural Networks
12 pages
Neural Networks - 2
No ratings yet
Neural Networks - 2
79 pages
Deep Learning Computer Vision
No ratings yet
Deep Learning Computer Vision
302 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
UNIT II DNN
No ratings yet
UNIT II DNN
24 pages
Lect 5
No ratings yet
Lect 5
89 pages
NN_Notes
No ratings yet
NN_Notes
39 pages
L10 Neural Network
No ratings yet
L10 Neural Network
52 pages
DS303_NN
No ratings yet
DS303_NN
20 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Chapter 2 - 2 Shallow neural network 2_2
No ratings yet
Chapter 2 - 2 Shallow neural network 2_2
34 pages
5_From Linear Models to Multi-layer Perceptrons
No ratings yet
5_From Linear Models to Multi-layer Perceptrons
45 pages
Slides 11
No ratings yet
Slides 11
48 pages
MachineLearningSlides PartOne
No ratings yet
MachineLearningSlides PartOne
252 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
Session Presentation - Noting Significant Details and Textual Evidences Through Close Reading
100% (9)
Session Presentation - Noting Significant Details and Textual Evidences Through Close Reading
74 pages
ML Lec 10 Neural Networks
No ratings yet
ML Lec 10 Neural Networks
87 pages
Unit -4 Artificial Neural Networks
No ratings yet
Unit -4 Artificial Neural Networks
33 pages
Essay Types
No ratings yet
Essay Types
242 pages
AI - W7L13
No ratings yet
AI - W7L13
46 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Sparseautoencoder 2011new
No ratings yet
Sparseautoencoder 2011new
19 pages
UNIT-I.pptx
No ratings yet
UNIT-I.pptx
90 pages
Unit 4
No ratings yet
Unit 4
19 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
NeuralNetworks
No ratings yet
NeuralNetworks
29 pages
Unit II
No ratings yet
Unit II
12 pages
AD3451 ML UNIT 4 NOTES
No ratings yet
AD3451 ML UNIT 4 NOTES
36 pages
Math Rally
No ratings yet
Math Rally
3 pages
Lecture2 Slides 1
No ratings yet
Lecture2 Slides 1
28 pages
English Syllabus Forms 1-2-1
No ratings yet
English Syllabus Forms 1-2-1
165 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
Deep Learning PDF
100% (1)
Deep Learning PDF
87 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
ML unit 4
No ratings yet
ML unit 4
23 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Course Outline - MACC7007 - M3 (2022-23)
No ratings yet
Course Outline - MACC7007 - M3 (2022-23)
7 pages
Design of Augmented Reality Based Card (Ar-Card) As A Biological Learning Media in Bacteria Material
No ratings yet
Design of Augmented Reality Based Card (Ar-Card) As A Biological Learning Media in Bacteria Material
13 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
SLII at a Glance Poster
No ratings yet
SLII at a Glance Poster
1 page
Unit 1
No ratings yet
Unit 1
16 pages
Deep Learning
100% (4)
Deep Learning
100 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
DS Semester 5 Prj 1, 3
No ratings yet
DS Semester 5 Prj 1, 3
16 pages
Kauswagan Es 122208 4th Quartertabo Sa Kinaadman Sy2024 2025
No ratings yet
Kauswagan Es 122208 4th Quartertabo Sa Kinaadman Sy2024 2025
3 pages
Longman Picture Dictionary - Games With Flashcards
No ratings yet
Longman Picture Dictionary - Games With Flashcards
4 pages
4 Teaching-Learning Methods in Medical Education
100% (3)
4 Teaching-Learning Methods in Medical Education
22 pages
Visual Arts Program Year 8 Aussie
100% (1)
Visual Arts Program Year 8 Aussie
6 pages
CLP Flexi Learn - MPU 3412 CAREER GUIDANCE 2
No ratings yet
CLP Flexi Learn - MPU 3412 CAREER GUIDANCE 2
5 pages
Semi Detailed Lesson Plan
81% (108)
Semi Detailed Lesson Plan
3 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
Week 5
No ratings yet
Week 5
2 pages
Abel Zenebe
No ratings yet
Abel Zenebe
3 pages
Artificial Classroom LightingEffects On Secondary Schools Students Health in Nairobi County of Kenya
No ratings yet
Artificial Classroom LightingEffects On Secondary Schools Students Health in Nairobi County of Kenya
6 pages
Child Development Languages
No ratings yet
Child Development Languages
3 pages
Learning English Through Stories Cambridge
No ratings yet
Learning English Through Stories Cambridge
3 pages
Teaching Prose: Group 6
No ratings yet
Teaching Prose: Group 6
11 pages
Aurora Pioneers Memorial Ollege: Teaching Guides Template (2019)
No ratings yet
Aurora Pioneers Memorial Ollege: Teaching Guides Template (2019)
3 pages
Ictlessonplanjmepdf
No ratings yet
Ictlessonplanjmepdf
3 pages
LAS For Summative Assessment Written Work Performance Task
100% (1)
LAS For Summative Assessment Written Work Performance Task
4 pages
Propaganda Art Lesson Plan
No ratings yet
Propaganda Art Lesson Plan
2 pages
Anecdotal Record Template 1
100% (1)
Anecdotal Record Template 1
5 pages
16-ELS-Final-Module 16-08082020
No ratings yet
16-ELS-Final-Module 16-08082020
18 pages
Questions For Primary Teachers
No ratings yet
Questions For Primary Teachers
16 pages
Editable Course 11 Progress Test
No ratings yet
Editable Course 11 Progress Test
2 pages
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
From Everand
Gauss Nodes Revolution: Numerical Integration Theory Radically Simplified And Generalised
Rob Porter
No ratings yet
Artificial Intelligence 2024 Book 2 of 2: AI, #2
From Everand
Artificial Intelligence 2024 Book 2 of 2: AI, #2
Yang Yen Thaw
No ratings yet
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
From Everand
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
Mohmmad Khaja Shareef
No ratings yet
Advanced C++ Interview Questions You'll Most Likely Be Asked
From Everand
Advanced C++ Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture_09_slides_-_after

Uploaded by

Lecture_09_slides_-_after

Uploaded by

1

Neural networks for

▪ Why deep learning?

▪ Quiz Review and answer to some questions from exercises/problem sets

Feature engineering Data statistics Naive Bayes

KNN Clustering Dimensionality reduction

Neural networks Convolutional neural Decision-trees

Labels, dependent variables, target, outcome Y

Training data/set/example S & xi ,

\xih . for unsupervised leaving

Logistic regression for d-dimensional data:

Issue: Logistic regression performs badly on non-

Potential fix: Use feature engineering to make data

Before deep learning:

Data Features Model

Hand-design the features

Data Features Model

Deep neural networks derive

“2-layer Neural Net”, or “3-layer Neural Net”, or

[1] [1]T [1]

[1] [1]T [1] [1] [1] [1]

[1] [1]T [1] [1] [1] [1]

Q: What happens if we remove the activations?

Q: What happens if we remove the activations?

̂y = W[2]T (W[1]T x + b[1]) + b[2]

T [2]T [1]T [2]T [1] [2]

A: We end up with a linear classifier!

To model a nonlinear problem:

Some well-known activation functions:

Sigmoid (σ): Tanh:

Rectified Linear Unit (ReLU): Leaky ReLU:

To train, we need a loss function: L(y,̂ y)

Using that loss function, we want to update

Once gradients are computed,

During inference, you only need the forward pass.

[1] [1]T [1]

[1] [1]T [1]

[1] [1]T [1]

[1] [1]T [1]

[1] [1]T [1]

[1] [1]T [1]

[1] [1]T [1]

[1] [1]T [1]

[1] [1]T [1]

Forward pass of this 2-layer NN:

Rewriting it in one equation:

▪ Forward pass computes result of an operation and save any intermediates

Most popular frameworks today:

▪ Convolutional neural networks (next lecture)

▪ Recurrent neural networks (relevance for control system)

▪ Training neural networks is time (and energy) consuming

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.