0% found this document useful (0 votes)
6 views44 pages

DSA5102X Lecture5

The document outlines Lecture 5 of the Foundations of Machine Learning course, focusing on neural networks, particularly convolutional and recurrent architectures. It discusses the limitations of fully connected neural networks, introduces convolutional layers for spatial data, and explains recurrent neural networks for temporal data. Additionally, it provides guidelines for homework submission and project requirements.

Uploaded by

Jake Huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views44 pages

DSA5102X Lecture5

The document outlines Lecture 5 of the Foundations of Machine Learning course, focusing on neural networks, particularly convolutional and recurrent architectures. It discusses the limitations of fully connected neural networks, introduces convolutional layers for spatial data, and explains recurrent neural networks for temporal data. Additionally, it provides guidelines for homework submission and project requirements.

Uploaded by

Jake Huang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Foundations of Machine Learning

DSA 5102X • Lecture 5

Soufiane Hayou
Department of Mathematics
Homework 1
General comments
• Before submission, reset kernel (Esc-0-0 + enter) and run
all cells in your notebook. Make sure the notebook runs
without errors
• Normalize inputs for Gaussian kernel

• Visualization plots
• Discover relationships
• Formulate hypothesis
• Select variables
Last time
We introduced
• Neural networks
• Universal/nonlinear approximation
• Shallow and deep networks
• Optimization of deep neural networks
• Gradient Descent
• SGD, interaction with learning rates and batch sizes
• Backpropagation algorithm
Today, we will look at some modern architectures that are very
useful for practical applications, and the key ideas behind them
What’s Wrong with FCNNs?
Recall The Fully Connected NN
Architecture
Permutations
A permutation of objects is one-to-one transformation of these
objects.
In other words, it is a bijection on

{ 1,2,3,4 ,5,6 ,7 ,8 ,9 }
𝑝 𝑝 (1 )=3 , 𝑝 ( 2 )=9 …
{ 3 ,9 ,1,5,4 ,6 ,8,7,2 }
Permutation Invariance
Observe that the FCNN hypothesis space has the following
invariance property:

Suppose , then for any permutation on the indices, define ,


then the function as well

In other words, we do not care about permuting the signal’s


components since if we can fit one permutation, we can fit any
permutation! Is this sensible?
Random
Permutation
Random
Permutation
Random
Permutation
Limitation of FCNN
The permutation invariance
property loses spatial/temporal
structure in the data!
Convolutional Neural Networks
Convolutions
Convolution is a basic concept in signal processing and harmonic
analysis.
Given two functions and and , we define their convolution as

Basic properties
• Commutativity:
• Linearity:
What do convolutions do?
Convolution with

Convolution with a smooth bump smooths out the signal.

Is this the only thing convolutions can do?


Discrete Convolutions
Real signals are rarely truly continuous
• Inherently discrete
• Discrete samples from continuous data

In fact, often discrete samples are enough to reconstruct


continuous signals, as long as the latter is band-limited! [Shannon,
1949].
Discrete Convolutions
Let be discrete signals

We define the discrete convolution as


Convolution vs Cross-Correlation
Discrete Convolution

Discrete Cross-Correlation (flip to )


Finite Convolutions and Boundary
Conditions
Practical signals are finite, so need to truncate

Assume , with

𝑚
( 𝑤∗𝑥 )( 𝑘) =∑ 𝑤 ( 𝑖 ) 𝑥 ( 𝑘+𝑖 )
𝑖=0
Finite Convolutions and Boundary
Conditions
Practical signals are finite, so need to truncate
• Circular convolutions
𝑤
1 2 1

𝑥
5 3 1 4 1 5 3

𝑤∗𝑥
12 9 10 11 14
Finite Convolutions and Boundary
Conditions
Practical signals are finite, so need to truncate
• Valid convolutions
𝑤
1 2 1

𝑥
3 1 4 1 5

𝑤∗𝑥
9 10 11
Finite Convolutions and Boundary
Conditions
Practical signals are finite, so need to truncate
• Zero-Padded convolutions with “same” padding
𝑤
1 2 1

𝑥
0 3 1 4 1 5 0

𝑤∗𝑥
7 9 10 11 10
Convolution in 2D
When dealing with image data, we often use convolutions in 2D

Now, and are matrices and also a matrix:

We call
• : convolution filter or kernel
• : input signal
Example: Convolutions in 2D

https://towardsdatascience.com/intuitively-understanding-convolutions-for-deep-learning-
1f6f42faee1
Image Input Data
Most basic
• with ( matrix)

Colored (RGB) data:


• with and
• is the number of image channels
Convolutional Layer
A convolutional layer is a basic building block of convolutional
neural networks. It is of the form

Compare with fully connected case


Why Convolutions?
Weight sharing:

and

Still matrix multiplication, but with shared weights!


Effective Feature Extractors
𝑤1 ∗ 𝑥
𝑤1

𝑤1 ∗ 𝑥

𝑤2
Equivariance
Let be some transformation on the space of input signals. We say
that a function is equivariant with if

Observe: convolutions are equivariant with translations!

What is the significance of this?


Invariance
On the other hand, we say a function is invariant with respect to
if

Examples:

Other examples?
Equivariance and Invariance
For some transformation , if is equivariant, and is invariant, then
is invariant!

Furthermore, if are equivariant, then

is also invariant.

Convolution layers can help build complex (approximately)


translation invariant models!
Shrinking/Focusing the Hypothesis
Space

𝓗

𝑓

𝓗′
{ 𝑓 : 𝑓 ( 𝑇 (𝑥 ))= 𝑓 ( 𝑥)}
Pooling Layers
Max pooling In 1D with stride :

𝑥
𝑥
3 1 4 1
3 1 4 1 5 9
5 9 2 6
5 3 5 8 9 6
3 4 9
9 7 9 3 9 9
A Typical CNN Architecture

Conv 1 Conv 2
Flatten
Filters Filters

FCNN

( )
0 .9
0.1

Max Pool Max Pool


Demo: CNNs for Image
Classification
Recurrent Neural Networks
Time Series Data
Another type of data with structure

The outputs may also be temporal

We need to model the relationship between and . Say and we


want to approximate/learn
The Recurrent Architecture
Recurrent Neural Networks (RNNs) models the relationship
between and as

• are the hidden states. Their purpose is to make the


system memoryless
• are the trainable parameters
The Recurrent Architecture

h 𝜏+1=𝑔 ( h 𝜏 , 𝑥𝜏 +1 , 𝜃 )

Minimize

𝑥𝜏 +1 h𝜏 𝑜𝜏 𝑦𝜏
𝑜 𝜏 =𝑢 ( h𝜏 , 𝜙 )

The parameters are trained so that the outputs are close to


How to Optimize RNNs?

𝑦1 𝑦2 𝑦 𝜏− 1 𝑦𝜏

𝑜1 𝑜2 𝑜𝜏 −1 𝑜𝜏
𝑢 (h0 , 𝜙 ) 𝑢 (h1 , 𝜙) 𝑢 ( h 𝜏 −1 , 𝜙 ) 𝑢 ( h 𝜏 , 𝜙 )

h0 h1 h1 … h 𝜏− 1 h𝜏
𝑔 (h 0 , 𝑥 1 , 𝜃) 𝑔 (h1 , 𝑥2 , 𝜃) 𝑔 (h𝜏 − 1 , 𝑥𝜏 , 𝜃)

𝑥1 𝑥2 𝑥𝜏 − 1 𝑥𝜏
Important Observations
• Parameters are shared in time, as opposed to space as in CNNs

• Temporal/causal structure is preserved

• The hidden states can be made large to give the system some
form of memory
• By changing the form of the hidden states and/or the
functions , we can obtain many variants, including Gated
Recurrent Units (GRU) and Long Short Term Memory
(LSTM)
Demo: RNNs for Text Generation
Summary
Convolution Neural Networks
• Suitable for data with spatial structure (e.g. images)
• Convolution and pooling builds equivariance, invariance

Recurrent Neural Networks


• Suitable for data with temporal structure (e.g. text)

They can build powerful models!


Project
• Use at least one of Fully connected NN, CNN or RNN
model in your project.

• Due to availability of computing resources, I am not


looking for high accuracy or good performance on large
network structures. Rather, I am looking at correct usage,
so try it on simple models if you like
Test
• Style: similar to homework 2

• Duration: 1 hour 30 mins

• Date and time:


• Main slot: 2:00pm to 3:30pm on 2 Oct 2021

• Practice: test questions/answers from last year


will be uploaded in the next few days.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy