0% found this document useful (0 votes)

16 views159 pages

03 Autoencoders 4

This document provides an overview of autoencoders. It defines an autoencoder as a neural network that takes an input x and predicts x. It explains that autoencoders use a bottleneck architecture with an encoder that maps inputs to a latent space and a decoder that maps the latent space back to the input space. Autoencoders are used for tasks like dimensionality reduction, data compression, and unsupervised representation learning. The simplest autoencoder uses linear activations and can be solved with principal component analysis. More complex autoencoders use shallow neural networks for the encoder and decoder.

Uploaded by

suponjiayume

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views159 pages

03 Autoencoders 4

Uploaded by

suponjiayume

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 159

Autoencoders

Di He
Outline
• Basics

• Variational Autoencoder

• Denoising Autoencoder

• Vector Quantized VAE

2
What is autoencoder?

• An autoencoder is a feed-forward neural network whose job is to take an input x

and predict x.

3
What is autoencoder?

• An autoencoder is a feed-forward neural network whose job is to take an input x

and predict x.

• Trivial (short-cut) solutions exist：a neural network can learn identity mapping 𝑥 = 𝑓(𝑥)

𝑥 𝑥

4
What is autoencoder?

• An autoencoder is a feed-forward neural network whose job is to take an input x

and predict x.

• Trivial (short-cut) solutions exist：a neural network can learn identity mapping 𝑥 = 𝑓(𝑥)

• Bottleneck architecture

𝑥 𝑥

5
What is autoencoder?

• An autoencoder is a feed-forward neural network whose job is to take an input x

and predict x.

• Trivial (short-cut) solutions exist：a neural network can learn identity mapping 𝑥 = 𝑓(𝑥)

• Bottleneck architecture

𝑥 𝑥

encoder

6
What is autoencoder?

• An autoencoder is a feed-forward neural network whose job is to take an input x

and predict x.

• Trivial (short-cut) solutions exist：a neural network can learn identity mapping 𝑥 = 𝑓(𝑥)

• Bottleneck architecture

𝑥 𝑥

encoder decoder

7
Why autoencoder?

• Map high-dimensional data to two dimensions for visualization

𝑥 𝑥

encoder decoder

8
Why autoencoder?

• Map high-dimensional data to two dimensions for visualization

• Data compression (i.e. reducing communication cost)

𝑥 𝑥

encoder decoder

9
Why autoencoder?

• Map high-dimensional data to two dimensions for visualization

• Data compression (i.e. reducing communication cost)

• Unsupervised representation learning (i.e., pre-training)

𝑥 𝑥

encoder decoder

10
Why autoencoder?

• Map high-dimensional data to two dimensions for visualization

• Data compression (i.e. reducing communication cost)

• Unsupervised representation learning (i.e., pre-training)

• Generative modelling
𝑥 𝑥

encoder decoder

11
The simplest autoencoder

• The simplest kind of autoencoder has one hidden layer with linear activations.
ℎ = 𝑈𝑥, 𝑈 ∈ 𝑅𝑘×𝑑 , 𝑥 ∈ 𝑅𝑑×1

o𝑢𝑡𝑝𝑢𝑡 = 𝑉ℎ, 𝑉 ∈ 𝑅𝑑×𝑘 𝑥 U V

encoder decoder

12
The simplest autoencoder

• The simplest kind of autoencoder has one hidden layer with linear activations.
ℎ = 𝑈𝑥, 𝑈 ∈ 𝑅𝑘×𝑑 , 𝑥 ∈ 𝑅𝑑×1

o𝑢𝑡𝑝𝑢𝑡 = 𝑉ℎ, 𝑉 ∈ 𝑅𝑑×𝑘 𝑥 U V

o𝑢𝑡𝑝𝑢𝑡 = 𝑉𝑈𝑥 encoder decoder

13
The simplest autoencoder

• The simplest kind of autoencoder has one hidden layer with linear activations.
o𝑢𝑡𝑝𝑢𝑡 = 𝑉𝑈𝑥, 𝑈 ∈ 𝑅𝑘×𝑑 , 𝑉 ∈ 𝑅𝑑×𝑘
𝑥 U V

encoder decoder

14
The simplest autoencoder

• The simplest kind of autoencoder has one hidden layer with linear activations.
o𝑢𝑡𝑝𝑢𝑡 = 𝑉𝑈𝑥, 𝑈 ∈ 𝑅𝑘×𝑑 , 𝑉 ∈ 𝑅𝑑×𝑘
𝑥 U V
• Note
• This network is linear
encoder decoder
• We usually set 𝑘 ≪ 𝑑 (if 𝑘 = 𝑑, we can make 𝑉𝑈 = 𝐼,
which is meaningless)

• when 𝑘 ≪ 𝑑, we actually do dimension reduction for data 𝑥

15
The simplest autoencoder

• The simplest kind of autoencoder has one hidden layer with linear activations.
o𝑢𝑡𝑝𝑢𝑡 = 𝑉𝑈𝑥, 𝑈 ∈ 𝑅𝑘×𝑑 , 𝑉 ∈ 𝑅𝑑×𝑘
𝑥 U V
• How to determine 𝑉 and 𝑈

minimize 𝑉𝑈𝑥 − 𝑥 , 𝑝 𝑖𝑠 𝑢𝑠𝑢𝑎𝑙𝑙𝑦 𝑠𝑒𝑡 𝑡𝑜 2

𝑝 encoder decoder

16
The simplest autoencoder

minimize 𝑉𝑈𝑥 − 𝑥 , 𝑝 𝑖𝑠 𝑢𝑠𝑢𝑎𝑙𝑙𝑦 𝑠𝑒𝑡 𝑡𝑜 2

𝑝 encoder decoder

minimize 𝑉𝑈𝑋 − 𝑋 2

17
The simplest autoencoder

minimize 𝑉𝑈𝑋 − 𝑋 2
encoder decoder

• Many optimal solutions: if 𝑈 ∗ and 𝑉 ∗ is a solution, then 𝑈 ∗ × 2 and 𝑉 ∗ /2 is also a solution

• The problem is well known as Principled Component Analysis

• You don’t need to solve this problem by gradient descent. There’s a closed-form solution

18
More about autoencoder

• Autoencoder is generally 𝑓(𝑔(𝑥)) = 𝑥

• Function g is the encoder
• Function f is the decoder
• h=g(x) is called the code/the representation/ 𝑥 U V
latent variable of x

encoder decoder

19
More about autoencoder

• Autoencoder is generally 𝑓(𝑔(𝑥)) = 𝑥

• Function g is the encoder
• Function f is the decoder
• h=g(x) is called the code/the representation/ 𝑥 U V
latent variable of x

encoder decoder
• f and g shouldn’t be too complex or powerful
• To avoid learn copy (encoder) and paste (decoder）
• To avoid overfit

20
More about autoencoder

• Autoencoder is generally 𝑓(𝑔(𝑥)) = 𝑥

• Function g is the encoder
• Function f is the decoder
• h=g(x) is called the code/the representation/ 𝑥 U V
latent variable of x

encoder decoder
• f and g can be shallow neural networks
• All the parameters should be trained by gradient descent

21
More about autoencoder

• Autoencoder is generally 𝑓(𝑔(𝑥)) = 𝑥

• Function g is the encoder
• Function f is the decoder
• h=g(x) is called the code/the representation/ 𝑥 U V
latent variable of x

encoder decoder
• Autoencoders are data-specific and learned
• This is different from compression method like mp3 or jpeg
• An autoencoder learned on “cat images” may fail on “dog images”

22
More about autoencoder

• Autoencoder is generally 𝑓(𝑔(𝑥)) = 𝑥

• Function g is the encoder
• Function f is the decoder
• h=g(x) is called the code/the representation/ 𝑥 U V
latent variable of x

encoder decoder
• Autoencoders learn useful properties of data
• PCA learns principled components

23
Vanilla autoencoder is not a generative model

𝑥 encoder ℎ decoder 𝑥′

24
How to modify an autoencoder into a
generative model

𝑥 encoder ℎ decoder 𝑥′
• 𝑥 follows a distribution (the data distribution), but it is unknown

25
How to modify an autoencoder into a
generative model

𝑥 encoder ℎ decoder 𝑥′
• 𝑥 follows a distribution (the data distribution), but it is unknown

• If after training, ℎ follows a known distribution (e.g., standard Gaussian

distribution), it will be perfect!

26
How to modify an autoencoder into a
generative model

𝑥 encoder ℎ decoder 𝑥′

27
How to modify an autoencoder into a
generative model

decoder
𝑁(0, 𝐼) ℎ 𝑥′

When the encoder is replaced by random noise, the decoder becomes a generative model!

28
The remaining challenge

𝑥 encoder ℎ decoder 𝑥′

How to make 𝒉 to be a known distribution after training?

29
The first step

𝑥 encoder ℎ decoder 𝑥′

How to make 𝒉 to be a known distribution after training?

30
Stochastic latent representation

• Autoencoder U
• Encoder: ℎ = 𝑔(𝑥)
• Decoder: 𝑥′ = 𝑓(ℎ)

𝑥 encoder ℎ decoder 𝑥′
• f and g are deterministic function

31
Stochastic latent representation

• Autoencoder U • Variational Autoencoder

• Encoder: ℎ = 𝑔(𝑥) • Encoder: ℎ ∼ 𝑔(𝑥)
• Decoder: 𝑥′ = 𝑓(ℎ) • Decoder: 𝑥′ = 𝑓(ℎ)

𝑥 encoder ℎ decoder 𝑥′
• f and g are deterministic function • g is a stochastic function
• f is a deterministic function
• h is a random variable

32
Stochastic latent representation

U • Variational Autoencoder
• Encoder: ℎ ∼ 𝑔(𝑥)
• Decoder: 𝑥′ = 𝑓(ℎ)

• Pre-define a parametric distribution family

• Gaussian with mean and std • g is a stochastic function
• g outputs the distribution parameters • f is a deterministic function
• g outputs the mean and std • h is a random variable

33
Examples

28×28

34
Examples

28×28

Input: 784 dimension

35
Examples

… …

28×28

First hidden layer: 256 dimension

36
Examples

… … …

28×28

Second hidden layer:

• 50 dimensions to predict mean
• 50 dimensions to predict variance

37
Examples

code

… … … …

28×28
Generate from Gaussian distribution using
the learned mean and variance

38
Examples

… … … …

28×28

encoder

39
Examples (code, incomplete)

… … … …

28×28

encoder

40
Examples

… … … …

28×28

encoder

41
Examples

… … … … … …

28×28

encoder

42
Examples

… … … … … …

28×28

encoder decoder

43
Examples

… … … … … …

28×28

encoder decoder

44
Examples (code, incomplete)

… … … … … …

28×28

encoder decoder

45
Examples (code, incomplete)

… … … … … …

28×28

encoder decoder

46
Examples

… … … … … …

28×28

encoder decoder
• The encoder and decoder are not necessarily symmetric

47
Examples

… … … … … …

28×28

encoder decoder
• The encoder and decoder are not necessarily MLP

48
Examples

• The encoder and decoder are not necessarily MLP

https://www.mathworks.com/help/deeplearning/ug/train-a-variational-autoencoder-vae-to-generate-images.html

49
The second step

… … … … … …

28×28

encoder decoder
How to make 𝒉 to be a known distribution after training?

50
Training VAE

• Variational Autoencoder：forward process

• Input: 𝑥
• [u, 𝜎] = 𝑔(𝑥, 𝜃𝑒𝑛𝑐 )
• ℎ ∼ Normal(𝑢, 𝜎 2 )
• Output: 𝑥′ = 𝑓(ℎ, 𝜃𝑑𝑒𝑐 )

51
Training VAE

• Variational Autoencoder：forward process

• Input: 𝑥
• [u, 𝜎] = 𝑔(𝑥, 𝜃𝑒𝑛𝑐 )
• ℎ ∼ Normal(𝑢, 𝜎 2 )
• Output: 𝑥′ = 𝑓(ℎ, 𝜃𝑑𝑒𝑐 )

• Loss design: evaluating the difference between 𝑥 and 𝑥′

52
Training VAE

• Variational Autoencoder：forward process

• Input: 𝑥
• [u, 𝜎] = 𝑔(𝑥, 𝜃𝑒𝑛𝑐 )
• ℎ ∼ Normal(𝑢, 𝜎 2 )
• Output: 𝑥′ = 𝑓(ℎ, 𝜃𝑑𝑒𝑐 )

• Loss design: evaluating the difference between 𝑥 and 𝑥′

• Mean square error: 𝐿 = 𝑥 − 𝑥 ′ 2
• Other loss function can be applied

53
Training VAE

• Variational Autoencoder：forward process

• Input: 𝑥
• [u, 𝜎] = 𝑔(𝑥, 𝜃𝑒𝑛𝑐 )
• ℎ ∼ Normal(𝑢, 𝜎 2 )
• Output: 𝑥′ = 𝑓(ℎ, 𝜃𝑑𝑒𝑐 )

• Loss design: evaluating the difference between 𝑥 and 𝑥′

• Mean square error: 𝐿 = 𝑥 − 𝑥 ′ 2
• Other loss function can be applied

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

54
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑑𝑒𝑐 ?

55
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑑𝑒𝑐 ?

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2

56
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑑𝑒𝑐 ?

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
𝜕𝐿 𝜕𝐿 𝜕𝑓
=
𝜕𝜃𝑑𝑒𝑐 𝜕𝑓 𝜕𝜃𝑑𝑒𝑐

57
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑑𝑒𝑐 ?

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
𝜕𝐿 𝜕𝐿 𝜕𝑓
=
𝜕𝜃𝑑𝑒𝑐 𝜕𝑓 𝜕𝜃𝑑𝑒𝑐

• 𝐿 is differentiable with respect to 𝑓

58
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑑𝑒𝑐 ?

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
𝜕𝐿 𝜕𝐿 𝜕𝑓
=
𝜕𝜃𝑑𝑒𝑐 𝜕𝑓 𝜕𝜃𝑑𝑒𝑐

• 𝐿 is differentiable with respect to 𝑓

• 𝑓 is differentiable (almost everywhere) with respect to 𝜃𝑑𝑒𝑐

59
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑑𝑒𝑐 ?

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
𝜕𝐿 𝜕𝐿 𝜕𝑓
=
𝜕𝜃𝑑𝑒𝑐 𝜕𝑓 𝜕𝜃𝑑𝑒𝑐

• 𝐿 is differentiable with respect to 𝑓

• 𝑓 is differentiable (almost everywhere) with respect to 𝜃𝑑𝑒𝑐

60
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑒𝑛𝑐 ?

61
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑒𝑛𝑐 ?

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
, 𝑤ℎ𝑒𝑟𝑒 ℎ~Normal(𝑔 𝑥, 𝜃𝑒𝑛𝑐 )

62
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑒𝑛𝑐 ?

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
, 𝑤ℎ𝑒𝑟𝑒 ℎ~Normal(𝑔 𝑥, 𝜃𝑒𝑛𝑐 )
𝜕𝐿
• How to compute
𝜕𝜃𝑒𝑛𝑐

63
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑒𝑛𝑐 ?

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
, 𝑤ℎ𝑒𝑟𝑒 ℎ~Normal(𝑔 𝑥, 𝜃𝑒𝑛𝑐 )
𝜕𝐿
• How to compute
𝜕𝜃𝑒𝑛𝑐

• 𝐿 is differentiable with respect to ℎ

64
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑒𝑛𝑐 ?

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
, 𝑤ℎ𝑒𝑟𝑒 ℎ~Normal(𝑔 𝑥, 𝜃𝑒𝑛𝑐 )
𝜕𝐿
• How to compute
𝜕𝜃𝑒𝑛𝑐

• 𝐿 is differentiable with respect to ℎ

• 𝜃𝑒𝑛𝑐 is differentiable with respect to 𝑔

65
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑒𝑛𝑐 ?

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
, 𝑤ℎ𝑒𝑟𝑒 ℎ~Normal(𝑔 𝑥, 𝜃𝑒𝑛𝑐 )
𝜕𝐿
• How to compute
𝜕𝜃𝑒𝑛𝑐

• 𝐿 is differentiable with respect to ℎ

• 𝜃𝑒𝑛𝑐 is differentiable with respect to 𝑔
• 𝒈 is NOT differentiable with respect to 𝒉!

66
Training VAE

• Parameter (𝜃𝑒𝑛𝑐 and 𝜃𝑑𝑒𝑐 ) updates

• How to update parameter 𝜃𝑒𝑛𝑐 ?

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
, 𝑤ℎ𝑒𝑟𝑒 ℎ~Normal(𝑔 𝑥, 𝜃𝑒𝑛𝑐 )
𝜕𝐿
• How to compute
𝜕𝜃𝑒𝑛𝑐
◆ A sample from N(0,1) -> 0.32
• 𝐿 is differentiable with respect to ℎ ◆ A sample from N(1e-6,1) -> -0.17
• 𝜃𝑒𝑛𝑐 is differentiable with respect to 𝑔
• 𝒈 is NOT differentiable with respect to 𝒉! ◆ g=0->1e-6, h=0.32->-0.17, not
continuous

67
Key tech: Reparameterization trick(重参数化)

• Core technique: rearrange of rescaling and sampling

𝜇(𝜃𝑒𝑛𝑐 )
… … …
𝜎(𝜃𝑒𝑛𝑐 )
28×28
N(0,I)

68
Key tech: Reparameterization trick(重参数化)

• Core technique: rearrange of rescaling and sampling

𝜇(𝜃𝑒𝑛𝑐 )
… … …
𝜎(𝜃𝑒𝑛𝑐 ) N(𝜇(𝜃𝑒𝑛𝑐 ), 𝜎(𝜃𝑒𝑛𝑐 ))
rescale
28×28
N(0,I)

69
Key tech: Reparameterization trick(重参数化)

• Core technique: rearrange of rescaling and sampling

𝜇(𝜃𝑒𝑛𝑐 )
… … …
𝜎(𝜃𝑒𝑛𝑐 ) N(𝜇(𝜃𝑒𝑛𝑐 ), 𝜎(𝜃𝑒𝑛𝑐 )) ℎ
rescale sample
28×28
N(0,I)

70
Key tech: Reparameterization trick(重参数化)

• Core technique: rearrange of rescaling and sampling

𝜇(𝜃𝑒𝑛𝑐 )
… … …
𝜎(𝜃𝑒𝑛𝑐 ) N(𝜇(𝜃𝑒𝑛𝑐 ), 𝜎(𝜃𝑒𝑛𝑐 )) ℎ
rescale sample
28×28
N(0,I)

71
Key tech: Reparameterization trick(重参数化)

• Core technique: rearrange of rescaling and sampling

𝜇(𝜃𝑒𝑛𝑐 )
… … …
𝜎(𝜃𝑒𝑛𝑐 )
28×28
N(0,I) 𝜖
sample

72
Key tech: Reparameterization trick(重参数化)

• Core technique: rearrange of rescaling and sampling

𝜇(𝜃𝑒𝑛𝑐 )
… … … rescale
𝜎(𝜃𝑒𝑛𝑐 ) 𝜇 𝜃𝑒𝑛𝑐 + 𝜖𝜎(𝜃𝑒𝑛𝑐 )
28×28
N(0,I) 𝜖
sample

73
Key tech: Reparameterization trick(重参数化)

• Core technique: rearrange of rescaling and sampling

𝜇(𝜃𝑒𝑛𝑐 )
… … … rescale
𝜎(𝜃𝑒𝑛𝑐 ) 𝜇 𝜃𝑒𝑛𝑐 + 𝜖𝜎(𝜃𝑒𝑛𝑐 ) ~N(𝜇(𝜃𝑒𝑛𝑐 ), 𝜎(𝜃𝑒𝑛𝑐 ))
28×28
N(0,I) 𝜖
sample

74
Key tech: Reparameterization trick(重参数化)

• Core technique: rearrange of rescaling and sampling

𝜇(𝜃𝑒𝑛𝑐 )
… … … rescale
𝜎(𝜃𝑒𝑛𝑐 ) 𝜇 𝜃𝑒𝑛𝑐 + 𝜖𝜎(𝜃𝑒𝑛𝑐 ) ~N(𝜇(𝜃𝑒𝑛𝑐 ), 𝜎(𝜃𝑒𝑛𝑐 ))
28×28 reparameterization
N(0,I) 𝜖
sample

75
VAE：implementation

• Variational Autoencoder
• Input: 𝑥
• [u, 𝜎] = 𝑔(𝑥, 𝜃𝑒𝑛𝑐 )
• 𝜖 ∼ Normal(0, 𝐼)
• ℎ = 𝜇 + 𝜖𝜎
• Output: 𝑥′ = 𝑓(ℎ, 𝜃𝑑𝑒𝑐 )

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
= 𝑥 − 𝑓(𝑔𝑢 𝑥, 𝜃𝑒𝑛𝑐 + 𝝐𝑔𝜎 (𝑥, 𝜃𝑒𝑛𝑐 ), 𝜃𝑑𝑒𝑐 ) 2

• All parameters are trained by gradient descent end-to-end

76
VAE：implementation

• Variational Autoencoder
• Input: 𝑥
• [u, 𝜎] = 𝑔(𝑥, 𝜃𝑒𝑛𝑐 )
• 𝜖 ∼ Normal(0, 𝐼)
• ℎ = 𝜇 + 𝜖𝜎
• Output: 𝑥′ = 𝑓(ℎ, 𝜃𝑑𝑒𝑐 )

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
= 𝑥 − 𝑓(𝑔𝑢 𝑥, 𝜃𝑒𝑛𝑐 + 𝝐𝑔𝜎 (𝑥, 𝜃𝑒𝑛𝑐 ), 𝜃𝑑𝑒𝑐 ) 2

• All parameters are trained by gradient descent end-to-end

77
VAE：implementation

• Variational Autoencoder
• Input: 𝑥
• [u, 𝜎] = 𝑔(𝑥, 𝜃𝑒𝑛𝑐 )
• 𝜖 ∼ Normal(0, 𝐼)
• ℎ = 𝜇 + 𝜖𝜎
• Output: 𝑥′ = 𝑓(ℎ, 𝜃𝑑𝑒𝑐 )

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
= 𝑥 − 𝑓(𝑔𝑢 𝑥, 𝜃𝑒𝑛𝑐 + 𝝐𝑔𝜎 (𝑥, 𝜃𝑒𝑛𝑐 ), 𝜃𝑑𝑒𝑐 ) 2

• All parameters are trained by gradient descent end-to-end

78
VAE：implementation

• Variational Autoencoder
• Input: 𝑥
• [u, 𝜎] = 𝑔(𝑥, 𝜃𝑒𝑛𝑐 )
• 𝜖 ∼ Normal(0, 𝐼)
• ℎ = 𝜇 + 𝜖𝜎
• Output: 𝑥′ = 𝑓(ℎ, 𝜃𝑑𝑒𝑐 )

𝐿 = 𝑥 − 𝑥′ 2
= 𝑥 − 𝑓(ℎ, 𝜃𝑑𝑒𝑐 ) 2
= 𝑥 − 𝑓(𝑔𝑢 𝑥, 𝜃𝑒𝑛𝑐 + 𝝐𝑔𝜎 (𝑥, 𝜃𝑒𝑛𝑐 ), 𝜃𝑑𝑒𝑐 ) 2

• All parameters are trained by gradient descent end-to-end

79
The third step

… … … … … …

28×28

encoder decoder
How to make 𝒉 to be a known distribution after training?

80
Why we need known distribution?

Distribution 1
Distribution 2
…..
… … …
…..
…..
…..

encoder

81
Why we need known distribution?

… …

decoder

82
VAE：implementation

• Variational Autoencoder
• Input: 𝑥
• [u, 𝜎] = 𝑔(𝑥, 𝜃𝑒𝑛𝑐 )
• 𝜖 ∼ Normal(0, 𝐼)
• ℎ = 𝜇 + 𝜖𝜎
• Output: 𝑥′ = 𝑓(ℎ, 𝜃𝑑𝑒𝑐 )

• Reconstruction loss 𝐿 = 𝑥 − 𝑥 ′ 2

83
VAE：implementation

• Variational Autoencoder
• Input: 𝑥
• [u, 𝜎] = 𝑔(𝑥, 𝜃𝑒𝑛𝑐 )
• 𝜖 ∼ Normal(0, 𝐼)
• ℎ = 𝜇 + 𝜖𝜎
• Output: 𝑥′ = 𝑓(ℎ, 𝜃𝑑𝑒𝑐 )

• Reconstruction loss 𝐿 = 𝑥 − 𝑥 ′ 2
• Regularization loss 𝐿𝑟 = 𝐾𝐿(𝑁(𝜇 𝑥 , 𝜎(𝑥)||𝑁(0, 𝐼))

84
VAE：implementation

• Variational Autoencoder
• Input: 𝑥
• [u, 𝜎] = 𝑔(𝑥, 𝜃𝑒𝑛𝑐 )
• 𝜖 ∼ Normal(0, 𝐼)
• ℎ = 𝜇 + 𝜖𝜎
• Output: 𝑥′ = 𝑓(ℎ, 𝜃𝑑𝑒𝑐 )

• Reconstruction loss 𝐿 = 𝑥 − 𝑥 ′ 2
• Regularization loss 𝐿𝑟 = 𝐾𝐿(𝑁(𝜇 𝑥 , 𝜎(𝑥)||𝑁(0, 𝐼))
• Overall loss = 𝐿 + 𝜆𝐿𝑟

85
Summary

• Variational Autoencoder
• Bottleneck architecture
• Stochastic code
• Training
• Reparameterization trick
• Two training terms
• Inference
• Sample Gaussian noise
• Feed the noise into the decoder to generate images

86
Probabilistic view of VAE

87
Probabilistic view of VAE

• Most works assume data (𝑥) is sampled from a fixed but unknown distribution
• Some works care about “how the data is generated”

88
Probabilistic view of VAE

• Most works assume data (𝑥) is sampled from a fixed but unknown distribution
• Some works care about “how the data is generated”

• Data generation process

• Assume there is a latent code 𝑧 that guides the generation of 𝑥

89
Probabilistic view of VAE

• Most works assume data (𝑥) is sampled from a fixed but unknown distribution
• Some works care about “how the data is generated”

• Data generation process

• Assume there is a latent code 𝑧 that guides the generation of 𝑥
• Assume 𝑧 follows a distribution 𝑝(𝑧), called the prior distribution

90
Probabilistic view of VAE

• Most works assume data (𝑥) is sampled from a fixed but unknown distribution
• Some works care about “how the data is generated”

• Data generation process

• Assume there is a latent code 𝑧 that guides the generation of 𝑥
• Assume 𝑧 follows a distribution 𝑝(𝑧), called the prior distribution

◼ Sample 𝑧𝑖 from 𝑝(𝑧)

◼ Sample 𝑥𝑖 from 𝑝(𝑥|𝑧𝑖 )
◼ Obtain dataset 𝐷 = {𝑥𝑖 }

91
Probabilistic view of VAE

• Most works assume data (𝑥) is sampled from a fixed but unknown distribution
• Some works care about “how the data is generated”

• Data generation process

• Assume there is a latent code 𝑧 that guides the generation of 𝑥
• Assume 𝑧 follows a distribution 𝑝(𝑧), called the prior distribution

◼ Sample 𝑧𝑖 from 𝑝(𝑧)  We have dataset 𝐷 = {𝑥𝑖 }

◼ Sample 𝑥𝑖 from 𝑝(𝑥|𝑧𝑖 )  How to learn 𝑝(𝑥|𝑧)？
◼ Obtain dataset 𝐷 = {𝑥𝑖 }

92
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧

◼ Sample 𝑧𝑖 from 𝑝(𝑧)  We have dataset 𝐷 = {𝑥𝑖 }

◼ Sample 𝑥𝑖 from 𝑝(𝑥|𝑧𝑖 )  How to learn 𝑝(𝑥|𝑧)？
◼ Obtain dataset 𝐷 = {𝑥𝑖 }

93
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

◼ Sample 𝑧𝑖 from 𝑝(𝑧)  We have dataset 𝐷 = {𝑥𝑖 }

◼ Sample 𝑥𝑖 from 𝑝(𝑥|𝑧𝑖 )  How to learn 𝑝(𝑥|𝑧)？
◼ Obtain dataset 𝐷 = {𝑥𝑖 }

94
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖
Usually computationally intractable (involving high-
dimensional integral with non-linear functions

◼ Sample 𝑧𝑖 from 𝑝(𝑧)  We have dataset 𝐷 = {𝑥𝑖 }

◼ Sample 𝑥𝑖 from 𝑝(𝑥|𝑧𝑖 )  How to learn 𝑝(𝑥|𝑧)？
◼ Obtain dataset 𝐷 = {𝑥𝑖 }

95
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

• A known fact (assumption)

96
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

• A known fact (assumption)

• Given any 𝑧, only a small region of points in domain 𝑋 have non-zero 𝑝 𝑥 𝑧

97
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

• A known fact (assumption)

• Given any 𝑧, only a small region of points in domain 𝑋 have non-zero 𝑝 𝑥 𝑧

• The integral only have non-zero values in a small region of points

98
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

• A known fact (assumption)

• Given any 𝑧, only a small region of points in domain 𝑋 have non-zero 𝑝 𝑥 𝑧

• The integral only have non-zero values in a small region of points

• Assume we have another function/distribution 𝑞𝜃2 𝑧 𝑥 that can find the region

99
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

100
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

101
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

102
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

103
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

104
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

105
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

≥0

106
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

Evidence lower bound ≥0

107
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

108
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

109
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

𝑝 𝑧 𝑝(𝑥|𝑧)
= න 𝑞 𝑧 𝑥 log 𝑑𝑧
𝑧 𝑞(𝑧|𝑥)
𝑝 𝑧
= න 𝑞 𝑧 𝑥 log 𝑝(𝑥|𝑧) 𝑑𝑧 + න 𝑞 𝑧 𝑥 log 𝑑𝑧
𝑧 𝑧 𝑞(𝑧|𝑥)

110
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

111
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

𝐸𝑧~𝑞(𝑧|𝑥) [log 𝑝 𝑥 𝑧 ]

112
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

𝐸𝑧~𝑞(𝑧|𝑥) [log 𝑝 𝑥 𝑧 ]
Function 𝑞 is the encoder and function 𝑝 is the decoder,
this term is the reconstruction performance

113
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

𝐸𝑧~𝑞(𝑧|𝑥) [log 𝑝 𝑥 𝑧 ]

114
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

𝐸𝑧~𝑞(𝑧|𝑥) [log 𝑝 𝑥 𝑧 ] -KL divergence

115
Probabilistic view of VAE

• Maximum likelihood method

𝑃 𝑥 = න 𝑝𝜃1 𝑥 𝑧 𝑝 𝑧 𝑑𝑧 maximize ∑log 𝑃 𝑥𝑖

𝐸𝑧~𝑞(𝑧|𝑥) [log 𝑝 𝑥 𝑧 ] -KL divergence

The regularization term

116
Summary

117
VAE theory and application

118
Problems in VAE

• VAE usually cannot go deep (check David Wipf’s work)

• The dimension of the latent code is sensitive (check David Wipf’s work)

• VAE cannot do density estimation, i.e., accurately calculating 𝑃 𝑥

• VAE is usually used as a component of a system, but not a standalone model

119
Extension：Denoising autoencoder
VAE injects noise in the representation

… … … … … …

28×28

encoder decoder

121
DAE injects noise in the representation

… … … … …

28×28

122
Short-cut solutions exist?

… … … … …

28×28

• Vanilla autoencoder (clear input) • Denoising autoencoder (corrupted input)

• Identity mapping has zero loss • Identity mapping has non-zero loss
• need bottleneck architecture • Usually with very deep models

123
How to inject noise?

• Mask as noise

masked autoencoder are scalable vision learners

124
How to inject noise?

• Mask as noise

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

125
Summary

• Denoising Autoencoder
• Known as one of the most efficient pre-training (self-supervised) method

126
Summary

• Denoising Autoencoder
• Known as one of the most efficient pre-training (self-supervised) method

• Masking is the standard choice and you can explore more (additive noise)

127
Summary

• Denoising Autoencoder
• Known as one of the most efficient pre-training (self-supervised) method

• Masking is the standard choice and you can explore more (additive noise)

• Training with very deep model and huge amount of data

128
Summary

• Denoising Autoencoder
• Known as one of the most efficient pre-training (self-supervised) method

• Masking is the standard choice and you can explore more (additive noise)

• Training with very deep model and huge amount of data

• Can be used with ANY data type

129
Summary

• Denoising Autoencoder
• Known as one of the most efficient pre-training (self-supervised) method

• Masking is the standard choice and you can explore more (additive noise)

• Training with very deep model and huge amount of data

• Can be used with ANY data type

• Usually is not considered as generative model, although it can generate

the missing part of the data

130
Extension：Vector-quantized VAE
Background

• VQVAE is usually used in a system, but not a standalone model

• VQVAE is popularly used (important to certain extent) in the development of

large-scale vision-language models

132
Motivation

• Language is formed by (a sequence of)

word tokens
• A small fixed vocabulary
• Short length
• Discrete
• Processed as vectors
• One-hot vector
• Real vector

133
Motivation

• Language is formed by (a sequence of) • Image is formed by (a sequence of)

word tokens pixels
• A small fixed vocabulary • No vocabulary
• Short length • Long length
• Discrete • Viewed to be continuous
• Processed as vectors • Processed as RGB vectors
• One-hot vector
• Real vector

134
Motivation

• Language is formed by (a sequence of) • Image is formed by (a sequence of)

The data-type difference makes it hard to

jointly train a vision-language model

135
Goal of VQVAE

• Process an image into a sequence of “word tokens”

136
Goal of VQVAE

• Process an image into a sequence of “word tokens”

• How to construct a small fixed vocabulary for an image

• How to define the length of the sequence

• How to transform RBGs into tokens?

137
VQVAE model

… … …

28×28

encoder

138
VQVAE model

… … … 𝑧 𝑥 ∈ 𝑅𝑑
𝑒

28×28

encoder

139
VQVAE model

… … … 𝑧 𝑥 ∈ 𝑅𝑑
𝑒

28×28

…
encoder

K 𝑒1 , 𝑒2 , … , 𝑒𝐾 ∈ 𝑅𝑑
codebook

140
VQVAE model

… … … 𝑧𝑒 𝑥 𝑧𝑑 𝑥

28×28

…
encoder

K 𝑒1 , 𝑒2 , … , 𝑒𝐾 ∈ 𝑅𝑑
codebook

141
VQVAE model

• 𝑧𝑑 𝑥 is the nearest neighbor of 𝑧𝑒 𝑥 in

the codebook
𝑧𝑑 (𝑥) = 𝑒𝑞
… … … 𝑧𝑒 𝑥 𝑧𝑑 𝑥 𝑞 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑘 ||𝑒𝑘 − 𝑧𝑒 (𝑥)||

28×28

…
encoder

K 𝑒1 , 𝑒2 , … , 𝑒𝐾 ∈ 𝑅𝑑
codebook

142
VQVAE model

… … … 𝑧𝑒 𝑥 𝑧𝑑 𝑥 … …

28×28

…
encoder

K 𝑒1 , 𝑒2 , … , 𝑒𝐾 ∈ 𝑅𝑑
codebook

143
Optimization terms

… … … 𝑧𝑒 𝑥 𝑧𝑑 𝑥 … …

28×28

…
encoder

K 𝑒1 , 𝑒2 , … , 𝑒𝐾 ∈ 𝑅𝑑
codebook

144
Gradient computation of encoder

𝑧𝑑 (𝑥) = 𝑒𝑞
𝑞 = 𝑎𝑟𝑔𝑚𝑎𝑥𝑘 ||𝑒𝑘 − 𝑧𝑒 (𝑥)||

… … … 𝑧𝑒 𝑥 𝑧𝑑 𝑥 … …

28×28

…
encoder

K 𝑒1 , 𝑒2 , … , 𝑒𝐾 ∈ 𝑅𝑑
codebook

145
Key problem

• If we use | 𝑥 − 𝑔 𝑧𝑑 | as the loss, encoder parameters cannot be

updated

𝑧𝑒 𝑥 𝑧𝑑 𝑥

146
Key problem

• If we use | 𝑥 − 𝑔 𝑧𝑑 | as the loss, encoder parameters cannot be

updated

• If we use | 𝑥 − 𝑔 𝑧𝑒 | as the loss. The objective has changed and the

gradient is not correct

𝑧𝑒 𝑥 𝑧𝑑 𝑥

147
Key problem

• If we use | 𝑥 − 𝑔 𝑧𝑑 | as the loss, encoder parameters cannot be

updated

• If we use | 𝑥 − 𝑔 𝑧𝑒 | as the loss. The objective has changed and the

gradient is not correct

• Solution: set 𝑧𝑑 = 𝑧𝑒 + StopGrad(𝑧𝑑 − 𝑧𝑒 )

StopGrad(): gradients in the argument will be never

calculated 𝑧𝑒 𝑥 𝑧𝑑 𝑥

148
The straight-through estimator

• Loss term 𝑥 − 𝑔 𝑧𝑒 + StopGrad(𝑧𝑑 − 𝑧𝑒 )

𝑧𝑒 𝑥 𝑧𝑑 𝑥

149
The straight-through estimator

• Loss term 𝑥 − 𝑔 𝑧𝑒 + StopGrad(𝑧𝑑 − 𝑧𝑒 )

• Correct forward process

𝑧𝑒 𝑥 𝑧𝑑 𝑥

150
The straight-through estimator

• Loss term 𝑥 − 𝑔 𝑧𝑒 + StopGrad(𝑧𝑑 − 𝑧𝑒 )

• Correct forward process

• Correct backward process of decoder parameters

𝑧𝑒 𝑥 𝑧𝑑 𝑥

151
The straight-through estimator

• Loss term 𝑥 − 𝑔 𝑧𝑒 + StopGrad(𝑧𝑑 − 𝑧𝑒 )

• Correct forward process

• Correct backward process of decoder parameters

• Computable updates for encoder parameters

𝑧𝑒 𝑥 𝑧𝑑 𝑥

152
Update of the codebook

• We hope the codebook is meaningful

𝑧𝑒 𝑥

codebook
𝑧𝑒 𝑥 𝑧𝑑 𝑥

153
Update of the codebook

• We hope the codebook is meaningful

• Codebook loss

𝑧𝑒 − StopGrad(𝑧𝑑 ) + 𝛽 𝑧𝑑 − StopGrad(𝑧𝑒 )

𝑧𝑒 𝑥 𝑧𝑑 𝑥

154
Update of the codebook

• We hope the codebook is meaningful

• Codebook loss

𝑧𝑒 − StopGrad(𝑧𝑑 ) + 𝛽 𝑧𝑑 − StopGrad(𝑧𝑒 )

• VQVAE loss = reconstruction loss + codebook loss

𝑧𝑒 𝑥 𝑧𝑑 𝑥

155
Improving capacity

• Practically we set K=8192, does it mean there are only 8192 generation results?

156
Improving capacity

• Practically we set K=8192, does it mean there are only 8192 generation results?

Deep CNN
Patch-level 𝑧𝑒1 𝑥 𝑧𝑑1 𝑥
network 𝑧𝑒𝑙 𝑥 𝑧𝑑𝑙 𝑥
𝑧𝑒𝐿 𝑥 𝑧𝑑𝐿 𝑥

157
Summary
• Variational Autoencoder

• Denoising Autoencoder

• Vector Quantized VAE

158
Thanks dihe@pku.edu.cn

Gen AI Unit 2
No ratings yet
Gen AI Unit 2
65 pages
A Semi-Detailed Lesson Plan in MAPEH 8 (Music)
75% (4)
A Semi-Detailed Lesson Plan in MAPEH 8 (Music)
2 pages
Unit IV DL
No ratings yet
Unit IV DL
122 pages
Unit IV DL
No ratings yet
Unit IV DL
122 pages
Lecture_6373_07
No ratings yet
Lecture_6373_07
53 pages
Autoencoders
No ratings yet
Autoencoders
20 pages
DL Unit3 Autoencoder
No ratings yet
DL Unit3 Autoencoder
91 pages
dlunit4
No ratings yet
dlunit4
122 pages
DL M3 Tech
No ratings yet
DL M3 Tech
15 pages
ML Lec 19 Autoencoder
No ratings yet
ML Lec 19 Autoencoder
54 pages
GAPE_module_3 - Copy - Copy
No ratings yet
GAPE_module_3 - Copy - Copy
21 pages
Ch3-Auto-encoder
No ratings yet
Ch3-Auto-encoder
40 pages
Lecture 2.3.1 - Autoencoders
No ratings yet
Lecture 2.3.1 - Autoencoders
6 pages
Auto Encoder S
No ratings yet
Auto Encoder S
32 pages
Unit 5e - Autoencoders
No ratings yet
Unit 5e - Autoencoders
32 pages
L23_autoencoders
No ratings yet
L23_autoencoders
16 pages
Autoencoder
No ratings yet
Autoencoder
39 pages
Auto Encoder
No ratings yet
Auto Encoder
10 pages
D5_PPT
No ratings yet
D5_PPT
79 pages
Autoencoders
No ratings yet
Autoencoders
35 pages
Unit5 Autoencoders.doc
No ratings yet
Unit5 Autoencoders.doc
45 pages
DUnit IV
No ratings yet
DUnit IV
22 pages
Chapter 7 - Autoencoders
No ratings yet
Chapter 7 - Autoencoders
91 pages
Unit 3
No ratings yet
Unit 3
39 pages
465-Lecture 12
No ratings yet
465-Lecture 12
31 pages
week6 (1)
No ratings yet
week6 (1)
4 pages
1 Autoencoders
No ratings yet
1 Autoencoders
22 pages
Unit 4
No ratings yet
Unit 4
10 pages
Experiment 4
No ratings yet
Experiment 4
26 pages
Deep Learning: Autoencoder
No ratings yet
Deep Learning: Autoencoder
42 pages
UNIT 3
No ratings yet
UNIT 3
23 pages
Auto Encoders
No ratings yet
Auto Encoders
4 pages
35-Gated RNNs - Optimization For Long-Term Dependencies - Explicit Memory-07!10!2024
No ratings yet
35-Gated RNNs - Optimization For Long-Term Dependencies - Explicit Memory-07!10!2024
3 pages
MODULE 5 Auto-Encoders and Generative Models
No ratings yet
MODULE 5 Auto-Encoders and Generative Models
25 pages
Study Materials - Denoising Autoencoders
No ratings yet
Study Materials - Denoising Autoencoders
7 pages
Mod 3 Advanced AI
No ratings yet
Mod 3 Advanced AI
37 pages
Unit II
No ratings yet
Unit II
35 pages
Unit-5 Auto Encoders in Deep Learning
No ratings yet
Unit-5 Auto Encoders in Deep Learning
23 pages
DL Unit 5
No ratings yet
DL Unit 5
19 pages
Auto Encoder s
No ratings yet
Auto Encoder s
16 pages
Autoencoder - Unit 4
No ratings yet
Autoencoder - Unit 4
39 pages
Lecture 14 Autoencoders
No ratings yet
Lecture 14 Autoencoders
39 pages
DeepLearning Unit IV Notes
No ratings yet
DeepLearning Unit IV Notes
58 pages
Auto Encoder
No ratings yet
Auto Encoder
39 pages
Autoencoders U
No ratings yet
Autoencoders U
44 pages
More Than A Gut Feeling
No ratings yet
More Than A Gut Feeling
52 pages
6. Brief Introduction on Current Research Areas - Autoencoders
No ratings yet
6. Brief Introduction on Current Research Areas - Autoencoders
20 pages
AAI Module 3
No ratings yet
AAI Module 3
11 pages
7& 9 Autoencoder and Variational Autoencoder
No ratings yet
7& 9 Autoencoder and Variational Autoencoder
13 pages
Lecture 23b Auto Encoder
No ratings yet
Lecture 23b Auto Encoder
27 pages
V2 - SITHCCC003 - Prepare and Present Sandwiches - Stude - 221205 - 204548
No ratings yet
V2 - SITHCCC003 - Prepare and Present Sandwiches - Stude - 221205 - 204548
61 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
Unit 5
No ratings yet
Unit 5
36 pages
AD3501-DL-UNIT 5 NOTES
No ratings yet
AD3501-DL-UNIT 5 NOTES
16 pages
Previewpdf
No ratings yet
Previewpdf
72 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
Direct Instruction Revisited: A Key Model For Instructional Technology
No ratings yet
Direct Instruction Revisited: A Key Model For Instructional Technology
15 pages
Autoencoders in Machine Learning
No ratings yet
Autoencoders in Machine Learning
7 pages
BSAB-Principles of Soil Science
No ratings yet
BSAB-Principles of Soil Science
11 pages
Generative_Models
No ratings yet
Generative_Models
65 pages
Variational Autoencoders
No ratings yet
Variational Autoencoders
14 pages
8601 Unit 2 1
No ratings yet
8601 Unit 2 1
33 pages
Your Image of The Child PDF
No ratings yet
Your Image of The Child PDF
5 pages
Action Plan (Learning Action Cell - Lac) SY 2017-2018
100% (2)
Action Plan (Learning Action Cell - Lac) SY 2017-2018
4 pages
Autoencoders
No ratings yet
Autoencoders
4 pages
Design of Augmented Reality Based Card (Ar-Card) As A Biological Learning Media in Bacteria Material
No ratings yet
Design of Augmented Reality Based Card (Ar-Card) As A Biological Learning Media in Bacteria Material
13 pages
CPDD PTR 02 Instructional Design Updated Form As of 2018
100% (2)
CPDD PTR 02 Instructional Design Updated Form As of 2018
2 pages
Long Conversations Toefl
No ratings yet
Long Conversations Toefl
5 pages
English 10 Semester Example
100% (1)
English 10 Semester Example
3 pages
Speaking-for-IELTS-new For Everyone
No ratings yet
Speaking-for-IELTS-new For Everyone
32 pages
AIF-C01 (87 Questions)
No ratings yet
AIF-C01 (87 Questions)
79 pages
An Unsupervised Machine Learning Algorithms_Comprehensive Review
No ratings yet
An Unsupervised Machine Learning Algorithms_Comprehensive Review
12 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
French 1 Unit 5 Les Loisirs
No ratings yet
French 1 Unit 5 Les Loisirs
2 pages
Guidelines For Project and Programme Evaluation (Austrian Development Cooperation)
No ratings yet
Guidelines For Project and Programme Evaluation (Austrian Development Cooperation)
8 pages
1 - 8 Yunita Lema
No ratings yet
1 - 8 Yunita Lema
8 pages
Senior4 SpeakingTest May 2021
No ratings yet
Senior4 SpeakingTest May 2021
8 pages
Untitled
No ratings yet
Untitled
11 pages
Child Development Languages
No ratings yet
Child Development Languages
3 pages
Triple e Framework Lesson G
No ratings yet
Triple e Framework Lesson G
8 pages
Teaching Practice Task 4: From Theory To Practice
No ratings yet
Teaching Practice Task 4: From Theory To Practice
2 pages
MCQ - Questions - Python - ML - DL - Azure 1
No ratings yet
MCQ - Questions - Python - ML - DL - Azure 1
2 pages
LAS For Summative Assessment Written Work Performance Task
100% (1)
LAS For Summative Assessment Written Work Performance Task
4 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Grade-3-COT-Math-Q4
No ratings yet
Grade-3-COT-Math-Q4
4 pages
CLC 12 - Capstone Draft Proposal Worksheet
No ratings yet
CLC 12 - Capstone Draft Proposal Worksheet
3 pages
G8 T3 Overview FINAL
No ratings yet
G8 T3 Overview FINAL
1 page
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
Lesson Plan #9 (Injury Prevention in Dual Sports)
100% (1)
Lesson Plan #9 (Injury Prevention in Dual Sports)
2 pages
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
From Everand
Hidden Line Removal: Unveiling the Invisible: Secrets of Computer Vision
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.