12.advanced DL Topics
12.advanced DL Topics
I2DL: Dai 1
Transfer Learning
Distribution Distribution
P1 P2
Large dataset Small dataset
Use what has been
learned for another
setting
I2DL: Prof. Dai 2
Transfer Learning
TRAIN
Trained on ImageNet New dataset with C
classes
FROZEN
Source : http://cs231n.stanford.edu/slides/2016/winter1516_lecture11.pdf
[Donahue et al., ICML’14] DeCAF,
[Razavian et al., CVPRW’14] CNN Features off-the-shelf
I2DL: Prof. Dai 3
Representation Learning
𝑨𝑡 = 𝜽𝑐 𝑨𝑡−1 + 𝜽𝑥 𝒙𝑡
Hidden state
Previous Input
hidden state
Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
Source: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
𝑄𝐾 𝑇
Attention 𝑄, 𝐾, 𝑉 = softmax 𝑉
𝑑𝑘
Nodes
Edges
Node embeddings
Edge embeddings
I2DL: Prof. Dai Battaglia et al. “Relational inductive biases, deep learning, and graph networks”. 2018 26
‘Node to edge’ updates
• At every message passing step , first do:
message
Order invariant Neighbors of message
operation (e.g. node i
sum, mean, max)
https://www.deepmind.com/blog/traffic-prediction-with-advanced-graph-neural-networks
Can we
do
better?
[Badrinarayanan et al., TPAMI‘16] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
I2DL: Prof. Dai 39
Generative Models
• Given training data, how to generate new samples
from the same distribution
Generated Images
Real Images
Figure copyright and adapted from Ian Goodfellow, Tutorial on Generative Adversarial Networks, 2017
I2DL: Prof. Dai 41
Variational
Autoencoders
𝑥 𝑥
Source: https://bit.ly/37ctFMS
𝑞𝜙 𝑧 𝑥 𝑝𝜃 𝑥 𝑧)
Encoder Decoder
𝑥 𝜙 𝜃 𝑥
𝑥 𝜙 𝜃 𝑥
𝜙 Sample 𝜃
𝑥 𝑥
Σ𝑧|𝑥 𝑧
𝜙 Sample 𝜃
𝑥 𝑥
Σ𝑧|𝑥 𝑧
Decoder
𝜇𝑧|𝑥
Sample 𝜃 𝑥
Σ𝑧|𝑥 𝑧
Degree of smile
Source: https://github.com/hindupuravinash/the-gan-zoo
Test time:
-> reconstruction from
‘random’ vector
Output Image
Latent space 𝑧
dim 𝑧 < dim(𝑥)
𝐺
𝑧
𝐺(𝑧)
𝐷(𝐺(𝑧))
𝐷(𝑥)
𝐷
𝐺
𝑧
𝐺(𝑧)
𝐷(𝐺(𝑧))
• Minimax Game:
– G minimizes probability that D is correct
– Equilibrium is saddle point of discriminator loss
• D provides supervision (i.e., gradients) for G
[Goodfellow et al., NIPS‘14] Generative Adversarial Networks
I2DL: Prof. Dai 64
GAN Applications
[Brock et al., ICLR‘18] BigGAN : Large Scale GAN Training for High Fidelity Natural Image Synthesis
I2DL: Prof. Dai 66
StyleGAN: Face Image Generation
[Karras et al., ‘18] StyleGAN : A Style-Based Generator Architecture for Generative Adversarial Networks
[Karras et al., ‘19] StyleGAN2 : Analyzing and Improving the Image Quality of StyleGAN
I2DL: Prof. Dai 67
Cycle GAN: Unpaired Image-to-Image Translation
[Zhu et al., ICCV‘17] Cycle GAN : Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
I2DL: Prof. Dai 68
SPADE: GAN-Based Image Editing
[Park et al., CVPR‘19] SPADE : Semantic Image Synthesis with Spatially-Adaptive Normalization
I2DL: Prof. Dai 69
Texturify: 3D Texture Generation
• What is diffusion?
• Efficiency?
𝑥𝑡 = 1 − 𝛽𝑡 𝑥𝑡−1 + 𝛽𝑡 𝜖𝑡−1
= 𝛼𝑡 𝑥𝑡−2 + 1 − 𝛼𝑡 𝜖𝑡−2
=⋯
= 𝛼𝑡 𝑥0 + 1 − 𝛼𝑡 𝜖0
𝑥𝑡 ~𝑞 𝑥𝑡 𝑥0 = 𝒩(𝑥𝑡 ; 𝛼𝑡 𝑥0 , 1 − 𝛼𝑡 𝐈)
I2DL: Prof. Dai 77
Reverse Diffusion
• 𝑥𝑇→∞ becomes a Gaussian distribution
I2DL: Prof. Dai [Ho et al. ’20] Denoising Diffusion Probabilistic Models 79
Training a Diffusion Model
• Optimize negative log-likelihood of training data
𝐿𝑉𝐿𝐵
= 𝔼𝑞 [𝐷𝐾𝐿 𝑞(𝑥𝑇 |𝑥0 ||𝑝𝜃 (𝑥𝑇 ) 𝐿𝑇
𝑇
Supervised
Unsupervised Reinforcement
Learning
Learning Learning
E.g., classification,
E.g., clustering,
regression
anomaly detection Sequential data
Labeled data
Unlabeled data Learning by
Find mapping from interaction with
Find structure in data the environment
input to label
Source: Deepmind.com
Agent Environment
Observation, ot
Reward, rt
I2DL: Prof. Dai 91
Characteristics of RL
• Sequential, non i.i.d. data (time matters)
𝑠𝑡 = 𝑓 ℎ𝑡
Observation, ot
Reward, rt
Model-Free RL Model-Based RL
I2DL: Prof. Dai [Mnih et al., NIPS’13] Playing Atari with Deep Reinforcement Learning 99
RL Milestones: AlphaZero (StarCraft)
• Model: Transformer network with a LSTM core
• Trained on 200 years of StarCraft play for 14 days
• 16 Google v3 TPUs
• December 2018:
Beats MaNa, a
professional StarCraft
player (world rank 13)
• Linear vs logistic
regression
• Loss functions
– Comparison & effects
• Parameter search
& interpretation
Source: http://ruder.io/optimizing-gradient-descent/,
https://srdas.github.io/DLBook/ImprovingModelGeneralization.html,
http://cs231n.github.io/neural-networks-3/
I2DL: Prof. Dai 104
Typology of Neural Networks
• CNNs • Autoencoder
𝑁+2⋅𝑃−𝐹
+1
𝑆
𝑁+2⋅𝑃−𝐹
× +1
𝑆
• RNNs • GANs
real or
fake pair?
ML for DL for
3D Vision
Geometry (Niessner)
(Dai)
• Syllabus
– Advanced architectures, e.g., Siamese neural networks
– Variational Autoencoders
– Generative models, e.g. GAN,
– Multi-dimensional CNN
– Graph neural networks
– Domain adaptation
– Prof. Dai
• https://phillipi.github.io/pix2pix/
• http://cs231n.stanford.edu/slides/2017/cs231n_2017_le
cture13.pdf