0% found this document useful (0 votes)

124 views29 pages

7-Knowledge Distillation

mmmmm kmmm

Uploaded by

MSR MSR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views29 pages

7-Knowledge Distillation

mmmmm kmmm

Uploaded by

MSR MSR

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

admin

● Second guest lecture scheduled: Jonathan Frankle

(MIT/MosaicML/Databricks)
● Start A2 early
○ Get your TinyML kits if you haven’t already
■ At the end of class or during office hours

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 1

ECE 5545: Machine Learning Hardware and Systems

7: Knowledge Distillation
Recap

2. Neural Network
● Quantization
● Pruning
● Knowledge Distillation
● AutoML

Methods to reduce the number of computations

and/or memory footprint of DNNs

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 3

What is Knowledge Distillation?

Data
Data

Teacher
Data

Knowledge
Student

Student

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 4

What is Knowledge Distillation?
● Distill “knowledge” from a large
neural network to a small one. Training
Data Training
○ E.g. ResNet101 → MobileNet Data

● Larger DNNs are easier to train

Teacher
Teacher
Large Neural Network
● Small DNNs are easier to deploy

● Knowledge? Knowledge
e.g.: softmax class
○ Classification: Softmax class probabilities probabilities

Student
● Proposed by Caruna et al. (2006)
Small Neural Network
Student
● Generalized by Hinton et al. (2015)
© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 5
Learning Outcomes

Understand what knowledge distillation is and why it is needed.

Write code to perform knowledge distillation

Understand advanced knowledge distillation techniques and open problems

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 6

Training a neural network
Logits (z) Probabilities (q)
Image 5x1 5x1
Neural Network
224 x 224 x 3 0.3 0.11
0.9 0.21
Input Output
Hidden Layers 1.9 0.54
Layer Layer Softmax
(conv + pool layers) 0.1 0.09
(conv) (FC)
0.2 0.10
Loss
Function
(Hard) Targets
● Key observation: Hard targets have no information 0
car
about wrong classes
dog 0
● Soft targets have information about wrong classes cat 1

○ Where do we get them from? bus 0

boat 0

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 7

Training a neural network
Logits (z) Probabilities (q)
Image 5x1 5x1
Neural Network
224 x 224 x 3 0.3 0.11
0.9 0.21
Input Output
Hidden Layers 1.9 0.54
Layer Layer Softmax
(conv + pool layers) 0.1 0.09
(conv) (FC)
0.2 0.10
Loss
Function
(Soft) Targets (Hard) Targets
● Key observation: Hard targets have no information 0
0.03 car
about wrong classes
0.25 dog 0
● Soft targets have information about wrong classes 0.70 cat 1

○ Where do we get them from? 0.02 bus 0

0.01 boat 0

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 8

QUESTION

Where do we get soft targets?

1. From labeled data by clustering similar classes

2. Using an equation
3. From a trained neural network

?
4. Using expert annotation
QUESTION

Where do we get soft targets?

1. From labeled data by clustering similar classes

2. Using an equation
“Teacher”
3. From a trained neural network network in

?
4. Using expert annotation distillation
Knowledge Distillation
Logits (z) Probabilities (q)
5x1 5x1
Neural Network
0.3 0.11
0.9 0.21

Student
Image Bac
Small Neural Network 1.9 0.54 kpro
224 x 224 x 3 Softmax p
0.1 0.09
0.2 0.10
Loss
Function
(Soft) Targets
0.1 0.03
Transfer set:
0.7
Teacher

subset of 0.25
training data Large Neural Network 2.9 Softmax 0.70
Trained 0.2 0.02
0.1 0.01

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 11

Softmax Temperature
Recall: Softmax function:

Softmax with temperature:

z q [T=1] q [T=10]
-1.1 0.007 0.171

Probability
1.4 0.087 0.219
3.7 0.880 0.276 T=10
T=5
0.1 0.024 0.193
T=1
-3.0 0.001 0.141
Classes

Additional hyper-parameter: T → exposes more “dark knowledge”

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 12

Knowledge Distillation Can we use logits?

Logits (z) Probabilities (q)

5x1 5x1
Neural Network
0.3 0.11 Which loss
0.9 0.21 function?

Student
Image 1.9 0.54
Small Neural Network Softmax
224 x 224 x 3
0.1 0.09
T=5
0.2 0.10
Loss
Which temperature to use during inference? Function
(Soft) Targets
0.1 0.03
0.7
Teacher

0.25
Large Neural Network 2.9 Softmax 0.70
0.2 T=5 0.02
0.1 0.01

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 13

QUESTION

What temperature to use during inference?

1. T=5 (same as training)

2. T=1 (standard softmax)
3. T=0

?
4. T=10
QUESTION

What temperature to use during inference?

1. T=5 (same as training) No reason to change T, and

standard T is sometimes
2. T=1 (standard softmax)
used for setting a confidence
3. T=0 threshold

?
4. T=10
Hard labels
0 0 1 0 0
Distillation Loss + Student Loss Hard prediction
0.02
0.11
Loss
0.84
Softmax Function
0.01
T=1
0.02
Student
Soft predictions Loss
Neural Network
0.11
Image 0.21
Student

224 x 224 x 3 0.54

Small Neural Network Softmax
0.09
T=5
0.10
Loss
(Soft) Targets Function
0.03
Distillation
Teacher

0.25
Large Neural Network Softmax Loss
0.70
T=5 0.02
© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS 0.01 Lecture 7 - 16
Distillation Loss + Student Loss

Student loss
Distillation loss

ure
ction
t
© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 17
import torch
import torch.nn.functional as F
import torchvision

# load model + init optimizer

model = torchvision.models.mobilenet_v2()
optimizer = torchvision.optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

for epoch in range(NUM_EPOCHS):

for inputs,labels in trainloader:

# forward
outputs = net(inputs)

# loss
loss = F.cross_entropy(outputs, labels)

# backward + optimize
loss.backward()
optimizer.step()
18
# load model + init optimizer
student = torchvision.models.mobilenet_v2()
optimizer = torchvision.optim.SGD(net.parameters(), lr=0.01, momentum=0.9)
teacher = torchvision.models.resnet101() # load teacher model
Load teacher + set hyper-params
T, ALPHA = 5, 0.3 # distillation hyper-params

for epoch in range(NUM_EPOCHS):

for inputs,labels in trainloader:
# forward
outputs_s = student(inputs)
outputs_t = teacher(inputs) # teacher forward pass

hard_loss = F.cross_entropy(outputs_s, labels) # hard loss with G.T. labels

#distillation loss Compute loss

p, q = F.softmax(outputs_s/T, dim=1), F.softmax(outputs_t/T, dim=1)
dist_loss = F.kl_div(p, q)

loss = ALPHA * dist_loss + (1. - ALPHA) * hard_loss # combined hard + distillation loss

# backward + optimize
loss.backward()
optimizer.step() 19
Learning Outcomes

Understand what knowledge distillation is and why it is needed.

Write code to perform knowledge distillation

Understand advanced knowledge distillation techniques and open problems

Ensembles and Specialists
● Teacher architecture can be more
complicated – boost accuracy

Student
Small Neural Network

Distillation
Loss
Teacher

Large Neural Network

Ensembles and Specialists
● Teacher architecture can be more

Student
complicated – boost accuracy
Small Neural Network
● Ensembles:
Distillation
Loss
○ Different initializations
○ Different model architectures
Teacher

Large Neural Network

Average
Teacher

Large Neural Network

Teacher

Large Neural Network

Ensembles and Specialists
● Teacher architecture can be more

Student
Small Neural Network complicated – boost accuracy
Distillation ● Ensembles:
Loss ○ Different initializations
Large Neural Network
○ Different model architectures
T

Specialist 1
● Specialists [1]:
Large Neural Network Select ○ Divide classes to different model
T

Specialist 2 relevant +
minimize KL ■ Google JFT dataset has 15000
divergence classes
Large Neural Network
Specialist 2 ○ One generalist NN on all data
T

Specialist 3
○ Top-k classes from generalist are
further refined by specialists
Large Neural Network
○ How to choose specialist classes?
T

Generalist

Distillation Types
● Offline: Pretrained teacher used to add Teacher
Student
distillation loss during student training Pretrained

● Online: Both teacher and student are

trained simultaneously Teacher
Student
○ Collaborative/mutual learning Untrained

Distillation Types
● Offline: Pretrained teacher used to add Teacher
distillation loss during student training Student
Pretrained
● Online: Both teacher and student are trained
simultaneously
Teacher
○ Collaborative/mutual learning Student
Untrained
● Self distillation:
○ E.g. Progressive hierarchical inference
Teacher/Student
Final
Exit
Early Exit 1 Early Exit 2

Knowledge Types
● Response-based:
○ Output probabilities as soft targets (as we have already seen)

● Feature-based:
○ Output/weights of 1 or more “hint layers” and minimize e.g. MSE loss
○ More advanced: minimize difference in attention maps between student/teacher

● Relation-based:
○ Correlations between feature maps: e.g. Gramian between two features

Distillation Algorithms
● Survey paper [2] has a thorough
overview of different distillation variations
● Adverserial: Teacher also acts as discriminator in
GAN to supplement training data to “teach” true
data distribution
● Cross-modal: Teacher trained on RGB distills
information to student learning on heat maps.
Unlabeled image pairs needed.
● Quantized distillation: Use full-precision network
to transfer knowledge to quantized network.
● … more in the paper!
Image from [2]

Further Reading
[1] G. Hinton et al., Distilling the knowledge in a neural network, Neural
Information Processing Systems (NeurIPS), 2015.
[2] J. Guo et al., Knowledge Distillation: A Survey, ArXiV preprint, 2021.

Adv. OS Notes Module-04
100% (1)
Adv. OS Notes Module-04
8 pages
Lecture # 15-1 Knowledge Distillation
No ratings yet
Lecture # 15-1 Knowledge Distillation
51 pages
Csps 1
100% (1)
Csps 1
62 pages
SECodec: Structural Entropy-Based Compressive Speech Representation Codec For Speech Language Models
100% (1)
SECodec: Structural Entropy-Based Compressive Speech Representation Codec For Speech Language Models
17 pages
Abdelghani KABOT S CV 1675947915
No ratings yet
Abdelghani KABOT S CV 1675947915
2 pages
K-Impact Guidelines For Colleges
No ratings yet
K-Impact Guidelines For Colleges
3 pages
ACA Notes UNIT-1
No ratings yet
ACA Notes UNIT-1
20 pages
Top 45 Machine Learning Interview Questions in 2025
100% (1)
Top 45 Machine Learning Interview Questions in 2025
37 pages
End-to-End Machine Learning Project (Bootcamp)
No ratings yet
End-to-End Machine Learning Project (Bootcamp)
415 pages
Notes PDF ML Day 4
100% (1)
Notes PDF ML Day 4
5 pages
CS 89 31 Final Project Background Info
No ratings yet
CS 89 31 Final Project Background Info
16 pages
Big Questions With Answers
100% (1)
Big Questions With Answers
32 pages
Machine Learning in Production
No ratings yet
Machine Learning in Production
31 pages
Unit 4 Da
No ratings yet
Unit 4 Da
57 pages
Neural Network Module 2 Notes
100% (1)
Neural Network Module 2 Notes
72 pages
Year 4 Module 6 Quiz PDF
0% (1)
Year 4 Module 6 Quiz PDF
5 pages
Intermediate AI Prompting - Neural Networks
From Everand
Intermediate AI Prompting - Neural Networks
Eric Centore
No ratings yet
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
No ratings yet
2-Capacity, Underfitting, overfitting-15-Jul-2020Material - I - 15-Jul-2020 - ML - Fundamentals
35 pages
AI ML Roadmap
No ratings yet
AI ML Roadmap
4 pages
Introduction To Generative Models
No ratings yet
Introduction To Generative Models
13 pages
Soft Computing UNIT 1
No ratings yet
Soft Computing UNIT 1
10 pages
A State of Art Techniques On Machine Learning Algorithms A Perspective of Supervised Learning Approaches in Data Classification
100% (1)
A State of Art Techniques On Machine Learning Algorithms A Perspective of Supervised Learning Approaches in Data Classification
5 pages
HTML Tables and Forms (PDFDrive)
100% (1)
HTML Tables and Forms (PDFDrive)
68 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Bigmart PDF
No ratings yet
Bigmart PDF
20 pages
Google People and Ai Guidebook-Workshop-Slides
No ratings yet
Google People and Ai Guidebook-Workshop-Slides
126 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Plane Strain and Plane Stress
No ratings yet
Plane Strain and Plane Stress
35 pages
CompTIA Network+: Untangling Ethernet, Herding Packets, and Conquering Connectivity Chaos
From Everand
CompTIA Network+: Untangling Ethernet, Herding Packets, and Conquering Connectivity Chaos
Scott Markham
No ratings yet
SE Report
100% (1)
SE Report
43 pages
Sommerville Software Enginnering..tekalegn
No ratings yet
Sommerville Software Enginnering..tekalegn
7 pages
The Five Steps in Problem Analysis
No ratings yet
The Five Steps in Problem Analysis
46 pages
Scilab Manual For Image Processing by MR Gautam Pal Computer Engineering Tripura Institute of Technlogy
No ratings yet
Scilab Manual For Image Processing by MR Gautam Pal Computer Engineering Tripura Institute of Technlogy
50 pages
Unit 3
No ratings yet
Unit 3
14 pages
1 Archiwum-66-4-05-Chandrahas - 2021
100% (1)
1 Archiwum-66-4-05-Chandrahas - 2021
18 pages
Iso 15614 11 2002
No ratings yet
Iso 15614 11 2002
12 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
AML - Theory - Syllabus - Chandigarh University
No ratings yet
AML - Theory - Syllabus - Chandigarh University
4 pages
004artificial Intelligence 3rd Ed by Elaine Rich Kevin Knight Amp Shivashankar Nair
No ratings yet
004artificial Intelligence 3rd Ed by Elaine Rich Kevin Knight Amp Shivashankar Nair
44 pages
Machine Learning: Neural Networks
No ratings yet
Machine Learning: Neural Networks
22 pages
4 Series Manual Version 1p10
No ratings yet
4 Series Manual Version 1p10
60 pages
Data Science
No ratings yet
Data Science
8 pages
Project On Performance Appraisal of J&K Bank Employees
0% (1)
Project On Performance Appraisal of J&K Bank Employees
24 pages
Presentaion Project
No ratings yet
Presentaion Project
15 pages
9 Months - Professional Web Development & BI Overview
No ratings yet
9 Months - Professional Web Development & BI Overview
4 pages
ML Notes MAKAUT 7th Sem
No ratings yet
ML Notes MAKAUT 7th Sem
31 pages
Company Profile Al Nafi
100% (1)
Company Profile Al Nafi
13 pages
7th Sem Syllabus
No ratings yet
7th Sem Syllabus
11 pages
Python Programming-Grade 9
No ratings yet
Python Programming-Grade 9
53 pages
EEE1007 Neural Network and Fuzzy Control
No ratings yet
EEE1007 Neural Network and Fuzzy Control
2 pages
AI Introduction
No ratings yet
AI Introduction
11 pages
The Cellular Concept - System Design Fundamental
No ratings yet
The Cellular Concept - System Design Fundamental
53 pages
Data Science Questions and Answers
No ratings yet
Data Science Questions and Answers
4 pages
Deep Neural Network
No ratings yet
Deep Neural Network
12 pages
Spec Hyundai HX210
No ratings yet
Spec Hyundai HX210
10 pages
Lecture 6 - State Space Search - Uninformed Search
No ratings yet
Lecture 6 - State Space Search - Uninformed Search
43 pages
Losses in Piping System
No ratings yet
Losses in Piping System
18 pages
Part I - Sample Questions: COMPETENCY 1: Patient Care
No ratings yet
Part I - Sample Questions: COMPETENCY 1: Patient Care
20 pages
Boiler and Boiler Calculations
No ratings yet
Boiler and Boiler Calculations
7 pages
Calamansi Extract and Onion As Alternative Cockroach Killer
No ratings yet
Calamansi Extract and Onion As Alternative Cockroach Killer
6 pages
Deep Learning and CNNFYTGS5101-Guoyangxie
No ratings yet
Deep Learning and CNNFYTGS5101-Guoyangxie
42 pages
Spiral Model
No ratings yet
Spiral Model
8 pages
BUHK408
No ratings yet
BUHK408
5 pages
5 Open Source Wi-Fi Hotspot Solutions
No ratings yet
5 Open Source Wi-Fi Hotspot Solutions
3 pages
Swivel Grease MSDS
No ratings yet
Swivel Grease MSDS
8 pages
How To Do Deep Learning With SAS: Title
No ratings yet
How To Do Deep Learning With SAS: Title
16 pages
Future Skills - An Introduction, General Overview of The Future Skills Sub-Sector-1
No ratings yet
Future Skills - An Introduction, General Overview of The Future Skills Sub-Sector-1
15 pages
Robotics and Machine Vision Internal 3 Important Questions
No ratings yet
Robotics and Machine Vision Internal 3 Important Questions
1 page
8 Recommender
No ratings yet
8 Recommender
139 pages
Maurenthia Jeinely Mandey, Sri Murni, Arrazi Hasan Jan - Januari 2023
No ratings yet
Maurenthia Jeinely Mandey, Sri Murni, Arrazi Hasan Jan - Januari 2023
8 pages
IOT-Based Smart Plant Protection and Pest Control by Using Raspberry Pi
No ratings yet
IOT-Based Smart Plant Protection and Pest Control by Using Raspberry Pi
6 pages
CP16036
No ratings yet
CP16036
6 pages
1 Sem M.tech Syllabus
No ratings yet
1 Sem M.tech Syllabus
9 pages
DMW Project Report by Saurabh Zingade
No ratings yet
DMW Project Report by Saurabh Zingade
16 pages
بنك الاسئلة د محمود ابوالفتوح PDF
No ratings yet
بنك الاسئلة د محمود ابوالفتوح PDF
4 pages
Why and How Do I Get Into Machine Learning Development?
No ratings yet
Why and How Do I Get Into Machine Learning Development?
3 pages
Quiz ôn tập thi cuối kỳ Attempt review
No ratings yet
Quiz ôn tập thi cuối kỳ Attempt review
9 pages
18 Home Savings vs. Dailo
No ratings yet
18 Home Savings vs. Dailo
11 pages
Lecture17 Mle Map
No ratings yet
Lecture17 Mle Map
29 pages
Decision Tree Classifiers
No ratings yet
Decision Tree Classifiers
95 pages
Screenshot 2021-05-16 at 11.15.41 AM
No ratings yet
Screenshot 2021-05-16 at 11.15.41 AM
17 pages
Evaluation of Quickcampus++ As Integrated Student Management System of Pangasinan State University
No ratings yet
Evaluation of Quickcampus++ As Integrated Student Management System of Pangasinan State University
4 pages
Lab Rheology and Injection Molding - 1
No ratings yet
Lab Rheology and Injection Molding - 1
3 pages
Futureinternet 15 00192
No ratings yet
Futureinternet 15 00192
24 pages
100 Consumer Behavior Questions
No ratings yet
100 Consumer Behavior Questions
50 pages
Expert Systems Principles and Programming Solution Manual PDF
No ratings yet
Expert Systems Principles and Programming Solution Manual PDF
2 pages
It2402 Mobile Communication
No ratings yet
It2402 Mobile Communication
1 page
Markingson FINAL2 Nov 14
No ratings yet
Markingson FINAL2 Nov 14
8 pages
Comparative Analysis of Truss Bridges IJERTV10IS010168
No ratings yet
Comparative Analysis of Truss Bridges IJERTV10IS010168
3 pages
NEBOSH IGC1-PART-2 (Answer)
No ratings yet
NEBOSH IGC1-PART-2 (Answer)
4 pages
Floor Truss Span Tables
No ratings yet
Floor Truss Span Tables
2 pages
Factor Analysis To Evaluate Hospital Resilience
No ratings yet
Factor Analysis To Evaluate Hospital Resilience
7 pages
Trackpad Pro Ver. 5.0 Class 6
From Everand
Trackpad Pro Ver. 5.0 Class 6
Nidhi Arora
No ratings yet
Final Report 169369314
No ratings yet
Final Report 169369314
11 pages
PDS AlphaTec MANUAL PRESSURE TEST KIT 2003
No ratings yet
PDS AlphaTec MANUAL PRESSURE TEST KIT 2003
1 page
Final Report 169374910
No ratings yet
Final Report 169374910
7 pages
Final Report 169508596
No ratings yet
Final Report 169508596
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

7-Knowledge Distillation

Uploaded by

7-Knowledge Distillation

Uploaded by

admin

● Second guest lecture scheduled: Jonathan Frankle

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 1

Methods to reduce the number of computations

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 3

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 4

● Larger DNNs are easier to train

Understand what knowledge distillation is and why it is needed.

Write code to perform knowledge distillation

Understand advanced knowledge distillation techniques and open problems

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 6

○ Where do we get them from? bus 0

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 7

○ Where do we get them from? 0.02 bus 0

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 8

Where do we get soft targets?

1. From labeled data by clustering similar classes

Where do we get soft targets?

1. From labeled data by clustering similar classes

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 11

Softmax with temperature:

Additional hyper-parameter: T → exposes more “dark knowledge”

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 12

Logits (z) Probabilities (q)

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 13

What temperature to use during inference?

1. T=5 (same as training)

What temperature to use during inference?

1. T=5 (same as training) No reason to change T, and

224 x 224 x 3 0.54

# load model + init optimizer

for epoch in range(NUM_EPOCHS):

for inputs,labels in trainloader:

for epoch in range(NUM_EPOCHS):

hard_loss = F.cross_entropy(outputs_s, labels) # hard loss with G.T. labels

#distillation loss Compute loss

Understand what knowledge distillation is and why it is needed.

Write code to perform knowledge distillation

Understand advanced knowledge distillation techniques and open problems

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 20

Large Neural Network

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 21

Large Neural Network

Large Neural Network

Large Neural Network

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 22

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 23

● Online: Both teacher and student are

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 24

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 25

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 26

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 27

© 2022 Mohamed S. Abdelfattah ECE 5545: ML HW & SYS Lecture 7 - 28

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.