0% found this document useful (0 votes)

19 views98 pages

Week 3: Deeplearning - Ai

Uploaded by

ensaf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views98 pages

Week 3: Deeplearning - Ai

Uploaded by

ensaf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 98

Week 3

Overview
deeplearning.ai
Week 3

Question BERT
Answering

Transfer T5
learning
Question Answering
Context-based Closed book

Model Model
Not just the model

Data Data

Training Training Transfer Learning!

Model Model
Classical training
Training Inference

Course Review

Course Review Model

Model
Transfer learning
Inference
Pre-training
Movie Reviews Model Course Review

Model
Training
on “Downstream” Task
Course Review Model
Transfer Learning: Different Tasks
Inference
Pre-Training
Sentiment Watching the Model When’s my
Classiﬁcation movie is like ... birthday?

Model
Training When is
Downstream task: Pi Day? Model
Question Answering March
14! Umm…
BERT: Bi-directional Context
Uni-directional

Learning from deeplearning.ai is like watching the sunset with my best friend!

context
Bi-directional

Learning from deeplearning.ai is like watching the sunset with my best friend!

context context
T5: Single task vs. Multi task
Studying with Studying with
deeplearning.ai deeplearning.ai
was ... was ...

Model 1
Model

Model 2
T5: more data, better performance

C4
Colossal Clean Crawled
English wikipedia
Corpus
~13 GB
~800 GB
Transfer
Learning
in NLP
deeplearning.ai
Desirable Goals

● Reduce training time

Transfer Learning!
● Improve predictions

● Small datasets
Transfer learning options
1 2
Pre-train
Transfer Model
data
Train
Model Labeled
data Feature- Unlabeled
based
3
prediction Fine- Pre-training task
tuning
Language modeling
Masked words
Next sentence
Transfer 1
General purpose learning
I am because I am learning CBOW “Happy”

Word Embeddings

input
Translation

“Features”
Transfer 1
Feature-based vs. Fine-Tuning
Pre-Train Pre-Train

Model prediction Model prediction

features
Fine-tune same model
on Downstream task

Model prediction
Model prediction
Train a new model
Fine-tune: adding a layer Transfer
1

Pre-Training

Movies ...

Course
reviews
2
Pre-train
Data and performance data

Data Model

Data Model
2
Pre-train
Labeled vs Unlabeled Data data

Labeled text data Unlabeled text data

2
Pre-train
Transfer learning with unlabeled data data
Pre-Training

Model

Which tasks work with

No labels !
unlabeled data?
Downstream task

What day is Pi day? Model March 14

Labeled data
3
Self-supervised task Pre-training task

Unlabeled
data

Create
Inputs targets
(features) (Labels)
3
Self-supervised tasks Pre-training task
Unlabeled Data
Learning from deeplearning.ai
is like watching the sunset Target
with my best friend.
Input friend

Learning from deeplearning.ai

is like watching the sunset Model prediction Loss
with my best _______

Update

Language modeling
Fine-tune a model for each downstream task
Pre Training
Model

Training on
Downstream task
Model Model Model

Translation Summarization Q&A

Summary
1 2
Pre-train
Transfer Model
data
Train
Model Labeled
data Feature- Unlabeled
based
3
prediction Fine- Pre-training task
tuning
Language modeling
Masked words
Next sentence
ELMo, GPT,
BERT, T5
deeplearning.ai
Outline

CBOW ELMo GPT BERT T5

Context
… right ...

… they were on the right ...

… they were on the right side of the street

Continuous Bag of Words

… they were on the right side of the street

Fixed window Fixed window

“on”

“the” “right”

“side”

“of” Fully-connected (Feed Forward) neural network

Need more context?

… they were on the right side of the street.

Fixed window Fixed window

… they were on the right side of history.

Use all context words
The legislators believed that they were on the right side of history, so they changed the law.
ELMo: Full context using RNN
The legislators believed that they were on the _____ side of history so they changed the law.

Bi-directional LSTM

“right”
LSTM LSTM

Word embedding for “right”

Open AI GPT
ELMo Transformer GPT

Decoder
RNN
Decoder
Encoder

The legislators believed that they were on the _____

Uni-directional
Why not bi-directional?
Transformer
Attention

… on the right side...

Each word can peek at itself!
GPT: Uni-directional
Transformer Transformer

Attention
Attention

… on the right side... … on the right

Each word can peek at itself! No peeking!
BERT
Transformer GPT BERT

Decoder
Decoder Encoder
Encoder

The legislators believed that they were on the _____ side of history, so they changed the law.

Bi-directional
Transformer + Bi-directional Context

… on the _ side _ history ... Model “right”

“of”

Multi-Mask Language Modeling

BERT: Words to Sentences
So they changed the law.
The legislators believed that they were
on the right side of history. ?
Then the bunny ate the carrot.

Sentence “A”
? Sentence “B”

Next Sentence Prediction

BERT Pre-training Tasks
Multi-Mask Language Modeling

… on the _ side _ history ... “right”

Model
“of”

Next Sentence Prediction

?
Sentence “A” Sentence “B”
T5: Encoder vs. Encoder-Decoder

Transformer GPT BERT T5

Decoder Decoder
Decoder Encoder
Encoder Encoder
T5: Multi-task
Studying with
deeplearning.ai
was ...

How?
Model
T5: Text-to-Text
“5 stars”
“Classify: Learning from deeplearning.ai is like...”

Classify

“It was alright”

“Summarize: It was the best of times…” Summarize
Task type
Question
“Question: “When is Pi day?” “March 14”
More details next!
Summary
CBOW ELMo GPT BERT T5

Context Full sentence Transformer: Transformer: Transformer:

window Decoder Encoder Encoder - Decoder
Bi-directional
FFNN Context Uni-directional Bi-directional Bi-directional
Context Context Context
RNN
Multi-Mask Multi-Task

Next Sentence
Prediction
Bidirectional Encoder
Representations from
Transformers (BERT)
deeplearning.ai
Outline
● Learn about the BERT architecture

● Understand how BERT pre-training works

BERT
● Makes use of transfer learning/pre-training:

...
...

...
...
BERT
● A multi layer bidirectional transformer

● Positional embeddings

● BERT_base:
12 layers (12 transformer blocks)
12 attentions heads
110 million parameters
BERT pre-training

After school Lukasz does his in the library.

● Masked language modeling (MLM)

BERT pre-training

After school Lukasz does his homework in the library.

After school his homework in the .

Summary
● Choose 15% of the tokens at random: mask them 80% of the time,
replace them with a random token 10% of the time, or keep as is 10% of
the time.

● There could be multiple masked spans in a sentence

● Next sentence prediction is also used when pre-training.

BERT
Objective
deeplearning.ai
Outline
● Understand how BERT inputs are fed into the model

● Visualize the output

● Learn about the BERT objective

Formalizing the input

Input [CLS] my dog is cute [SEP] he likes play ##ing [SEP]

E E E E E E E E E E E
Token [CLS] my dog is cute [SEP] he likes play ##ing [SEP]

Embeddings
E E E E E E E E E E E
Segment A A A A A A B B B B B

Embeddings
Position E E E E E E E E E E E
0 1 2 3 4 5 6 7 8 9 10
Embeddings
Visualizing the output
NSP Mask ML Mask ML

C T1 ... TN T [SEP] T1 ’ ...

• [CLS]: a special
TM ’
classiﬁcation symbol
added in front of
BERT
every input
E [CLS] E 1
... EN E [SEP] E1 ’ ... EM ’

[CLS] Tok 1 ... Tok N [SEP] Tok 1 ... Tok M • [SEP]: a special
separator token
Masked sentence A Masked sentence B

Unlabeled Sentence A and B Pair

BERT Objective
Objective 1: Objective 2:
Multi-Mask LM Next Sentence Prediction

Loss: Cross Entropy Loss Loss: Binary Loss

V 2
Summary
● BERT objective

● Model inputs/outputs
Fine-tuning
BERT
deeplearning.ai
Fine-tuning BERT: Outline
MNLI
Pre-train BERT
BERT
Hypothesis Premise
Sentence A Sentence B

NER
SQuAD BERT

BERT
Sentence A Tags

Question Answer
Inputs
Summary
Sentence A Sentence B Sentence Entities

Text Ø Sentence Paraphrase

Question Passage Article Summary

Hypothesis Premise
⋮
Transformer
T5
deeplearning.ai
Outline

● Understand how T5 works

● Recognize the different types of attention used

● Overview of model architecture

Transformer - T5 Model
Text to Text Machine Translation

Classiﬁcation
Summarization

Question
Answering (Q&A)
Sentiment
©Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Transformer - T5 Model
Original text

Thank you for inviting me to your party last week.

Inputs

Thank you <X> me to your party <Y> week.

Targets

<X> for inviting <Y> last <Z>

©Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Model Architecture
y1 y2 .
Language model Preﬁx LM
Decoder

X2 X3 y1 y2 . X2 X3 y1 y2 .
Encoder

X1 X2 X3 X4 X1 X2 X3 y1 y2 X1 X X 3 y1 y2
2

©Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Model Architecture
● Encoder/decoder Decoder

Encoder

● 12 transformer blocks each

● 220 million parameters

©Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Summary
● Preﬁx LM attention

● Model architecture

● Pre-training T5 (MLM)
Multi-task
Training
Strategy
deeplearning.ai
Multi-task training strategy
“Translate English to German: That is
good.” “Das ist gut”

“cola sentence: The course is jumping

“not acceptable”

T5
well.”

“stsb sentence1: The rhino grazed on “3.8”

the grass. Sentence2: A rhino is
grazing in a ﬁeld.”
“six people
“Summarize: state authorities hospitalized after a
dispatched emergency crews tuesday storm in attala
to survey the damage after an county”
onslaught of severe weather in
mississippi…”

©Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Input and Output Format
Machine translation:
• translate English to German: That is good.
● Predict entailment, contradiction , or neutral
• mnli premise: I hate pigeons hypothesis: My feelings
towards pigeons are ﬁlled with animosity. target: entailment
● Winograd schema
• The city councilmen refused the demonstrators a permit
because *they* feared violence

©Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Multi-task Training Strategy
Fine-tuning method GLUE CNNDM SQuAD SGLUE EnDe EnFr EnRo

* All parameters 83.28 19.24 80.88 71.36 26.98 39.82 27.65

Adapter layers, 80.52 15.08 79.32 60.40 13.84 17.88 15.54

Adapter layers, 81.51 16.62 79.47 63.03 19.83 27.50 22.63

Adapter layers, 81.54 17.78 79.18 64.30 23.45 33.98 25.81

Adapter layers, 81.51 16.62 79.47 63.03 19.83 27.50 22.63

Gradual unfreezing 82.50 18.95 79.17 70.79 26.71 39.02 26.93

How much data from each task to train on?

©Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Data Training Strategies
Examples-proportional mixing Equal mixing

Data 1 Data 1

Sample 1 Sample 1

Data 2
Data 2 Sample 2
Sample 2

Temperature-scaled mixing
Gradual unfreezing vs. Adapter layers

Gradual unfreezing Adapter layers

©Exploring the Limits of Transfer learning with a unified text to Text Transformer. Raffel et. al. 2020
Fine-tuning
Pre Training
Model Model Model

Translation Summarization MLM

Fine Tune on Speciﬁc Task

Model 218 steps

Q&A
GLUE
Benchmark
deeplearning.ai
General Language Understanding Evaluation
● A collection used to train, evaluate, analyze natural language
understanding systems
● Datasets with different genres, and of different sizes and
difﬁculties
● Leaderboard
Tasks Evaluated on
● Sentence grammatical or not?
● Sentiment
● Paraphrase
● Similarity
● Questions duplicates
● Answerable
● Contradiction
● Entailment
● Winograd (co-ref)
General Language Understanding Evaluation

● Drive research

● Model agnostic

● Makes use of transfer learning

Question
Answering
deeplearning.ai
Transformer encoder Feedforward:

[
Add & Norm
Feed LayerNorm,
Forward dense,
Add & Norm activation,
Multi-Head dropout_middle,
Attention
dense,

Positional dropout_final
Encoding ]
Input
Embedding
Inputs
Transformer encoder Encoder block:

[
Add & Norm
Feed Residual(
Forward LayerNorm,
Add & Norm attention,
Multi-Head dropout_,
Attention
),

Positional Residual(
Encoding feed_forward,
Input
),
Embedding
Inputs ]
Transformer encoder Feedforward: Encoder block:

[ [
Add & Norm
Feed LayerNorm, Residual(
Forward dense, LayerNorm,

Add & Norm activation, attention,

Multi-Head dropout_middle, dropout_,
Attention
dense, ),
dropout_final Residual(
Positional
Encoding ] feed_forward,
Input )
Embedding
]
Inputs
Data examples
Question: What percentage of the French population today is non - European ?

Context: Since the end of the Second World War , France has become an ethnically diverse country . Today ,
approximately ﬁve percent of the French population is non - European and non - white . This does not
approach the number of non - white citizens in the United States ( roughly 28 – 37 % , depending on how Latinos are classiﬁed ;
see Demographics of the United States ) . Nevertheless , it amounts to at least three million people , and has forced the issues
of ethnic diversity onto the French policy agenda . France has developed an approach to dealing with ethnic problems that
stands in contrast to that of many advanced , industrialized countries . Unlike the United States , Britain , or even the
Netherlands , France maintains a " color - blind " model of public policy . This means that it targets virtually no policies directly
at racial or ethnic groups . Instead , it uses geographic or class criteria to address issues of social inequalities . It has , however ,
developed an extensive anti - racist policy repertoire since the early 1970s . Until recently , French policies focused primarily
on issues of hate speech — going much further than their American counterparts — and relatively less on issues of
discrimination in jobs , housing , and in provision of goods and services .

Target: Approximately ﬁve percent

Implementing Q&A with T5
“Translate English to
● Load a pre-trained model German: That is good.” “Das ist gut”

● Process data to get the required inputs “cola sentence: The course “not
is jumping well.”
and outputs: "question: Q context: C" as
input and "A" as target “stsb sentence1: The rhino
T5 acceptable”

grazed on the grass. “3.8”

● Fine tune your model on the new task Sentence2: A rhino is
and input grazing in a ﬁeld.” “six people
“Summarize: state hospitalized
authorities dispatched after a storm
● Predict using your own model emergency crews tuesday in attala
to survey the damage after county”
an onslaught of severe
weather in mississippi…”
Hugging
Face:
Introduction
deeplearning.ai
Outline
● What is Hugging Face?

● How you can use the Hugging Face ecosystem

Hugging Face Use it with

Transformers library

Use it for

Applying state of the art Fine-tuning pretrained

transformer models transformer models
Hugging Face: Using Transformers
Pipelines 1. Pre-processing your inputs

2. Running the model

3. Post-processing the outputs

Context
Q/ Answers
A
Questions
Hugging Face: Fine-Tuning Transformers
Datasets:
Tokenizer
One Thousand

Model Checkpoints:
Trainer Evaluation metrics
More than 14 thousand

Tokenizer
Checkpoint: Set of learned
parameters for a model using a
training procedure for some task
Human readable output
Hugging
Face: Using
Transformers
deeplearning.ai
Using Transformers
Pipelines 1. Pre-processing your inputs

2. Running the model

3. Post-processing the outputs

Context
Q/ Answers
A
Questions
Task
Tasks
Initialization
Pipelines Model
Checkpoint

Inputs for the

Use
task

Sentiment Analysis Question Answering Fill-Mask

Context and Sentence and

Sequence
questions position
Checkpoints

Huge number of model checkpoints that you can

use in your pipelines.

But beware, not every checkpoint would be

suitable for your task.
Model Hub

Hub containing models that you can use in your

pipelines according to the task you need:
https://huggingface.co/models

Model Card shows a description of your selected

model and useful information such as code snippet
examples.
Hugging
Face:
Fine-Tuning
Transformers
deeplearning.ai
Fine-Tuning Tools
Datasets:
Tokenizer
One Thousand

Model Checkpoints:
Trainer Evaluation metrics
More than 14 thousand

Tokenizer

Human readable output

Model Checkpoints
Model Checkpoints:
Model Dataset Name in
More than 15 thousand
(and increasing) Stanford Question
distilbert-base-cased-
DistilBERT Answering Dataset
distilled-squad
(SQuAD)
Upload the architecture
and weights with 1 line Wikipedia and Book
BERT bert-base-cased
of code! Corpus

... ... ...

Datasets

Datasets:
One Thousand Load them using just one function

Optimized to work with massive amounts of data!

Tokenizers

"What well-known [ 101, 1327, 1218, 118, 1227,

superheroes were introduced 18365, 1279, 1127, 2234,
Tokenizer
between 1939 and 1941 by 1206, 3061, 1105, 3018,
Detective Comics?" 1118, 9187, 7452, 136, 102]

Depending on the use case, you

might need to run additional steps.
Trainer and Evaluation Metrics
Trainer object let’s you deﬁne the training procedure

Number of epochs
Warm-up steps
Weight decay
...

Train using one line of code!

Pre-deﬁned evaluation metrics, like BLEU and ROUGE

Introduction To Transformers
No ratings yet
Introduction To Transformers
187 pages
Introduction To LLMS: Transformers Types of Llms Configuration Settings
100% (2)
Introduction To LLMS: Transformers Types of Llms Configuration Settings
7 pages
BERT Slides
No ratings yet
BERT Slides
41 pages
Lan - Guage Mo - Del Cheat Sheet
100% (2)
Lan - Guage Mo - Del Cheat Sheet
3 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
1102AITA04 AI For Text Analytics
No ratings yet
1102AITA04 AI For Text Analytics
88 pages
BERT Slides
No ratings yet
BERT Slides
62 pages
11 Bert
No ratings yet
11 Bert
66 pages
BERT Architecture
No ratings yet
BERT Architecture
23 pages
Bert Model - NLP
No ratings yet
Bert Model - NLP
10 pages
Transformers in NLP 1
No ratings yet
Transformers in NLP 1
9 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
Lec 19
No ratings yet
Lec 19
50 pages
BERT GPT CoT
No ratings yet
BERT GPT CoT
83 pages
Lecture 12 Pretraining
No ratings yet
Lecture 12 Pretraining
46 pages
Ai 2121
No ratings yet
Ai 2121
42 pages
All About Encoder-Decoder Models
No ratings yet
All About Encoder-Decoder Models
50 pages
1122 Ai Prompt Prelim Exam
No ratings yet
1122 Ai Prompt Prelim Exam
29 pages
UNIT-5 and 6
No ratings yet
UNIT-5 and 6
40 pages
Unit 2
No ratings yet
Unit 2
38 pages
Bert 1 42
No ratings yet
Bert 1 42
42 pages
Lecture 13 - Transformer Encoder Decoderv2
No ratings yet
Lecture 13 - Transformer Encoder Decoderv2
65 pages
BERT and Transformer
No ratings yet
BERT and Transformer
48 pages
BERT
No ratings yet
BERT
98 pages
Transformers Tutorial 1 56
No ratings yet
Transformers Tutorial 1 56
56 pages
LLM Learning
No ratings yet
LLM Learning
56 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
99 pages
LSTM To BERT
No ratings yet
LSTM To BERT
30 pages
BERT Explained - State of The Art Language Model For NLP - by Rani Horev - Towards Data Science
100% (1)
BERT Explained - State of The Art Language Model For NLP - by Rani Horev - Towards Data Science
8 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
Reasoning With Transformer Bas
No ratings yet
Reasoning With Transformer Bas
28 pages
Bert
No ratings yet
Bert
60 pages
NLP DL Lecture4
No ratings yet
NLP DL Lecture4
78 pages
RBFNetworks
No ratings yet
RBFNetworks
32 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
7 Transformers
No ratings yet
7 Transformers
20 pages
Bert
No ratings yet
Bert
20 pages
Data Mining Report
No ratings yet
Data Mining Report
17 pages
2024 Semeval-1 72
No ratings yet
2024 Semeval-1 72
6 pages
BERT Interview Questions and Cross Questions-1
No ratings yet
BERT Interview Questions and Cross Questions-1
9 pages
A Comparison of LSTM and BERT For Small Corpus: Aysu Ezen-Can SAS Inst. September 14, 2020
No ratings yet
A Comparison of LSTM and BERT For Small Corpus: Aysu Ezen-Can SAS Inst. September 14, 2020
12 pages
495 Lecture 11 BERT
No ratings yet
495 Lecture 11 BERT
31 pages
Bert
No ratings yet
Bert
5 pages
14 04 Transformers
No ratings yet
14 04 Transformers
11 pages
6-Bert T5 GPT
No ratings yet
6-Bert T5 GPT
31 pages
HKBK College of Engineering Department of Computer Science and Engineering
No ratings yet
HKBK College of Engineering Department of Computer Science and Engineering
24 pages
Lec 02
No ratings yet
Lec 02
33 pages
Stanford CS 224N Deep Learning For NLP Practice Quiz Pack
No ratings yet
Stanford CS 224N Deep Learning For NLP Practice Quiz Pack
4 pages
Bert
No ratings yet
Bert
36 pages
13 - Bert
No ratings yet
13 - Bert
17 pages
Csci 544 Sequence Labeling L
No ratings yet
Csci 544 Sequence Labeling L
79 pages
Towards Trustworthy LLMs - Understanding The Security and Privacy
No ratings yet
Towards Trustworthy LLMs - Understanding The Security and Privacy
82 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
Bert
No ratings yet
Bert
10 pages
AI Branches
No ratings yet
AI Branches
7 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
20 pages
855 Roberta A Robustly Optimized B
No ratings yet
855 Roberta A Robustly Optimized B
15 pages
Bert Ayman
No ratings yet
Bert Ayman
5 pages
Neural Networks
100% (1)
Neural Networks
16 pages
cp-1 Jozve Hoshmasnoei Matlabsitecom
No ratings yet
cp-1 Jozve Hoshmasnoei Matlabsitecom
16 pages
Understanding BERT
No ratings yet
Understanding BERT
4 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
Blue and Yellow Illustrative Digital Education Presentation
No ratings yet
Blue and Yellow Illustrative Digital Education Presentation
7 pages
BERT
No ratings yet
BERT
4 pages
Deep Neural Network - Application 2layer
No ratings yet
Deep Neural Network - Application 2layer
7 pages
Aiml - Cvmia - Mjcet - 2024 2 9 11 26 29
No ratings yet
Aiml - Cvmia - Mjcet - 2024 2 9 11 26 29
2 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
Ramp-Up Guide Machine Learning
No ratings yet
Ramp-Up Guide Machine Learning
3 pages
Chapter 44
No ratings yet
Chapter 44
26 pages
References
No ratings yet
References
2 pages
Fish and Fisheries - 2022 - Saleh - Computer Vision and Deep Learning For Fish Classification in Underwater Habitats A
No ratings yet
Fish and Fisheries - 2022 - Saleh - Computer Vision and Deep Learning For Fish Classification in Underwater Habitats A
23 pages
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
19 pages
DL Unit 4
No ratings yet
DL Unit 4
27 pages
Souza 2020 Bertimbau
No ratings yet
Souza 2020 Bertimbau
23 pages
Video Anomaly Detection in 10 Years: A Survey and Outlook
No ratings yet
Video Anomaly Detection in 10 Years: A Survey and Outlook
20 pages
AI Engineer Profession
No ratings yet
AI Engineer Profession
2 pages
Face Detection and Tracking Using Live Video Acquisition - MATLAB & Simulink Example - MathWorks India
No ratings yet
Face Detection and Tracking Using Live Video Acquisition - MATLAB & Simulink Example - MathWorks India
5 pages
Mini Project On Butterfly Effect - Project PPT Presentation - 18BCS011,18BCS023,18BCS041
No ratings yet
Mini Project On Butterfly Effect - Project PPT Presentation - 18BCS011,18BCS023,18BCS041
20 pages
CS231n Convolutional Neural Networks For Visual Recognition
No ratings yet
CS231n Convolutional Neural Networks For Visual Recognition
25 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
MLQs
No ratings yet
MLQs
1 page
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
No ratings yet
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
10 pages
Delving Deep Into Rectifiers: Surpassing Human-Level Performance On Imagenet Classification
No ratings yet
Delving Deep Into Rectifiers: Surpassing Human-Level Performance On Imagenet Classification
11 pages
Hugging Face
No ratings yet
Hugging Face
1 page
Indian Sign Language Converter Using Convolutional Neural Networks
No ratings yet
Indian Sign Language Converter Using Convolutional Neural Networks
5 pages
Bert Explained
No ratings yet
Bert Explained
8 pages
DeepFloyd IF: A Text-To-Image Model That Can Integrate Text Into Images
No ratings yet
DeepFloyd IF: A Text-To-Image Model That Can Integrate Text Into Images
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week 3: Deeplearning - Ai

Uploaded by

Week 3: Deeplearning - Ai

Uploaded by

Week 3

Training Training Transfer Learning!

Course Review Model

● Reduce training time

Model prediction Model prediction

Labeled text data Unlabeled text data

Which tasks work with

What day is Pi day? Model March 14

Learning from deeplearning.ai

Translation Summarization Q&A

CBOW ELMo GPT BERT T5

… they were on the right ...

… they were on the right side of the street

… they were on the right side of the street

“of” Fully-connected (Feed Forward) neural network

… they were on the right side of the street.

… they were on the right side of history.

Word embedding for “right”

The legislators believed that they were on the _____

… on the right side...

… on the right side... … on the right

… on the ___ side ___ history ... Model “right”

Multi-Mask Language Modeling

Next Sentence Prediction

… on the ___ side ___ history ... “right”

Next Sentence Prediction

Transformer GPT BERT T5

“It was alright”

Context Full sentence Transformer: Transformer: Transformer:

● Understand how BERT pre-training works

After school Lukasz does his in the library.

● Masked language modeling (MLM)

After school Lukasz does his homework in the library.

After school his homework in the .

● There could be multiple masked spans in a sentence

● Next sentence prediction is also used when pre-training.

● Visualize the output

● Learn about the BERT objective

Input [CLS] my dog is cute [SEP] he likes play ##ing [SEP]

C T1 ... TN T [SEP] T1 ’ ...

Unlabeled Sentence A and B Pair

Loss: Cross Entropy Loss Loss: Binary Loss

Text Ø Sentence Paraphrase

Question Passage Article Summary

● Understand how T5 works

● Recognize the different types of attention used

● Overview of model architecture

Thank you for inviting me to your party last week.

Thank you <X> me to your party <Y> week.

<X> for inviting <Y> last <Z>

● 12 transformer blocks each

● 220 million parameters

“cola sentence: The course is jumping

“stsb sentence1: The rhino grazed on “3.8”

* All parameters 83.28 19.24 80.88 71.36 26.98 39.82 27.65

Adapter layers, 80.52 15.08 79.32 60.40 13.84 17.88 15.54

Adapter layers, 81.51 16.62 79.47 63.03 19.83 27.50 22.63

Adapter layers, 81.54 17.78 79.18 64.30 23.45 33.98 25.81

Adapter layers, 81.51 16.62 79.47 63.03 19.83 27.50 22.63

Gradual unfreezing 82.50 18.95 79.17 70.79 26.71 39.02 26.93

How much data from each task to train on?

Gradual unfreezing Adapter layers

Translation Summarization MLM

Fine Tune on Speciﬁc Task

Model 218 steps

● Makes use of transfer learning

Add & Norm activation, attention,

Target: Approximately ﬁve percent

grazed on the grass. “3.8”

● How you can use the Hugging Face ecosystem

Applying state of the art Fine-tuning pretrained

2. Running the model

3. Post-processing the outputs

2. Running the model

3. Post-processing the outputs

Inputs for the

… on the _ side _ history ... Model “right”

… on the _ side _ history ... “right”