0% found this document useful (0 votes)

66 views42 pages

Bert 1 42

The document discusses pretrained transformers, focusing on the pretraining and fine-tuning paradigm, particularly in the context of BERT and its applications. It covers various aspects such as the use of masks for bidirectional context, byte-pair encoding for vocabulary management, and fine-tuning techniques for tasks like sequence classification and named entity recognition. The document emphasizes the importance of transfer learning and the ability of pretrained models to generalize from large datasets for downstream applications.

Uploaded by

Atharv Bajaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views42 pages

Bert 1 42

Uploaded by

Atharv Bajaj

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Pretrained Transformers

Pawan Goyal

CSE, IIT Kharagpur

March 15th, 2023

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 1 / 55

Pretraining through Language Modeling - General
Paradigm

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 2 / 55

The Pretraining / Finetuning paradigm

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 3 / 55

Stochastic Gradient Descent and Pretrain/Finetune

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 4 / 55

Beyond ELMo: Using Transformers for Pretraining

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 5 / 55

Pretraining for three types of architectures

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 6 / 55

Pretraining Encoders

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 7 / 55

Pretraining Encoders

What would be the objective function?

So far, we’ve looked at language model pretraining (ELMo).
But encoders get bidirectional context, so we can’t do language modeling!

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 7 / 55

Solution: Use Masks

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 8 / 55

BERT: Bidirectional Encoder Representations from
Transformers

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 9 / 55

BERT: another view

yi = softmax(WV hi ), WV ∈ R|V|×dh , hi ∈ Rdh

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 10 / 55

BERT: Next Sentence Prediction

y = softmax(WNSP C), WNSP ∈ R2×dh , C ∈ Rdh

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 11 / 55

BERT: Next Sentence Prediction

Why NSP?
Masking focuses on predicting words from surrounding contexts so as to
produce effective word-level representations.
Many applications require relationship between two sentences, e.g.,
▶ paraphrase detection (detecting if two sentences have similar meanings),
▶ entailment (detecting if the meanings of two sentences entail or contradict
each other)
▶ discourse coherence (deciding if two neighboring sentences form a
coherent discourse)

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 12 / 55

BERT: Bidirectional Encoder Representations from
Transformers

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 13 / 55

Using BERT to create contextualized word embeddings

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 14 / 55

Using BERT to create contextualized word embeddings

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 15 / 55

A quick detail about the vocabulary

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 16 / 55

Finite vocabulary assumption

Finite vocabulary assumptions make even less sense in many languages with
complex morphology

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 17 / 55

Finite vocabulary assumption

Finite vocabulary assumptions make even less sense in many languages with
complex morphology

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 17 / 55

The byte-pair encoding algorithm

Start with a vocabulary containing only characters and an “end of word”

symbol.
Using a corpus of text, find the most common adjacent characters “a,b”;
add “ab” as a subword
Replace instances of the character pair with the new subword ; repeat
until desired vocab

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 18 / 55

Byte Pair Encoding

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 19 / 55

Byte Pair Encoding

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 20 / 55

Byte Pair Encoding

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 21 / 55

Byte Pair Encoding

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 22 / 55

Byte Pair Encoding

When do we stop
When the desired vocabulary size is met

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 23 / 55

Wordpiece/Sentencepiece model

Google NMT (GNMT) uses a variant of this

▶ V1: wordpiece model
▶ V2: sentencepiece model
Rather than char n-gram count, uses a greedy approximation to
maximizing language model log likelihood to choose the pieces
Add n-gram that maximally reduces perplexity
Here, perplexity is computed based on unigram LM

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 24 / 55

Wordpiece/Sentencepiece model

Wordpiece model tokenizes inside words

Sentencepiece model works from raw text
Whitespace is retained as special token (_) and grouped normally
You can reverse things at end by joining pieces and recoding them to
spaces

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 25 / 55

Using BERT for different tasks

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 26 / 55

Transfer Learning through Fine-tuning

The power of pretrained language models lies in their ability to extract

generalizations from large amounts of text
To make practical use of these generalizations, we need to create
interfaces from these models to downstream applications through a
process called fine-tuning.
Fine-tuning facilitates the creation of applications on top of pretrained
models through the addition of a small set of application-specific
parameters.
The fine-tuning process consists of using labeled data from the
application to train these additional application-specific parameters.
Typically, this training will either freeze or make only minimal adjustments
to the pretrained language model parameters.

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 27 / 55

Fine Tuning for Sequence Classification

With RNNs, we used the hidden layer associated with the final input
element to stand for the entire sequence. In BERT, the [CLS] token plays
the role of sentence embedding.
This unique token is added to the vocabulary and is prepended to the
start of all input sequences, both during pretraining and encoding.
The output vector in the final layer of the model for the [CLS] input serves
as the input to classifier head a classifier head.

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 28 / 55

Fine Tuning for Sequence Classification

With RNNs, we used the hidden layer associated with the final input
element to stand for the entire sequence. In BERT, the [CLS] token plays
the role of sentence embedding.
This unique token is added to the vocabulary and is prepended to the
start of all input sequences, both during pretraining and encoding.
The output vector C ∈ Rdh in the final layer of the model for the [CLS]
input serves as the input to classifier head a classifier head.
The only new parameters introduced during fine-tuning are classification
layer weights WC ∈ RK×dh , where K is the number of labels.

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 29 / 55

Fine Tuning for Sequence Classification

y = softmax(WC C)

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 30 / 55

Pair-wise Sequence Classification

Example: multiNLI
Pairs of sentences are given one of 3 labels: entails, contradicts and
neutral.
These labels describe a relationship between the meaning of the first
sentence (the premise) and the second sentence (the hypothesis).

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 31 / 55

Pair-wise Sequence Classification
As with NSP training, the two inputs are separated by a [SEP] token.
As with sequence classification, the output vector associated with the
prepended [CLS] token represents the model’s view of the input pair.
This vector C provides the input to a three-way classifier that can be
trained on the MultiNLI training corpus.

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 32 / 55

Sequence Labeling

Here, the final output vector corresponding to each input token is passed
to a classifier that produces a softmax distribution over the possible set of
tags.
The set of weights to be learned for this additional layer is WK ∈ Rk×dh ,
where k is the number of possible tags for the task.
Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 33 / 55
POS Tagging

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 34 / 55

Named Entity Recognition and BIO Scheme

Supervised training data for tasks like named entity recognition (NER) is
typically in the form of BIO tags associated with text segmented at the word
level. For example:

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 35 / 55

BIO Scheme with subwords

After wordPiece Tokenization:

The sequence does not align with the original tags.

Solution: Training and Decoding

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 36 / 55

BIO Scheme with subwords

After wordPiece Tokenization:

The sequence does not align with the original tags.

Solution: Training and Decoding

Training: we can just assign the gold-standard tag associated with each
word to all of the subword tokens derived from it.
Decoding: the simplest approach is to use the argmax BIO tag
associated with the first subword token of a word.

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 36 / 55

Fine-tuning for span-based applications

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 37 / 55

Fine-tuning for span-based applications

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 38 / 55

Fine-tuning for SQUAD

We represent the input question and passage as a single packed

sequence (with [SEP])
We only introduce a start vector S ∈ Rdh and an end vector E ∈ Rdh during
fine-tuning.
The probability of word i being the start of the answer span is computed
as a dot product between Ti and S followed by a softmax over all of the
words in the paragraph.
eS·Ti
Pi =
∑j eS·Tj
The analogous formula is used for the end of the answer span.
The score of a candidate span from position i to position j is defined as
S · Ti + E · Tj , and the maximum scoring span where j ≥ i is used as a
prediction.

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 39 / 55

Bert Explained
No ratings yet
Bert Explained
8 pages
13 Pretraining
No ratings yet
13 Pretraining
47 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
Transformer Part3 16 Mar 23 PDF
No ratings yet
Transformer Part3 16 Mar 23 PDF
59 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
99 pages
Lec14 Pretraining
No ratings yet
Lec14 Pretraining
42 pages
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
No ratings yet
How To Fine-Tune BERT For Text Classification?: Corresponding Author The Source Codes Are Available at
10 pages
Pre-Training & LLM 2
No ratings yet
Pre-Training & LLM 2
46 pages
BERT and Transformer
No ratings yet
BERT and Transformer
48 pages
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
No ratings yet
The Illustrated BERT, ELMo, and Co. (How NLP Cracked Transfer Learning) - Jay Alammar - Visualizing Machine Learning One Concept at A Time
19 pages
LSTM To BERT
No ratings yet
LSTM To BERT
30 pages
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
BERT Architecture
No ratings yet
BERT Architecture
23 pages
NLP DL Lecture4
No ratings yet
NLP DL Lecture4
78 pages
Week 3: Deeplearning - Ai
No ratings yet
Week 3: Deeplearning - Ai
98 pages
Lecture 04 - Pre-Trained Language Models (PLMS)
No ratings yet
Lecture 04 - Pre-Trained Language Models (PLMS)
36 pages
HKBK College of Engineering Department of Computer Science and Engineering
No ratings yet
HKBK College of Engineering Department of Computer Science and Engineering
24 pages
cs224n 2023 Lecture9 Pretraining
No ratings yet
cs224n 2023 Lecture9 Pretraining
54 pages
Lecture 12 Pretraining
No ratings yet
Lecture 12 Pretraining
46 pages
All About Encoder-Decoder Models
No ratings yet
All About Encoder-Decoder Models
50 pages
BERT
No ratings yet
BERT
98 pages
7 Transformers
No ratings yet
7 Transformers
20 pages
Preprint Jesus
No ratings yet
Preprint Jesus
2 pages
Bert
No ratings yet
Bert
20 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
NLP Lecture 01-16-Plm-tl
No ratings yet
NLP Lecture 01-16-Plm-tl
11 pages
NLP Week9 Fine Tuning - and - IR
No ratings yet
NLP Week9 Fine Tuning - and - IR
64 pages
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
No ratings yet
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
68 pages
Bert
No ratings yet
Bert
10 pages
11 Bert
No ratings yet
11 Bert
66 pages
Pars BERT
No ratings yet
Pars BERT
10 pages
495 Lecture 11 BERT
No ratings yet
495 Lecture 11 BERT
31 pages
Bert
No ratings yet
Bert
36 pages
UER: An Open-Source Toolkit For Pre-Training Models
No ratings yet
UER: An Open-Source Toolkit For Pre-Training Models
6 pages
LLM Learning
No ratings yet
LLM Learning
56 pages
Understanding BERT
No ratings yet
Understanding BERT
4 pages
Ba LLMS W2 S2 2024 2025
No ratings yet
Ba LLMS W2 S2 2024 2025
47 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
S B: A Pretrained Language Model For Scientific Text: CI ERT
No ratings yet
S B: A Pretrained Language Model For Scientific Text: CI ERT
6 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
14 LookingForward
No ratings yet
14 LookingForward
48 pages
BERT
No ratings yet
BERT
4 pages
Lec 02
No ratings yet
Lec 02
33 pages
Transformers in NLP 1
No ratings yet
Transformers in NLP 1
9 pages
1 s2.0 S2095809922006324 Main
No ratings yet
1 s2.0 S2095809922006324 Main
20 pages
Song 19 D
No ratings yet
Song 19 D
11 pages
2AMM30+AY23 24+Text+Mining+Lecture+3
No ratings yet
2AMM30+AY23 24+Text+Mining+Lecture+3
88 pages
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
No ratings yet
Joshua K. Cage - Python Transformers by Huggingface Hands On - 101 Practical Implementation Hands-On of ALBERT - ViT - BigBird and Other Latest Models With Huggingface Transformers
186 pages
13 - Bert
No ratings yet
13 - Bert
17 pages
855 Roberta A Robustly Optimized B
No ratings yet
855 Roberta A Robustly Optimized B
15 pages
An Embarrassingly Simple Approach For Transfer Learning From Pretrained Language Models
No ratings yet
An Embarrassingly Simple Approach For Transfer Learning From Pretrained Language Models
7 pages
1102AITA04 AI For Text Analytics
No ratings yet
1102AITA04 AI For Text Analytics
88 pages
Fine-Tuning and Masked Lan-Guage Models: 11.1 Bidirectional Transformer Encoders
No ratings yet
Fine-Tuning and Masked Lan-Guage Models: 11.1 Bidirectional Transformer Encoders
22 pages
Bert
No ratings yet
Bert
60 pages
Fine-Tuning and Masked Lan-Guage Models: 11.1 Bidirectional Transformer Encoders
No ratings yet
Fine-Tuning and Masked Lan-Guage Models: 11.1 Bidirectional Transformer Encoders
17 pages
Lecture # 13-3 BERT
No ratings yet
Lecture # 13-3 BERT
63 pages
Bert Model - NLP
No ratings yet
Bert Model - NLP
10 pages
Using A Catenary Equation in Parametric Representation For Minimizing Stress Concentrations at Notches
No ratings yet
Using A Catenary Equation in Parametric Representation For Minimizing Stress Concentrations at Notches
25 pages
Ajp 22517 Important Questions
No ratings yet
Ajp 22517 Important Questions
14 pages
Thiago. Ingles. s2
No ratings yet
Thiago. Ingles. s2
5 pages
EditDistance
No ratings yet
EditDistance
28 pages
Literary Devices English 4
No ratings yet
Literary Devices English 4
3 pages
Grade 10 Math
No ratings yet
Grade 10 Math
142 pages
54 TH Nfa Brochure
No ratings yet
54 TH Nfa Brochure
200 pages
Unit 13 Inversion: Explanations
100% (1)
Unit 13 Inversion: Explanations
2 pages
Week 4
No ratings yet
Week 4
25 pages
Arabic Grammar For Beginners Nahw Syntax by Shaykh Mufti Saiful Islam
100% (1)
Arabic Grammar For Beginners Nahw Syntax by Shaykh Mufti Saiful Islam
37 pages
Tom04 Quick Overview of The Bible
No ratings yet
Tom04 Quick Overview of The Bible
38 pages
Comptency Map 21ST Literature of The Philippines and The World
No ratings yet
Comptency Map 21ST Literature of The Philippines and The World
6 pages
TestOut LabSim
No ratings yet
TestOut LabSim
2 pages
Nad Installation Docs-Amit
No ratings yet
Nad Installation Docs-Amit
79 pages
Car, Gun Push and Pull Lissa
No ratings yet
Car, Gun Push and Pull Lissa
6 pages
Mans Best Friend British English Teacher
No ratings yet
Mans Best Friend British English Teacher
11 pages
Fast Detection of DATA TRANSFER
No ratings yet
Fast Detection of DATA TRANSFER
88 pages
Q 4
No ratings yet
Q 4
27 pages
... The Noisier... The Children Got, ... The Angrier... The Teacher Got
No ratings yet
... The Noisier... The Children Got, ... The Angrier... The Teacher Got
6 pages
Executive Summary of Mujib Climate Prosperity Plan
No ratings yet
Executive Summary of Mujib Climate Prosperity Plan
3 pages
Asm 13606
No ratings yet
Asm 13606
3 pages
Microprocessor Microcontroller EXAM 2021
No ratings yet
Microprocessor Microcontroller EXAM 2021
5 pages
MS 5118
No ratings yet
MS 5118
21 pages
KURT J (2015) The Beginning of Oulipo - An Attempt To Rediscover A Movement
No ratings yet
KURT J (2015) The Beginning of Oulipo - An Attempt To Rediscover A Movement
20 pages
Tutorial2 8086
No ratings yet
Tutorial2 8086
2 pages
Structured Data in Example
No ratings yet
Structured Data in Example
19 pages
9-Basic GK Questions With Answers PDF Notes
No ratings yet
9-Basic GK Questions With Answers PDF Notes
23 pages
Morning Mail
100% (1)
Morning Mail
7 pages
Speech Unit ACU F Order Form Filling Instructions
No ratings yet
Speech Unit ACU F Order Form Filling Instructions
33 pages
Birth Application Fillable Form 1
No ratings yet
Birth Application Fillable Form 1
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Bert 1 42

Uploaded by

Bert 1 42

Uploaded by

Pretrained Transformers

CSE, IIT Kharagpur

March 15th, 2023

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 1 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 2 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 3 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 4 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 5 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 6 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 7 / 55

What would be the objective function?

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 7 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 8 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 9 / 55

yi = softmax(WV hi ), WV ∈ R|V|×dh , hi ∈ Rdh

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 10 / 55

y = softmax(WNSP C), WNSP ∈ R2×dh , C ∈ Rdh

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 11 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 12 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 13 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 14 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 15 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 16 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 17 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 17 / 55

Start with a vocabulary containing only characters and an “end of word”

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 18 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 19 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 20 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 21 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 22 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 23 / 55

Google NMT (GNMT) uses a variant of this

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 24 / 55

Wordpiece model tokenizes inside words

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 25 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 26 / 55

The power of pretrained language models lies in their ability to extract

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 27 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 28 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 29 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 30 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 31 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 32 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 34 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 35 / 55

After wordPiece Tokenization:

The sequence does not align with the original tags.

Solution: Training and Decoding

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 36 / 55

After wordPiece Tokenization:

The sequence does not align with the original tags.

Solution: Training and Decoding

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 36 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 37 / 55

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 38 / 55

We represent the input question and passage as a single packed

Pawan Goyal (IIT Kharagpur) Pretrained Transformers March 15th, 2023 39 / 55

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.