0% found this document useful (0 votes)

17 views64 pages

RADL TTho

hust

Uploaded by

Phuc Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views64 pages

RADL TTho

hust

Uploaded by

Phuc Nguyen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Transformer and its variants for NLP

Quan Thanh Tho

qttho@hcmut.edu.vn
Assoc. Prof. Quan Thanh Tho
Vice Dean
Faculty of Computer Science and Engineering
Ho Chi Minh City University of Technology (HCMUT)
Vietnam National University - Ho Chi Minh City
qttho@hcmut.edu.vn
http://www.cse.hcmut.edu.vn/qttho/

• BEng, HCMUT, Vietnam, 1998

• PhD, NTU, Singapore, 2006
• Research Interests: Artificial Intelligence,
Natural Language Processing, intelligent
systems, formal methods
2
NLP Milestones

Quan Thanh Tho, “Modern Approaches in Natural Language Processing”, VNU Journal of Science:
Computer Science and Communication Engineering, 2022 3
Agenda
• Sequence data and sequence models
• Seq2Seq and attention
• Transformer model
• BERT and other variants
• Applications in NLP

4
Sequence data and sequence models

5
Sequence data
A series of data points whose points reliant on each other
• Length can be varied
• Positions matter

6
Problem of Standard Networks

• Inputs, outputs can be different lengths in different

examples.
• Relations between positions are not well reflected

7
RNN comes as a rescue

RNN: an architecture tailored for sequence data:

1) Doesn't depend on data length
2) Take advantage of past information
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation.
In D. E. Rumelhart & J. L. Mcclelland (Eds.), Parallel distributed processing: Explorations in the microstructure of
cognition, Volume 1: Foundations (pp. 318–362). MIT Press 8
Seq2seq and Attention

10
Intuition

11
Seq2Seq architecture
NLP researchers also employ
that idea into designing a
structure dubbed as
Sequence-to-Sequence
(Seq2Seq), which extends
AutoEncoder architecture

Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel,

R., Urtasun, R., Torralba, A., & Fidler, S.
(2015). Skip-thought vectors. In C. Cortes, N.
Lawrence, D. Lee, M. Sugiyama, & R. Garnett
(Eds.), Advances in neural information
processing systems. Curran Associates, Inc

Sutskever, I., Vinyals, O., & Le, Q. V. (2014).

Sequence to sequence learning with neural
networks. Advances in neural information
processing systems, 27

12
Seq2Seq: The bottle neck problem

13
Seq2Seq with attention

14
Seq2Seq with attention

15
Seq2Seq with attention

16
Seq2Seq with attention

17
Seq2Seq with attention

18
Seq2Seq with another bottleneck

19
20
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you
need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural
information processing systems. Curran Associates, Inc 21
22
Inside an Encoder Block

23
Scaled Dot Product Attention

24
Scaled Dot Product Attention

25
Scaled Dot Product Attention

26
Scaled Dot Product Attention

RNN-based Seq2Seq:
- Keys and Values are the same
- Queries are provided from
encoder

27
Self-Attention in Transformer
• Attention maps a query and a set of key-value
pairs to an output
• query, keys, and output are all vectors

28
Self-Attention

Image source: 29
https://jalammar.github.io/illustrated-transformer/
30
31
32
34
35
36
37
38
BERT and other variants

40
Transformer-based Language
Models

41
BERT
• Bidirectional Encoder Representations from
Transformers.
• Use the Transformer Encoder architecture.
• Introduced in 2018 by Google AI.

Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: pretraining of
deep bidirectional transformers for language understanding (J. Burstein, C.
Doran, & T. Solorio, Eds.).
Architecture
Pretraining
• Two unsupervised tasks:
1. Masked Language Model
2. Next Sentence Prediction
Text Classification
GPT
• Generative Pre-trained Transformer
• Use the Transformer Decoder architecture.
• Introduced in 2018 by OpenAI.

Openai [Accessed: 2023-03-01]. (2023). https://openai.com/

How it works?
XLNet
• Autoencoding (BERT):
• [MASK] tokens do not appear during finetuning ⇒
pretrain-finetuning discrepancy.
• Assume the predicted tokens are independent of each
other given the unmasked tokens. Example: “New York is
a city” ⇒ “[MASK] [MASK] is a city”
• Autoregressive (GPT):
• Only trained to encode a unidirectional context (forward
or backward).
Yang, Z. et al. “XLNet: Generalized Autoregressive Pretraining for
Language Understanding.” NeurIPS (2019)
XLNet
• XLNet combines pros from both while avoiding
their cons.
• Techniques:
• Permutation Language Modeling
• Two-Stream Self-Attention for Target-Aware
Representations
• Incorporating Ideas from Transformer-XL
• Modeling Multiple Segments
Applications in NLP

54
NLP typical pipeline

55
NLP DL-based pipeline

56
Pre-trained Neural Language Model

ULMFit (Howard and Rudder, 2018) 57

NLP LM-based pipeline

58
NLP LM-based pipeline

59
From BERT to BART
• BERT is not a fully Seq2Seq model (i.e. not a generative model)
• BART is introduced as an extended/complement

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and
Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation,
and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages
7871–7880, Online. Association for Computational Linguistics.
60
From PhoBERT to BARTPho

61
BARTPho for Vietnamese
translation applications
• Pretrained with Vietnamese
• Implicitly processing “aligning” task
• More powerful if the target language has similar
language to Vietnamese (Chinese, Bahnaric, etc. )

62
A Demo to be concluded

• https://www.ura.hcmut.edu.vn/bahnar/nmt
63
Thank you

5a. Recurrent Neural Networks
No ratings yet
5a. Recurrent Neural Networks
45 pages
RNN-StannfordBased
No ratings yet
RNN-StannfordBased
102 pages
Seq 2 Seq
No ratings yet
Seq 2 Seq
61 pages
unit6
No ratings yet
unit6
26 pages
Introduction To Transformers
No ratings yet
Introduction To Transformers
187 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen.li
No ratings yet
Nicole Koenigstein - Transformers in Action (MEAP v7) 2024 (2024, Manning Publications Co.) - Libgen.li
272 pages
Lesson 14 - Transformer
No ratings yet
Lesson 14 - Transformer
124 pages
NLP DL Lecture4
No ratings yet
NLP DL Lecture4
78 pages
BERT and Transformer
No ratings yet
BERT and Transformer
48 pages
Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey
No ratings yet
Attention Mechanism, Transformers, BERT, and GPT: Tutorial and Survey
14 pages
7 Transformers
No ratings yet
7 Transformers
20 pages
L5 Training Neural Networks Part 2 en v2
No ratings yet
L5 Training Neural Networks Part 2 en v2
70 pages
MARINCO2019 Poster Presentation
No ratings yet
MARINCO2019 Poster Presentation
1 page
10 Attention N Bert
No ratings yet
10 Attention N Bert
55 pages
Deep Learning Basics
No ratings yet
Deep Learning Basics
10 pages
Gonzalez Ch06
No ratings yet
Gonzalez Ch06
47 pages
Gibbons2019talk M.J.dinneen
No ratings yet
Gibbons2019talk M.J.dinneen
46 pages
Collate Ada Unit 3 Notes
No ratings yet
Collate Ada Unit 3 Notes
42 pages
Bert
No ratings yet
Bert
60 pages
718900810-BK-SINGH_ch 2
No ratings yet
718900810-BK-SINGH_ch 2
40 pages
Transformers MUIA
No ratings yet
Transformers MUIA
34 pages
BERT Finetuning Theory
No ratings yet
BERT Finetuning Theory
14 pages
Transformers Tutorial 1 56
No ratings yet
Transformers Tutorial 1 56
56 pages
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
No ratings yet
Large Language Models For Information Management - 01 - Modulo Base (MB) - 4pdf
68 pages
GenAI Workflow Automation NPTEL Zoom Course
No ratings yet
GenAI Workflow Automation NPTEL Zoom Course
88 pages
L12 Generative Models en
No ratings yet
L12 Generative Models en
65 pages
Sequence-To-Sequence, Attention, Transformer - Machine Learning Lecture
No ratings yet
Sequence-To-Sequence, Attention, Transformer - Machine Learning Lecture
20 pages
14.Chapter10_AdvancedDeepLearningForText
No ratings yet
14.Chapter10_AdvancedDeepLearningForText
22 pages
2022-foundations-tutorial3-sunwang-deeplearning4nlp
No ratings yet
2022-foundations-tutorial3-sunwang-deeplearning4nlp
103 pages
Linear Programming
No ratings yet
Linear Programming
24 pages
L6 Hardware and Software For DL en
No ratings yet
L6 Hardware and Software For DL en
66 pages
2AMM30+AY23 24+Text+Mining+Lecture+3
No ratings yet
2AMM30+AY23 24+Text+Mining+Lecture+3
88 pages
Attention_is_All_You_Need__Explained
No ratings yet
Attention_is_All_You_Need__Explained
46 pages
Transformer Tutorial
No ratings yet
Transformer Tutorial
14 pages
14 04 Transformers
No ratings yet
14 04 Transformers
11 pages
2.5 The Mandelbrot Set PDF
No ratings yet
2.5 The Mandelbrot Set PDF
13 pages
1.3 Taylor Series
No ratings yet
1.3 Taylor Series
29 pages
Implicit Euler
No ratings yet
Implicit Euler
37 pages
unit5 3
No ratings yet
unit5 3
48 pages
Julia Ode
No ratings yet
Julia Ode
22 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
Transformer networks
No ratings yet
Transformer networks
53 pages
Unit - 3
No ratings yet
Unit - 3
55 pages
02-Transformer Based NLP Applications
No ratings yet
02-Transformer Based NLP Applications
57 pages
L.7
No ratings yet
L.7
54 pages
Descartes' Rule of Signs, Rational Zero Test
No ratings yet
Descartes' Rule of Signs, Rational Zero Test
18 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
A Practical Survey On Faster and Lighter Transformers - 2023 - Fournier Et Al
No ratings yet
A Practical Survey On Faster and Lighter Transformers - 2023 - Fournier Et Al
40 pages
Simulation Modeling and Analysis 5th Edition Law Solutions Manual instant download
100% (3)
Simulation Modeling and Analysis 5th Edition Law Solutions Manual instant download
40 pages
Tranformrerz
No ratings yet
Tranformrerz
62 pages
1102AITA04 AI For Text Analytics
No ratings yet
1102AITA04 AI For Text Analytics
88 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Transformer Design Report
No ratings yet
Transformer Design Report
21 pages
Image Tampering Detection
No ratings yet
Image Tampering Detection
19 pages
14-LookingForward
No ratings yet
14-LookingForward
48 pages
Advanced Deep Learning and Transformers - Cirrincione
No ratings yet
Advanced Deep Learning and Transformers - Cirrincione
3 pages
2024_Transformer_master
No ratings yet
2024_Transformer_master
50 pages
LLM
No ratings yet
LLM
41 pages
aM3RdIpjnYdPsGKF
No ratings yet
aM3RdIpjnYdPsGKF
20 pages
Kadane
100% (1)
Kadane
15 pages
paper_review
No ratings yet
paper_review
6 pages
Thuyết Trình TWP
No ratings yet
Thuyết Trình TWP
7 pages
How Transformers Work_ A Detailed Exploration of Transformer Architecture _ DataCamp
No ratings yet
How Transformers Work_ A Detailed Exploration of Transformer Architecture _ DataCamp
19 pages
Complete NLP Guide_ From Fundamentals to Deep Learning with TensorFlow
No ratings yet
Complete NLP Guide_ From Fundamentals to Deep Learning with TensorFlow
13 pages
Transformers
No ratings yet
Transformers
27 pages
imp_ml
No ratings yet
imp_ml
8 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
62 pages
Cs224n Self Attention Transformers 2023 Draft
No ratings yet
Cs224n Self Attention Transformers 2023 Draft
18 pages
Transportation Simplex Method
No ratings yet
Transportation Simplex Method
14 pages
Introduction To ANSYS CFD Professional: Best Practice Guidelines
No ratings yet
Introduction To ANSYS CFD Professional: Best Practice Guidelines
11 pages
Sine-And-Cosine-Rules 3
No ratings yet
Sine-And-Cosine-Rules 3
10 pages
Attention Book Sample
No ratings yet
Attention Book Sample
32 pages
4.1. Moving Cone Algorithm
No ratings yet
4.1. Moving Cone Algorithm
9 pages
Ma2020 Assign 3
100% (1)
Ma2020 Assign 3
2 pages
Overview of The Transformer-Based Models For NLP Tasks
No ratings yet
Overview of The Transformer-Based Models For NLP Tasks
5 pages
Transformers in NLP 1
No ratings yet
Transformers in NLP 1
9 pages
Computer 101
No ratings yet
Computer 101
7 pages
Linear Convolution Using DFT
No ratings yet
Linear Convolution Using DFT
2 pages
Practice Test
No ratings yet
Practice Test
8 pages
Evolution and Optimum Seeking Schwefel PDF
No ratings yet
Evolution and Optimum Seeking Schwefel PDF
2 pages
Transformer
No ratings yet
Transformer
5 pages
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
No ratings yet
REPORT-MTechPESJul23BGrp2-3 (22-02-25)
15 pages
Choosing A Solver
No ratings yet
Choosing A Solver
12 pages
Attention 1 2
No ratings yet
Attention 1 2
2 pages
What Is A Perceptron?
No ratings yet
What Is A Perceptron?
1 page
Part 1 Linux Chapter 4 Process Management
No ratings yet
Part 1 Linux Chapter 4 Process Management
10 pages
ELISA-Logit21042005-TESTING20190531
No ratings yet
ELISA-Logit21042005-TESTING20190531
6 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
Generative AI Interview Questions and Answers
No ratings yet
Generative AI Interview Questions and Answers
7 pages
AI Navigation Cheat Sheet PDF
No ratings yet
AI Navigation Cheat Sheet PDF
1 page
Taller #1 Introducción. Taylor
No ratings yet
Taller #1 Introducción. Taylor
2 pages
HW2
No ratings yet
HW2
3 pages
Q2 LAS#1 Polynomial Function
No ratings yet
Q2 LAS#1 Polynomial Function
2 pages
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
From Everand
BERT Foundations and Applications: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

RADL TTho

Uploaded by

RADL TTho

Uploaded by

Transformer and its variants for NLP

Quan Thanh Tho

• BEng, HCMUT, Vietnam, 1998

• Inputs, outputs can be different lengths in different

RNN: an architecture tailored for sequence data:

Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel,

Sutskever, I., Vinyals, O., & Le, Q. V. (2014).

Openai [Accessed: 2023-03-01]. (2023). https://openai.com/

ULMFit (Howard and Rudder, 2018) 57

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.