0% found this document useful (0 votes)

50 views59 pages

Seq 2 Seq

This document introduces the seq2seq model for machine translation. It discusses how seq2seq uses two RNNs, an encoder and decoder, to translate sentences from one language to another. The encoder encodes the source sentence into a vector, and the decoder generates the target sentence based on the encoded vector. Beam search is used during decoding to explore multiple hypotheses instead of greedy decoding. The model is trained end-to-end using backpropagation to minimize the loss between the predicted and target words.

Uploaded by

21020641

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views59 pages

Seq 2 Seq

Uploaded by

21020641

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Seq2seq model

and application for machine

translation

Nguyen Van Vinh

UET, VNU-Hanoi

1
Content

•Introduction to Machine Translation

•The seq2seq model
•Attention mechanism
•Practice: Machine translation với mô hình seq2seq

2
Machine Translation
(Dịch máy)

Machine Translation (MT) is the task of translating a sentence x from

one language (the source language) to a sentence y in another language
(the target language).

x: L'homme est né libre, et partout il est dans les fers

y: Man is born free, but everywhere he is in chains

3
What is Neural Machine
Translation?

• Neural Machine Translation (NMT) is a way to do Machine

Translation with a single neural network

• The neural network architecture is called sequence-to-sequence (aka

seq2seq) and it involves two RNNs (LSTMs).

4
Encoder-decoder Framework
Decoder (RNN/LSTM)

Encoder (RNN/LSTM)
“Sequence to Sequence Learning with Neural
Networks”, 2014

5
Encoder-decoder Framework

Vector K-dimention
of context

Condition of the word to be generate from

the translation system

“Sequence to Sequence Learning with Neural Networks”, 2014

6
Encoder-decoder Framework

7
Encoder-decoder Framework
All inputs are
aggregatedin this
single vector

In the seq2seq, the decoder's state

depends only on the previous state and
the previous output
8
Neural Machine Translation
The sequence-to-sequence model
Target sentence (output)
Encoding of the source sentence.
Provides initial hidden state
for Decoder RNN. the poor don’t have any money <END>

max

max
max
max
max
max
max
arg

arg
arg
arg
arg
arg
arg
Encoder RNN

Decoder RNN
les pauvres sont démunis <START> the poor don’t have any money

Source sentence (input) Decoder RNN is a Language Model that generates

target sentence conditioned on encoding.
Encoder RNN produces
an encoding of the Note: This diagram shows test time behavior:
source sentence. decoder output is fed in as next step’s input
9
Training of NMT system
•As in other RNN models, we can train by minimizing the loss
function between what we predict at each step and its
ground true value.

10
Training of NMT system

11
12
Training of NMT system
= negative log = negative log = negative log
prob of “the” prob of “have” prob of <END>

Encoder RNN = + + + + + +

Decoder RNN
les pauvres sont démunis <START> the poor don’t have any money

Source sentence (from corpus) Target sentence (from corpus)

Seq2seq is optimized as a single system.

Backpropagation operates “end to end”. 13
Better-than-greedy decoding?

• Greedy decoding has no way to undo decisions!

• les pauvres sont démunis (the poor don’t have any money)
• → the ____
• → the poor ____
• → the poor are ____
• Better option: use beam search (a search algorithm) to
explore several hypotheses and select the best one

14
Decoder based on Beam search

15
Decoder based on Beam
search: Example
Beam size = 2

<START>

16
Decoder based on Beam
search: Example
Beam size = 2

the

<START>

17
Decoder based on Beam
search: Example
Beam size = 2

poor
the
people

<START>

poor
a
person

18
Decoder based on Beam
search: Example
Beam size = 2

are
poor
the don’t
people

<START>
person
poor
a but
person

19
Decoder based on Beam
search: Example
Beam size = 2
always

not
are
poor
the don’t have
people
take
<START>
person
poor
a but
person

20
Decoder based on Beam
search: Example
Beam size = 2
always
in
not
are with
poor
the don’t have
people
any
take
<START> enough
person
poor
a but
person

21
Decoder based on Beam
search: Example
Beam size = 2
always
in
not
are with
poor money
the don’t have
people funds
any
take
<START> enough
person money
poor
a but funds
person

22
Decoder based on Beam
search: Example
Beam size = 2
always
in
not
are with
poor money
the don’t have
people funds
any
take
<START> enough
person money
poor
a but funds
person

23
Beam search: stopping criterion
•In greedy decoding, usually we decode until the model
produces an <END>
Ví dụ : <START> he hit me with a pie <END>
•In beam search decoding, different hypotheses may produce
<END> tokens on different timesteps
• When a hypothesis produces <END>, that hypothesis is complete.
• Place it aside and continue exploring other hypotheses via beam search.
•Usually we continue beam search until:
• We reach timestep T (where T is some pre-defined cutoff), or
• We have at least n completed hypotheses (where n is pre-defined cutoff)

24
Advantage of NMT
Compared to SMT, NMT has many advantages:
Better performance
• More fluent
• Better use of context
• Better use of phrase similarities

• A single neural network to be optimized end-to-end

• No subcomponents to be individually optimized

• Requires much less human engineering effort

• No feature engineering
• Same method for all language pairs
25
Weakness of NMT?
Compared to SMT:

• NMT is less interpretable

• Hard to debug

• NMT is difficult to control

• For example, can’t easily specify rules or guidelines for
translation
• Safety concerns!

26
How to evaluation MT system?
BLEU (Bilingual Evaluation Understudy)

• BLEU compares the machine-written translation to one or several human-

written translation(s), and computes a similarity score based on:
• n-gram precision (usually up to 3 or 4-grams)
• Penalty for too-short system translations

• BLEU is useful but imperfect

• There are many valid ways to translate a sentence
• So a good translation can get a poor BLEU score because it has low n-
gram overlap with the human translation ☹
27
Sequence-to-sequence: the bottleneck problem

Encoding of the
source sentence. Target sentence (output)

the poor don’t have any money <END>

Encoder RNN

Decoder RNN
les pauvres sont démunis <START> the poor don’t have any money

Source sentence (input)

Problems with this architecture?

28
Sequence-to-sequence: the bottleneck problem
Encoding of the
source sentence.
Target sentence (output)
This needs to capture all
information about the source
sentence. the poor don’t have any money <END>
Information bottleneck!
Encoder RNN

Decoder RNN
les pauvres sont démunis <START> the poor don’t have any money

Source sentence (input)

29
Watching sentence embedding
Attention (Chú ý)

• Attention: Solution for bottleneck problem.

• Main idea: on each step of the decoder, use direct

connection to the encoder to focus on a particular part of
the source sequence

31
Encoder-Decoder với Attention

dot product
Attention
scores

Decoder RNN
Encoder
RNN

les pauvres sont démunis <START>

Source sentence (input) 32

Seq2Seq with attention

dot product
Attention
scores

Decoder RNN
Encoder
RNN

les pauvres sont démunis <START>

Source sentence (input) 33

Sequence-to-sequence with
attention

dot product
Attention
scores

Decoder RNN
Encoder
RNN

les pauvres sont démunis <START>

Source sentence (input) 34

Sequence-to-sequence with attention

dot product
Attention
scores

Decoder RNN
Encoder
RNN

les pauvres sont démunis <START>

Source sentence (input) 35

Sequence-to-sequence with
attention
On this decoder timestep, we’re
scores distribution mostly focusing on the first
Attention Attention
encoder hidden state (”les”)

Take softmax to turn the scores

into a probability distribution

Decoder RNN
Encoder
RNN

les pauvres sont démunis <START>

Source sentence (input) 36

Sequence-to-sequence with
attention
Attention Use the attention distribution to take a
output weighted sum of the encoder hidden
scores distribution states.
Attention Attention

The attention output mostly contains

information the hidden states that
received high attention.

Decoder RNN
Encoder
RNN

les pauvres sont démunis <START>

Source sentence (input) 37

Sequence-to-sequence with
attention
Attention the
output
scores distribution
Attention Attention

Decoder RNN
Encoder
RNN

les pauvres sont démunis <START>

Source sentence (input) 38

Sequence-to-sequence with
attention
Attention poor
output
scores distribution
Attention Attention

Decoder RNN
Encoder
RNN

les pauvres sont démunis <START> the

Source sentence (input) 39

Sequence-to-sequence with
attention
Attention don’t
output
scores distribution
Attention Attention

Decoder RNN
Encoder
RNN

les pauvres sont démunis <START> the poor

Source sentence (input) 40

Sequence-to-sequence with
attention
Attention have
output
scores distribution
Attention Attention

Decoder RNN
Encoder
RNN

les pauvres sont démunis <START> the poor don’t

Source sentence (input) 41

Sequence-to-sequence with
attention
Attention any
output
scores distribution
Attention Attention

Decoder RNN
Encoder
RNN

les pauvres sont démunis <START> the poor don’t have

Source sentence (input) 42

Sequence-to-sequence with
attention
Attention money
output
scores distribution
Attention Attention

Decoder RNN
Encoder
RNN

les pauvres sont démunis <START> the poor don’t have any

Source sentence (input) 43

Neural Machine Translation by Jointly Learn to Align
and Translate

Source: Bahdanau et al., ICLR 2015, https://arxiv.org/abs/1409.0473

Neural Machine Translation by Jointly Learn to Align
and Translate

Source: Bahdanau et al., ICLR 2015, https://arxiv.org/abs/1409.0473

Attention: Formula

• We have encoder hidden states (values)

• On timestep t, we have (query s) decoder hidden state There are
• We get the attention scores for this step: multiple ways
to do this

• We take softmax to get the attention distribution for this step (this is a
probability distribution and sums to 1)

• We use to take a weighted sum of the encoder hidden states to get the
attention output

• Finally we concatenate the attention output with the decoder hidden state
and proceed as in the non-attention seq2seq model
46
Attention scoring function
• q is the query and k is the key
• Multi-layer Perceptron (Bahdanau et al. 2015)
• Flexible, often very good with large data

• Bilinear (Luong et al. 2015)

• Dot Product (Luong et al. 2015)

Attention is so great (Bahdanau at al., 16054
Citations)
• Attention significantly improves NMT performance
• It’s very useful to allow decoder to focus on certain parts of the source
• Attention provides a more “human-like” model of the MT process
• You can look back at the source sentence while translating, rather than needing to remember it
all
• Attention solves the bottleneck problem
• Attention allows decoder to look directly at source; bypass bottleneck
• Attention helps with the vanishing gradient problem
• Attention provides some interpretability
• By inspecting attention distribution, we can see
what the decoder was focusing on
• We get alignment for free!
• This is cool because we never explicitly trained
an alignment system
• The network just learned alignment by itself 48
Key developments in attention

ChatGpt

GPT3
Attention is a general Deep Learning technique

50
Evolution of the MT system over time
[Edinburgh En-De WMT newstest2013 Cased BLEU; NMT 2015 from U. Montréal]

Source: http://www.meta-net.eu/events/meta-forum-2016/slides/09_sennrich.pdf
51
NMT: the first big story of NLP Deep Learning
Neural Machine Translation went from a fringe research activity in 2014 to
the leading standard method in 2016

• 2014: First seq2seq paper published [Sutskever et al., 2014]

• 2016: Google Translate switches from SMT to NMT and by 2018 everyone
has

• This is amazing!
• SMT systems, built by hundreds of engineers over many years, outperformed by
NMT systems trained by a small groups of engineers in a few months 52
MT solved?

• Nope!
• Many difficulties remain:
• Out-of-vocabulary words (unknow words)
• Domain mismatch between train and test data
• Maintaining context over longer text
• Low-resource language pairs (Hallucinations)

53
MT solved?

• Nope!
• Using common sense is still hard

?
54
Seq2seq is flexible and efficient!
• Seq2Seq is useful not only in the Machine Translation
• Many NLP tasks can be phrased as sequence-to-sequence:
• Summarization (long text → short text)
• Dialogue (previous utterances → next utterance)
• Parsing (input text → output parse as sequence)
• Code generation (natural language → Python code)
• OCR (image of character  text sequcence)
• ASR (sequence of Acoustic  text sequence)

55
Machine Translation Problem
• Automatic translation from English sentence to Vietnamese sentence
• Training và testing data: IWSLT 2015
Example:
Input: I like a blue book
Oputput: Tôi thích quyển sách màu xanh
Apply seq2seq + attention for NMT

Goals to be achieved:
1. Training seq2seq model for English-Vietnamese translation problem.
2. Evaluating the translation system by BLEU score.
3. Seing Attention score matrix to better understand the model
The tasks to be implemented:
1. Data preprocessing
2. Create training data
3. Write encoder, decoder, attention modul
4. Training model
5. Translate new source sentence
6. Show Attention, BLEU score.
Conclusion

• Sequence-to-sequence is the architecture of most

current NLP problems such as NMT, text generation, ...

• Attention great is a way to focus on

particular parts of the input
• Improved Seq2Seq model a lot!
• As the foundation of the Transformer model (now
dominated!)

58
References

• Speech and Language Processing 2023

(https://web.stanford.edu/~jurafsky/slp3/)
• Machine Translation and Sequence-to-Sequence Models,
Neubig, 2019, CMU
• Some slides for Universities of Stanford 2023, MIT, …

15 - NEW 2020 ATTENTION ENC DEC TRANSFORMERS Lect15
No ratings yet
15 - NEW 2020 ATTENTION ENC DEC TRANSFORMERS Lect15
50 pages
(Slides) Module 44
No ratings yet
(Slides) Module 44
119 pages
RNN LSTM GRU Transformers
0% (1)
RNN LSTM GRU Transformers
123 pages
Lecture 5: Self-Attention and Transformers
No ratings yet
Lecture 5: Self-Attention and Transformers
99 pages
Lect 07 - MT and Seq2seq
No ratings yet
Lect 07 - MT and Seq2seq
86 pages
XCS224N Module5 Slides
No ratings yet
XCS224N Module5 Slides
80 pages
NLP Baigiang 10
No ratings yet
NLP Baigiang 10
67 pages
cl8 Encdec
No ratings yet
cl8 Encdec
51 pages
AN2DL 05 2324 Seq2SeqAndWordEmbedding
No ratings yet
AN2DL 05 2324 Seq2SeqAndWordEmbedding
42 pages
06-DL-Deep Learning For Text Data (LSTM Seq2Seq Models)
No ratings yet
06-DL-Deep Learning For Text Data (LSTM Seq2Seq Models)
44 pages
Lecture 5
No ratings yet
Lecture 5
102 pages
Visualizing A Neural Machine Translation Model
No ratings yet
Visualizing A Neural Machine Translation Model
38 pages
05 Attention Slides
No ratings yet
05 Attention Slides
69 pages
cs224n 2022 Lecture08 Final Project
No ratings yet
cs224n 2022 Lecture08 Final Project
71 pages
L22 - Attention in Deep Learning
No ratings yet
L22 - Attention in Deep Learning
65 pages
Cs224n 2020 Lecture08 NMT
No ratings yet
Cs224n 2020 Lecture08 NMT
77 pages
Attention
No ratings yet
Attention
12 pages
Unit5 3
No ratings yet
Unit5 3
48 pages
UNIT 2 FULL - Compressed
No ratings yet
UNIT 2 FULL - Compressed
26 pages
DL Co4 PPT-1
No ratings yet
DL Co4 PPT-1
29 pages
Neural Machine Translation, Seq2seq, and Attention
No ratings yet
Neural Machine Translation, Seq2seq, and Attention
17 pages
Unit 3 Questions With Answers Ghanta Ka Password
No ratings yet
Unit 3 Questions With Answers Ghanta Ka Password
20 pages
Deep Recurrent Neural Networks
No ratings yet
Deep Recurrent Neural Networks
24 pages
Sequence-To-Sequence Models: CIS 530, Computational Linguistics: Spring 2018
No ratings yet
Sequence-To-Sequence Models: CIS 530, Computational Linguistics: Spring 2018
61 pages
Attention: Sharad Jones
No ratings yet
Attention: Sharad Jones
25 pages
Unit4 Notes Final
No ratings yet
Unit4 Notes Final
34 pages
Sequence-To-Sequence, Attention, Transformer - Machine Learning Lecture
No ratings yet
Sequence-To-Sequence, Attention, Transformer - Machine Learning Lecture
20 pages
Lec 11
No ratings yet
Lec 11
30 pages
NLP Answers
No ratings yet
NLP Answers
13 pages
Incorporating Source-Side Phrase Structures Into Neural Machine Translation
No ratings yet
Incorporating Source-Side Phrase Structures Into Neural Machine Translation
26 pages
Vinija's Notes - Natural Language Processing - Attention
No ratings yet
Vinija's Notes - Natural Language Processing - Attention
27 pages
Tech Doc 2 (Repaired)
No ratings yet
Tech Doc 2 (Repaired)
22 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
A Character-Level Decoder Without Explicit Segmentation For Neural Machine Translation
No ratings yet
A Character-Level Decoder Without Explicit Segmentation For Neural Machine Translation
11 pages
NLP Script
No ratings yet
NLP Script
2 pages
M5 Topic 1 - Encoder Decoder
No ratings yet
M5 Topic 1 - Encoder Decoder
21 pages
Machine Translation Wise 2016/2017
No ratings yet
Machine Translation Wise 2016/2017
58 pages
Sequence Models-II
No ratings yet
Sequence Models-II
10 pages
Unit - IV - Natural Language Processing
No ratings yet
Unit - IV - Natural Language Processing
9 pages
French To English Translator in PyTorch
No ratings yet
French To English Translator in PyTorch
30 pages
Deep Learning Basics
No ratings yet
Deep Learning Basics
10 pages
Module 3 Part 2 Encoder
No ratings yet
Module 3 Part 2 Encoder
14 pages
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
No ratings yet
The Illustrated Transformer - Jay Alammar - Visualizing Machine Learning One Concept at A Time - .Booklet
14 pages
Language Translation
No ratings yet
Language Translation
15 pages
Modern Language Models
No ratings yet
Modern Language Models
28 pages
Attention and Memory in Deep Learning and NLP
No ratings yet
Attention and Memory in Deep Learning and NLP
8 pages
Copy Mechanism
No ratings yet
Copy Mechanism
10 pages
Encoder-Decoder Sequence To Sequence Architechure
No ratings yet
Encoder-Decoder Sequence To Sequence Architechure
16 pages
Polynomial Expansion Paper
No ratings yet
Polynomial Expansion Paper
4 pages
Neural Machine Translation PDF
No ratings yet
Neural Machine Translation PDF
15 pages
Pervasive Attention 2D Convolutional Neural Networks For Sequence-to-Sequence Prediction
No ratings yet
Pervasive Attention 2D Convolutional Neural Networks For Sequence-to-Sequence Prediction
11 pages
Sequence To Sequence
No ratings yet
Sequence To Sequence
4 pages
AAM Unit 6 Notes
No ratings yet
AAM Unit 6 Notes
20 pages
Module 4-1
No ratings yet
Module 4-1
44 pages
Lesson 4: Attention Is All You Need Encoder and Decoder Processes
No ratings yet
Lesson 4: Attention Is All You Need Encoder and Decoder Processes
5 pages
Dynamic Chat Bot
No ratings yet
Dynamic Chat Bot
4 pages
TAUS Translation Technology Landscape Report
100% (5)
TAUS Translation Technology Landscape Report
39 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
What Is A Transformer
No ratings yet
What Is A Transformer
11 pages
Natural Language Processing Notes
No ratings yet
Natural Language Processing Notes
80 pages
Large Language Model
0% (1)
Large Language Model
38 pages
Word Embeddings
No ratings yet
Word Embeddings
55 pages
There Are Four Types of MT
No ratings yet
There Are Four Types of MT
8 pages
Eti 3111
No ratings yet
Eti 3111
28 pages
NLP Short Que Ans
No ratings yet
NLP Short Que Ans
21 pages
Natural Language Processing
100% (2)
Natural Language Processing
9 pages
Investigating Machine Translation Errors in Render
No ratings yet
Investigating Machine Translation Errors in Render
14 pages
Srihari Thirumaligai Senior Project Final Draft
No ratings yet
Srihari Thirumaligai Senior Project Final Draft
13 pages
Zhao Shen 2022 Book Review Empirical Studies of Translation and Interpreting The Post Structuralist Approach
No ratings yet
Zhao Shen 2022 Book Review Empirical Studies of Translation and Interpreting The Post Structuralist Approach
5 pages
Case Study Question
No ratings yet
Case Study Question
16 pages
Machine Translation System For Amharic Text To Ethiopian Sign Language
No ratings yet
Machine Translation System For Amharic Text To Ethiopian Sign Language
74 pages
AI-Natural Language Processing (NLP) - IJRASET
No ratings yet
AI-Natural Language Processing (NLP) - IJRASET
8 pages
Reinforcement of Low-Resource Language Translation With Neural Machine Translation and Backtranslation Synergies
No ratings yet
Reinforcement of Low-Resource Language Translation With Neural Machine Translation and Backtranslation Synergies
11 pages
Machine Translation From Text To Sign Language: A Systematic Review
No ratings yet
Machine Translation From Text To Sign Language: A Systematic Review
35 pages
Cbbm870019ac 001 PDF
No ratings yet
Cbbm870019ac 001 PDF
120 pages
Word N-Gram Based Approach For Word Sense Disambiguation in Telugu Natural Language Processing
No ratings yet
Word N-Gram Based Approach For Word Sense Disambiguation in Telugu Natural Language Processing
5 pages
Comparative Analysis of Translation Results Using A Machine Translator Between Google Translate and Bing Translator
No ratings yet
Comparative Analysis of Translation Results Using A Machine Translator Between Google Translate and Bing Translator
8 pages
SSRN Id3441139
No ratings yet
SSRN Id3441139
10 pages
ObjectStore Mechanism
No ratings yet
ObjectStore Mechanism
24 pages
A Recipe For Arabic-English Neural Machine Translation
No ratings yet
A Recipe For Arabic-English Neural Machine Translation
5 pages
UBMK2024 Bildiriler Kitabı Article
No ratings yet
UBMK2024 Bildiriler Kitabı Article
8 pages
Research Proposal - An Approach To Machine Translation of Bantu Languages Using Cinyanja and Ciyawo
No ratings yet
Research Proposal - An Approach To Machine Translation of Bantu Languages Using Cinyanja and Ciyawo
23 pages
State Diagram
No ratings yet
State Diagram
13 pages
Mihiret Bekel Proposal
No ratings yet
Mihiret Bekel Proposal
14 pages
Tagalog To English - Google Search
No ratings yet
Tagalog To English - Google Search
1 page
File 2
No ratings yet
File 2
10 pages
02 Ratianantitra icARTi2023
No ratings yet
02 Ratianantitra icARTi2023
6 pages
The Challenges of Using Neural Machine Translation For Literature
No ratings yet
The Challenges of Using Neural Machine Translation For Literature
10 pages
1-English-Myanmar (Burmese) Phrase-Based SMT With One-To-One and One-To-Multiple Translations Corpora
No ratings yet
1-English-Myanmar (Burmese) Phrase-Based SMT With One-To-One and One-To-Multiple Translations Corpora
8 pages
Midterm Exam For BAsic of Trnalsation 2024
No ratings yet
Midterm Exam For BAsic of Trnalsation 2024
4 pages
A3 Machine Translation
No ratings yet
A3 Machine Translation
2 pages
Hutchins 50 Tears of Translation
No ratings yet
Hutchins 50 Tears of Translation
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Seq 2 Seq

Uploaded by

Seq 2 Seq

Uploaded by

Seq2seq model

and application for machine

Nguyen Van Vinh

•Introduction to Machine Translation

Machine Translation (MT) is the task of translating a sentence x from

x: L'homme est né libre, et partout il est dans les fers

y: Man is born free, but everywhere he is in chains

• Neural Machine Translation (NMT) is a way to do Machine

• The neural network architecture is called sequence-to-sequence (aka

Condition of the word to be generate from

“Sequence to Sequence Learning with Neural Networks”, 2014

In the seq2seq, the decoder's state

Source sentence (input) Decoder RNN is a Language Model that generates

Source sentence (from corpus) Target sentence (from corpus)

Seq2seq is optimized as a single system.

• Greedy decoding has no way to undo decisions!

• A single neural network to be optimized end-to-end

• Requires much less human engineering effort

• NMT is less interpretable

• NMT is difficult to control

• BLEU compares the machine-written translation to one or several human-

• BLEU is useful but imperfect

the poor don’t have any money <END>

Source sentence (input)

Problems with this architecture?

Source sentence (input)

• Attention: Solution for bottleneck problem.

• Main idea: on each step of the decoder, use direct

les pauvres sont démunis <START>

Source sentence (input) 32

les pauvres sont démunis <START>

Source sentence (input) 33

les pauvres sont démunis <START>

Source sentence (input) 34

les pauvres sont démunis <START>

Source sentence (input) 35

Take softmax to turn the scores

les pauvres sont démunis <START>

Source sentence (input) 36

The attention output mostly contains

les pauvres sont démunis <START>

Source sentence (input) 37

les pauvres sont démunis <START>

Source sentence (input) 38

les pauvres sont démunis <START> the

Source sentence (input) 39

les pauvres sont démunis <START> the poor

Source sentence (input) 40

les pauvres sont démunis <START> the poor don’t

Source sentence (input) 41

les pauvres sont démunis <START> the poor don’t have

Source sentence (input) 42

Source sentence (input) 43

Source: Bahdanau et al., ICLR 2015, https://arxiv.org/abs/1409.0473

Source: Bahdanau et al., ICLR 2015, https://arxiv.org/abs/1409.0473

• We have encoder hidden states (values)

• Bilinear (Luong et al. 2015)

• Dot Product (Luong et al. 2015)

• 2014: First seq2seq paper published [Sutskever et al., 2014]

• Sequence-to-sequence is the architecture of most

• Attention great is a way to focus on

• Speech and Language Processing 2023

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.