0% found this document useful (0 votes)

848 views12 pages

Attention

The document discusses problems with traditional encoder-decoder recurrent neural networks (RNNs) for tasks like machine translation. It introduces the attention mechanism as a solution. With attention, the encoder outputs are connected to the decoder via a learned attention layer, allowing the decoder to focus on relevant parts of the input at each step. This addresses the problem of having to compress all input information into a single fixed-length vector. The mathematics of calculating attention weights at each decoder step are also described.

Uploaded by

api-332129590

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

848 views12 pages

Attention

Uploaded by

api-332129590

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Why does attention

mechanism work while

training Encoder-Decoder
Recurrent Neural Networks
Traditional Encoder decoder model
for Recurrent Neural Networks
Problems in this approach
 Asking the encoder to learn what was spoken (input data); that is
captured in the activation of the last input cell, thereafter using only this
activation value and a start sequence (which is usually zero ) we want to
produce the output sequence [ NOTE: A SINGLE RNN LAYER WITH INPUT
AND OUTPUT WILL NOT BE POSSIBLE HERE BECAUSE THE LENGTH OF INTPUT
SEQUENCE CAN BE DIFFERENT FROM THE LENGTH OF THE OUTPUT
SEQUENCE ! ] . Thus the entire information from the input is represented
by a SINGLE vector that is the activation of the last encoder cell
Problems in this approach continued
 This approach is okayish if we have very small input sequences, but
for large input sequences this method performs very poorly as
translation relies on reading a complete sentence and compressing
all information into a fixed-length vector, as we can imagine, a
sentence with hundreds of words represented by several words will
surely lead to information loss, inadequate translation, etc (
Vanishing gradients is not the only reason, because even if the RNNs
are replaced by LSTMs, the problem still persists : making a single
vector model entire information of a sequence is too much of an
ask and this method doesn’t scale)
 What is the solution ? -> ATTENTION learning mechanism
Attention Mechanism
 I will explain with the example of Machine Translation task
 Input is represented as x<t'> , where t' represent a input time frame
 The encoder is a Bi-directional LSTM with activations called a<t'>
 The decoder is a simple LSTM , with activations called as s<t>. Important to
note that I am representing input sequence with time label t' and output
sequence with time label t( which again highlights the important
observation that length of output is different (in a general case) from the
length of input)
 The encoder and decoder are connected with each other via a attention
layer as described in the next slide. Observe the difference in architecture
of the previous and current systems
 Due to poor support of mathematical notations in powerpoint online, I am
forced to draw the structure by hand :P
The overall model :
Decoder
|
Attention
|
Encoder
The mathematics of
interconnection of
encoder to decoder
via the attention layer
Mathematics of the
attention weights
 Implement the whole model as discussed in the last few slides and
train it with gradient descent(backprop) [this will be straightforward
to do because gradients are easy to find as all variables have been
neatly expressed as differentiable mathematical functions of the
remaining variables of the model]
 Finally ,while figuring out the output for a particular time step , the
network pays attention to the appropriate parts of the input as
learnt by the attention layer
Advantage of
attention
 A big advantage of attention is that it gives us
the ability to interpret and visualize what the
model is doing. For example, by visualizing
the attention weight matrix when a sentence
is translated, we can understand how the
model is translating (as shown for the task of
machine translation)
Places where attention is used

 Image captioning ( this idea was first introduced in this domain)

 Machine translation
 Speech Recognition
Image Captioning Example:

15 - NEW 2020 ATTENTION ENC DEC TRANSFORMERS Lect15
No ratings yet
15 - NEW 2020 ATTENTION ENC DEC TRANSFORMERS Lect15
50 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
15 pages
Attention Is All You Need Paper Explained Well
No ratings yet
Attention Is All You Need Paper Explained Well
18 pages
L22 - Attention in Deep Learning
No ratings yet
L22 - Attention in Deep Learning
65 pages
cs224n 2022 Lecture08 Final Project
No ratings yet
cs224n 2022 Lecture08 Final Project
71 pages
Visualizing A Neural Machine Translation Model
No ratings yet
Visualizing A Neural Machine Translation Model
38 pages
Attention Is All You Need Explained
No ratings yet
Attention Is All You Need Explained
46 pages
Vinija's Notes - Natural Language Processing - Attention
No ratings yet
Vinija's Notes - Natural Language Processing - Attention
27 pages
UNIT 2 FULL - Compressed
No ratings yet
UNIT 2 FULL - Compressed
26 pages
05 Attention Slides
No ratings yet
05 Attention Slides
69 pages
Unit5 3
No ratings yet
Unit5 3
48 pages
L3 Transformer and PLMs
No ratings yet
L3 Transformer and PLMs
111 pages
Attention Attention!
No ratings yet
Attention Attention!
26 pages
Pervasive Attention 2D Convolutional Neural Networks For Sequence-to-Sequence Prediction
No ratings yet
Pervasive Attention 2D Convolutional Neural Networks For Sequence-to-Sequence Prediction
11 pages
Attention - Attention! - Lil'Log
No ratings yet
Attention - Attention! - Lil'Log
23 pages
Sequence-To-Sequence, Attention, Transformer - Machine Learning Lecture
No ratings yet
Sequence-To-Sequence, Attention, Transformer - Machine Learning Lecture
20 pages
M5 Topic 1 - Encoder Decoder
No ratings yet
M5 Topic 1 - Encoder Decoder
21 pages
Week9 Seq2seq
No ratings yet
Week9 Seq2seq
32 pages
Project Report-Lg
100% (1)
Project Report-Lg
85 pages
Modern Language Models
No ratings yet
Modern Language Models
28 pages
Attention and Memory in Deep Learning and NLP
No ratings yet
Attention and Memory in Deep Learning and NLP
8 pages
MR Marmalade PDF
No ratings yet
MR Marmalade PDF
26 pages
Bio-Stats Step 3
100% (6)
Bio-Stats Step 3
9 pages
RNN StannfordBased
No ratings yet
RNN StannfordBased
102 pages
Attention Mechanism
No ratings yet
Attention Mechanism
2 pages
Lesson 4: Attention Is All You Need Encoder and Decoder Processes
No ratings yet
Lesson 4: Attention Is All You Need Encoder and Decoder Processes
5 pages
Attn Is All You Need
No ratings yet
Attn Is All You Need
15 pages
Attention in Neural Networks
No ratings yet
Attention in Neural Networks
8 pages
Attention Is All You Need
67% (3)
Attention Is All You Need
11 pages
3.1 Language Models and Attention
No ratings yet
3.1 Language Models and Attention
22 pages
AN2DL 06 2324 AttentionAndTrasformers
No ratings yet
AN2DL 06 2324 AttentionAndTrasformers
60 pages
Comprehensive Guide Attention Mechanism Deep Learning
No ratings yet
Comprehensive Guide Attention Mechanism Deep Learning
17 pages
NLP 8
No ratings yet
NLP 8
42 pages
DAA FinalReport
No ratings yet
DAA FinalReport
14 pages
02-Transformer Based NLP Applications
No ratings yet
02-Transformer Based NLP Applications
57 pages
01 The Transformer
No ratings yet
01 The Transformer
64 pages
20190630transformer 210110081057
No ratings yet
20190630transformer 210110081057
32 pages
Lecture15 Transformer
No ratings yet
Lecture15 Transformer
26 pages
Unlocking Linguistic Intelligence - Attention Mechanisms and Transformer Architectures in NLP
No ratings yet
Unlocking Linguistic Intelligence - Attention Mechanisms and Transformer Architectures in NLP
117 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
11 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
19 pages
AATN Merged
No ratings yet
AATN Merged
139 pages
2024 Transformer Master
No ratings yet
2024 Transformer Master
50 pages
7181 Attention Is All You Need
No ratings yet
7181 Attention Is All You Need
11 pages
Deep Neural Network Module 7 Attention Transformer
No ratings yet
Deep Neural Network Module 7 Attention Transformer
40 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
62 pages
Project Report Group1
100% (2)
Project Report Group1
91 pages
Attention
No ratings yet
Attention
15 pages
Attention Is All You Need
No ratings yet
Attention Is All You Need
15 pages
Lecture Notes - Advanced Language Model - BERT, GPT
No ratings yet
Lecture Notes - Advanced Language Model - BERT, GPT
24 pages
Attention Is All You Need: Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit
No ratings yet
Attention Is All You Need: Ashish Vaswani Noam Shazeer Niki Parmar Jakob Uszkoreit
15 pages
Attention Is All You Need - NIPS-2017-attention-is-all-you-need-Paper
No ratings yet
Attention Is All You Need - NIPS-2017-attention-is-all-you-need-Paper
11 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Aiayn
No ratings yet
Aiayn
15 pages
DR 68 V 7 BT 98 Ny 9 M
No ratings yet
DR 68 V 7 BT 98 Ny 9 M
23 pages
Attention Is All You Need Paper - Removed
No ratings yet
Attention Is All You Need Paper - Removed
9 pages
Projectreport-G15 Tue
100% (1)
Projectreport-G15 Tue
19 pages
What Is A Transformer
No ratings yet
What Is A Transformer
11 pages
Notes 2 Transformer Model Architecture
No ratings yet
Notes 2 Transformer Model Architecture
4 pages
Wave Properties
100% (1)
Wave Properties
2 pages
A1
No ratings yet
A1
11 pages
Example File
No ratings yet
Example File
3 pages
Attention 1 2
No ratings yet
Attention 1 2
2 pages
Properties of Ocean Water
100% (1)
Properties of Ocean Water
5 pages
Coursera vg79h67t6f58
No ratings yet
Coursera vg79h67t6f58
1 page
Verilog Final Code
No ratings yet
Verilog Final Code
5 pages
Music As Persuasive Communication StrategyinAdvertising and Branding
No ratings yet
Music As Persuasive Communication StrategyinAdvertising and Branding
18 pages
Women Empowerment
100% (1)
Women Empowerment
7 pages
Coursera A6n52bwq2vkg
No ratings yet
Coursera A6n52bwq2vkg
1 page
Project Report Iitd KD
No ratings yet
Project Report Iitd KD
48 pages
BTP Thesis rs1 End-To-End-Asr
No ratings yet
BTP Thesis rs1 End-To-End-Asr
51 pages
MayaCredit SoA 2024FEB
No ratings yet
MayaCredit SoA 2024FEB
3 pages
Fairmot Explained 1
No ratings yet
Fairmot Explained 1
19 pages
Cls v2 1 6
No ratings yet
Cls v2 1 6
15 pages
Presentation 2
No ratings yet
Presentation 2
12 pages
Isp98 Confirming Undertaking
No ratings yet
Isp98 Confirming Undertaking
5 pages
Q2 Project Instructions
No ratings yet
Q2 Project Instructions
12 pages
Ps Ip
No ratings yet
Ps Ip
7 pages
Towards Adapting NMF Dictionaries Using Total Variability Modeling For Noise-Robust Acoustic Features
No ratings yet
Towards Adapting NMF Dictionaries Using Total Variability Modeling For Noise-Robust Acoustic Features
5 pages
General Feedback For Module 7
No ratings yet
General Feedback For Module 7
1 page
Coursera Vgle3dsyt3ke
No ratings yet
Coursera Vgle3dsyt3ke
1 page
Schengen
No ratings yet
Schengen
8 pages
Gravitation Revision Notes (JEE Mains)
No ratings yet
Gravitation Revision Notes (JEE Mains)
33 pages
Corporate and Academic Services: Part 1: Basic Data
No ratings yet
Corporate and Academic Services: Part 1: Basic Data
3 pages
Sow and Comp For Petanque Court
No ratings yet
Sow and Comp For Petanque Court
6 pages
Detailed Lesson Plan in Physical Science mhelDS
No ratings yet
Detailed Lesson Plan in Physical Science mhelDS
16 pages
Rs 1 Poster
No ratings yet
Rs 1 Poster
1 page
Usc Poster
No ratings yet
Usc Poster
1 page
Coursera Wx29vxacwe33
No ratings yet
Coursera Wx29vxacwe33
1 page
Coursera Vtwwcbh3ae6w
No ratings yet
Coursera Vtwwcbh3ae6w
1 page
Coursera lz9hplj95ph6
No ratings yet
Coursera lz9hplj95ph6
1 page
Coursera Kaxe2yuddqpy
No ratings yet
Coursera Kaxe2yuddqpy
1 page
EnviroBLASTO: A Calculator For Estimating The Environmental Impacts of Rock Blasting
No ratings yet
EnviroBLASTO: A Calculator For Estimating The Environmental Impacts of Rock Blasting
6 pages
Snapdragon X POCO F7 KOL Narrative
No ratings yet
Snapdragon X POCO F7 KOL Narrative
6 pages
SAP MaxDB Content Server Architecture
No ratings yet
SAP MaxDB Content Server Architecture
72 pages
Data Structure Programs Using C Language (Unit-3)
No ratings yet
Data Structure Programs Using C Language (Unit-3)
10 pages
Electrical Technology
No ratings yet
Electrical Technology
24 pages
L2 and L3-Network Classification-Topology
No ratings yet
L2 and L3-Network Classification-Topology
17 pages
ECONOMY
No ratings yet
ECONOMY
18 pages
Benchmarking Optimizers
No ratings yet
Benchmarking Optimizers
30 pages
Rajant SpecSheet LX5 Squid Cable 110817
No ratings yet
Rajant SpecSheet LX5 Squid Cable 110817
2 pages
Hunshu
No ratings yet
Hunshu
6 pages
Money and Banking Notes Part 2
No ratings yet
Money and Banking Notes Part 2
3 pages
Review of Hussain Sagar Lake Pollution, Hyderabad, India
No ratings yet
Review of Hussain Sagar Lake Pollution, Hyderabad, India
7 pages
Reactive Streams PDF
No ratings yet
Reactive Streams PDF
4 pages
Chemical Identity: Material Safety Data Sheet Gasoline/Petrol
No ratings yet
Chemical Identity: Material Safety Data Sheet Gasoline/Petrol
4 pages
RM-898 Research Methodology For DM-5, CE&M-4, SE-3 and TN-2 Students Only
No ratings yet
RM-898 Research Methodology For DM-5, CE&M-4, SE-3 and TN-2 Students Only
3 pages
A Friendly Introduction to MATLAB Programming
From Everand
A Friendly Introduction to MATLAB Programming
Orhan Gazi
No ratings yet
Digital Modulations using Matlab
From Everand
Digital Modulations using Matlab
Mathuranathan Viswanathan
4/5 (6)
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Attention

Uploaded by

Attention

Uploaded by

Why does attention

mechanism work while

 Image captioning ( this idea was first introduced in this domain)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.