0% found this document useful (0 votes)

14 views25 pages

DAB311 DL Week 11 RNN

The document discusses the need for sequential modeling in machine learning, highlighting the limitations of Fully Connected Networks (FCN) and Convolutional Neural Networks (CNN) in handling fixed input dimensions and lack of memory. It emphasizes the importance of sequential models for time series data, such as video and autonomous vehicle data, which exhibit periodic cycles, trends, and sudden changes. The motivation for using sequential models is to effectively capture and analyze temporal patterns in data.

Uploaded by

sidharth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views25 pages

DAB311 DL Week 11 RNN

Uploaded by

sidharth

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Sequential Modelling

Week 11
Need for sequential modelling
• Fully Connected Network (FCN)
• Fixed input dimension
• e.g : input [x1 x2 x3 x4 ……… xn ]
• If input size is <=n → set zeros
• If input size is > n → ignore the input data

• Convolutional Neural Network (CNN)

• Carry spatial information
• Good for image data

FCN and CNN: FCN and CNN:

▪ Output for a given snapshot ▪ Fixed input dimension
▪ Next set of input is treated a new snapshot ▪ Does not carry memory
Need for sequential modelling (cont’d)
• Motivation for sequential Models:
➢ Time series data E.g.: for time series data
• Video
o Periodic cycles
• Autonomous vehicle: Object
o Trends state
o Regularity • Electric circuit
o Sudden spikes/drops • Temperature variation
• Stock price

➢ Natural Language:
o Email auto complete
o Translation (e.g.: English to French)
o Sentiment analysis
Need for sequential modelling (cont’d)
• Today is the coolest temperature in Windsor
NLP:
1 2 3 4 5 6 7
▪ Varying input size
• The historical average temperature in November is 12 degree Celsius
1 8 9 5 6 10 2 11 13 14

Tokenization is the process of breaking down text into smaller, manageable pieces called "tokens.“

Word tokenization - ["I", "love", "NLP"].

Character tokenization – ["N", "L", "P"]

A token ID is a numerical identifier assigned to each token during the tokenization process
Need for sequential modelling (cont’d)
salary Loan Rejected

Credit score
Loan granted
Sequence does not matter
Problems:
Experience
• Varying input size
Needs Verifcation • Too much computation
Age • No parameter sharing

I Neutral

like Positive
Sequence matters
this
Negative
dish
Recurrent Neural Network (RNN)

Neural Network unwrap

Simple RNN : one hidden layer

Deep RNN : many hidden layer

Issues with RNN
• Vanishing gradient
• Exploding gradient

Long Short-Term Memory (LSTM)

• LSTMs introduce special units called memory cells to store information across time steps in a sequence. These
cells can maintain their state (memory) over a longer period of time than traditional RNN units.

• The memory cells are controlled by three gates: input gate, forget gate, and output gate. These gates allow
LSTMs to decide which information to keep, which to discard, and which new information to add.

Gated Recurrent Unit (GRU)

• GRUs are similar to Long Short-Term Memory (LSTMs) but have a simpler structure and fewer parameters,
making them computationally more efficient.
Large Language Models (LLM)
• Language Models:
➢ Basic NLP tasks (answering questions, translation, sentiment analysis)

• LLM is a form of Generative Artificial Intelligence (GenAI – Able to generate new content)

• LLM is a Neural Network designed to

➢ Understand
➢ Generate
➢ Respond
to human like texts
• Deep NN trained on massive (large) amount of data

Why do we call Large Language Models?

• Training on massive amount of data
• Billions of parameters
Large Language Models (cont’d.)
LLM vs Earlier NLP (or simple LM) Models
• NLP/LM:
➢ very specific tasks (e.g., translation, sentiment analysis)
➢ Not able to write an email from given instructions

• LLM:
➢ Can do wide range of NLP tasks
➢ Able to write email for a given set of instructions and more

• Why LLM is so good compared to earlier NLP/LM?

TRANSFORMER ARCHITECTURE
➢ Not all LLMs are transformers
➢ Not all transformers are LLMs
Large Language Models (cont’d.)

• Generative Artificial Intelligence (GenAI): Generate new contents

• LLM typically deals with text, but do they have to be limited to text only?
NO

• GPT 4 is a multimodal model that can process text and images, however referred as LLM due to its primary
fucus and fundamental design being around text-based tasks
• Waymo's multimodal end-to-end model refers to their integrated approach for autonomous driving, where
multiple types of data inputs (camera, radar and lidar) are processed together to make driving decisions.
Use Cases of LLM
o Machine translation: LLMs can be used to translate text from one language to another.

o Content generation: LLMs can generate new text, such as fiction, articles, and even computer
code.

o Sentiment analysis: LLMs can be used to analyze the sentiment of a piece of text, such as
determining whether it is positive, negative, or neutral.

o Text summarization: LLMs can be used to summarize a long piece of text, such as an article or a
document.

o Chatbots and virtual assistants: LLMs can be used to power chatbots and virtual assistants,
such as OpenAI's ChatGPT or Google's Gemini (formerly called Bard).

o Knowledge retrieval: LLMs can be used to retrieve knowledge from vast volumes of text in
specialized areas such as medicine or law.
Stages of Building LLMs Huge computational cost (e.g.: GPT3 training
cost is approximately 4.6 million dollars)
▪ Stage 1: Implementing the LLM
architecture and data preparation
process. This stage involves preparing
and sampling the text data and
understanding the basic mechanisms
behind LLMs.

▪ Stage 2: Pretraining an LLM to create a

foundation model. This stage involves
pretraining the LLM on unlabeled data.
Typically training on a large diverse data
set. (Also known as general data set)

▪ Stage 3: Fine-tuning the foundation

model to become a personal assistant Why is fine tuning important?
or text classifier. This stage involves fine • Train your specific data set
tuning the pretrained LLM on labeled data, • Customize for your application of
which can be either an instruction organization (e.g. health care, airline, law
dataset or a dataset with class labels. firm, educational institute etc.,)
Simplified Transformer Architecture
• An encoder that processes the input text
and produces an embedding representation
(a numerical representation that captures
many different factors in different
dimensions) of the text

• Encodes input text into vectors

• Decoder can use to generate the translated

text one word at a time.

• Generate output text from encoded

vectors

Self-attention mechanism:
• Key part of transformers that allows to weigh importance of different words/tokens relative to
each other.
• Enables model to capture long range dependencies
Transformer Architecture

Attention Is All You Need

https://arxiv.org/pdf/1706.03762
BERT Vs GPT Architecture

• Bidirectional encode representations from

transformers (BERT): the encoder segment
exemplifies BERT-like LLMs, which focus on
masked word prediction and are primarily
used for tasks like text classification

• Predict hidden words in a given

sentence

• Generate pre-trained transformers (GPT):

the decoder segment showcases GPT-like
LLMs, designed for generative tasks and
producing coherent text sequences

• Generate new words

GPT Architecture
• The GPT architecture employs only the
decoder portion of the original transformer.

• It is designed for unidirectional, left-to-right

processing, making it well suited for text
generation and next-word prediction tasks.

• Generate text in an iterative fashion, one

word at a time.
GPT Architecture (cont’d.)
Working with text
Embedding
Vector Embedding

Words corresponding to similar concepts

often appear close to each other in the
embedding space. For instance, different
types of birds appear closer to each
other in the embedding space than in
countries and cities.
Tokenizing Texts

Here, we split an input text into

individual tokens, which are either
words or special characters, such as
punctuation characters.
Converting Tokens into Token IDs

We build a vocabulary by tokenizing the

entire text in a training dataset into
individual tokens. These individual
tokens are then sorted alphabetically,
and duplicate tokens are removed. The
unique tokens are then aggregated into
a vocabulary that defines a mapping
from each unique token to a unique
integer value. The depicted vocabulary is
purposefully small and contains no
punctuation or special characters for
simplicity.
Converting Tokens into Token IDs (cont’d.)
Starting with a new text sample, we tokenize the
text and use the vocabulary to convert the text
tokens into token IDs. The vocabulary is built from
the entire training set and can be applied to the
training set itself and any new text samples. The
depicted vocabulary contains no punctuation or
special characters for simplicity.
Adding special context tokens

We add special tokens to a vocabulary to deal with

certain contexts. For instance, we add
an <|unk|> token to represent new and unknown
words that were not part of the training data and
thus not part of the existing vocabulary.
Furthermore, we add an <|endoftext|> token that
we can use to separate two unrelated text sources.
Byte Pair Encoding (BPE)
The BPE tokenizer was used to train LLMs such as GPT-2, GPT-3, and the original model used in ChatGPT.

BPE tokenizers break down unknown words into

subwords and individual characters. This way, a BPE
tokenizer can parse any word and doesn’t need to
replace unknown words with special tokens, such
as <|unk|>.

Generative AI For Dummies
67% (3)
Generative AI For Dummies
6 pages
Mod 4
No ratings yet
Mod 4
69 pages
Large Language Models (LLM)
100% (1)
Large Language Models (LLM)
139 pages
Whitepaper - Foundational Large Language Models & Text Generation
100% (2)
Whitepaper - Foundational Large Language Models & Text Generation
75 pages
Building LLMs - Stanford
No ratings yet
Building LLMs - Stanford
78 pages
Slides
No ratings yet
Slides
137 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
100% (14)
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
132 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models, Second Edition-Addison-Wesley (2024)
No ratings yet
Sinan Ozdemir - Quick Start Guide To Large Language Models, Second Edition-Addison-Wesley (2024)
279 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
Shinva 80L
100% (1)
Shinva 80L
5 pages
REPORT Contour
100% (3)
REPORT Contour
7 pages
Genaitoolboxltslides 1736779963542
No ratings yet
Genaitoolboxltslides 1736779963542
38 pages
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
100% (3)
Quick Start Guide To LLMs by Sinan Ozdemir 1703540700
275 pages
LLM and Gen AI
No ratings yet
LLM and Gen AI
4 pages
Generative AI With LArge Language Models
No ratings yet
Generative AI With LArge Language Models
36 pages
Generative AI Exists Because of The Transformer
No ratings yet
Generative AI Exists Because of The Transformer
52 pages
Transformer Basics
No ratings yet
Transformer Basics
17 pages
AI Tools
No ratings yet
AI Tools
19 pages
Robotics - PPT For Ros Etc Students Good
No ratings yet
Robotics - PPT For Ros Etc Students Good
15 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
cl13 GPT
No ratings yet
cl13 GPT
26 pages
cl13 gpt-2
No ratings yet
cl13 gpt-2
26 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
100% (5)
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
326 pages
Sinan Ozdemir Quick Start Guide To Large Language Models Strategies
No ratings yet
Sinan Ozdemir Quick Start Guide To Large Language Models Strategies
285 pages
Generative AI With Large Language Models
100% (3)
Generative AI With Large Language Models
31 pages
Generative AI Unit 3 Notes
No ratings yet
Generative AI Unit 3 Notes
8 pages
Path To The LLM & Generative AI
No ratings yet
Path To The LLM & Generative AI
12 pages
NLP Transformer Class Notes
No ratings yet
NLP Transformer Class Notes
3 pages
D 02 Large Language Models
100% (1)
D 02 Large Language Models
58 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
Report 1 Transformers
No ratings yet
Report 1 Transformers
7 pages
MS For Survey Works (Draft) R5
No ratings yet
MS For Survey Works (Draft) R5
47 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
LLM Prompting & In-Context Learning
No ratings yet
LLM Prompting & In-Context Learning
18 pages
LLM Cheatsheet
No ratings yet
LLM Cheatsheet
1 page
To Create A LLM
No ratings yet
To Create A LLM
53 pages
Rungta College of Engineering and Technology :: Dr. Vishnu Kumar Mishra :: Report
No ratings yet
Rungta College of Engineering and Technology :: Dr. Vishnu Kumar Mishra :: Report
184 pages
4-HC24.PrimisAI - Hans Bouwmeester.v4
No ratings yet
4-HC24.PrimisAI - Hans Bouwmeester.v4
29 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
LLMs
No ratings yet
LLMs
40 pages
Definition:: Large Language Models (LLMS)
No ratings yet
Definition:: Large Language Models (LLMS)
41 pages
LLM - A Introduction To Generative AI
100% (1)
LLM - A Introduction To Generative AI
31 pages
Module1 L4 LLMs New
No ratings yet
Module1 L4 LLMs New
37 pages
Theory of Elasticity
No ratings yet
Theory of Elasticity
4 pages
03 NLP Document
No ratings yet
03 NLP Document
38 pages
Clase1 Generating Your First Text
No ratings yet
Clase1 Generating Your First Text
18 pages
Day 1
No ratings yet
Day 1
32 pages
Generative AI For Everyone: Doç. Dr. Murat Mühendislik Fakültesi, Bilgisayar, Gazi Üniversitesi, E-Mail: My Gazi - Edu.tr
No ratings yet
Generative AI For Everyone: Doç. Dr. Murat Mühendislik Fakültesi, Bilgisayar, Gazi Üniversitesi, E-Mail: My Gazi - Edu.tr
44 pages
Presentation 11
No ratings yet
Presentation 11
20 pages
Cost & Management Accounting
No ratings yet
Cost & Management Accounting
3 pages
Chapter 1
No ratings yet
Chapter 1
29 pages
LLMS&EMBEDDINGS
No ratings yet
LLMS&EMBEDDINGS
10 pages
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
No ratings yet
Complete NLP Guide - From Fundamentals To Deep Learning With TensorFlow
13 pages
GenAI Syllabus
No ratings yet
GenAI Syllabus
17 pages
MSC Circ 0913
No ratings yet
MSC Circ 0913
11 pages
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
No ratings yet
How LLM's Work, How GPT Was Trained, and How GPT Generates Outputs
12 pages
TOS - Statistics and Probability - 3rd Quarter Examination
No ratings yet
TOS - Statistics and Probability - 3rd Quarter Examination
2 pages
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
No ratings yet
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
11 pages
Large Language Models
No ratings yet
Large Language Models
10 pages
Introduction To Large Language Models
No ratings yet
Introduction To Large Language Models
3 pages
Unit 4 LLM
No ratings yet
Unit 4 LLM
11 pages
A2mot En5
100% (1)
A2mot En5
5 pages
Sistema de Frenos Freight m12
No ratings yet
Sistema de Frenos Freight m12
457 pages
What Are LLMs
No ratings yet
What Are LLMs
3 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
LLM 1
No ratings yet
LLM 1
6 pages
Jurnal Manajemen Strategi Agribisnis Jessica Halaman 74 - 87
No ratings yet
Jurnal Manajemen Strategi Agribisnis Jessica Halaman 74 - 87
46 pages
Sales Forecasting
No ratings yet
Sales Forecasting
32 pages
Explain The Physical Layer of The I2C Protocol
100% (1)
Explain The Physical Layer of The I2C Protocol
7 pages
Mil H 6875H
No ratings yet
Mil H 6875H
29 pages
2HCA-Health Analyst Toolkit
No ratings yet
2HCA-Health Analyst Toolkit
17 pages
Algebra and More For Analytics
No ratings yet
Algebra and More For Analytics
29 pages
4 Political Frame Worksheet
No ratings yet
4 Political Frame Worksheet
3 pages
Contribution of Renewable Energy On Total Energy Capacity
No ratings yet
Contribution of Renewable Energy On Total Energy Capacity
6 pages
Computer Network - CS610 Power Point Slides Lecture 12
No ratings yet
Computer Network - CS610 Power Point Slides Lecture 12
20 pages
Icats Basic HEO (HE)
No ratings yet
Icats Basic HEO (HE)
102 pages
Icd 09
No ratings yet
Icd 09
9 pages
Week 7
No ratings yet
Week 7
24 pages
Project - Management - PPT Final
No ratings yet
Project - Management - PPT Final
18 pages
Mining Industry Business Plan by Slidesgo
No ratings yet
Mining Industry Business Plan by Slidesgo
58 pages
Creativity Is Always A Social Process
No ratings yet
Creativity Is Always A Social Process
17 pages
Fuzzy Logic To Controlled Signal System
No ratings yet
Fuzzy Logic To Controlled Signal System
10 pages
Cambridge International Examinations
No ratings yet
Cambridge International Examinations
12 pages
Introduction To FIR Filter Design
No ratings yet
Introduction To FIR Filter Design
34 pages
Mutations
No ratings yet
Mutations
48 pages
Exam 2 Details S1
No ratings yet
Exam 2 Details S1
1 page
Developer - Vapasi - Thoughtworks
No ratings yet
Developer - Vapasi - Thoughtworks
2 pages
General Knowledge For IAS in English
No ratings yet
General Knowledge For IAS in English
4 pages
DAB304 Lab06 S2025-001
No ratings yet
DAB304 Lab06 S2025-001
1 page
Summer Internship
No ratings yet
Summer Internship
2 pages
Lecture Week 8 Part 2
No ratings yet
Lecture Week 8 Part 2
9 pages
N PR 1450 010D Chapter8
No ratings yet
N PR 1450 010D Chapter8
4 pages
Foot-Surface-Structure Analysis Using A Smartphone-Based 3D Foot Scanner
No ratings yet
Foot-Surface-Structure Analysis Using A Smartphone-Based 3D Foot Scanner
7 pages
Planning Engineer
No ratings yet
Planning Engineer
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DAB311 DL Week 11 RNN

Uploaded by

DAB311 DL Week 11 RNN

Uploaded by

Sequential Modelling

• Convolutional Neural Network (CNN)

FCN and CNN: FCN and CNN:

Word tokenization - ["I", "love", "NLP"].

Neural Network unwrap

Simple RNN : one hidden layer

Deep RNN : many hidden layer

Long Short-Term Memory (LSTM)

Gated Recurrent Unit (GRU)

• LLM is a Neural Network designed to

Why do we call Large Language Models?

• Why LLM is so good compared to earlier NLP/LM?

• Generative Artificial Intelligence (GenAI): Generate new contents

▪ Stage 2: Pretraining an LLM to create a

▪ Stage 3: Fine-tuning the foundation

• Encodes input text into vectors

• Decoder can use to generate the translated

• Generate output text from encoded

Attention Is All You Need

• Bidirectional encode representations from

• Predict hidden words in a given

• Generate pre-trained transformers (GPT):

• Generate new words

• It is designed for unidirectional, left-to-right

• Generate text in an iterative fashion, one

Words corresponding to similar concepts

Here, we split an input text into

We build a vocabulary by tokenizing the

We add special tokens to a vocabulary to deal with

BPE tokenizers break down unknown words into

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.