DAB311 DL Week 11 RNN
DAB311 DL Week 11 RNN
Week 11
Need for sequential modelling
• Fully Connected Network (FCN)
• Fixed input dimension
• e.g : input [x1 x2 x3 x4 ……… xn ]
• If input size is <=n → set zeros
• If input size is > n → ignore the input data
➢ Natural Language:
o Email auto complete
o Translation (e.g.: English to French)
o Sentiment analysis
Need for sequential modelling (cont’d)
• Today is the coolest temperature in Windsor
NLP:
1 2 3 4 5 6 7
▪ Varying input size
• The historical average temperature in November is 12 degree Celsius
1 8 9 5 6 10 2 11 13 14
Tokenization is the process of breaking down text into smaller, manageable pieces called "tokens.“
A token ID is a numerical identifier assigned to each token during the tokenization process
Need for sequential modelling (cont’d)
salary Loan Rejected
Credit score
Loan granted
Sequence does not matter
Problems:
Experience
• Varying input size
Needs Verifcation • Too much computation
Age • No parameter sharing
I Neutral
like Positive
Sequence matters
this
Negative
dish
Recurrent Neural Network (RNN)
• The memory cells are controlled by three gates: input gate, forget gate, and output gate. These gates allow
LSTMs to decide which information to keep, which to discard, and which new information to add.
• LLM is a form of Generative Artificial Intelligence (GenAI – Able to generate new content)
• LLM:
➢ Can do wide range of NLP tasks
➢ Able to write email for a given set of instructions and more
TRANSFORMER ARCHITECTURE
➢ Not all LLMs are transformers
➢ Not all transformers are LLMs
Large Language Models (cont’d.)
• LLM typically deals with text, but do they have to be limited to text only?
NO
• GPT 4 is a multimodal model that can process text and images, however referred as LLM due to its primary
fucus and fundamental design being around text-based tasks
• Waymo's multimodal end-to-end model refers to their integrated approach for autonomous driving, where
multiple types of data inputs (camera, radar and lidar) are processed together to make driving decisions.
Use Cases of LLM
o Machine translation: LLMs can be used to translate text from one language to another.
o Content generation: LLMs can generate new text, such as fiction, articles, and even computer
code.
o Sentiment analysis: LLMs can be used to analyze the sentiment of a piece of text, such as
determining whether it is positive, negative, or neutral.
o Text summarization: LLMs can be used to summarize a long piece of text, such as an article or a
document.
o Chatbots and virtual assistants: LLMs can be used to power chatbots and virtual assistants,
such as OpenAI's ChatGPT or Google's Gemini (formerly called Bard).
o Knowledge retrieval: LLMs can be used to retrieve knowledge from vast volumes of text in
specialized areas such as medicine or law.
Stages of Building LLMs Huge computational cost (e.g.: GPT3 training
cost is approximately 4.6 million dollars)
▪ Stage 1: Implementing the LLM
architecture and data preparation
process. This stage involves preparing
and sampling the text data and
understanding the basic mechanisms
behind LLMs.
Self-attention mechanism:
• Key part of transformers that allows to weigh importance of different words/tokens relative to
each other.
• Enables model to capture long range dependencies
Transformer Architecture
https://arxiv.org/pdf/1706.03762
BERT Vs GPT Architecture