0% found this document useful (0 votes)

3 views17 pages

Transformer Basics

Uploaded by

bhramreshwarjhacse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views17 pages

Transformer Basics

Uploaded by

bhramreshwarjhacse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Transformers & LLM Basics

LARGE LANGUAGE MODELS(LLMs)

▪ A large language model is a type of machine learning model that is trained on a large corpus of text
data to generate outputs for various natural language processing (NLP) tasks, such as text generation,
question answering, and machine translation.

▪ Large language models are typically based on deep learning neural networks such as the Transformer
architecture and are trained on massive amounts of text data, often involving billions of words. Larger
models, such as Google’s BERT model, are trained with a large dataset from various data sources which
allows them to generate output for many tasks.

Text Output

Language
Text Input
Model

Numeric Representation of
text useful for other systems
USE CASES OF LLMs
Large language models can be applied to a variety of use cases and industries,
including healthcare, retail, tech, and more. The following are use cases that exist in
all industries:

Text generation

Sentiment analysis

Chatbots

Textual Entailment Recognition

Question Answering

Code generation
TTransformer Componets
TRANSFORMER MODELS
Largely replaced RNN models with the publication of Attention is All You Need by Google in 2017
CATEGORIES OF TRANSFORMER MODELS

Encoders Decoders Encoders-Decoders

For understanding Language For Generative Models Sequence to Sequence
Suited for task requiring an understanding Suited for tasks involving Suited for tasks around generating new sentences
of the full sentence, such as sentence Text Generation depending on a given input, such as
summarization, translation, or generative question
classification, named entity recognition, and
answering.
extractive question answering.

Output probabilities

Models: Models: Models:

▪ BERT ▪ GPT-3 ▪ T5
Encoder Decoder
▪ ALBERT ▪ GPT-2 ▪ Multilingual –mT5
▪ DistilBERT

Outputs
Inputs
(shifted right)
BERT

▪ BERT: Pre-training of Deep Bidirectional Transformers

for Language Understanding (from Google in 2018)

▪ Encoder-only architecture that performs two main tasks

▪ Predicts several blanks in input given entire
context around the blank

▪ When given sentences A and B, it determines if

B actually follows A

▪ Used for question answering, classification etc.

▪ Takes a long time to train since each iteration only gets
signal from a handful of tokens in each sequence
GPT
Generative Pre-Training

▪ Originally published by OpenAI in 2018, followed by GPT-2 in 2019, and GPT-3 in 2020.
▪ Architecture is also a single stack like BERT, but is a traditional left-to-right language model

▪ Can be used for generating larger blocks of text (e.g. chat bots), but can also be used for question answering
▪ Has been the model that we have focused the most on with Megatron
▪ Faster to train than BERT since each iteration gets signal from every token in the sequence
WHEN LARGE LANGUAGE MODELS MAKE SENSE?

Traditional
Large Language Models ▪ Zero-Shot (or Few Shot Learning)
NLP
Approach ▪ Painful & Impractical to get a large corpus of
labelled data
Requires
labelled data Yes No
▪ Models can learn new tasks
▪ If you want models with “common sense”
Parameters 100s of millions Billions to trillions and can generalize well to new tasks

Desired Specific (one model General (model can do ▪ A single model can serve all use-cases
model per task) many tasks) ▪ At-scale you avoid costs and complexity of
capability many models, saving cost in data curation,
training, and managing deployment
Training Retrain frequently with Never retrain, or
Frequenc task-specific training retrain minimally
y data
DISTRIBUTED TRAINING
Data, Pipeline and Tensor Parallelism
CHALLENGES

Compute-, cost-, and Significant capital investment and large-scale compute infrastructure are
time- intensive necessary to maintain and develop LLMs.
workload:

As mentioned, training a large model requires a significant amount of

Scale of data required: data. Many companies struggle to get access to large enough data.

Due to their scale, training and deploying large language models are very
Technical expertise:
difficult.
THANK YOU!

WRD 2024-JH
No ratings yet
WRD 2024-JH
165 pages
Lecture 15 - Foundation Models - CLIP and GPT
No ratings yet
Lecture 15 - Foundation Models - CLIP and GPT
45 pages
Whitepaper - Foundational Large Language Models & Text Generation
100% (2)
Whitepaper - Foundational Large Language Models & Text Generation
75 pages
Lesson Plan Robots 1
100% (1)
Lesson Plan Robots 1
6 pages
Large Language Model
No ratings yet
Large Language Model
49 pages
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
No ratings yet
Dokumen - Pub Quick Start Guide To Large Language Models Strategies and Best Practices For Using Chatgpt and Other Llms 9780138199425
325 pages
Codes Us
No ratings yet
Codes Us
56 pages
Problem Chapter 8
No ratings yet
Problem Chapter 8
62 pages
OceanofPDF - Com Large Language Models Concepts - John AtkinsonAbutridy
No ratings yet
OceanofPDF - Com Large Language Models Concepts - John AtkinsonAbutridy
185 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models, Second Edition-Addison-Wesley (2024)
No ratings yet
Sinan Ozdemir - Quick Start Guide To Large Language Models, Second Edition-Addison-Wesley (2024)
279 pages
Whitepaper - Foundational Large Language Models & Text Generation - v2
100% (1)
Whitepaper - Foundational Large Language Models & Text Generation - v2
86 pages
Large Language Models (LLM)
No ratings yet
Large Language Models (LLM)
139 pages
Wheel Loaders
100% (2)
Wheel Loaders
32 pages
1102AITA04 AI For Text Analytics
No ratings yet
1102AITA04 AI For Text Analytics
88 pages
Lec20 LLM
No ratings yet
Lec20 LLM
58 pages
LLM - Introduction 2024
No ratings yet
LLM - Introduction 2024
77 pages
Task Support Vehicle: Maintenance Repair Parts Manual
No ratings yet
Task Support Vehicle: Maintenance Repair Parts Manual
120 pages
Installing OpenCV With Visual C++ On Windows 7
100% (1)
Installing OpenCV With Visual C++ On Windows 7
10 pages
The Best LLMs Cheatsheet - Part 1
No ratings yet
The Best LLMs Cheatsheet - Part 1
16 pages
BSBWRT401 - Assessment 2 Template
No ratings yet
BSBWRT401 - Assessment 2 Template
13 pages
Mod 4
No ratings yet
Mod 4
69 pages
Manual Soldadura en Campo
No ratings yet
Manual Soldadura en Campo
32 pages
Macroeconomics 2Nd Edition Krugman Solutions Manual Full Chapter PDF
100% (11)
Macroeconomics 2Nd Edition Krugman Solutions Manual Full Chapter PDF
34 pages
Software Requirements Specification Template
No ratings yet
Software Requirements Specification Template
12 pages
VB UNIT 1 Notes
No ratings yet
VB UNIT 1 Notes
24 pages
LangChain Academy - Introduction To LangGraph - Motivation
No ratings yet
LangChain Academy - Introduction To LangGraph - Motivation
17 pages
Lingeswaran Vs Thirunagalingam
No ratings yet
Lingeswaran Vs Thirunagalingam
5 pages
Css12 1st Week5 SSLM
No ratings yet
Css12 1st Week5 SSLM
6 pages
Lecture 12 Pretraining
No ratings yet
Lecture 12 Pretraining
46 pages
Chuyên Đề 22 - Từ Chỉ Số Lượng
No ratings yet
Chuyên Đề 22 - Từ Chỉ Số Lượng
4 pages
Unit 1 Rates of Change Assessment of Learning 1 PDF
No ratings yet
Unit 1 Rates of Change Assessment of Learning 1 PDF
11 pages
TTT Trainer Checklist
No ratings yet
TTT Trainer Checklist
4 pages
Transformer Models - BERT, GPT, and Beyond
No ratings yet
Transformer Models - BERT, GPT, and Beyond
10 pages
Electrical Contractor & Gen. Services
No ratings yet
Electrical Contractor & Gen. Services
2 pages
Estrada On Downside Risk
No ratings yet
Estrada On Downside Risk
12 pages
Adesanya Adedamola David: Microbiologist
No ratings yet
Adesanya Adedamola David: Microbiologist
2 pages
Valvepedia March-2017
No ratings yet
Valvepedia March-2017
7 pages
List of Guitar Manufacturers - Wikipedia
No ratings yet
List of Guitar Manufacturers - Wikipedia
11 pages
SSN College of Engineering
No ratings yet
SSN College of Engineering
2 pages
(English) Introduction To Large Language Models (DownSub - Com)
No ratings yet
(English) Introduction To Large Language Models (DownSub - Com)
9 pages
Chapter - 6 Issue and Redemption of Debentures
No ratings yet
Chapter - 6 Issue and Redemption of Debentures
8 pages
AI Tools
No ratings yet
AI Tools
19 pages
Quick Details
No ratings yet
Quick Details
2 pages
Beton Dizayn Programi
No ratings yet
Beton Dizayn Programi
4 pages
Slides
No ratings yet
Slides
137 pages
Explain City Functional Movement
No ratings yet
Explain City Functional Movement
3 pages
Lecture # 13-3 BERT
No ratings yet
Lecture # 13-3 BERT
63 pages
Mobile1 PDF
No ratings yet
Mobile1 PDF
2 pages
Niact 2
No ratings yet
Niact 2
25 pages
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
100% (5)
Sinan Ozdemir - Quick Start Guide To Large Language Models - Strategies and Best Practices For Using ChatGPT and Other LLMs-Addison-Wesley Professional (2023)
326 pages
2 Generative Models
No ratings yet
2 Generative Models
60 pages
Week4 LLMs EN
No ratings yet
Week4 LLMs EN
48 pages
LLM Book 43-102
No ratings yet
LLM Book 43-102
60 pages
Generative AI and LLMS
No ratings yet
Generative AI and LLMS
34 pages
LLM Learning
No ratings yet
LLM Learning
56 pages
Day 1
No ratings yet
Day 1
32 pages
NLP LLM
No ratings yet
NLP LLM
47 pages
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
100% (14)
(EARLY RELEASE) Quick Start Guide To Large Language Models Strategies and Best Practices For Using ChatGPT and Other LLMs (Sinan Ozdemir) (Z-Library)
132 pages
Unit - 3
No ratings yet
Unit - 3
55 pages
14 04 Transformers
No ratings yet
14 04 Transformers
11 pages
RM Assignment 4
No ratings yet
RM Assignment 4
5 pages
LLM Prompting & In-Context Learning
No ratings yet
LLM Prompting & In-Context Learning
18 pages
BTech Advanced AI Unit03
No ratings yet
BTech Advanced AI Unit03
109 pages
6-Bert T5 GPT
No ratings yet
6-Bert T5 GPT
31 pages
GEN-AI-unit 3
No ratings yet
GEN-AI-unit 3
30 pages
Brexhq - Prompt-Engineering - Tips and Tricks For Working With Large Language Models Like OpenAI's GPT-4
No ratings yet
Brexhq - Prompt-Engineering - Tips and Tricks For Working With Large Language Models Like OpenAI's GPT-4
12 pages
Jacob Devlin BERT
No ratings yet
Jacob Devlin BERT
43 pages
Lec # 12
No ratings yet
Lec # 12
26 pages
DAB311 DL Week 11 RNN
No ratings yet
DAB311 DL Week 11 RNN
25 pages
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
No ratings yet
Huggingface Co Blog Warm Starting Encoder Decoder Data Preprocessing
20 pages
13 - Bert
No ratings yet
13 - Bert
17 pages
Definition:: Large Language Models (LLMS)
No ratings yet
Definition:: Large Language Models (LLMS)
41 pages
14 LookingForward
No ratings yet
14 LookingForward
48 pages
Pretraining Part1 16 Mar 23 PDF
No ratings yet
Pretraining Part1 16 Mar 23 PDF
32 pages
To Create A LLM
No ratings yet
To Create A LLM
53 pages
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
No ratings yet
Large Language Models: Dr. Asgari, Dr. Rohban, Soleymani Fall 2023
53 pages
Pranay Report
No ratings yet
Pranay Report
26 pages
Paper Review
No ratings yet
Paper Review
6 pages
Summary - Foundations On LLMs
No ratings yet
Summary - Foundations On LLMs
6 pages
Large Language Model Algorithms in Plain English
No ratings yet
Large Language Model Algorithms in Plain English
8 pages
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
No ratings yet
Thoughts On NLP Research in The (Post-) LLM Era: Yijia Shao Yuanpei College 2023/04/28
51 pages
LLM Review
No ratings yet
LLM Review
16 pages
Overview of The Transformer-Based Models For NLP Tasks
No ratings yet
Overview of The Transformer-Based Models For NLP Tasks
5 pages
Introduction To Large Language Models
No ratings yet
Introduction To Large Language Models
3 pages
LLM 1
No ratings yet
LLM 1
6 pages
LLMS&TRANSFORMERS
No ratings yet
LLMS&TRANSFORMERS
4 pages
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
No ratings yet
Exploring The Evolution of Large Language Models: Architectures, Applications, and Future Directions
11 pages
The Diverse Landscape of Large Language Models Deepsense Ai
No ratings yet
The Diverse Landscape of Large Language Models Deepsense Ai
16 pages
CHATGPT DALL.E 3: Complete Guide. Third Edition
From Everand
CHATGPT DALL.E 3: Complete Guide. Third Edition
Hesham Mohamed Elsherif
No ratings yet
Introduction to Programming Languages
From Everand
Introduction to Programming Languages
IntroBooks Team
4/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Transformer Basics

Uploaded by

Transformer Basics

Uploaded by

Transformers & LLM Basics

LARGE LANGUAGE MODELS(LLMs)

Textual Entailment Recognition

Encoders Decoders Encoders-Decoders

Models: Models: Models:

▪ BERT: Pre-training of Deep Bidirectional Transformers

▪ Encoder-only architecture that performs two main tasks

▪ When given sentences A and B, it determines if

▪ Used for question answering, classification etc.

As mentioned, training a large model requires a significant amount of

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.