The Significance of LLM Tokenization
The Significance of LLM Tokenization
com
The Significance of
LLM Tokenization
6RW4PK7EJL
Proprietary
This file iscontent.
meant © Great Learning. All
for personal useRights
by Reserved. Unauthorized use or distribution
vaniyellamraju@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In t h is s e s s io n , w e w ill d is c u s s :
● P ro b le m s c a u s e d b y To k e n iza t io n in LLMs
● Mit ig a t in g To k e n iza t io n P ro b le m s
vaniyellamraju@gmail.com
6RW4PK7EJL
Proprietary
This file iscontent.
meant © Great Learning. All
for personal useRights
by Reserved. Unauthorized use or distribution
vaniyellamraju@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
The Significance of Tokenization
Input 1 Output 1
Tokenization Agnostic Poor Performance
As k t h e LLM t o p e rfo rm w o rd Th e LLM w ill lik e ly fa il a t
g a m e s o n a fu ll w o rd . p e rfo rm in g t h e t a s k o n t h e w o rd .
Ca n yo u re ve rs e t h e le t t e rs o f Su re , t h e re ve rs e d le t t e rs o f t h e
t h e w o rd “P e rfo rm a n c e ”? w o rd “P e rfo rm a n c e ” a re
“e m e rfn s re P ”.
vaniyellamraju@gmail.com
6RW4PK7EJL
LLM
Input 2 Output 2
Tokenization Cognizant Better Performance
As k t h e LLM t o d o w o rd g a m e s b y Th e LLM w ill n o w b e m o re lik e ly t o
s e p a ra t in g t h e le t t e rs o f t h e w o rd . c o m p le t e t h e w o rd t a s k
s u c c e s s fu lly
Ca n yo u re ve rs e t h e le t t e rs o f Th e re ve rs e d le t t e rs a re E-C-N-A-
t h e w o rd “P -E-R-F-O -R-M-A-N-C- M-R-O -F-R-E-P .
E”? This file is meant for personal use by vaniyellamraju@gmail.com only.
Sharing orcontent.
Proprietary publishing
© Greatthe contents
Learning. in part
All Rights or full
Reserved. is liable for
Unauthorized use legal action.
or distribution
The Significance of Tokenization (Cont.)
Input 1
vaniyellamraju@gmail.com
6RW4PK7EJL
Proprietary
This file iscontent.
meant © Great Learning. All
for personal useRights
by Reserved. Unauthorized use or distribution
vaniyellamraju@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
The Significance of Tokenization (Cont.)
Input 2
vaniyellamraju@gmail.com
6RW4PK7EJL
Proprietary
This file iscontent.
meant © Great Learning. All
for personal useRights
by Reserved. Unauthorized use or distribution
vaniyellamraju@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.
Summary
He re ’s a b rie f re c a p :
● To k e n iza t io n c a n c a u s e La rg e La n g u a g e Mo d e ls (LLMs ) t o
m is in t e rp re t w o rd s c o m p a re d t o h u m a n re vie w e rs , e s p e c ia lly in
t a s k s lik e W o rd Ga m e s t h a t re q u ire a lp h a b e t -le ve l t o k e n iza t io n .
vaniyellamraju@gmail.com
6RW4PK7EJL
● Th e c o rre c t w a y t o re s o lve t h e s e is s u e s is b y a d d in g s p e c ia l
s e p a ra t o r c h a ra c t e rs b e t w e e n a lp h a b e t s , e n s u rin g t h e LLM
p ro c e s s e s e a c h a lp h a b e t a s a n in d ivid u a l t o k e n .
Proprietary
This file iscontent.
meant © Great Learning. All
for personal useRights
by Reserved. Unauthorized use or distribution
vaniyellamraju@gmail.com only.
Sharing or publishing the contents in part or full is liable for legal action.