0% found this document useful (0 votes)

12 views21 pages

5) Lecture Feb11&13&17&18

The document discusses language models, specifically N-grams, which assign probabilities to sequences of words based on their history. It addresses challenges like data sparsity and unknown words, and presents solutions such as smoothing techniques and backoff methods. The evaluation of language models is also covered, highlighting intrinsic and extrinsic measures, including perplexity as a performance metric.

Uploaded by

Prince lalulucky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views21 pages

5) Lecture Feb11&13&17&18

Uploaded by

Prince lalulucky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

CSN-528

Feb-11&13&17&18
Language Models - N-Grams
Language Model

- “I love to listen .............” ? - Music/your/truth ...

- Models that assign a probability to each possible next word (conditional), then to assign
probability to the entire sentence (joint).
- For noisy or ambiguous input (I will be back soonish and not I will be bassoon dish)
- Next word suggestion
- Next POS tag suggestion
- Spelling correction (Their ar two midterms)
- Grammatical Error Correction (Everything has improve)
- Probabilites to sequence of words can further improve machine translation.
he introduced reporters to the main contents of the statement

he briefed to reporters the main contents of the statement

he briefed reporters on the main contents of the statement
Language Model

Conditional Probabiltiy
P(w|h) // Probability of word given history
eg., P(the|its water is so transparent that)

One possible way is to compute probability directly from the corpus or the text -
P(the|its water is so transparent that) = C(its water is so transparent that the) / C(its water is so transparent that)

- Languages are creative, count changes rapidly.

- No corpus is big enough.

- Even single addition of a word to the history will change the count to zero – Leads to data sparsity problem

Solution:

- Break the sequence into individual component.

- Store the count of the individual component.

Language Model

Joint Probability of a sequence

One possible way is to compute probability directly from the corpus or the text -
P(its water is so transparent that) = C(its water is so transparent that) / C(all 6 words sequence in the courpus)

- Hard to compute from the corpus.

Simplify using Chain Rule- Showing relation between conditional and join probability

- Probability of a word is conditioned on a long sequence of preceeding words. w n is conditioned on wn-1 to

w1
N-Gram Language Model

A word depends only a few n words from history (Markov Assumption), not the entire
sequence.
N=3 -> Trigram (looks 2 words into the past)
N =2 -> Bigram (looks 1 word into the past)
N=1 -> Unigram (Only the word)

P(the|Walden Pond’s water is so transparent that) ~ P(the|that) //bigram

P(the|Walden Pond’s water is so transparent that) ~ P(the|transpaent that) //trigram

N-Gram Language Model

A word depends only a few n words from history (Markov Assumption), not the entire
sequence (Markov Assumption).
N=3 -> Trigram (looks 2 words into the past)
N =2 -> Bigram (looks 1 word into the past)
N=1 -> Unigram (Only one word)

//Conditional Probability

//Joint Probability considering bi-

gram assumption.

How to compute the liklihood?

// Also called MLE

This relative frequency computation is called Maximum Liklihood Estimaiton.

The model ((M) parameters) maximizes the liklihood of training corpus (T) given
the model M, that is P(T|M).
N-Gram Language Model

<s> I am Sam </s>

<s> Sam I am </s>
<s> I do not like green eggs and ham </s>

- Start and End symbols complete the grammar of n-gram model.

N-Gram Language Model
How to Present a Language Model?

Bi-gram Counts

P(<S> I want to eat chinese </S>) Use training data to build

using the given bigram model? the model.

Uni-gram counts

Bi-gram Probabilites (M)

N-Gram Language Model

Log probabilties:

- for longer seqneces the mulitplication of probabilities will result into numerical underflow.
- log of probability provides a solution, exponent gives back the probabiltiy if needed.
N-Gram Language Model

Performance Evaluation: Measure statisically significant difference between two language models.
- Extrinsic Evaluation: Measure how much the application improves, for example, next word
prediciton.
- Intrinsic Evaluaiton: Measures the quality of the language model on its own. Perplexity (PP)
is one such measure.

- In practice, we often devide data into train, test, and development set. We report performance
score on the test set.
N-Gram Language Model (Intrinsic Evaluation)
- W represents the complete sequence of the test data, <S> & </S> are inserted before and
after each sentence in the data before probability computation.
- The perplexity of a language model on a test set is the inverse probability of the test set, normalized by
the number of words.

Perplexity (PP (W)):

For bigram ->

- Minimizing perplexity is equivalent to maximizing probability of the test data.

- Preplexity is inversely proportional to the liklihood of the test sequence.
- Perplexity is also called weighted average branching factor of a language. The branching factor of a
language is the number of possible next words that can follow any word.

For Wall Street Journal corpus

Cautions: Intrinsic Evaluation Measures

- The vocabulary of the models must be the same for comparison.

- An (intrinsic) improvement in perplexity does not guarantee an (extrinsic) improvement in

the performance of a language processing task like speech recognition or machine
translation.

- We need to check the end-to-end performance with respect to the application.

N-Gram Language Model

Sentences randomly generated from four n-grams computed from Shakespeare’s works.
The longer the context, the more meaningful sentences.
Issues with LMs

●
LMs do better and better if we increase corpus.
●
Differences in the training and test: A model
trained on Shakespeare’s text couldn’t perform
well on Wall Street Journal text prediction.
●
Similar domain helps - To build a language
model for translating legal documents,we need
a training corpus of legal documents.
●
Still Data sparsity is a prevelant issue.
N-Gram Language Model

- Data Sparsity
- witenssing zero probability or very low probability n-grams.
denied the allegations: 5
denied the speculation: 2
denied the rumors: 1
denied the report: 1
Got the offer: 1

denied the offer: 0

denied the loan: 0
P(offer|denied the) is 0
N-Gram Language Model

- Data Sparsity
- In other cases we have to deal with words we haven’t seen before, which we’ll call
unknown words, or out of vocabulary (OOV) words.
- The percentage of OOV words that appear in the test set is called the OOV rate.
- Often tagged as UNK.
- How to compute probability of UNK?
N-Gram Language Model

- Dealing Unknown Words

- Smoothing: Shave off some probability mass from frequent events and assign to UNK.
1. Laplace Smoothing or add-one Smoothing: Add one to all the bigram counts before
computing probability.
Laplace: Smoothed Bi-gram Probabilites

Each unigram count is augmented by V (1446)

- P(to|want) decreases from .66 to .26
- Probability mass is shifted to all zero terms.
k-smoothing

- Unknown Words
- Smoothing: move a bit less of the probability mass from the seen terms to the unseen
terms.
1. Add k-smoothing: Add a fractional count-k (.5 or .05 or .01).

- k to be optimized on the development/validation set.

- Gale and Chruch, 1994 showed that k-smooting also leads to poor variance and
iappropriate discounts.
Better mechanisms of Smoothing
Unknown Words
- Backoff: Use the less context, for 4-gram, use trigram, for 3-gram, use bigram, for 2-Gram, use
Unigram.
- We “back off” to a lower-order n-gram if we have zero evidence for a higher-order n-gram.

- Katz Backoff: We rely on a discounted probability P∗ if we’ve seen this n-gram before. Otherwise,
we recursively back off to the Katz probability for the shorter-history (N-1)-gram.

function alpha is to distribute this

probability mass to the lower order
n-grams

- Interpolation: Combine different order n-grams by linearly interpolating all the models.

Lamda is a hyper parameter, which

sums to 1.
The Vauquois Triangle

The Vauquois triangle for machine translation [Vauquois 1968]

Network Operating System Logbook 2 Islington College
100% (1)
Network Operating System Logbook 2 Islington College
23 pages
Notes of NLP - Unit-2
No ratings yet
Notes of NLP - Unit-2
23 pages
Unit 2b
No ratings yet
Unit 2b
22 pages
NLP m2
No ratings yet
NLP m2
74 pages
Ngrams
100% (1)
Ngrams
22 pages
CS 388: Natural Language Processing:: N-Gram Language Models
No ratings yet
CS 388: Natural Language Processing:: N-Gram Language Models
22 pages
6.chapter6 LanguageModel
No ratings yet
6.chapter6 LanguageModel
33 pages
Lecture 5: Language Modeling (N-Gram, BOW)
No ratings yet
Lecture 5: Language Modeling (N-Gram, BOW)
25 pages
NLP - N-Gram Language Model
No ratings yet
NLP - N-Gram Language Model
22 pages
NLP Unit-4
No ratings yet
NLP Unit-4
48 pages
NLP Unit2
No ratings yet
NLP Unit2
65 pages
N Grams
No ratings yet
N Grams
51 pages
13 Ngramlm
No ratings yet
13 Ngramlm
27 pages
Module 2
No ratings yet
Module 2
98 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
N Grams
No ratings yet
N Grams
3 pages
Lecture 6 To 8 N-Gram
No ratings yet
Lecture 6 To 8 N-Gram
19 pages
Language Modeling
No ratings yet
Language Modeling
50 pages
N-Gram Language Models
No ratings yet
N-Gram Language Models
26 pages
Lec-3 Language Modeling N-Grams
No ratings yet
Lec-3 Language Modeling N-Grams
41 pages
3-Lecture Three - (Chapter Two-N-gram Language Models)
No ratings yet
3-Lecture Three - (Chapter Two-N-gram Language Models)
28 pages
Artificial Intelligence: Natural Language Processing
No ratings yet
Artificial Intelligence: Natural Language Processing
13 pages
NLP Units Iv V
No ratings yet
NLP Units Iv V
30 pages
Lecture 4 N Grams
No ratings yet
Lecture 4 N Grams
29 pages
14 Ngramlm
No ratings yet
14 Ngramlm
67 pages
Ngram
No ratings yet
Ngram
41 pages
Lecture04-Ngram Lang Models
No ratings yet
Lecture04-Ngram Lang Models
39 pages
LM 24 Aug
No ratings yet
LM 24 Aug
84 pages
NLP
No ratings yet
NLP
12 pages
04 Language Modeling
No ratings yet
04 Language Modeling
70 pages
NLP Cat 2
No ratings yet
NLP Cat 2
78 pages
3 LM 2024
No ratings yet
3 LM 2024
78 pages
NLP Unit-4
No ratings yet
NLP Unit-4
62 pages
03 LanguageModel
No ratings yet
03 LanguageModel
41 pages
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
No ratings yet
Artificial Intelligence: N-Gram Models: Russell & Norvig: Section 22.1
32 pages
04 - N-Gram Language Models
No ratings yet
04 - N-Gram Language Models
41 pages
Unit Ii - NLP
No ratings yet
Unit Ii - NLP
35 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
Lecture 4
No ratings yet
Lecture 4
87 pages
Multimedia Application L5
No ratings yet
Multimedia Application L5
35 pages
Evaluating Language Models
No ratings yet
Evaluating Language Models
21 pages
CME4408 P5 N-Grams Smooting
No ratings yet
CME4408 P5 N-Grams Smooting
43 pages
NLP Unit-V
No ratings yet
NLP Unit-V
30 pages
NLP - Module 2
No ratings yet
NLP - Module 2
77 pages
NLP Unit 4 Q & A
No ratings yet
NLP Unit 4 Q & A
17 pages
Multimedia Application L6
No ratings yet
Multimedia Application L6
63 pages
NLP PLM
No ratings yet
NLP PLM
35 pages
Lecture 3 - Language Modelling and RNNs Part 1
No ratings yet
Lecture 3 - Language Modelling and RNNs Part 1
44 pages
Lecture 03
No ratings yet
Lecture 03
41 pages
N Grams - Nptel Notes
No ratings yet
N Grams - Nptel Notes
75 pages
Lecture13 LM YirenWang
No ratings yet
Lecture13 LM YirenWang
8 pages
NLP Lec 11
No ratings yet
NLP Lec 11
6 pages
Lecture - 3 - Statistical Language Models
No ratings yet
Lecture - 3 - Statistical Language Models
56 pages
Language Modeling: Introduction To N-Grams
No ratings yet
Language Modeling: Introduction To N-Grams
79 pages
Module-1 ch-2
No ratings yet
Module-1 ch-2
31 pages
NLP 1.2
No ratings yet
NLP 1.2
22 pages
Language Modelling
No ratings yet
Language Modelling
17 pages
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
No ratings yet
N-Gram Language Models: Random Sentence Generated From A Jane Austen Trigram Model
28 pages
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
No ratings yet
Language Modeling: Prabhleen Juneja Thapar Institute of Engineering & Technology
36 pages
Angew Chem Int Ed - 2017 - Choi
No ratings yet
Angew Chem Int Ed - 2017 - Choi
5 pages
Sales Management & Sales Distribution: A Project ON Mumbai Dabawalla'S
No ratings yet
Sales Management & Sales Distribution: A Project ON Mumbai Dabawalla'S
30 pages
DC1500 - Installation Manual: WWW - HHO-Plus - LV T: +371 27124103
No ratings yet
DC1500 - Installation Manual: WWW - HHO-Plus - LV T: +371 27124103
39 pages
Atlantic International University - Wikipedia
No ratings yet
Atlantic International University - Wikipedia
4 pages
Mad Catz Street Fighter V Arcade FightStick TE2 PS4 PS3 Product Guide
No ratings yet
Mad Catz Street Fighter V Arcade FightStick TE2 PS4 PS3 Product Guide
14 pages
Cet455-Qp May 24
No ratings yet
Cet455-Qp May 24
2 pages
Grounded Theory Thesis Structure
100% (3)
Grounded Theory Thesis Structure
5 pages
Project Delivery Scaffold Erecting Dismantling and Modification Inspection Checklist
No ratings yet
Project Delivery Scaffold Erecting Dismantling and Modification Inspection Checklist
3 pages
Bus 1010 E-Portfolio Assignment
No ratings yet
Bus 1010 E-Portfolio Assignment
6 pages
(Viral) Kamal Kaur Viral Video Original Link
No ratings yet
(Viral) Kamal Kaur Viral Video Original Link
5 pages
Podcast Lesson Plan
No ratings yet
Podcast Lesson Plan
3 pages
Euglena S
No ratings yet
Euglena S
4 pages
CSS Solved General Science and Ability Past Paper 2021
No ratings yet
CSS Solved General Science and Ability Past Paper 2021
35 pages
Cat Global Catalog Loctite
100% (1)
Cat Global Catalog Loctite
47 pages
Fanuc LATHE CNC Program Manual Gcodetraining 588
77% (13)
Fanuc LATHE CNC Program Manual Gcodetraining 588
104 pages
Dispatch & Store
No ratings yet
Dispatch & Store
1 page
Semi - NCM 101
100% (1)
Semi - NCM 101
13 pages
Schneider Ecofit - Low and Medium Voltage Distribution Switchboards FPX
No ratings yet
Schneider Ecofit - Low and Medium Voltage Distribution Switchboards FPX
150 pages
What Your Food Ate How To Heal Our Land and Reclaim Our Health David R Montgomery Instant Download
No ratings yet
What Your Food Ate How To Heal Our Land and Reclaim Our Health David R Montgomery Instant Download
83 pages
Waiver
No ratings yet
Waiver
6 pages
Educ 102
No ratings yet
Educ 102
3 pages
Get Through Primary FRCA SBAs, 1st Edition Official Download
92% (12)
Get Through Primary FRCA SBAs, 1st Edition Official Download
14 pages
Cisco Hidden Commands
100% (1)
Cisco Hidden Commands
24 pages
636379840590994941
100% (1)
636379840590994941
55 pages
Question Bank CC-9 (Educational Psychology) Unit-1: Objective Questions
No ratings yet
Question Bank CC-9 (Educational Psychology) Unit-1: Objective Questions
7 pages
October 2021 Current Affairs MCQS
No ratings yet
October 2021 Current Affairs MCQS
53 pages
Mini Score PDF
No ratings yet
Mini Score PDF
6 pages
TEC-030100.2-MET-DoR-002-Fosroc1A-Renderoc FC (Fairing Coat) (1 Component Polymer Modifyied Cementitious
No ratings yet
TEC-030100.2-MET-DoR-002-Fosroc1A-Renderoc FC (Fairing Coat) (1 Component Polymer Modifyied Cementitious
4 pages
ARUNKUMAR K - Profama Invoice
No ratings yet
ARUNKUMAR K - Profama Invoice
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

5) Lecture Feb11&13&17&18

Uploaded by

5) Lecture Feb11&13&17&18

Uploaded by

CSN-528

- “I love to listen .............” ? - Music/your/truth ...

he briefed to reporters the main contents of the statement

- Languages are creative, count changes rapidly.

- No corpus is big enough.

- Break the sequence into individual component.

- Store the count of the individual component.

Joint Probability of a sequence

- Hard to compute from the corpus.

- Probability of a word is conditioned on a long sequence of preceeding words. w n is conditioned on wn-1 to

P(the|Walden Pond’s water is so transparent that) ~ P(the|that) //bigram

P(the|Walden Pond’s water is so transparent that) ~ P(the|transpaent that) //trigram

//Joint Probability considering bi-

How to compute the liklihood?

This relative frequency computation is called Maximum Liklihood Estimaiton.

<s> I am Sam </s>

- Start and End symbols complete the grammar of n-gram model.

P(<S> I want to eat chinese </S>) Use training data to build

Bi-gram Probabilites (M)

Perplexity (PP (W)):

For bigram ->

- Minimizing perplexity is equivalent to maximizing probability of the test data.

For Wall Street Journal corpus

- The vocabulary of the models must be the same for comparison.

- An (intrinsic) improvement in perplexity does not guarantee an (extrinsic) improvement in

- We need to check the end-to-end performance with respect to the application.

denied the offer: 0

- Dealing Unknown Words

Each unigram count is augmented by V (1446)

- k to be optimized on the development/validation set.

function alpha is to distribute this

Lamda is a hyper parameter, which

The Vauquois triangle for machine translation [Vauquois 1968]

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.