0% found this document useful (0 votes)

15 views63 pages

05 Introduction To NLP

The document introduces natural language processing (NLP) and discusses some common NLP tasks like text classification, named entity recognition, and part-of-speech tagging. It also provides examples of how machine learning can be used for NLP applications like predicting the next word in a sequence and representing text as vectors for classification. Finally, it outlines some of the challenges in NLP like ambiguity and complex language patterns.

Uploaded by

Manish kumawat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views63 pages

05 Introduction To NLP

Uploaded by

Manish kumawat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

EE782 Advanced Topics in Machine Learning

05 Introduction to NLP
Amit Sethi
Electrical Engineering, IIT Bombay
Module objectives

• Revise what is NLP

• Understand some key problems in NLP

• Appreciate earlier frameworks used for NLP

• Some example solutions to NLP problems

Outline

• NLP basics

• Pre-processing in NLP

• Language model with an example

• From words to vectors

• Some applications
What is Natural Language Processing?
• NLP is analysis or generation of • NLP is based on:
natural language text using – Probability and statistics
computers, for example: – Machine learning
– Machine translation – Linguistics
– Spell check (autocorrect) – Common sense
– Automated query answering
– Speech parsing (a problem that
overlaps with ASR)
Why do NLP?
• Language is one of the defining characteristics
of our species
• A large body of knowledge can be organized
and easily accessed using NLP
• Original conception of the Turing test was
based on NLP
Some standard terms
• Corpus: A body of text samples

• Document: A text sample

• Vocabulary: A list of words used in the corpus

• Language model: How the words are supposed to be organized

Examples of NLP tasks
• Corpus  Extract documents
• Document  Extract sentences
• Sentences  Extract tokens
• Tokens  Normal, stem, lemma forms
• Sentence, tokens  PoS tagging, NER
• Sentence, tokens Parsing, e.g. chunking,
chinking, syntax tree
• Document  Classification, e.g. sentiment
analysis, topic extraction
• Sentence  Text synthesis, e.g. translation, Q&A
Example: text classification
• Sentiment analysis – positive or negative
– “This is a ridiculously priced toothbrush. Seriously, no way to get
around it. It is absurdly priced and I'm almost embarrassed to be
admitting that I bought it. With that said... Wow, this thing is
amazing.”
– “These pens make me feel so feminine and desirable. I can barely
keep the men away when I'm holding one of these in my dainty hand.
My husband has started to take fencing lessons just to keep the men
away.”

Text source: Amazon.com reviews

Example: Named entity recognition
• A real-world person, place, or object that can be given a proper
noun:
– “India posted a score of 256/8 in their allotted 50 overs in the third
and deciding ODI of the series. Virat Kohli was the top-scorer for men
in blue with a classy 71, while Adil Rashid and David Willey picked up
three wickets each.”
– India  Place, Virat Kohli  Person, …

Text source: Reuters

Example: PoS and Parsing text

Tool Source: http://mshang.ca/syntree/

Importance of context of a word
• “We were on a crash course.”

• Crash can mean an accident, a percussion strike, or a collapse.

• Course can mean a study plan, or a path.

Challenges in NLP

• Large vocabulary
• Multiple meanings
• Many word forms
• Synonyms
• Sarcasm, jokes, idioms, figures of speech
• Fluid style and usage
Basic text classification using ML
Text sample
• Variable number of words

Pre-processing
• Tokenization, normalization, etc.

Feature
• Fixed-length vector

Class
• Discrete set
Outline

• NLP basics

• Pre-processing in NLP

• Language model with an example

• From words to vectors

• Some applications
Tokenization
• Chopping up text into pieces called tokens
• Usually, each word is a token
§ Jeevan / saved / the / puppy
• How do you tokenize?
§ Split up at all non-alpha-numeric characters
§ What about apostrophes?
§ What about two-word entities, e.g. “New Delhi”?
§ What about compound words in Sanskrit and German?
Stop words
• Words that are common
• Non-selective (excluding negation)
• Examples:
§ Articles: a, an, the
§ Common verbs: is, was, are
§ Pronouns: he, she, it
§ Conjunctions: for, and
§ Prepositions: at, on, with
§ Need not be used to classify text
Normalization
§ Words appear in many forms:
§ School, school, schools
§ U.S.A, USA, U.S., US
§ But not “us”
§ Windows vs. windows/window
§ These need not be considered separate terms
§ Normalization is counting equivalent forms as one term
Stemming and Lemmatization
§ Stemming – chopping off the end of words
§ Nannies becomes nanni (Rule: .ies  .i)
§ Caresses becomes caress (Rule: .sses  .ss)
§ This is a heuristic way
§ Finding the lemma of a word is the more exact task
§ Nannies should become nanny
§ Privatization should become private
Word vectors
• “India posted a score of 256/8 in their allotted 50 overs
in the third and deciding ODI of the series. Virat Kohli
was the top-scorer for men in blue with a classy 71,
while Adil Rashid and David Willey picked up three
wickets each”
• One-hot encoding (or 1-of-N encoding)

Text source: Reuters

Bag-of-words as a feature
§ “India posted a score of 256/8 in § Counts
their allotted 50 overs in the
third and deciding ODI of the §
series. Virat Kohli was the top-
scorer for men in blue with a § The counts can be
classy 71, while Adil Rashid and normalized
David Willey picked up three § The words can be
wickets each” standardized
§ Score
§ Scorer
§ What about
uninformative words?
Text source: Reuters
TF-IDF as a feature
§ Term frequency – inverse document frequency
§ TF ft,d is the count of term t in document d
§ Usually normalized in some sense
,
§ ∑ ∈𝑑 ,
§ IDF penalizes terms that occur often in all
documents, e.g. “the”
|𝐷|
§
1+|{𝑑∈𝐷:𝑡∈𝑑}|
§ TF-IDF is
§ Form a vector of TF-IDF for various terms
§ Which terms?
Examples of TF-IDF
§ Let us assume that the § Let us assume that the
word dog appears four word is appears 50 times
times in a document of in a document of 1000
1000 words words
§ TF = 4/1000 = 4 × 10-3 = § TF = 50/1000 = 50 × 10-3 =
0.004 0.05
§ Let the same word appear § Let the same word appear
50 times in 1 million 40,000 times in 1 million
documents documents
§ IDF = log (1000000 / 50) = § IDF = log (1000000 /
4.3 40000) = 1.398
§ So, TF-IDF = 0.004 × 4.3 = § So, TF-IDF = 0.05 × 1.398
0.0172 = 0.0699

Without IDF, dog would not be able to compete with is.

We can then use traditional ML methods
• Text: “India posted a score of 256/8 in their allotted 50 overs in the third
and deciding ODI of the series. Virat Kohli was the top-scorer for men in
blue with a classy 71, while Adil Rashid and David Willey picked up three
wickets each”

• Add word vectors:

• Topic: “Cricket”

Text source: Reuters

Outline

• NLP basics

• Pre-processing in NLP

• Language model with an example

• From words to vectors

• Some applications
Language model: predicting words

§ Can you predict the next word?

The stocks fell again today for a third day

in this week.
§ Clearly, we can narrow down the choice of
next word, and sometimes even get it right.
§ How?
§ Domain knowledge: third day vs. third minute
§ Syntactic knowledge: a …<adjective | noun>
A language model is perhaps
fundamental to how our mind works
• Even illiterate people can predict the next spoken word with
some certainty in their native language
• This comes from experience with lots of conversational
sentences
• Can a machine gain such “experience?”
• How would such “experience” be modeled?
• What can it be used for?
A probabilistic model of language
• What is the probability of a word? Which words are highly likely?
– A, an, the, he, she, it
P(wm)
– What about “obsequious?”
–…
• What is the probability of a word given its:
– previous word?
P(wm| wm 1)
– Previous two words?
P(wm| wm 1, wm  2)
– Previous three words?
P(wm| wm 1, wm  2, wm  3)
– …
An example: Guess the word!
• *** *** ****** **** ** ?
• *** *** ****** **** me ?
• *** *** ****** pick me ?
• *** *** please pick me ?
• *** you please pick me ?
• Can you please pick me ?
• Can you please pick me up?
N-gram: Markovian assumption
• The information provided by the immediately
previous word(s) is the most useful for
prediction
• We need not use more than n previous words
Unigram: P(wm | wm 1, wm  2,..., wm  )  P(wm)
Bigram: P(wm| wm 1, wm  2,..., wm  )  P(wm| wm 1)
Trigram: P(wm | wm 1, wm  2,..., wm  )  P(wm | wm 1, wm  2)
n-gram: P(wm| wm 1, wm  2,..., wm  )  P(wm| wm 1, wm  2,..., wm  n 1)

• This simplifies our model

How many n-grams are there?
• About 20,000 words (unigrams)
• So, about 400,000,000 bigrams, and
• 8,000,000,000 trigrams

• But, are all the bigrams and trigrams equally likely?

– The is a common word.
– The the does not even make sense.
• Yet, we want n to be small
Learn N-grams through examples
• Examples from corpora
– Shakespeare
– Wall Street Journal
– Thomson Reuters
• Depending on the corpus, machine will learn
that vocabulary; machine can sound like
Shakespeare
– Where art thou ****
– Where art thou my ****
– Where art thou my forlorn ****
– Where art thou my forlorn prince?
How does this help us?

• Automatic speech recognition (ASR)

– “There was a bay-er behind the bushes”
– Did she say bear or bare or beer or bar?
– Noun, adjective, verb?
– Or simply use the previous words
– This requires many, many examples such that
all n-grams that we are ever likely to
encounter are seen with reliable frequencies
It also helps spell check software

• Context for the word being checked

• Two types of spelling mistakes:
– Non words
• “There was a baer behind the bushes”
– Wrong words
• “There was a bare behind the bushes”
• Both benefit from a language model
Typical causes of spelling mistakes

• Exchanging two letters, e.g. baer

• Typing the wrong key, e.g. bwar
• Missing a letter, e.g. b ar
• Adding an extra letter, e.g. beear
• Wrong homophone, e.g. bare or beer
• OCR errors, e.g. bcar
Let us model word distortion
• What is the probability of exchanging two letters?
• What is the probability of typing the wrong key?
– Does it depend on the distance from the right key on
keyboard?
• What is the probability of missing a letter?
• …

The distortion model is called channel model

Channel model example: edit distance
• How many additions, deletions?
– BEAR: (1) FEAR
– BEAR: (1) FEAR, (2) F-AR
– BEAR: (1) FEAR, (2) F-AR, (3) FARE
• Should additions and deletions have equal
weight?
• What about exchange of two letters?
• What about pressing wrong neighboring key?

Channel model: P(typed word | candidate word)

Putting the two models together
• Bayes theorem and chain rule to the rescue:
P(A,B) = P(A|B) × P(B) = P(B|A) × P(A)
 Let W’ be typed word, F be phrase before, W be candidate word
 Find W that maximizes: P(W|W’,F) ; own probability given data
P(W|W’,F) = P(W,W’,F) / P(W’,F)
α P(W,W’,F)
= P(W’|W,F) × P(W,F)
= P(W’|W,F) × P(W|F) × P(F)
α P(W’|W) × P(W|F)
= Channel model × Language model
• That is, it is most likely to have led to the distortion AND makes sense
language-wise
A probabilistic model of spell check has two parts
• Noisy channel model P(wm’|wm)
• Could be based on edit distance between
strings
– E.g. Gaussian function of edit distance
• Markov language model P(wm I wm-1…wm-n+1)
– This could be an n-gram model
– What is the relative probability of a word (among
various choices) to be a part of the n-gram
• The correct word maximizes the product
arg maxw’ P(wm’|wm) x P(wm I wm-1…wm-n+1)
Hidden Markov model
st-3 st-2 st-1 st

xt-3 xt-2 xt-1 xt

• A discrete set of hidden states

• Each state depends only on the previous state
• A discrete set of observations
• Each observation only depends on the current
state
• Inference is based on maximum likelihood
Role of linguistics in NLP, an example
• What if an n-gram wasn’t in the corpus?
• Knowledge of parts of speech (POS) can help
• Another NLP problem: POS tagging
• Linguistics uncovers language syntax, grammar,
and POS patterns
S
• Now word choices
can be limited by VP NP

POS for ASR or P V A N PP

spell check P NP

– No bare! There was a **** behind the bushes

Outline

• NLP basics

• Pre-processing in NLP

• Language model with an example

• From words to vectors

• Some applications
Encoding
• Moving from sparse (e.g. one-hot) to dense
vectors

• Each dimension could represent attributes

such as geography, gender, PoS etc.
Why do we need dense vectors?
• Sparse (one-hot) vectors are high dimensional

• Sparse vectors do not have a neighborhood or directional

relationship between words

• Inserting a new word in the vocabulary will lead to catastrophic

changes in the input space
CBOW and Skip-Gram
§ Example: It was a cat that made all the noise
§ In continuous bag-of-words (CBOW), we try to predict a word
given its surrounding context (e.g. location ± 2)
§ (was  cat), (a  cat), (that  cat), (made  cat)
§ In a skip-gram model, we try to model the contextual words
(e.g. location ± 2) given a particular word
§ (cat  was), (cat  a), (cat  that), (cat  made)
Visualizing CBOW and Skip-Gram
Words from ±w positions

was a that made One-hot encoded

Weight matrix

CBOW Predictor Dense encoding

Weight matrix
cat One-hot encoded
(softmax)

One-hot encoded
cat
Weight matrix

Predictor Dense encoding

Skip-Gram
Weight matrix
was a that made One-hot encoded
(softmax)
Adapted from: https://arxiv.org/pdf/1301.3781.pdf
Words from ±w positions
How it is trained
• The objective is to maximize the probability of actual skip-grams,
while minimizing the probability of non-existent skip-grams

.
• .

Adapted from: https://arxiv.org/pdf/1402.3722.pdf

Importance of negative sampling
• If we just had positive sampling

• Then, we could easily increase this probability by making

a very large number.
Other considerations
• Make it more likely to drop frequent words
– Probability of keeping
– This effectively counters stop words such as ‘the’
• Negative sampling is based on frequency
– E.g., Probability of keeping |𝑉|
𝑗=1
/
– Practically, this worked better |𝑉| /
𝑗=1
The new vectors can directly be used to find analogs
• E.g. vprince – vboy + vgirl = vprincess

princess
girl

prince
boy
Word2Vec example results

Source: https://arxiv.org/pdf/1301.3781.pdf
Word2vec design choices
§ Dimension of the vector
§ Large dimension is more expressive
§ Small dimension trains faster
§ No incremental gain after a particular dimension
§ Number of negative samples
§ Increases the search space
§ Gives better models
§ Neural network architecture
§ Hidden units to convert 1-hot-bit into a vector
Co-occurrence matrix
Raw counts within a certain window Counts converted to probabilities
cat is chasing the mouse cat is chasing the mouse
cat 0 2 0 3 0 cat 0.00 0.40 0.00 0.60 0.00
is 2 0 2 0 0 is 0.50 0.00 0.50 0.00 0.00
chasing 0 2 0 3 4 chasing 0.00 0.22 0.00 0.33 0.44
the 3 0 3 0 5 the 0.27 0.00 0.27 0.00 0.45
mouse 0 0 4 5 0 mouse 0.00 0.00 0.44 0.56 0.00

• Consider two similar words (cat, kitty) with similar context

• Their co-occurrence vectors will be similar
• The co-occurrence matrix will be low rank
• We can represent co-occurrence matrix using SVD C = U ∑ V
• SVD is an expensive operation for a large vocabulary
GloVe : Global Vectors
§ GloVe captures word-word co-occurrences in the entire corpus
better
§ Let ij be the co-occurrence probability of words indexed with and
§ Let i be j ij
§ And, let ij ij i
§ What GloVe models is i j
T
k ik jk

“GloVe: Global Vectors forWord Representation” by Jeffrey Pennington, Richard Socher, Christopher D. Manning
GloVe explanation
§ Cost function:

§ For words co-occurrence probability is

§ And, a weighing function

§ Suppresses rare co-occurrences

§ And prevents frequent co-occurrences from taking over

“GloVe: Global Vectors forWord Representation” by Jeffrey Pennington, Richard Socher, Christopher D. Manning
GloVe is more accurate than word2vec

The accuracy show above is on word analogy task

“GloVe: Global Vectors forWord Representation” by Jeffrey Pennington, Richard Socher, Christopher D. Manning
Outline

• NLP basics

• Pre-processing in NLP

• Language model with an example

• From words to vectors

• Some applications
Application: PoS tagging
• Goal: Find part-of-speech of each word
• Application: Use in language model to structure
sentences better
• Example:
Amit found the tray and started to bring it to the guest
NNP VBD DT NN CC VBD TO VB PRP IN DT NN

• Certain regular expressions can be helpful

• For example, words ending with *ing are usually verbs
• Corpora with tagged words can be used
• For example, Brown corpus
Examples of tags
• Nouns • Adjective
• Singular noun  NN (Cat) • Basic  JJ (bad)
• Plural noun  NNS (Cats) • Comparative  JJR (worse)
• Proper noun  NNP (Garfield) • Adverb
• Personal pronouns  PRP (He) • Basic  RB (quickly)
• Verb • Determiner
• Base verb  VB (sleep) • Basic  DT (a, an, the)
• Gerund  VBG (sleeping) • WH  WDT (which, who)
• Preposition  IN (over) • Coordinating conjunction
 CC (and, or, however)
Some PoS Tagging Challenges
• Ambiguity that needs context
– It is a quick read (NN)
– I like to read (VB)

5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
No ratings yet
Motivation Video: Mitsuku Vs Cleverbot - AI (Artificial Intelligence)
45 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
CH 6. Applications of AI-NLP
No ratings yet
CH 6. Applications of AI-NLP
65 pages
NLP_DeepNLP
No ratings yet
NLP_DeepNLP
61 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
5.0
No ratings yet
5.0
34 pages
Chapter Four 1
No ratings yet
Chapter Four 1
91 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
C10_AI_UNIT 3_NLP_ HALF YEARLY
No ratings yet
C10_AI_UNIT 3_NLP_ HALF YEARLY
37 pages
IntroductionToNLPAbebeZerihun
No ratings yet
IntroductionToNLPAbebeZerihun
45 pages
AI_NLP
No ratings yet
AI_NLP
9 pages
NLP Intro
No ratings yet
NLP Intro
74 pages
Chapter 6-NLP
No ratings yet
Chapter 6-NLP
8 pages
MOD-1
No ratings yet
MOD-1
71 pages
NLP Unit 1
No ratings yet
NLP Unit 1
44 pages
Intro DL 10 NLP
No ratings yet
Intro DL 10 NLP
99 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Intro to statistical nlp
No ratings yet
Intro to statistical nlp
57 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
5 pages
Module03 Embeddings
No ratings yet
Module03 Embeddings
102 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
NLP_AI_X
No ratings yet
NLP_AI_X
6 pages
INFOSYS Natural Language Processing
No ratings yet
INFOSYS Natural Language Processing
13 pages
Dealing With Textual Data
No ratings yet
Dealing With Textual Data
67 pages
Module_5-Natural_language_processing[1]
No ratings yet
Module_5-Natural_language_processing[1]
13 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
ai-part-b-ch12
No ratings yet
ai-part-b-ch12
16 pages
NLP 9
No ratings yet
NLP 9
44 pages
Natural Language Processing
No ratings yet
Natural Language Processing
10 pages
pdf NLP
No ratings yet
pdf NLP
7 pages
Natural Language Processing tools and approaches
No ratings yet
Natural Language Processing tools and approaches
106 pages
Lecture_4_N_grams
No ratings yet
Lecture_4_N_grams
29 pages
Intro To NLP: Natural Language Toolkit
No ratings yet
Intro To NLP: Natural Language Toolkit
11 pages
DR Pushpak's Talk IIT Bombay, Ex IIT Patna
No ratings yet
DR Pushpak's Talk IIT Bombay, Ex IIT Patna
136 pages
Unit-3 (NLP)
No ratings yet
Unit-3 (NLP)
28 pages
NLP Introduction Overview
No ratings yet
NLP Introduction Overview
34 pages
13 Ai Cse551 NLP 1 PDF
No ratings yet
13 Ai Cse551 NLP 1 PDF
50 pages
NLP Introduction
No ratings yet
NLP Introduction
35 pages
NLP Viva
No ratings yet
NLP Viva
14 pages
AI_M3_Merged.pdf
No ratings yet
AI_M3_Merged.pdf
98 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
About NLP
No ratings yet
About NLP
14 pages
module-1 ch-2
No ratings yet
module-1 ch-2
31 pages
Predicting Words and Sentences Using Statistical Models: Nicola Carmignani
No ratings yet
Predicting Words and Sentences Using Statistical Models: Nicola Carmignani
42 pages
SebentaLN-parte1
No ratings yet
SebentaLN-parte1
42 pages
NLP_Week_02
No ratings yet
NLP_Week_02
55 pages
Lec 1.1.2
No ratings yet
Lec 1.1.2
44 pages
A Beginner's Guide To Natural Language Processing - IBM Developer
No ratings yet
A Beginner's Guide To Natural Language Processing - IBM Developer
9 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
2020 NLPDeepLearning
No ratings yet
2020 NLPDeepLearning
72 pages
Q_ClassX_AI_Ch7
No ratings yet
Q_ClassX_AI_Ch7
6 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
Lecture03 Naive Bayes
No ratings yet
Lecture03 Naive Bayes
33 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
NLP_M4_Part_2_SPP
No ratings yet
NLP_M4_Part_2_SPP
71 pages
Introduction To Language Modeling Final
No ratings yet
Introduction To Language Modeling Final
69 pages
Unit 6 (NLP)
No ratings yet
Unit 6 (NLP)
8 pages
Word Guide: Choosing the right words
From Everand
Word Guide: Choosing the right words
Mary Morel
No ratings yet
Lesson 2 - His Name, Her Name
No ratings yet
Lesson 2 - His Name, Her Name
17 pages
DLL For COT 3rd Quarter
No ratings yet
DLL For COT 3rd Quarter
5 pages
Extrapractice
No ratings yet
Extrapractice
10 pages
OSA: Statement On The Attempt To Alter Qubee Afaan Oromoo (Oromo Alphabet) by The Ethiopian Government
No ratings yet
OSA: Statement On The Attempt To Alter Qubee Afaan Oromoo (Oromo Alphabet) by The Ethiopian Government
3 pages
Guideline for Preparing Your 3-Minute TED Talk Presentation_
No ratings yet
Guideline for Preparing Your 3-Minute TED Talk Presentation_
2 pages
BUS4011 Marketing Dynamics Assessment
No ratings yet
BUS4011 Marketing Dynamics Assessment
7 pages
Name: Instructor:: Composition Essay Rubric in EDUC 108 (Technology For Teaching and Learning - 1)
No ratings yet
Name: Instructor:: Composition Essay Rubric in EDUC 108 (Technology For Teaching and Learning - 1)
3 pages
Doaa Article Presentation Media Literacy
No ratings yet
Doaa Article Presentation Media Literacy
23 pages
Basic Medical Sign Language
No ratings yet
Basic Medical Sign Language
8 pages
Level 8 1 11 Adv & 11 Elite English Assessment Term 2 2023 2024
No ratings yet
Level 8 1 11 Adv & 11 Elite English Assessment Term 2 2023 2024
13 pages
Letter - Id.recording - Sheet.blank 1
No ratings yet
Letter - Id.recording - Sheet.blank 1
2 pages
Schools of Linguistics Schools of Modern Linguistics
No ratings yet
Schools of Linguistics Schools of Modern Linguistics
11 pages
B.ingg. KLS 8 Unit 1
No ratings yet
B.ingg. KLS 8 Unit 1
22 pages
Oakland 2004 Language, Reading, and Readability Formulas Implications For Developing and Adapting Tests
No ratings yet
Oakland 2004 Language, Reading, and Readability Formulas Implications For Developing and Adapting Tests
16 pages
Lesson Overview:: Elif Demo Lesson Plan
No ratings yet
Lesson Overview:: Elif Demo Lesson Plan
5 pages
Storytelling-and-other-cultural-strategies
No ratings yet
Storytelling-and-other-cultural-strategies
18 pages
21st Century Literature 2nd Quarter Week 3 DLL
No ratings yet
21st Century Literature 2nd Quarter Week 3 DLL
4 pages
PRACTICAL RESEARCH 2 Annotated Bibliography
No ratings yet
PRACTICAL RESEARCH 2 Annotated Bibliography
18 pages
Answer The Following Questions Briefly: Héctor Figuera ID: 29.930.711
No ratings yet
Answer The Following Questions Briefly: Héctor Figuera ID: 29.930.711
4 pages
+model Examination-January 2023 - Class - 11
No ratings yet
+model Examination-January 2023 - Class - 11
5 pages
Language and Culture
No ratings yet
Language and Culture
3 pages
Furrl Growth Analyst Assignment
No ratings yet
Furrl Growth Analyst Assignment
13 pages
Japanese Up Close-15 Lessons on Society and Cultur
100% (1)
Japanese Up Close-15 Lessons on Society and Cultur
6 pages
English in Education-Research Paper Draft 5
No ratings yet
English in Education-Research Paper Draft 5
40 pages
CMAP English K1-2023 - 2024
No ratings yet
CMAP English K1-2023 - 2024
20 pages
Daily English Lesson Plan: Week Lesson Day & Date Time Theme Topic Cce/Ho TS
100% (1)
Daily English Lesson Plan: Week Lesson Day & Date Time Theme Topic Cce/Ho TS
3 pages
Đáp án ĐỀ ANH 6 - K1
No ratings yet
Đáp án ĐỀ ANH 6 - K1
2 pages
Method Description Advantages Disadvantages Example: Cornell Note-Taking System
No ratings yet
Method Description Advantages Disadvantages Example: Cornell Note-Taking System
1 page
Devising Drama Portfolio Factsheet and Self Assessment Form
No ratings yet
Devising Drama Portfolio Factsheet and Self Assessment Form
3 pages
Pass The Message
No ratings yet
Pass The Message
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

05 Introduction To NLP

Uploaded by

05 Introduction To NLP

Uploaded by

EE782 Advanced Topics in Machine Learning

• Revise what is NLP

• Understand some key problems in NLP

• Appreciate earlier frameworks used for NLP

• Some example solutions to NLP problems

• Language model with an example

• From words to vectors

• Document: A text sample

• Vocabulary: A list of words used in the corpus

• Language model: How the words are supposed to be organized

Text source: Amazon.com reviews

Text source: Reuters

Tool Source: http://mshang.ca/syntree/

• Crash can mean an accident, a percussion strike, or a collapse.

• Course can mean a study plan, or a path.

• Language model with an example

• From words to vectors

Text source: Reuters

Without IDF, dog would not be able to compete with is.

• Add word vectors:

Text source: Reuters

• Language model with an example

• From words to vectors

§ Can you predict the next word?

The stocks fell again today for a third day

• This simplifies our model

• But, are all the bigrams and trigrams equally likely?

• Automatic speech recognition (ASR)

• Context for the word being checked

• Exchanging two letters, e.g. baer

The distortion model is called channel model

Channel model: P(typed word | candidate word)

xt-3 xt-2 xt-1 xt

• A discrete set of hidden states

POS for ASR or P V A N PP

– No bare! There was a **** behind the bushes

• Language model with an example

• From words to vectors

• Each dimension could represent attributes

• Sparse vectors do not have a neighborhood or directional

• Inserting a new word in the vocabulary will lead to catastrophic

was a that made One-hot encoded

CBOW Predictor Dense encoding

Predictor Dense encoding

Adapted from: https://arxiv.org/pdf/1402.3722.pdf

• Then, we could easily increase this probability by making

• Consider two similar words (cat, kitty) with similar context

§ For words co-occurrence probability is

§ Suppresses rare co-occurrences

The accuracy show above is on word analogy task

• Language model with an example

• From words to vectors

• Certain regular expressions can be helpful

• Differences in numbers of tags

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.