0% found this document useful (0 votes)

58 views

NLP m1

The document provides an overview of natural language processing including its history, common tasks, challenges, and evolution. It discusses key NLP concepts like tokenization, stemming, lemmatization, part-of-speech tagging, and regularization expressions. The future of NLP is predicted to incorporate other signals like biometrics and enable more human-like computer interactions.

Uploaded by

priyankap1624153

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views

NLP m1

Uploaded by

priyankap1624153

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 148

4

MODULE 1
Natural Language Processing

•Introduction–
Past, present, and future of NLP;
Classical problems on text
processing; Necessary Math
concepts for NLP;
expressions Regular NLP;
in
processing: lemmatization,
Basic text stop
word, tokenisation, stemming,
etc; Spelling errors corrections–
Minimum edit distance, Bayesian
method;
What is NLP?
Need For NLP
• In Neuropsychology, linguistics & the philosophy of language, a
natural lang or ordinary lang is any lang that has evolved
naturally in humans through use & repetition without conscious
planning or premeditation
• Natural lang can take different forms, such as speech or signing.
• They are distinguished from constructed & formal lang such as
those used to program computers or to study logic.
Common NLP Tasks or Applications of NLP:
Applications of NLP
❖ Cars with an automatic speech recognition system
❖ Capture words in soundtrack from millions of hours of
video on the web
❖ Cross language information retrieval and translation-
Google Translator
❖ Automatic essay Analyzer, Tool To Summarise Research Papers
❖ Automated interactive virtual agents- Tutors
Even Humans made
Blunders
Chatbots: TayTweet
@TayandYou
Chatbots: TayTweet
@TayandYou
Goal of NLP
Approaches to NLP
(Lexical Dictionary)
Heuristic Method:
• Is any approach to problem solving or self-discovery that employs a
practical method that is not guaranteed to be optimal, perfect or
rational, but is nevertheless sufficient for reaching an immediate,
short term goal or approximation.

• Advantages: i. Quick Approaches (accurate)

ii. Less Errors

Current Scenario: So many Times when we are using ML & DL still we

are using Heuristic Approaches.
Algorithm ruled

Probability based

Classification Algo

Linear Discriminant Analysis Topic Modelling

Parts Of Speech
Sequence Learning
Data Acquisition:
Text preparation:
Steps For cleaning:
i.Removing Html Tags:
Unicode Normalization & Spell Checking:
Basic preprocessing:
Tokenization:
NLP Pipeline
NLP Pipeline
NLP Pipeline
NLP Pipeline
NLP Pipeline

Lemma
NLP Pipeline
NLP Pipeline
Feature Engineering:
Modelling:
Deployment:
Alternative Views on
NLP

□ Computational models of human language processing

❖ Programs that operate internally the way humans do
□ Computational models of human communication
❖ Programs that interact like humans
□ Computational systems that efficiently process text and speech
Complex Language Behavior
Requires
□ Phonetics and Phonology
❖ Word Sequence of sounds
❖ How the sound is realized acoustically

□ Morphology
❖ The way words break down into component parts that
carry meanings like: singular vs plural or I’m vs I am
Complex Language Behavior
Requires
□ Syntax
❖ Structural knowledge to properly string together the words
that constitute the response

❖ I’m I do, sorry that afraid Dave I’m I can’t

❖ I’m sorry Dave, I’m afraid I can’t do that

Complex Language Behavior
Requires

□
Semantics
❖ Lexical Semantics : meaning of all the words
❖ Compositional Semantics : knowledge about the
relationship of the
words

❖ How much Chinese silk was exported to Western

Europe by the end of the 18th century?
Complex Language Behavior
Requires

□ Pragmatics
❖ Kind of actionsthat speakers intend by their use
of sentences
Complex Language Behavior
Requires
□ Discourse
❖ Knowledge about linguistics units larger than a
utterance single

❖ How many students graduated that year?

that year may be :

▪ When the first batch graduates
Or
▪ When covid -19 hit the world
Ambigu
i ty
What Makes Natural
Language Processing
Difficult?
□ Ambiguity at Word
Level
What Makes Natural
Language Processing
Difficult?
□ Ambiguity at Sentence
Level

48
Ambiguit
y
□ Morphological Ambiguity – - > Part of Speech
Tagging
□ Semantic Ambiguity - - > Lexical Disambiguation
□ Syntactic Ambiguity - - > Probabilistic Parsing

33
What Makes Natural
Language Processing
Difficult?
□ Ambiguity at Meaning
Level
The Evolution of Natural Language
Processing

43
The Evolution of Natural Language Processing:
History of NLP

• In 1952, Bell Labs created Audrey, the first speech recognition system. It could recognize all
ten numerical digits.

• DARPA developed Harpy at Carnegie Mellon University in 1971. It was the first system to
recognize over a thousand words.
53
The Evolution of Natural Language Processing:
History of NLP

54
The Evolution of Natural Language Processing:
Current trends in NLP

□ Speech to text Conversion

□ Text to Speech
□ NLP integrated with deep learning and machine learning
has enabled chatbots and virtual assistants to carry
out complicated interactions.

□ NLP in healthcare can monitor treatments and

analyze reports and health records.

□ Cognitive analytics and NLP are combined

to automate routine tasks. 55
The Evolution of Natural Language Processing:
Current trends in NLP

Chatbots types 56
The Evolution of Natural Language Processing:
Current trends in NLP: Various NLP Algorithms

□ Bag of words: This model counts the frequency of each unique

word in an article.

□ TF-IDF: TF (term frequency) is calculated as the number of

times a certain term appears out of the number of terms present
in the document.

□ Co-occurrence matrix: To solve the problem of semantic

ambiguity. It tracked the context of the text but required a lot of
memory to store all the data

□ Transformer models: This is the encoder and decoder model

that uses attention to train the machines that imitate human
attention faster. BERT, developed by Google revolutionized NLP. 48
The Evolution of Natural Language Processing:
Future Predictions of NLP

58
The Evolution of Natural Language Processing:
Future Predictions of NLP

59
The Evolution of Natural Language Processing:
Future Predictions of NLP

□ non-verbal communications, like body language, gestures, and facial

expressions: use biometrics like facial recognition and retina scanner

□ creation of humanoid robotics by integrating NLP with biometrics.

computer-human interaction will move into computer-human
communication

□ NLP evolution can create robots who can see, touch, hear, and speak,
much like humans

60
Regular Expressions in
NLP
□ a language for specifying text search strings.

□ RE is defined as a sequence of characters that are mainly

used to find or replace patterns present in the text
□ RE is a set of characters that is used to find substring in a given string

□ RE is a formula : in a special language which can be used for specifying

simple classes of strings-> a sequence od symbols
□ RE is an algebraic expression for characterizing a set of strings

□ RE is defined as a an Instruction that is given to a function

on what or how to match/search or replace a set of strings

61
Regular Expressions in
NLP

□ RE require two things:

❖ Pattern: We wish to search

❖ Corpus : text from which we need to search

62
Regular Expressions in
NLP

The use of the brackets [] to specify a disjunction of

characters..

63
The question mark ? marks optionality of the previous
Regular Expressions in NLP
expression

The caret ^ for negation or just to mean ^

64
Regular Expressions in
NLP
Kleene * : “zero or more occurrences of the immediately previous character
or regular expression”.
Consider the language of certain sheep, which consists of strings that
look like the following:
baa!
the sheep language:
baaa!
/baaa*!/
baaaa!
baaaaa!
...

Kleene + : “one or more occurrences of the immediately preceding character

or regular expression”.
the sheep language:
/baa+!/
65
Regular Expressions in NLP

Anchors are special characters that anchor regular expressions to

particular places in a string.

Anchors in regular expressions.

66
Regular Expressions in NLP
Disjunction, Grouping, and Precedence

the order precedence of RE operator precedence, from

highest precedence to lowest precedence.

67
Disjunction:

Negation in Disjunction:
Regular Expressions in NLP

Aliases for common sets of

characters

Regular expression operators for counting 71

Regular Expressions in NLP

Some characters that need to be backslashed

72
Regular Expressions in NLP
Substitution, Capture Groups, and
ELIZA

Substitutions and capture groups are very useful in

implementing simple chatbots like ELIZA (Weizenbaum, 1966).

73
Regular Expressions in NLP
Evaluate the Regular Expressions

• the set of all alphabetic strings;

• the set of all lower case alphabetic strings ending in a b;

• the set of all strings from the alphabet a, b such that

each a is immediately preceded by and immediately
followed by a b;

• the set of all strings with two consecutive repeated

words (e.g., “Humbert Humbert” and “the the” but not
“the bug” or “the big bug”);

74
Regular Expressions in NLP
Evaluate the Regular Expressions
• all strings that start at the beginning of the line with an
integer and that end at the end of the line with a
word;;

• all strings that have both the word grotto and the word
raven in them (but not, e.g., words like grottos that
merely contain the word grotto);

• write a pattern that places the first word of an

English sentence in a register. Deal with punctuation.

75
Text Normalization

• Tokenizing (segmenting)
words
• Normalizing word formats
• Segmenting sentences

76
Text
Tokenization

77
Text
Tokenization

78
Text
Tokenization
• break off punctuation as a separate token;
□ commas are a useful piece of information
for parsers,
□ periods help indicate sentence boundaries.
✔ punctuation that occurs word internally like
m.p.h., Ph.D., Dr., U.S.
✔ prices ($45.55) and dates (01/02/06);
✔ URLs (http://www.stanford.edu),
✔ Twitter hashtags (#nlproc), or
✔ email addresses (someone@cs.colorado.edu).
✔ commas are used inside numbers in English, every
three digits: 555,500.50
79
Text
Tokenization
• A tokenizer can also be used to expand clitic contractions that
are marked by apostrophes, for example, converting what're
to the two tokens what are, and we're to we are

80
Text Tokenization- Byte-Pair Encoding
(A morpheme is
the smallest
• To deal with unknown word problem meaning-bearing
unit of a language
• Subwords can be arbitrary substrings, or they can be
meaning-bearing units like the morphemes -est or
-er.

• Most tokenization schemes have two parts:

• a token learner, and □ raw training corpus
• a token segmenter. □ raw test sentence.

❖ byte-pair encoding ❖ unigram language modeling and

❖ SentencePiece ❖ SentencePiece
81
Text Tokenization- Byte-Pair Encoding

• Begins with a vocabulary that is just the set of all

individual characters.

• It continues to count and merge, creating new longer

and longer character strings, until k merges have been
done creating k novel tokens;.

82
Text Tokenization- Byte-Pair Encoding

Input corpus of 18 word tokens with counts for each word

first count all pairs of adjacent symbols:

• the most frequent is the pair e r

83
Text Tokenization- Byte-Pair Encoding

84
Text Tokenization- Byte-Pair Encoding

the token parser is used to tokenize a test sentence, once the vocabulary is
learned.
85
Text
Tokenization

86
Word Normalization, Lemmatization and
Stemming
•Word normalization
□ putting words/tokens in a standard format,
□ choosing a single normal form for words with multiple
forms.
•For sentiment
analysis
extraction, andand other
machine
Case folding (normalization ) text classification
translation
tasks, information
□ Mapping everything to lower case

□ two morphologically different of a word behave

forms similarly. to

India, of India, Indian,

India’s, for India 87
Word Normalization, Lemmatization and
Stemming
Lemmatization is the task of
▪ determining that two words have the same
root,
▪ despite their surface differences..
am, are, and is
have the shared
lemma ‘be’

❖ He is reading detective stories

□ He be read detective story.

88
Word Normalization, Lemmatization and
Stemming

89
Word Normalization, Lemmatization and
Stemming
Lemmatization Morphology is the study of
• complete morphological the way words are built up
from smaller
parsing of the word. meaning-bearing units called
morphemes’

❖ Two broad classes of morphemes can be distinguished:

• stems—the central morpheme of the word, supplying
the main meaning—and
• affixes—adding “additional” meanings of various kinds..

90
Lemmatization:
Word Normalization, Lemmatization and
Stemming

• fox— • one morpheme (the morpheme

fox)
• Cats— • Two morpheme : the morpheme cat and the
• morpheme -s.

92
• Text on which Porter Stemmeris applied
Word Normalization, Lemmatization and
This Stemming
was not the map we found in
Billy Bones's chest, but an
accurate complete in all
copy,
things-names and heightsand
soundings-with the single

• simpler but cruder method, which mainly consists of chopping

off • stemmed output
word-final stemming
Thi wa not the map we found in
affixes
exception of the red crosses and Billi Bone s chest but an
the written notes. accur copi complet in all
thing name and height and
sound
with the singl except of the red
cross and the written note 93
Word Normalization, Lemmatization and
Stemming
Stemming
• Cascade of rules

94
Stop
Words

90
Stop Words
Stop
Words-Applications
❖ Supervised machine learning – removing stop words from the
feature space

❖ Clustering – removing stop words prior to generating clusters

❖ Information retrieval – preventing stop words from being indexed

❖ Text summarization- excluding stop words from contributing to

summarization scores & removing stop words when computing
ROUGE scores

can be detrimental. For

instance, in sentiment
analysis
99
Stop
Words-Types

❖ Determiners – Determiners tend to mark nouns

where a determiner usually will be followed by a noun
examples: the, a, an, another
❖ Coordinating conjunctions – Coordinating conjunctions
connect words, phrases, and clauses
examples: for, an, nor, but, or, yet, so
❖ Prepositions – Prepositions temporal or
express relations spatial
examples: in, under, towards,
before 100
Stop
Words-Benefits
Key benefits of removing stopwords:

❖ On removing stopwords, dataset size decreases and the time

to train the model also decreases
❖ Removing stopwords can potentially help improve the
performance as there are fewer and only meaningful tokens
left. Thus, it could increase classification accuracy
❖ Even search engines like Google remove stopwords for fast
and relevant retrieval of data from the database

101
Regular Expression:
Text Normalization

106
Spacy:
Minimum
Edit
Distance

95
Minimum Edit
Distance

111
Minimum Edit
Distance

• coreference, the task of deciding whether two strings refer

to the same entity:

❖ Stanford President Marc Tessier-Lavigne

❖ Stanford University President Marc Tessier-Lavigne

112
Minimum Edit
Distance

• The minimum edit distance between two strings

• Is the minimum number of editing operations
o Insertion
o Deletion
o Substitution
• Needed to transform one into the other

113
Minimum Edit
Distance

114
Minimum Edit
Distance

115
Minimum Edit
Distance

• delete an i,
• Substitute e for n,
• Substitute x for t,
• Insert c,
• Substitute u for n

□ d for deletion,
□ s for
substitution, 116
□ i for insertion.
Minimum Edit
Distance

each insertion or
deletion has a cost of 1
and substitutions are
not allowed.

117
How to find the Min Edit
Distance?

118
How to find the Min Edit
Distance?

119
Min Edit
Distance

120
Dynamic Programming for Minimum Edit
Distance

❖ Dynamic programming: A tabular computationn of D(n,m)

❖ Solving problems by combining solutions to subproblems.

121
Dynamic Programming for Minimum Edit Distance

D[i; j] as the edit distance between X[1::i] and Y[1:: j], i.e., the first i
characters of X and the first j characters of Y.

122
Dynamic Programming for Minimum Edit
Distance

123
Dynamic Programming for Minimum Edit
Distance

124
Dynamic Programming for Minimum Edit
Distance

125
Dynamic Programming for Minimum Edit
Distance

126
Dynamic Programming for Minimum Edit
Distance

127
Viterbi algorithm for Minimum Edit
Distance

❖ The Viterbi algorithm is a probabilistic extension of minimum edit

distance.
❖ Instead of computing the “minimum edit distance” between two
strings, Viterbi computes the “maximum probability alignment”
of one string with another.

128
114
BAYESIAN APPROACH TO SPELLING
CORRECTION
‘Noisy channels’
❖ In a number of tasks involving natural language, the problem can
be viewed as recovering an ‘original signal’ distorted by a `noisy
channel’:
– Speech recognition
– Spelling correction
– OCR / handwriting recognition
– (less felicitously perhaps): pronunciation variation

130
BAYESIAN APPROACH TO SPELLING
CORRECTION
Spelling Errors

131
BAYESIAN APPROACH TO SPELLING CORRECTION

Spelling Errors

132
BAYESIAN APPROACH TO SPELLING
CORRECTION

Types of Spelling Errors

Damerau (1964): 80% of all misspelled words (non- word

errors) are caused by SINGLE-ERROR MISSPELLINGS:

133
BAYESIAN APPROACH TO SPELLING
CORRECTION
Types of Spelling
Errors

134
BAYESIAN APPROACH TO SPELLING
CORRECTION
Dealing with Spelling
Errors

135
BAYESIAN APPROACH TO SPELLING
CORRECTION
Noisy Channel Model

136
BAYESIAN APPROACH TO SPELLING
CORRECTION

Bayesian inference

❖ `Bayesian inference’ is the name given to techniques typically

used in diagnostics to identify the CAUSE of certain
OBSERVATIONS
❖ The name ‘Bayesian’ comes from the fact that Bayes’ rule is
used to ‘turn around’ a problem:
▪ from one of finding statistics about the posterior
probability of the CAUSE to one of finding the posterior
probability of the OBSERVATIONS

137
BAYESIAN APPROACH TO SPELLING
CORRECTION
Bayesian
inference

138
BAYESIAN APPROACH TO SPELLING
CORRECTION

Bayesian
inference

Using Bayes’ Rule, this probability can be `turned

around’:

139
BAYESIAN APPROACH TO SPELLING
CORRECTION

Bayesian inference
In this approach Bayes’ theorem is used to compute the probability
of the intended word being ‘w’ when the typist in fact has typed ‘x’:

P(the|thme)

This is called the posterior probability of ‘w’ being the

intended word.
140
BAYESIAN APPROACH TO SPELLING
CORRECTION

Bayesian inference
the word in dictionary with the highest posterior probability is
chosen as the intended word:

a set of candidates for any input word (x), which we call C, and do the
maximization over the C set

141
BAYESIAN APPROACH TO SPELLING
CORRECTION

Bayesian inference
We will also rank candidates according to log-posterior instead
of posterior probability:

142
BAYESIAN APPROACH TO SPELLING
CORRECTION
Steps to develop the algorithm

143
BAYESIAN APPROACH TO SPELLING
CORRECTION
The best suggestion for the correct word given the incorrect spelling
”acress” can be calculated using Bayes rule as shown below as shown
below:

144
BAYESIAN APPROACH TO SPELLING
CORRECTION
The best suggestion for the correct word given the incorrect spelling
”acress” can be calculated using Bayes rule as shown below as shown
below:

145
BAYESIAN APPROACH TO SPELLING
CORRECTION
The best suggestion for the correct word given the incorrect spelling
”acress” can be calculated using Bayes rule as shown below as shown
below:

146
BAYESIAN APPROACH TO SPELLING
CORRECTION

147
BAYESIAN APPROACH TO SPELLING
CORRECTION

148

(FREE PDF Sample) (Original PDF) Modern Macroeconomics (The MIT Press) by Sanjay K. Chugh Ebooks
100% (6)
(FREE PDF Sample) (Original PDF) Modern Macroeconomics (The MIT Press) by Sanjay K. Chugh Ebooks
35 pages
Natural Language Processing CS 1462: Some Slides Borrows From Carl Sable
No ratings yet
Natural Language Processing CS 1462: Some Slides Borrows From Carl Sable
54 pages
UNIT I_NLP
No ratings yet
UNIT I_NLP
24 pages
NLP Unit1Content
No ratings yet
NLP Unit1Content
106 pages
Natural Language Processing Dossier 20231110 141736 0000
No ratings yet
Natural Language Processing Dossier 20231110 141736 0000
114 pages
NLP unit-1-introduction-and-word-level-analysis
No ratings yet
NLP unit-1-introduction-and-word-level-analysis
25 pages
NLP Notes Unit-1
No ratings yet
NLP Notes Unit-1
20 pages
PresentationDayone-Introduction of NLP
No ratings yet
PresentationDayone-Introduction of NLP
17 pages
1 Introduction
No ratings yet
1 Introduction
99 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
English Vocabulary in Use Advanced With Answers
No ratings yet
English Vocabulary in Use Advanced With Answers
9 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Module-I_NLP (1)
No ratings yet
Module-I_NLP (1)
35 pages
ML Module A7707 - Part1
No ratings yet
ML Module A7707 - Part1
48 pages
Unit1 SNLP Osmania University
No ratings yet
Unit1 SNLP Osmania University
16 pages
NLP PPT1 (1)
No ratings yet
NLP PPT1 (1)
29 pages
NLP Unit-1 - 1
No ratings yet
NLP Unit-1 - 1
24 pages
NLP UNIT 1
No ratings yet
NLP UNIT 1
46 pages
NLP chap1
No ratings yet
NLP chap1
50 pages
Chapter 6-NLPs
No ratings yet
Chapter 6-NLPs
31 pages
NLP-UNIT-1-1
No ratings yet
NLP-UNIT-1-1
67 pages
nlp
No ratings yet
nlp
19 pages
Human Communication, Either Spoken or Written, Consisting of The Use of Words in A Structured and Conventional Way". Language Makes Us Unique From Other Living Beings and I Would
No ratings yet
Human Communication, Either Spoken or Written, Consisting of The Use of Words in A Structured and Conventional Way". Language Makes Us Unique From Other Living Beings and I Would
7 pages
NLP IA1
No ratings yet
NLP IA1
7 pages
NLP INSEM
No ratings yet
NLP INSEM
100 pages
Chapter 6.
No ratings yet
Chapter 6.
31 pages
Natural Language Processing 101
No ratings yet
Natural Language Processing 101
26 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
7 pages
38. Natural Language Processing (1) Copy
No ratings yet
38. Natural Language Processing (1) Copy
30 pages
IntroductionToNLPAbebeZerihun
No ratings yet
IntroductionToNLPAbebeZerihun
45 pages
NLP MODULE 1 Chapter1 &2 ppt
No ratings yet
NLP MODULE 1 Chapter1 &2 ppt
83 pages
Natural Language Processing: Some Screenshots Are Taken From NLP Course by Jufrasky - Used Only For Educational Purpose
No ratings yet
Natural Language Processing: Some Screenshots Are Taken From NLP Course by Jufrasky - Used Only For Educational Purpose
44 pages
1_NLP.docx
No ratings yet
1_NLP.docx
26 pages
Bhawini NLP Practical
No ratings yet
Bhawini NLP Practical
98 pages
Lecture1
No ratings yet
Lecture1
16 pages
Natural Language Processing Inside Pages 2
No ratings yet
Natural Language Processing Inside Pages 2
159 pages
NLP - Notebook
No ratings yet
NLP - Notebook
19 pages
Lecture 2 NLP
No ratings yet
Lecture 2 NLP
27 pages
Unit I Inroduction
No ratings yet
Unit I Inroduction
52 pages
Nlp Notes Unit 1
No ratings yet
Nlp Notes Unit 1
42 pages
NLP Lect Unit I
No ratings yet
NLP Lect Unit I
140 pages
Unit 1
No ratings yet
Unit 1
35 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
Archivo - 01 (4 Cópia)
No ratings yet
Archivo - 01 (4 Cópia)
6 pages
Introduction To NLP - Part 1
No ratings yet
Introduction To NLP - Part 1
23 pages
CH1
No ratings yet
CH1
87 pages
Introduction to Data Science_Week 7_LAQ's
No ratings yet
Introduction to Data Science_Week 7_LAQ's
4 pages
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
No ratings yet
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
41 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
NLP ppt
No ratings yet
NLP ppt
20 pages
C10_AI_UNIT 3_NLP_ HALF YEARLY
No ratings yet
C10_AI_UNIT 3_NLP_ HALF YEARLY
37 pages
NLP-UNIT-I FINAL
No ratings yet
NLP-UNIT-I FINAL
31 pages
unit 4 (1)
No ratings yet
unit 4 (1)
39 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
Big Data Finance t8 1 Choi Neoma NLP 2024
No ratings yet
Big Data Finance t8 1 Choi Neoma NLP 2024
13 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
NLP.pptx
No ratings yet
NLP.pptx
21 pages
2-Regular expressions, Text Normalization, Edit Distance
No ratings yet
2-Regular expressions, Text Normalization, Edit Distance
42 pages
Artificial Intelligence-UNIT-4
No ratings yet
Artificial Intelligence-UNIT-4
37 pages
Natural Language Processing manual
No ratings yet
Natural Language Processing manual
39 pages
The spaCy Handbook: Simplifying Natural Language Processing
From Everand
The spaCy Handbook: Simplifying Natural Language Processing
Robert Johnson
No ratings yet
Briggs and Stratton 500E
No ratings yet
Briggs and Stratton 500E
116 pages
Dodea Online Summer High School Course Offerings 2021
No ratings yet
Dodea Online Summer High School Course Offerings 2021
1 page
2nd Q CURMAP MATH 1
No ratings yet
2nd Q CURMAP MATH 1
3 pages
Niagara Falls
No ratings yet
Niagara Falls
2 pages
Dictionary PDF
No ratings yet
Dictionary PDF
41 pages
Teaching Dharma
No ratings yet
Teaching Dharma
22 pages
Sample Affidavit
No ratings yet
Sample Affidavit
1 page
Special Contact Cleaner Scotch 1625 TDS Master EMD 2018-05-09 AABBCC03915 - EN - 01
No ratings yet
Special Contact Cleaner Scotch 1625 TDS Master EMD 2018-05-09 AABBCC03915 - EN - 01
2 pages
A enchanted
No ratings yet
A enchanted
2 pages
Mitsubishi Corporation v. Commissioner of Internal Revenue
No ratings yet
Mitsubishi Corporation v. Commissioner of Internal Revenue
2 pages
Update Ghana R.M.E JHS 1
No ratings yet
Update Ghana R.M.E JHS 1
4 pages
Cw3551-Dis Lesson Plan
No ratings yet
Cw3551-Dis Lesson Plan
5 pages
Đề Hsg Ro7 Huyện 22-23 Chính Thức
No ratings yet
Đề Hsg Ro7 Huyện 22-23 Chính Thức
11 pages
UENR86310001
No ratings yet
UENR86310001
486 pages
3rd Module in Research
No ratings yet
3rd Module in Research
11 pages
12559/shiv Ganga Exp Third Ac Economy (3E)
No ratings yet
12559/shiv Ganga Exp Third Ac Economy (3E)
3 pages
The Trading System Myth
No ratings yet
The Trading System Myth
7 pages
Sangeetaben Mahendrabhai Patel Vs State of Gujarat & Anr On 23 April, 2012
No ratings yet
Sangeetaben Mahendrabhai Patel Vs State of Gujarat & Anr On 23 April, 2012
11 pages
D6960 en PDF
No ratings yet
D6960 en PDF
8 pages
kinh tế quốc tế
No ratings yet
kinh tế quốc tế
8 pages
Plagiarism Declaration Form (T-DF)
No ratings yet
Plagiarism Declaration Form (T-DF)
14 pages
03 - Donaldson 2001 Contingency Theory Cap.1
No ratings yet
03 - Donaldson 2001 Contingency Theory Cap.1
17 pages
Fake Complaint Appointment 9june17
No ratings yet
Fake Complaint Appointment 9june17
3 pages
X Science SQP Term 2 (2021 22)
No ratings yet
X Science SQP Term 2 (2021 22)
19 pages
Adaudit Plus Service Account Configuration
No ratings yet
Adaudit Plus Service Account Configuration
17 pages
Swami and Mother Worship (Textual Grammar)
No ratings yet
Swami and Mother Worship (Textual Grammar)
72 pages
Public Policy Analysis
No ratings yet
Public Policy Analysis
12 pages
Marshall County Scores
No ratings yet
Marshall County Scores
7 pages
Shareera - Formation of Human Body - Ayurvedic Concept
No ratings yet
Shareera - Formation of Human Body - Ayurvedic Concept
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

NLP m1

Uploaded by

NLP m1

Uploaded by

4

• Advantages: i. Quick Approaches (accurate)

Current Scenario: So many Times when we are using ML & DL still we

Linear Discriminant Analysis Topic Modelling

□ Computational models of human language processing

❖ I’m I do, sorry that afraid Dave I’m I can’t

❖ I’m sorry Dave, I’m afraid I can’t do that

❖ How much Chinese silk was exported to Western

❖ How many students graduated that year?

that year may be :

□ Speech to text Conversion

□ NLP in healthcare can monitor treatments and

□ Cognitive analytics and NLP are combined

□ Bag of words: This model counts the frequency of each unique

□ TF-IDF: TF (term frequency) is calculated as the number of

□ Co-occurrence matrix: To solve the problem of semantic

□ Transformer models: This is the encoder and decoder model

□ non-verbal communications, like body language, gestures, and facial

□ creation of humanoid robotics by integrating NLP with biometrics.

□ RE is defined as a sequence of characters that are mainly

□ RE is a formula : in a special language which can be used for specifying

□ RE is defined as a an Instruction that is given to a function

□ RE require two things:

❖ Pattern: We wish to search

❖ Corpus : text from which we need to search

The use of the brackets [] to specify a disjunction of

The caret ^ for negation or just to mean ^

Kleene + : “one or more occurrences of the immediately preceding character

Anchors are special characters that anchor regular expressions to

Anchors in regular expressions.

the order precedence of RE operator precedence, from

Aliases for common sets of

Regular expression operators for counting 71

Some characters that need to be backslashed

Substitutions and capture groups are very useful in

• the set of all alphabetic strings;

• the set of all lower case alphabetic strings ending in a b;

• the set of all strings from the alphabet a, b such that

• the set of all strings with two consecutive repeated

• write a pattern that places the first word of an

• Most tokenization schemes have two parts:

❖ byte-pair encoding ❖ unigram language modeling and

• Begins with a vocabulary that is just the set of all

• It continues to count and merge, creating new longer

Input corpus of 18 word tokens with counts for each word

first count all pairs of adjacent symbols:

□ two morphologically different of a word behave

India, of India, Indian,

❖ He is reading detective stories

❖ Two broad classes of morphemes can be distinguished:

• fox— • one morpheme (the morpheme

• simpler but cruder method, which mainly consists of chopping

❖ Clustering – removing stop words prior to generating clusters

❖ Information retrieval – preventing stop words from being indexed

❖ Text summarization- excluding stop words from contributing to

can be detrimental. For

❖ Determiners – Determiners tend to mark nouns

❖ On removing stopwords, dataset size decreases and the time

• coreference, the task of deciding whether two strings refer

❖ Stanford President Marc Tessier-Lavigne

• The minimum edit distance between two strings

❖ Dynamic programming: A tabular computationn of D(n,m)

❖ The Viterbi algorithm is a probabilistic extension of minimum edit

Types of Spelling Errors

Damerau (1964): 80% of all misspelled words (non- word

❖ `Bayesian inference’ is the name given to techniques typically

Using Bayes’ Rule, this probability can be `turned

This is called the posterior probability of ‘w’ being the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.