0% found this document useful (0 votes)

11 views71 pages

NLP - 1 - 250119 - 222702

The document provides an overview of Natural Language Processing (NLP), its significance, core concepts, and various applications including sentiment analysis, spam detection, and chatbots. It discusses key techniques such as text preprocessing, tokenization, stemming, lemmatization, and text vectorization methods like Bag of Words and Word2Vec. Additionally, it highlights the importance of parts of speech tagging and named entity recognition in enhancing NLP tasks.

Uploaded by

rishavraghavcoc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views71 pages

NLP - 1 - 250119 - 222702

Uploaded by

rishavraghavcoc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

NLP and its applications

By Vivek Anand
Introduction
• What is NLP?
• Why is NLP important?
• Core concepts:
• Pre processing
• Tokenization
• Parts of Speech tagging
• Vectorization
Main topics
Text Preprocessing
Text Vectorization
Potential Use Cases
Sentiment
Analysis
Spam Detection
Topic Modelling
Neural
Networks for
Text
classification
Chatbots
Text Summarization
WHAT IS NLP??
Natural Language Processing

• NLP is a Challenging field of AI

• Unstructured data
• Inconsistent
• NLP is a field of Computer Science , AI ,
Linguistics concerning interactions
between humans and computers
• NLP Enables the computers to understand
, interpret and generate human language
Why NLP?
• Two sources of information Vision(Photos & Videos) & Text
• Most of qualitative inputs available online are in the form of Text
(Blogs ,Transcripts, Tweets , Comments , Medical reports etc)
• Principal method of human communication is called language in
the form of text
• Language used by humans is called natural language in contrast
to an artificial language or computer code
• NLP is essential is making computers understand how humans
speak and communicate in order let computers engage back in
communication through speech and writing
NLP History
NLP Today
• Text Summarization
• Text Generation
• Language translation
• Attention based models (LSTM)
• Large language models
• Speech recognition
Real Time use cases – SPAM/HAM
• Spam Detection Vs HAM
• A filter to set up specific rules applied to the emails and messages
wherein the SPAM is sent straight to the spam folder.
Real Time use cases – Sentiment Analysis
• Sentiment Analysis
• Opinion mining / Sentiment analysis is an NLP technique to determine if
data is positive, neutral or negative. Used by businesses to see if
customers perceive the product / service in one of the three categories
Real Time use cases – Text Summarization
• Use of ML algorithms to synthesize large bodies of texts into its
most important parts.
• Extractive
• Abstractive
Real Time use cases – Text Generation
• For example Input = “Deep learning has revolutionised AI”
• Generated text from trained ML Model = “Deep learning has
revolutionized AI and researchers are testing it for new roles such
as predictive intelligence and social networking. But the company
is also looking for ways to improve its AI”
Real Time use cases – Topic model
• Topic modeling is an unsupervised machine learning technique to
identify hidden topics in a collection of documents.
• Automatically discover abstract "topics" that best describe the
content of a text corpus
Real Time use cases – Language Translation
• Language translation or Machine translation to enable computers
to understand one language and transfer it into other languages
NLP Terminology

Corpus Document Vocabulary Words

The text under study. An entire sentence in a Collection of words
Entire articles paragraph Not necessarily all words
Many Twitter feeds Selecting a sub set of
words depending on
experiment
Pre Trained models have
set vocabularies
Summary of learnings
Cleaning Transformation Modelling
Spl char removal Text → Quant Logistic
Numbers Bag of words Naïve bayes
Alpha numeric Index mapping Decision trees
Lowercase TF-IDF KNN
Stop words Word2vec (Shallow NN) SVM
White spaces Glove (Matrix Factorization) NLP Classifiers
Stemming
Lemmatization
Spellings
Text Preprocessing - Tokenization
• Tokenization is the process of breaking a piece of text into smaller
units, called tokens, which can be words, phrases, or even
characters. These tokens are the building blocks for further
natural language processing tasks.
• Paragraph →Sentence tokenization
• Sentence →Word based tokenization
• Words →Character based tokenization
• Sub word tokenization →Using models like BERT→Eg Unbelievable→
[“un”,”believe”,”able]
Tokenization
Tokenization is the breaking down of text into smaller fragments or
components also known as TOKENS

For eg:

Each word in a list of tokenization is called as a token and

collectively called as a “Bag of words”
The above function can be performed in python using s.split()
How does it work?
• Tokenization is the first step in extractive information from the text.
These tokens are then converted into numbers and those
numbers (in various formats) are used to feed into the machine
learning models
Challenges
1. Punctuation
2. Case Sensitivity
3. Accented Characters
Punctuation

• “course.” & “course?” are two different tokens and thus causes
redundancy. Also requires additional data to train the model to
learn the difference
• Dimensions becomes large
Accented characters
• Words with different accents can be mistaken as separate tokens
in spite of meaning the same
• For example
• Can convert them into simple ASCII characters to standardize
them.
Is punctuation important?
• Example 1: “He likes the course.”
• Example 2: “He likes the course?”
• Both sentences have different meaning

• Higher order of algorithms (Deeplearning based – BERT , GPT)

Word based tokenization
• Each word is made into a separate token
• Commonly used scheme
• Provides rich information as every word has meaning
• Disadvantage being there are way too many words thus requiring a
large vocabulary. This also means a large feature matrix increasing
dimensionality
• For Eg: String = “I like this item”
when Tokenized = [‘I’, ‘like’ , ‘this’ , ‘item’]
Character-Based Tokenization
• Every word becomes a character.
• Requires a much smaller vocabulary (26 characters++)
• Characters do not contain a lot of information
• Many words start from the same character. Hence model learning
could be complex
• For Eg: String = “I like this item”
when Tokenized = [‘I’,’l’,’I’,’k’,’e’,’t’,’h’,’I’,’s’,’I’,’t’,’e’,’m’]
Subwords based tokenization
• This is in between word-based and Character-based method
• The word ‘Learning” could be tokenized into “Learn” & “ing”
• Advantage of this technique makes the model learning simpler
• By not using this technique the model increases in complexity for
example the model has to learn similarities between the words
“learn” , “learning” , “learned” , “learns” that could be avoided.
• Used in deep learning
Tokenization using NLTK (Natural language
toolkit)
• NLTK is open source
• Integrates well with other libraries (Scikit-learn, Tensorflow)
• Works well for model being built from scratch
• Advantages
• Tokenization
• Text Preprocessing
• Contains inbuilt models
• Part-of-Speech(POS) Tagging
• Parsing
• Named entity Recognition (NER)
• Text Classification
• Sentiment analysis
• Language translation
HANDS ON!!!!!!!
• Hands on approach for Tokenization
• Word based
• Character based
• Subwords bases

• Hands on approach using NLTK (Python)

TEXT PROCESSING
1. Text Processing - Stemming
• Stemming implies getting the
stem of a word. For Eg Learn is
a stem word for learning ,
learned , learns
• Stemming is a rule based
technique. Result may not be a
real word
• Crude method. Chops off
letters from the end
1. Text Processing - Stemming
• Rule based process of removing words
• Removes suffixes to reduce to its root form
• No guarantee stemmed words are valid words
• Simplistic approach
• Fast but less accurate
• No context consideration

Original Text Stemmed text

Running Run
Happier Happi
Excited Excite
Studies Studi
Stemming Algorithms
• Porter stemmer
• Developed By: Martin Porter (1980)
• Relatively conservative, meaning it doesn’t over-stem aggressively.
• Works through a series of rule-based steps (e.g., removing suffixes like
"ing," "ed"). – essentially for English
• Snowball stemmer
• Developed By: Martin Porter (improvement over Porter Stemmer).
• Supports multiple languages (not just English), making it more versatile.
• Follows a refined set of rules and handles edge cases better.
• Lancaster stemmer
• Developed By: Paice/Husk Lancaster (1990).
• Most aggressive of the three stemmers.
• Reduces words more heavily, often leading to over-stemming (e.g., "maximum" →
"max").
2. Text Processing - Lemmatization
• Lemmatization is the process of reducing a word to its base or
root form (called the lemma) while considering its context and
grammatical role. Unlike stemming, lemmatization ensures that
the root form of the word is a valid word in the language.
• Context awareness – considers part of speech
• Linguistic Rules – uses WORDNET analysis
• Semantic validity

• Process
Part of speech Database
Input word Output Lemma
tagging lookup
Lemmatization
• Looks beyond
chopping words
and considers
language
vocabulary.
• More robust and
not rule based
• End words are
always real words
3. Text Processing - Case sensitivity
• How does case sensitivity affect outcomes?
• Gor example the words “apple: and “Apple” have the same
meaning
• Models shouldn’t mistake them as two separate tokens
• This could be overcome by using a function called “s.lower()”
Hands on using NLTK – Stemming
Lemmatization
• Stemmer
• Porter
• Snowball

• Lemmatization using NLTK

When to use Stemming Vs Lemmatization
Feature Stemming Lemmatization
Definition Reduces words to their root Reduces words to their base
form using rules (heuristics). form (lemma) based on
context.
Output Non-dictionary words (e.g., Valid dictionary words (e.g.,
"running" → "run"). "running" → "run").
Speed Faster and simpler. Slower due to reliance on
linguistic analysis.
Accuracy Lower; may produce Higher; produces
incorrect stems. linguistically meaningful
results.
Resource requirement Minimal; no external Requires lexical resources
dictionaries needed. like WordNet , that contains
relationships between words
and their lemmas
4. Text Processing - Stop Words
• Common words such as “and”, “the”, “at”, “an”, “because” etc
occurring frequently that they have no meaning to add to variance
or does not need recognition.
5a. Text Processing – Parts of speech (POS)
Parts of speech are grammatical categories that define the roles and
relationships of words in a sentence. Common parts of speech include:

1.Nouns: Names of people, places, things (e.g., dog, city).

2.Verbs: Actions or states (e.g., run, is).
3.Adjectives: Descriptors for nouns (e.g., beautiful, fast).
4.Adverbs: Describe verbs, adjectives, or other adverbs (e.g., quickly, very).
5.Pronouns: Replace nouns (e.g., he, they).
6.Prepositions: Indicate relationships between nouns (e.g., on, under).
7.Conjunctions: Link words, phrases, or clauses (e.g., and, but).
8.Interjections: Express emotions (e.g., wow!, ouch!).
5a. Text Processing – Parts of speech (POS)
Assigns grammatical roles to each word in text

Understands sentence structure and formation as it operates a Wordwise level

only

Improves NLP translation

Aids in lemmatization
• Eg: He is running – Verb
• Eg: The running bulls are thrilling - Noun

Focus on syntax and grammatical roles

Real use case for POS tagging
• Consider building a chatbot. If a user says, "I want to book a flight
to Paris," POS tagging can:
• Identify "book" as a verb (action to be performed).
• Recognize "flight" as a noun (object of action).
• Mark "Paris" as a proper noun (destination).
• This information can then guide the chatbot's intent recognition and
response generation.
5b. Text Processing – Named entity recognition
(NER)
• Identifies and categories entities from text
• Operates at a word , phrase or any level
• Relies on context to detect information
• While POS focusses on grammatical roles , NER focuses on
semantics in extracting real world information entities
Example to show diff between POS & NER
WORD POS Tag
• Focus: Syntax and
Barack NNP (Proper Noun) grammatical structure
of the sentence.
Obama NNP (Proper Noun)
• Purpose: To identify the
was VBD (Verb) syntactic roles of
words, which is useful
born VBN (Verb) for parsing and
understanding
in IN (Preposition) sentence structure
Hawaii NNP (Proper Noun)
in IN (Preposition)
1961 CD (Cardinal Digit)
Example to explain NER
Entity NER Tag
Barack Obama PERSON
Hawaii LOCATION
1961 DATE
Usage of POS Vs NER
• POS Application
• Grammer analysis
• Machine translation
• Text to speech conversion

• NER Application
• Information extraction
• Knowledge gap Construction
• Chatbot development
• Text to speech conversion
TEXT VECTORIZATION
Convert text to vectors
Text Vectorization
• Introduction
• One Hot Encoding
• Word to index mapping
• Python hands on
• Bag of Words (BOW)
• Count vectorizer
• Hands on
• ML application using Count vectorizer
• TF-IDF
• Hands on
• Word2vec
• Shallow Neural network approach
• GloVe(Global Vectors for word representation)
ONE HOT ENCODING
Text Vectorization – One Hot encoding
• Eg: The food is good. The food is bad. Tiramisu is Amazing
• Vocabulary = {The , food , is , good , bad , Pizza , Tiramisu , Amazing}
Advantages and disadvantages of One Hot
Advantages Disadvantages
Easy to implement Sparse Matrix → Overfitting
Will not have a fixed matrix size
across different lines. ML needs
that (Refer to example below)
No Symantec meaning is captured
WORD TO INDEX MAPPING
Word to index mapping

Start by performing tokenization

Assigning an index to unique words (Token)in a dictionary

Create a vector using these index values

Advantages & Disadvantages to Word to index
• Advantages:
• Simplicity
• Maintains unique identifiers
• Compact
• Handles rare words
• Allows fixed size matrix

• Disadvantages:
• Ignores semantics
• Vocabulary dependent. (Any new word requires re indexing)
• Fails to capture context
• Not scalable to large vocabularies
BOW – COUNT VECTORIZER
Text Vectorization - Bag of Words – Count Vectorizer
Text Operation Text
He is a good boy Lower case convert S1: good boy
She is a good girl →→→→→→ S2: good girl
Boy and girl are good boy Remove stop words S3: boy girl good boy

good boy girl

S1 1 1 0
S2 1 0 1
S3 1 2 1

Comes with Binary as well as non binary option

Pros and Cons to Bag of Words – Count Vectorizer

Advantage:
Simplicity
Always has a fixed size matrix (Unlike one hot encoding)

Dis Advantage:
Sparse matrix here too
Does not follow order of words
Semantic meaning still not captured
• Eg: boy and good is given equal importance
Out of vocabulary is an issue (In case a missing word and if it
happens to be important)
Major disadvantage to this technique
An example
where there are
2 vectors. They
mean the exact
opposite of one
another.
However, the
distance
between these
2 vectors isn’t
large as just
one word is
different from
one another.
TF-IDF
(Term Frequency – Inverse document frequency)
TF IDF – (Enhanced BOW method)
• S1 = good boy • TF = (# of rep of words in sentence)/(# words in sentence)
• S2 = good girl
• S3 = boy girl good • IDF =loge((# sentences / # sentences containing the word))

S1 S2 S3
Term Frequency (TF)
good 1/2 1/2 1/3
boy 1/2 0/2 1/3
girl 0/2 1/2 1/3
TF IDF – Term Frequency – Inverse document
Frequency(Enhanced BOW method)
• S1 = good boy • TF = (# of rep of words in sentence)/(# words in sentence)
• S2 = good girl
• S3 = boy girl good • IDF =loge((# sentences / # sentences containing the word))

Inverse Document
Frequency (IDF) good loge(3/3)=0
boy loge(3/2)
girl loge(3/2)
TF IDF
Term Frequency (TF) Inverse Document
Frequency (IDF)
S1 S2 S3

good 1/2 1/2 1/3 good loge(3/3)=0

boy 1/2 0/2 1/3 X boy loge(3/2)
girl 0/2 1/2 1/3 girl loge(3/2)

S1 S2 S3

good ½0 ½0 1/3*0

TF IDF boy ½*loge(3/2) 0*loge(3/2) 1/3*loge(3/2)
girl 0*loge(3/2) ½*loge(3/2) 1/3*loge(3/2)
Pros and cons of TF IDF
S1 S2 S3

good 0 0 0
boy ½*loge(3/2) 0 1/3*loge(3/2)
girl 0 ½*loge(3/2) 1/3*loge(3/2)

Advantage Dis Advantage

• Intuitive • Sparsity matrix

• Fixed matrix size • Out of Vocabulary (OOV): Any word that is not added in
• Word importance is captured (If a word is present in every the train data set and if that word is present in the test
sentence it is scored zero) data, gets ignored
Word2vec
• Word2Vec turns words into dense vectors (a series of numbers)
so that computers can understand and process text.
• It uses a shallow neural network trained on a large text dataset.
• Captures meaning(King and Queen shall be close with similar
meaning)
• Semantic relationship is maintained
• Analogy: “King – Man + Woman = Queen”
• Similarity between words : King = Monarch

Think of Word2Vec as creating a "map" where every word is a point.

Words with similar meanings or usage patterns are placed closer
together on this map.

Breton Eluard Immaculate Conception
100% (1)
Breton Eluard Immaculate Conception
60 pages
Day '0' DevOps Ebook - v1.0
No ratings yet
Day '0' DevOps Ebook - v1.0
66 pages
Kautilya's Arthashastra Strategic Cultural Roots of India's Contemporary Statecraft (Kajari Kamal) (Z-Library)
100% (3)
Kautilya's Arthashastra Strategic Cultural Roots of India's Contemporary Statecraft (Kajari Kamal) (Z-Library)
261 pages
Share Buy-Back by Companies in Nigeria
No ratings yet
Share Buy-Back by Companies in Nigeria
8 pages
Evidence of Similar Facts Lecture 14-4-21-1
100% (1)
Evidence of Similar Facts Lecture 14-4-21-1
5 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
Indiannotes51928muse BW
No ratings yet
Indiannotes51928muse BW
512 pages
NLP 9
No ratings yet
NLP 9
44 pages
BOYANIKA
100% (1)
BOYANIKA
18 pages
Pricing of Cancer Medicines and Its Impact
No ratings yet
Pricing of Cancer Medicines and Its Impact
173 pages
Siklus Hidup Produk: (Product Life Cycle)
No ratings yet
Siklus Hidup Produk: (Product Life Cycle)
24 pages
Affidavit of Poseur-Buyer
0% (1)
Affidavit of Poseur-Buyer
1 page
DLL 1 Contemporary
No ratings yet
DLL 1 Contemporary
2 pages
Jeeves Econo Lift 100 20x20x30 Front Back PDF
No ratings yet
Jeeves Econo Lift 100 20x20x30 Front Back PDF
1 page
Intro To NLP: Natural Language Toolkit
No ratings yet
Intro To NLP: Natural Language Toolkit
11 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Paps 1000ph
No ratings yet
Paps 1000ph
15 pages
NLP Manual (1-12)
No ratings yet
NLP Manual (1-12)
55 pages
Minorproject Ishant
No ratings yet
Minorproject Ishant
18 pages
Hager
No ratings yet
Hager
4 pages
NLP Manual (1-12) 1
No ratings yet
NLP Manual (1-12) 1
56 pages
Month of October: Commemoration of Our Venerable Father Gall, Wonder-Worker of Switzerland
No ratings yet
Month of October: Commemoration of Our Venerable Father Gall, Wonder-Worker of Switzerland
5 pages
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
No ratings yet
Ram Chandra Padwal - Pratical Guide To NLTK For Data Science
37 pages
Marketing of Banking and Insurance Products
100% (1)
Marketing of Banking and Insurance Products
57 pages
NLP CT1
No ratings yet
NLP CT1
6 pages
Bodnar - Saving Private Ryan and Postwar PDF
No ratings yet
Bodnar - Saving Private Ryan and Postwar PDF
14 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
Unraveling The Power of Natural Language Processing
No ratings yet
Unraveling The Power of Natural Language Processing
11 pages
Lyla Borins - Data Bias and Misleading Graphs
100% (1)
Lyla Borins - Data Bias and Misleading Graphs
3 pages
1 Introduction
No ratings yet
1 Introduction
99 pages
NLP - Srilakshmi H - PPT Assignment
No ratings yet
NLP - Srilakshmi H - PPT Assignment
29 pages
Week 8-Module 7 NLP
No ratings yet
Week 8-Module 7 NLP
52 pages
Chapter 2 Fundamental 1
No ratings yet
Chapter 2 Fundamental 1
28 pages
Data Structures & Java
No ratings yet
Data Structures & Java
5 pages
Intro. To Political Theories
No ratings yet
Intro. To Political Theories
3 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Masculinity and Prisonization 0.01
No ratings yet
Masculinity and Prisonization 0.01
3 pages
GI Strategic Leadership and Public Governance
No ratings yet
GI Strategic Leadership and Public Governance
3 pages
NLP Lecture2 Text Pre Processing
No ratings yet
NLP Lecture2 Text Pre Processing
54 pages
26.06.23 - 02.07.23 (Aashiana)
No ratings yet
26.06.23 - 02.07.23 (Aashiana)
7 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
Handling Corpus Raw Text
No ratings yet
Handling Corpus Raw Text
15 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
Module 1.1
No ratings yet
Module 1.1
9 pages
CSC 528 Lecture 3
No ratings yet
CSC 528 Lecture 3
42 pages
Introduction To NLP Basics of Text Processing, Spelling Correction-Edit Distance, Weighted Edit Distance
No ratings yet
Introduction To NLP Basics of Text Processing, Spelling Correction-Edit Distance, Weighted Edit Distance
35 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
AP For NLP-LO1
No ratings yet
AP For NLP-LO1
61 pages
Natural Language Processing (NLP) & Computational Linguistics
No ratings yet
Natural Language Processing (NLP) & Computational Linguistics
60 pages
AP For NLP-Word 2 Vec
No ratings yet
AP For NLP-Word 2 Vec
33 pages
Pamphlet Stitch Bookbinding Class Handout
No ratings yet
Pamphlet Stitch Bookbinding Class Handout
6 pages
NLP Final
No ratings yet
NLP Final
33 pages
Module2.4 Text Processing
No ratings yet
Module2.4 Text Processing
17 pages
Text Processing For NLP Text Processing
No ratings yet
Text Processing For NLP Text Processing
15 pages
NLP Unit1
No ratings yet
NLP Unit1
24 pages
Natural Language Processing Manual
No ratings yet
Natural Language Processing Manual
39 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
Text Analytics and Natural Language Processing - KAI073
No ratings yet
Text Analytics and Natural Language Processing - KAI073
24 pages
Module-I NLP
No ratings yet
Module-I NLP
35 pages
Natural Language Processing - NOTES
No ratings yet
Natural Language Processing - NOTES
4 pages
Extreme Homeopathic Dilutions Retain
No ratings yet
Extreme Homeopathic Dilutions Retain
12 pages
NLP Pipeline
No ratings yet
NLP Pipeline
58 pages
Form 10 Combined Summons High Court
No ratings yet
Form 10 Combined Summons High Court
1 page
NLP 3-6
No ratings yet
NLP 3-6
20 pages
Text Preprocessing Stages
No ratings yet
Text Preprocessing Stages
8 pages
Architecture Site Analysis Guide
No ratings yet
Architecture Site Analysis Guide
11 pages
C10 - Ai - Unit 3 - NLP - Half Yearly
No ratings yet
C10 - Ai - Unit 3 - NLP - Half Yearly
37 pages
NLP m2
No ratings yet
NLP m2
71 pages
HOW 01 Both Side
No ratings yet
HOW 01 Both Side
1 page
High Burden of Chronic Kidney Disease of Unknown Origin in North East Nigeria
No ratings yet
High Burden of Chronic Kidney Disease of Unknown Origin in North East Nigeria
12 pages
Group 6 Fra Ppts
No ratings yet
Group 6 Fra Ppts
10 pages
VO - MCA - SEM 4 - Text Mining - U2
No ratings yet
VO - MCA - SEM 4 - Text Mining - U2
15 pages
Performance Evaluation - Attribution & Active MGT - Class
No ratings yet
Performance Evaluation - Attribution & Active MGT - Class
286 pages
CAT King Study Material 5
No ratings yet
CAT King Study Material 5
21 pages
Module 1
No ratings yet
Module 1
49 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
1 NLP
No ratings yet
1 NLP
26 pages
NLP Exp 3
No ratings yet
NLP Exp 3
24 pages
NLP Crash Course Comprehensive
No ratings yet
NLP Crash Course Comprehensive
2 pages
Nokia Positive and Negative TM
No ratings yet
Nokia Positive and Negative TM
8 pages
Survey Responses Varied Unique
No ratings yet
Survey Responses Varied Unique
3 pages
Survey Responses
No ratings yet
Survey Responses
3 pages
Enhancing Click-Through Rates Behavioral Nudges in Action Ft. #Melodi
No ratings yet
Enhancing Click-Through Rates Behavioral Nudges in Action Ft. #Melodi
7 pages
NLP With Topics
No ratings yet
NLP With Topics
10 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
Adnan Amin
No ratings yet
Adnan Amin
19 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Lecture 8 - Text Analytics NLP
No ratings yet
Lecture 8 - Text Analytics NLP
24 pages
NLP Pipeline
No ratings yet
NLP Pipeline
50 pages
NLP Basics
No ratings yet
NLP Basics
12 pages
Text Processing
No ratings yet
Text Processing
5 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
NLP - Shortnotes Unit 1 & 2
No ratings yet
NLP - Shortnotes Unit 1 & 2
16 pages
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
No ratings yet
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
83 pages
NLP Module 1
No ratings yet
NLP Module 1
71 pages
Python Text Processing with NLTK 2.0 Cookbook: LITE
From Everand
Python Text Processing with NLTK 2.0 Cookbook: LITE
Jacob Perkins
4/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

NLP - 1 - 250119 - 222702

Uploaded by

NLP - 1 - 250119 - 222702

Uploaded by

NLP and its applications

• NLP is a Challenging field of AI

Corpus Document Vocabulary Words

Each word in a list of tokenization is called as a token and

• Higher order of algorithms (Deeplearning based – BERT , GPT)

• Hands on approach using NLTK (Python)

Original Text Stemmed text

• Lemmatization using NLTK

1.Nouns: Names of people, places, things (e.g., dog, city).

Understands sentence structure and formation as it operates a Wordwise level

Improves NLP translation

Focus on syntax and grammatical roles

Start by performing tokenization

Assigning an index to unique words (Token)in a dictionary

Create a vector using these index values

good boy girl

Comes with Binary as well as non binary option

good 1/2 1/2 1/3 good loge(3/3)=0

good ½0 ½0 1/3*0

Advantage Dis Advantage

• Intuitive • Sparsity matrix

Think of Word2Vec as creating a "map" where every word is a point.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

NLP - 1 - 250119 - 222702

Uploaded by

NLP - 1 - 250119 - 222702

Uploaded by

NLP and its applications

• NLP is a Challenging field of AI

Corpus Document Vocabulary Words

Each word in a list of tokenization is called as a token and

• Higher order of algorithms (Deeplearning based – BERT , GPT)

• Hands on approach using NLTK (Python)

Original Text Stemmed text

• Lemmatization using NLTK

1.Nouns: Names of people, places, things (e.g., dog, city).

Understands sentence structure and formation as it operates a Wordwise level

Improves NLP translation

Focus on syntax and grammatical roles

Start by performing tokenization

Assigning an index to unique words (Token)in a dictionary

Create a vector using these index values

good boy girl

Comes with Binary as well as non binary option

good 1/2 1/2 1/3 good loge(3/3)=0

good ½*0 ½*0 1/3*0

Advantage Dis Advantage

• Intuitive • Sparsity matrix

Think of Word2Vec as creating a "map" where every word is a point.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

good ½0 ½0 1/3*0