Natural Language Processing Notes Class 10 AI
Natural Language Processing Notes Class 10 AI
Aiforkids.in
NLP Class
10
AI Notes
NATURAL
LANGUAGE
PROCESSING Process to simplify human
The ability of a computer to Start
lang. to make it
understand text and spoken understandable.
words NLP Process
Data
Ex. Mitsuku Bot, Clever
Wh Processing
Bot, Jabberwacky, and
Haptik. at Text Normalisation
Sentence Segmentation
Chat Tokenisation
Removal of Stop word
Bots Smart
Converting into same case
Bot Script Application
s of Stemming and
NLP
Bot
Lemmatization
Automatic Wh
Bag of word Algorithm
Summarization y
Sentiment TFIDF
Term Frequency
Analysis Text
Inverse Document Frequency
classification
Virtual Applications of TFIDF
Assistants Problems in
Understanding human
languages by computers. CLICK TEXT TO OPEN THE LINK
Human
Language Download Revision Notes
Computer Human
VS
Pdf Solve Important
Language Computer Questions Practice VIP
Arrangement of words & Questions PDF
meanings
Practice Sample Papers
(Structure) Syntax
Ask and Solve Doubts at
(Meaning) Semantics
Aiforkids Doubts corner
Multiple Meanings of a Practice NLP Explanation Video
word Perfect Syntax, no
Meaning
Youtube.com/ Aiforkids.in/class-
aiforkids 10/nlp
WHAT IS NLP?
Automatic
Summarization
Summarizing the meaning of documents and information
Extract the key emotional information from the text to understand the
reactions (Social Media)
Sentiment
Analysis
Identify sentiments and emotions from one or more posts
Companies use it to identify opinions and sentiments to get
feedback Can be Positive, Negative or Neutral
Text
classification
Assign predefined categories to a document and organize it to help you
find the information you need or simplify some activities.
Eg: Spam filtering in email.
2
Virtual
Assistants
By accessing our data, they can help us in keeping notes of our tasks,
making calls for us, sending messages, and a lot more.
With speech recognition, these assistants can not only detect our
speech but can also make sense of it.
A lot more advancements are expected in this field in the near
future Eg: Google Assistant, Cortana, Siri, Alexa, etc
CHATBOT
S
One of the most common applications of Natural Language Processing is a
chatbot.
Types of
ChatBots
SCRIPT BOTS SMART BOTS
1 HUMAN
LANGUAGE
Humans communicate through language which we process all the
time.
Our brain keeps on processing the sounds that it hears around
itself and tries to make sense out of them all the time.
Communications made by humans are complex.
2 COMPUTER
LANGUAGE
Listen Prioritize
Process
5
sentence.
1 MULTIPLE MEANINGS OF A
WORD
To understand let us have an example of the following three sentences:
1. "His face turned red after he found out that he had taken the wrong
bag"
Possibilities: He feels ashamed because he took another person’s bag
instead of his OR he's feeling angry because he did not manage to
steal the bag that he has been targeting.
2. "The red car zoomed past his nose"
Possibilities: Probably talking about the color of the car, that traveled
close to him in a flash.
6
2 PERFECT SYNTAX, NO
MEANING
DATA PROCESSING
TEXT
NORMALISATION
In Text Normalization, we undergo several steps to normalize the text
to a lower level. That is, we will be working on text from multiple
documents and the term used for the whole textual data from all the
documents altogether is known as "Corpus".
7
1 SENTENCE
SEGMENTATION
Under sentence segmentation, the whole corpus is divided into
sentences. Each sentence is taken as a different data so now the whole
corpus gets reduced to sentences.
Example:
2 TOTKOEKNENI
SISAATTIOIO
4 REMOVAL OF
STOPWORDS
Examples: a, an, and, are, as, for, it, is, into, in, if, on, or, such, the, there, to.
In this step, the tokens which are not necessary are removed from the
token list. To make it easier for the computer to focus on meaningful
terms, these words are removed.
Along with these words, a lot of times our corpus might have special
characters and/or numbers.
if you are working on a document containing email IDs, then you might
not want to remove the special characters and numbers
9
Example: You want to see the dreams with close eyes and
achieve them? the removed words would be
to, the, and, ?
-> You want see dreams with close eyes achieve them
6 STEMMIN
G
Might not be
meaningful.
10
Example:
7 LEMMATIZATIO
N
In lemmatization, the word we get after affix removal (also
known as lemma) is a meaningful one and it takes a longer
time to execute than stemming.
Stemming lemmatization
The stemmed words might The lemma word is a
not be meaningful. meaningful one.
Caring ➔ Car Caring ➔ Care
Example:
Step 1: Collecting data and pre-processing it.
In this table, the header row contains the vocabulary of the corpus
and three rows correspond to three different documents.
Finally, this gives us the document vector table for our corpus.
But the tokens have still not converted to numbers. This leads us
to the final steps of our algorithm: TFIDF.
TFIDF
TFIDF stands for Term Frequency & Inverse Document
Frequency.
1 TERM
FREQUENCY
1. Term frequency is the frequency of a word in one document.
2. Term frequency can easily be found in the document vector
table
Example:
15
Here, as we can see that the frequency of each word for each
document has been recorded in the table. These numbers are
nothing but the Term Frequencies!
2 DOCUMENT
FREQUENCY
Document Frequency is the number of documents in which the word
occurs irrespective of how many times it has occurred in those
documents.
3 INVERSE DOCUMENT
FREQUENCY
In the case of inverse document frequency, we need to put the
document frequency in the denominator while
the total number of documents is the
numerator.
ama an ani ar stresse wen to a therapi downloa healt chatbo
n d l e d t st d h t
3/ 3/
3/2 3/1 3/ 3/ 3/1 3/2 2 2 3/1 3/1 3/1 3/1
FORMULA OF TFIDF
1*log(3)
1*lo 1*lo 1*lo
17
0.17 0.47 0 0 0
0.17 .47 6 7 0.477 0 0 0 0
6 7
APPLICATIONS OF TFIDF