CSC 528 Lecture 3
CSC 528 Lecture 3
PART 2
COURSE TITLE:
INTRODUCTION TO ARTIFICIAL INTELLIGENCE
TOPIC : INTRODUCTION TO NATURAL
LANGUAGE PROCESSING
(NLP)
What Is NLP?
Segmentation
Tokenizing:
• You can make the learning process faster by getting rid of non-
essential words, which add little meaning to our statement
and are just there to make our statement sound more
cohesive. Words such as was, in, is, and, the, are called stop
words and can be removed.
Stemming:
• Note:
• TF-IDF is a statistical NLP algorithm that is important in evaluating the importance of a word
to a particular document belonging to a massive collection. This technique involves the
multiplication of distinctive values, which are:
• Term frequency: The term frequency value gives you the total number of times a word
comes up in a particular document. Stop words generally get a high term frequency in a
document.
• Inverse document frequency: Inverse document frequency, on the other hand, highlights
the terms that are highly specific to a document or words that occur less in a whole corpus
of documents.
• After performing the preprocessing steps, you then give your resultant data to a machine
learning algorithm like Naive Bayes, etc., to create your NLP application.
Search and learning
• Many natural language processing problems
can be written mathematically in the form of
optimization
• This basic structure can be applied to a huge range of problems.
• For example, the input x might be a social media post, and the
output y might be a labeling of the emotional sentiment
expressed by the author,
• or x could be a sentence in French, and the output y could be a
sentence in Tamil.
• or x might be a sentence in English, and y might be a
representation of the syntactic structure of the sentence .
• or x might be a news article and y might be a structured record
of the events that the article describes.
• This formulation reflects an implicit decision that language
processing algorithms will have two distinct modules
The search module
• Since for our classifier we have to find out which tag has a bigger
probability, we can discard the divisor which is the same for both
tags,
P(overall liked the movie | positive)* P(positive) with P(overall
liked the movie | negative) * P(negative)
There’s a problem though: “overall liked the movie” doesn’t
appear in our training dataset, so the probability is zero.
• Here, we assume the ‘naive’ condition that every word in a
sentence is independent of the other ones. This means that now
we look at individual words.
• We can write this as:
P(overall liked the movie) = P(overall) * P(liked) * P(the) * P(movie)
• The next step is just applying the Bayes theorem:-