0% found this document useful (0 votes)

36 views42 pages

CSC 528 Lecture 3

Uploaded by

tobianimashaun99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views42 pages

CSC 528 Lecture 3

Uploaded by

tobianimashaun99

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

COURSE CODE: CSC 528

PART 2

COURSE TITLE:
INTRODUCTION TO ARTIFICIAL INTELLIGENCE
TOPIC : INTRODUCTION TO NATURAL
LANGUAGE PROCESSING
(NLP)
What Is NLP?

• Humans communicate with each other using words and text.

• The way that humans convey information to each other is called
Natural Language.
• Every day humans share a large quality of information with each
other in various languages as speech or text.
• However, computers cannot interpret this data, which is in natural
language, as they communicate in 1s and 0s.
• The data produced is precious and can offer valuable insights.
• Hence, you need computers to be able to understand, emulate
and respond intelligently to human speech.
• Natural Language Processing or NLP refers to the branch of
Artificial Intelligence that gives the machines the ability to read,
understand and derive meaning from human languages.
• NLP combines the field of linguistics and
computer science to decipher language
structure and guidelines and to make models
which can comprehend, break down and
separate significant details from text and
speech.
How does natural language processing work?

• NLP enables computers to understand natural language as

humans do.
• Whether the language is spoken or written, NLP uses AI to
take real-world input, process it, and make sense of it in a way
a computer can understand.
• Just as humans have different sensors -- such as ears to hear
and eyes to see -- computers have programs to read and
microphones to collect audio.
• And just as humans have a brain to process that input,
computers have a program to process their respective inputs.
• At some point in processing, the input is converted to code
that the computer can understand.
• There are two main phases to NLP:
i. Data preprocessing
ii. Algorithm development.
The steps to perform preprocessing of data
in NLP include:
• Segmentation:
• You first need to break the entire document down into its constituent
sentences. You can do this by segmenting the article along with its
punctuations like full stops and commas.

Segmentation
Tokenizing:

• For the algorithm to understand these sentences, you need to

get the words in a sentence and explain them individually to
our algorithm. So, you break down your sentence into its
constituent words and store them. This is called tokenizing,
and each world is called a token.
Removing Stop Words:

• You can make the learning process faster by getting rid of non-
essential words, which add little meaning to our statement
and are just there to make our statement sound more
cohesive. Words such as was, in, is, and, the, are called stop
words and can be removed.
Stemming:

• It is the process of obtaining the Word Stem of

a word.
• Word Stem gives new words upon adding
affixes to them
Lemmatization:

• The process of obtaining the Root Stem of a word.

• Root Stem gives the new base form of a word that is present in the
dictionary and from which the word is derived.
• You can also identify the base words for different words based on the
tense, mood, gender etc.
Part of Speech Tagging
• Now, you must explain the concept of nouns,
verbs, articles, and other parts of speech to
the machine by adding these tags to our
words.
Named Entity Tagging:
• Next, introduce your machine to pop culture references and everyday names by flagging
names of movies, important personalities or locations, etc that may occur in the document.
• You do this by classifying the words into subcategories. This helps you find any keywords in
a sentence.
• The subcategories are person, location, monetary value, quantity, organization, movie.

• Note:
• TF-IDF is a statistical NLP algorithm that is important in evaluating the importance of a word
to a particular document belonging to a massive collection. This technique involves the
multiplication of distinctive values, which are:
• Term frequency: The term frequency value gives you the total number of times a word
comes up in a particular document. Stop words generally get a high term frequency in a
document.
• Inverse document frequency: Inverse document frequency, on the other hand, highlights
the terms that are highly specific to a document or words that occur less in a whole corpus
of documents.

• After performing the preprocessing steps, you then give your resultant data to a machine
learning algorithm like Naive Bayes, etc., to create your NLP application.
Search and learning
• Many natural language processing problems
can be written mathematically in the form of
optimization
• This basic structure can be applied to a huge range of problems.
• For example, the input x might be a social media post, and the
output y might be a labeling of the emotional sentiment
expressed by the author,
• or x could be a sentence in French, and the output y could be a
sentence in Tamil.
• or x might be a sentence in English, and y might be a
representation of the syntactic structure of the sentence .
• or x might be a news article and y might be a structured record
of the events that the article describes.
• This formulation reflects an implicit decision that language
processing algorithms will have two distinct modules
The search module

• The search module is responsible for computing the argmax of the

function .
• In other words, it finds the output that gets the best score with
respect to the input x.
• This is easy when the search space Y(x) is small enough to enumerate, or
when the scoring function has a convenient decomposition into parts.
• In many cases, we will want to work with scoring functions t that do
not have these properties, motivating the use of more sophisticated
search algorithms, such as bottom-up dynamic programming and beam
search .
• Because the outputs are usually discrete in language processing
problems, search often relies on the machinery of combinatorial
optimization
The learning module
• The learning module is responsible for finding
the parameters
• This is typically (but not always) done by
processing a large dataset of labeled
examples,
• where
• a column vector of feature counts for instance i, often word counts
• a structured label for instance i, such as a tag sequence
• Like search, learning is also approached
through the framework of optimization.
• Because the parameters are usually
continuous, learning algorithms generally rely
on numerical optimization to identify vectors
of real-valued parameters that optimize some
function of the model and the labeled data.
Linear text classification
• We begin with the problem of text
classification: given a text document, assign it
a discrete label is the set of
possible labels.
• Text classification has many applications, from
spam filtering to the analysis of electronic
health records.
The bag of words
• vector that is mostly zeros, with a column
vector of word counts x inserted in a location
that depends on the specific label y.
• But it is usually not easy to set classification weights
by hand, due to the large number of words and the
difficulty of selecting exact numerical weights.
Instead, we will learn the weights from data.
• Email users manually label messages as SPAM;
newspapers label their own articles as BUSINESS or
STYLE.
• Using such instance labels, we can automatically
acquire weights using supervised machine learning.
Machine learning approach for
classification
• Naive Bayes:
• Naive bayes classifier has three different
algorithms: Guassian naive bayes, multinomial naive
bayes, bernoulli naive bayes.
• It is pretty much complicated to understand all algorithms
deeply.
• However, you only need to understand multinomial naive
bayes fits to text classification.
• Multinomial naive bayes is typically used for multinomial
event model like bag-of-words, which is a method to
represent document as vector space by counting words
Applying Multinomial Naive Bayes to NLP Problems

• Multinomial Naive Bayes (MNB) is a popular

machine learning algorithm for text classification
problems in Natural Language Processing (NLP). It
is particularly useful for problems that involve
text data with discrete features such as word
frequency counts. MNB works on the principle of
Bayes theorem and assumes that the features are
conditionally independent given the class
variable.
steps for applying Multinomial Naive Bayes to
NLP problems
• Preprocessing the text data: The text data needs to
be preprocessed before applying the algorithm. This
involves steps such as tokenization, stop-word
removal, stemming, and lemmatization.
• Feature extraction: The text data needs to be
converted into a feature vector format that can be
used as input to the MNB algorithm. The most
common method of feature extraction is to use a
bag-of-words model, where each document is
represented by a vector of word frequency counts.
• Splitting the data: The data needs to be split into
training and testing sets. The training set is used to
train the MNB model, while the testing set is used
to evaluate its performance.
• Training the MNB model: The MNB model is
trained on the training set by estimating the
probabilities of each feature given each class. This
involves calculating the prior probabilities of each
class and the likelihood of each feature given each
class.
Evaluating the performance of the model:

• The performance of the model is evaluated

using metrics such as accuracy, precision,
recall, and F1-score on the testing set.
• MNB, has some limitations, such as the
assumption of independence between
features, which may not hold true in some
cases. Therefore, it is important to carefully
evaluate the performance of the model before
using it in a real-world application.
• Bayes theorem calculates probability P(c|x)
where c is the class of the possible outcomes
and x is the given instance which has to be
classified, representing some certain features.
P(c|x) = P(x|c) * P(c) / P(x)
Naive Bayes predict the tag of a text.
• They calculate the probability of each tag for a
given text and then output the tag with the
highest one.
How Naive Bayes Algorithm Works ?
• Let’s consider an example, classify the review
whether it is positive or negative.
Training Dataset:
• We classify whether the text “overall liked the movie” has a positive
review or a negative review. We have to calculate,
P(positive | overall liked the movie) — the probability that the tag of
a sentence is positive given that the sentence is “overall liked the
movie”.
P(negative | overall liked the movie) — the probability that the tag
of a sentence is negative given that the sentence is “overall liked the
movie”.
Before that, first, we apply Removing Stopwords and Stemming in the
text.
Removing Stopwords: These are common words that don’t really add
anything to the classification, such as an able, either, else, ever and so
on.
Stemming: Stemming to take out the root of the word.
Now After applying these two techniques, our text becomes
• Feature Engineering:
The important part is to find the features from the
data to make machine learning algorithms works.
In this case, we have text. We need to convert this
text into numbers that we can do calculations on.
We use word frequencies. That is treating every
document as a set of the words it contains. Our
features will be the counts of each of these words.
In our case, we have P(positive | overall liked the
movie), by using this theorem:
• P(positive | overall liked the movie) = P(overall liked the movie |
positive) * P(positive) / P(overall liked the movie)

• Since for our classifier we have to find out which tag has a bigger
probability, we can discard the divisor which is the same for both
tags,
P(overall liked the movie | positive)* P(positive) with P(overall
liked the movie | negative) * P(negative)
There’s a problem though: “overall liked the movie” doesn’t
appear in our training dataset, so the probability is zero.
• Here, we assume the ‘naive’ condition that every word in a
sentence is independent of the other ones. This means that now
we look at individual words.
• We can write this as:
P(overall liked the movie) = P(overall) * P(liked) * P(the) * P(movie)
• The next step is just applying the Bayes theorem:-

• P(overall liked the movie| positive) = P(overall | positive) * P(liked | positive) *

P(the | positive) * P(movie | positive)
• And now, these individual words actually show up several times in our training
data, and we can calculate them!
Calculating probabilities:
First, we calculate the a priori probability of each tag: for a given sentence in our
training data, the probability that it is positive P(positive) is 3/5. Then,
P(negative) is 2/5.
Then, calculating P(overall | positive) means counting how many times the word
“overall” appears in positive texts (1) divided by the total number of words in
positive (17). Therefore, P(overall | positive) = 1/17, P(liked/positive) = 1/17,
P(the/positive) = 2/17, P(movie/positive) = 3/17.
• if probability comes out to be zero then By using Laplace smoothing: we
add 1 to every count so it’s never zero. To balance this, we add the
number of possible words to the divisor, so the division will never be
greater than 1. In our case, the total possible words count are 21.
Applying smoothing, The results are:
• Now we just multiply all the probabilities, and see who is
bigger:

• P(overall | positive) * P(liked | positive) * P(the |

positive) * P(movie | positive) * P(positive ) = 1.38 *
10^{-5} = 0.0000138
P(overall | negative) * P(liked | negative) * P(the |
negative) * P(movie | negative) * P(negative) = 0.13 *
10^{-5} = 0.0000013
• Our classifier gives “overall liked the movie” the positive
tag.

Flexible Forming For Fluid Architecture: Arno Pronk
No ratings yet
Flexible Forming For Fluid Architecture: Arno Pronk
586 pages
GM 8
No ratings yet
GM 8
1 page
NLP DeepNLP
No ratings yet
NLP DeepNLP
61 pages
Introduction To NLP
No ratings yet
Introduction To NLP
50 pages
DLT Unit-5
No ratings yet
DLT Unit-5
48 pages
Natural Language Processing 5
No ratings yet
Natural Language Processing 5
24 pages
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
No ratings yet
Unit-3NaturalLanguageProcessing (NLP) 1 T1743588944524
83 pages
A Beginner's Guide To Natural Language Processing - IBM Developer
No ratings yet
A Beginner's Guide To Natural Language Processing - IBM Developer
9 pages
NLP Intro
No ratings yet
NLP Intro
74 pages
Natural Language Processing
No ratings yet
Natural Language Processing
8 pages
Natural Language Processing - NOTES
No ratings yet
Natural Language Processing - NOTES
4 pages
5.2 Natural Language Processing
No ratings yet
5.2 Natural Language Processing
43 pages
Week 6: Introduction To Natural Language Processing
No ratings yet
Week 6: Introduction To Natural Language Processing
18 pages
Chapter - 6 Communicating, Perceiving, and Acting
No ratings yet
Chapter - 6 Communicating, Perceiving, and Acting
30 pages
Mod 1
No ratings yet
Mod 1
71 pages
Natural Language Processing
No ratings yet
Natural Language Processing
6 pages
Unit - 1
No ratings yet
Unit - 1
55 pages
1 NLP
No ratings yet
1 NLP
26 pages
Hocken Maier 25
No ratings yet
Hocken Maier 25
46 pages
What Is Natural Language Processing (NLP) ?
No ratings yet
What Is Natural Language Processing (NLP) ?
11 pages
Unit 5 NLP
No ratings yet
Unit 5 NLP
24 pages
Module I NLP
No ratings yet
Module I NLP
65 pages
Text Classification Using NLP
No ratings yet
Text Classification Using NLP
28 pages
NLP LectureNotes UNIT 1
No ratings yet
NLP LectureNotes UNIT 1
55 pages
Unit 3 AI-ML Driven Data Science and Automation
No ratings yet
Unit 3 AI-ML Driven Data Science and Automation
49 pages
Minorproject Ishant
No ratings yet
Minorproject Ishant
18 pages
Natural Language Processing Revision Notes
No ratings yet
Natural Language Processing Revision Notes
4 pages
NLP 9
No ratings yet
NLP 9
44 pages
Course Code HUM1012 Logic and Language Structure BL202425040 0921 D21+D22
No ratings yet
Course Code HUM1012 Logic and Language Structure BL202425040 0921 D21+D22
55 pages
Ai Unit-5
No ratings yet
Ai Unit-5
40 pages
NLP 160709201345
No ratings yet
NLP 160709201345
61 pages
C10 - Ai - Unit 3 - NLP - Half Yearly
No ratings yet
C10 - Ai - Unit 3 - NLP - Half Yearly
37 pages
Hadi Pres, 21-12-24-1
No ratings yet
Hadi Pres, 21-12-24-1
16 pages
NLP Unit 1 To 5
No ratings yet
NLP Unit 1 To 5
91 pages
01 Introduction To Natural Language Processing
No ratings yet
01 Introduction To Natural Language Processing
42 pages
(IJCST-V6I3P19) :vignesh Venkatesh
No ratings yet
(IJCST-V6I3P19) :vignesh Venkatesh
16 pages
NLP Text Classification Week4
No ratings yet
NLP Text Classification Week4
26 pages
Natural Language Processing 101
No ratings yet
Natural Language Processing 101
26 pages
تعلم ML4
No ratings yet
تعلم ML4
42 pages
Chapter 7.1 - Introducing Natural Language Processing
No ratings yet
Chapter 7.1 - Introducing Natural Language Processing
39 pages
NLP Crash Course Comprehensive
No ratings yet
NLP Crash Course Comprehensive
2 pages
Lect 01
No ratings yet
Lect 01
28 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
Lec 2
No ratings yet
Lec 2
21 pages
Lecture 02
No ratings yet
Lecture 02
31 pages
NLP For ML - Spam Classifier
No ratings yet
NLP For ML - Spam Classifier
14 pages
Module 05 - Learners Guide
No ratings yet
Module 05 - Learners Guide
31 pages
UNIT - 03 (All Topics)
No ratings yet
UNIT - 03 (All Topics)
54 pages
NLP Unit 1
No ratings yet
NLP Unit 1
44 pages
Brocode OP
No ratings yet
Brocode OP
133 pages
NLP Unit 1 PDF
No ratings yet
NLP Unit 1 PDF
27 pages
Chapter 6.
No ratings yet
Chapter 6.
31 pages
Unit 1
No ratings yet
Unit 1
99 pages
DeekshikaJadyada AP24LDS11
No ratings yet
DeekshikaJadyada AP24LDS11
6 pages
Unit 1 NLP KCS072
No ratings yet
Unit 1 NLP KCS072
12 pages
NLP - 1 - 250119 - 222702
No ratings yet
NLP - 1 - 250119 - 222702
71 pages
Unit 5 - Aiaaia
No ratings yet
Unit 5 - Aiaaia
19 pages
Introduction To Data Science - Week 7 - LAQ's
No ratings yet
Introduction To Data Science - Week 7 - LAQ's
4 pages
Notes MSC NLP
No ratings yet
Notes MSC NLP
36 pages
Dataiku - Get Up To Speed With NLP
No ratings yet
Dataiku - Get Up To Speed With NLP
16 pages
Introduction To NLP - Part 1
No ratings yet
Introduction To NLP - Part 1
23 pages
纸张研究
100% (2)
纸张研究
12 pages
Warning Warning Warning: Prodigy 2.0 (M3 Unit Controller) Setup Guide
No ratings yet
Warning Warning Warning: Prodigy 2.0 (M3 Unit Controller) Setup Guide
88 pages
05 Ncecsc Dgarm
No ratings yet
05 Ncecsc Dgarm
9 pages
Photovoltaic Plants ABB-3
No ratings yet
Photovoltaic Plants ABB-3
20 pages
SIM7000 Hardware Design - V1.09
No ratings yet
SIM7000 Hardware Design - V1.09
69 pages
Screw Compressors: Boge Air. The Air To Work
No ratings yet
Screw Compressors: Boge Air. The Air To Work
52 pages
Payroll System Thesis Documentation
100% (1)
Payroll System Thesis Documentation
8 pages
Xerox Versant 4100 Press CEIG v1.0
No ratings yet
Xerox Versant 4100 Press CEIG v1.0
61 pages
Cepsa Atf Avant Diii
No ratings yet
Cepsa Atf Avant Diii
1 page
أسس النظرية للإدارة و التنظيم
No ratings yet
أسس النظرية للإدارة و التنظيم
20 pages
Purchase Order - 0002 PDF
No ratings yet
Purchase Order - 0002 PDF
1 page
ESP
No ratings yet
ESP
15 pages
Documents
No ratings yet
Documents
8 pages
33kV TWIN FEEDER 3&4 - REV-A - 03.06.2013
No ratings yet
33kV TWIN FEEDER 3&4 - REV-A - 03.06.2013
52 pages
Aba Siwes
No ratings yet
Aba Siwes
59 pages
Petronas Carigali SDN BHD: Document Review Status
No ratings yet
Petronas Carigali SDN BHD: Document Review Status
1 page
PDL-III Report FINAL
No ratings yet
PDL-III Report FINAL
34 pages
Palak Project
No ratings yet
Palak Project
15 pages
Collaborative Desktop Publishing
No ratings yet
Collaborative Desktop Publishing
18 pages
The School Principal of PEGAFI, Dr. Francisca Uy
No ratings yet
The School Principal of PEGAFI, Dr. Francisca Uy
3 pages
Comm SAP Tables
No ratings yet
Comm SAP Tables
11 pages
Bizhub 43 - Service Manual - Ver.253363295-B PDF
No ratings yet
Bizhub 43 - Service Manual - Ver.253363295-B PDF
232 pages
UvBeastV3 365nmMINI Manual-R2.1
No ratings yet
UvBeastV3 365nmMINI Manual-R2.1
20 pages
Model: Service Manual
No ratings yet
Model: Service Manual
62 pages
Operating Instructions Clv63x Clv64x Clv65x Bar Code Scanners en Im0071081
No ratings yet
Operating Instructions Clv63x Clv64x Clv65x Bar Code Scanners en Im0071081
268 pages
PotBS Manual
No ratings yet
PotBS Manual
14 pages
CC Link IE
No ratings yet
CC Link IE
84 pages
22k-4522 (Shozab Mehdi) Lab - 1
No ratings yet
22k-4522 (Shozab Mehdi) Lab - 1
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CSC 528 Lecture 3

Uploaded by

CSC 528 Lecture 3

Uploaded by

COURSE CODE: CSC 528

• Humans communicate with each other using words and text.

• NLP enables computers to understand natural language as

• For the algorithm to understand these sentences, you need to

• It is the process of obtaining the Word Stem of

• The process of obtaining the Root Stem of a word.

• The search module is responsible for computing the argmax of the

• Multinomial Naive Bayes (MNB) is a popular

• The performance of the model is evaluated

• P(overall liked the movie| positive) = P(overall | positive) * P(liked | positive) *

• P(overall | positive) * P(liked | positive) * P(the |

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.