0% found this document useful (0 votes)
23 views27 pages

01 NLP - Merged Vinay

Hsjshsjavsjsjsksbsbsmzbzjxnnxnxznbzjzmzn Jsksjsjznsnznnzzbbxn.jsjsjshsjskjdjsjj

Uploaded by

kambleyash1412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views27 pages

01 NLP - Merged Vinay

Hsjshsjavsjsjsksbsbsmzbzjxnnxnxznbzjzmzn Jsksjsjznsnznnzzbbxn.jsjsjshsjskjdjsjj

Uploaded by

kambleyash1412
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

N.G. ACHARYA & D.K.

MARATHE COLLEGE OF
ARTS, SCIENCE & COMMERCE.
(Affiliated to University Of Mumbai)

PRACTICAL JOURNAL

PSCSP514
Natural Language Processing
SUBMITTED BY

VINAY VIJAY GUPTA


SEAT NO :

SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS


FOR QUALIFYING M.Sc. (CS) PART-I (SEMESTER – II) EXAMINATION.

2023-2024

DEPARTMENT OF COMPUTER SCIENCE


SHREE N.G.ACHARYA MARG,CHEMBUR

MUMBAI-400 071
N.G.ACHARYA & D.K.MARATHE COLLEGE
OF ARTS,SCIENCE & COMMERCE
(Affiliated to University of Mumbai)

CERTIFICATE

This is to certify that Mr. Vinay Vijay Gupta Seat No. studying
in Master of Science in Computer Science Part I Semester II has
satisfactorily completed the Practical of PSCSP514 Natural Language
Processing as prescribed by University of Mumbai, during the academic
year 2023-24.

Signature Signature Signature

Internal Guide External Examiner Head Of Department

College Seal Date


2
Sr.No Title Signature
1 Write a program to implement sentence segmentation and
word tokenization

2 WAP stemming and lemmatization

3 Implement a tri-gram model


a. unigram
b. bigram
c. trigram

4 Implement PoS TAGGING USING HMM & NEURAL


Model

5 Write a program to Implement syntactic parsing of a given


text

6 Write a program to Implement dependency parsing of a


given text

7
Write a program to Implement Named Entity
Recognition (NER)

8 Create a chatbot using python and nltk(application of


Natural Language Processing)
Practical 1

Aim: WAP to implement sentence segmentation and word tokenization.

What is Tokenization?

Tokenization is the process of breaking up a piece of text into sentences or


words. When we break down textual data into sentences or words, the output we
get is known as tokens. There are two strategies for tokenization of a textual
dataset:

Program:

Tokenization:
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize
sentence = "Hi, My name is Aman, I hope you like my
work. You can follow me on Instagram for more resources.
My username is 'the.clever.programmer'."
print(sent_tokenize(sentence))
from nltk.tokenize import TreebankWordTokenizer
word_token = TreebankWordTokenizer()
print(word_token.tokenize(sentence))

Output:
b. Segmentation:
The process of deciding from where the sentences actually start or end in
NLP or we can simply say that here we are dividing a paragraph based on
sentences. This process is known as Sentence Segmentation.

Type of text the pipeline is trained on, e.g. web or news. For example,
en_core_web_sm is a small English pipeline trained on written web text
(blogs, news, comments), that includes vocabulary, vectors, syntax and
entities.

#import spacy library


import spacy

#load core english library


nlp = spacy.load("en_core_web_sm")

#take unicode string


#here u stands for unicode
doc = nlp(u"I Love Coding. Geeks for Geeks helped me in this regard
very much. I Love Geeks for Geeks.")
#to print sentences
for sent in doc.sents:
print(sent)

output:
Practical 2

Aim: WAP stemming and lemmatization.

Languages we speak and write are made up of several words often


derived from one another. When a language contains words that are
derived from another word as their use in the speech changes is
called Inflected Language.

Porter Stemmer This is the Porter stemming algorithm. It follows the algorithm
presented in Porter, M. "An algorithm for suffix stripping.

CONNECT
CONNECTIONS ----- > CONNECT
CONNECTED ----- > CONNECT
CONNECTING ----- > CONNECT
CONNECTION ----- > CONNECT

Lancaster Stemmer is the most aggressive stemming algorithm. It has an edge


over other stemming techniques because it offers us the functionality to add our
own custom rules in this algorithm when we implement this using the NLTK
package. This sometimes results in abrupt results
Program :
from nltk.stem import PorterStemmer
from nltk.stem import LancasterStemmer
porter = PorterStemmer()
lancaster=LancasterStemmer()
#proide a word to be stemmed
print("Porter Stemmer")
print(porter.stem("cats"))
print(porter.stem("trouble"))
print(porter.stem("troubling"))
print(porter.stem("troubled"))
print("Lancaster Stemmer")
print(lancaster.stem("cats"))
print(lancaster.stem("trouble"))
print(lancaster.stem("troubling"))
print(lancaster.stem("troubled"))

output:
What is Stemming?

Stemming is a method of normalization of words in Natural Language


Processing. It is a technique in which a set of words in a sentence are converted
into a sequence to shorten its lookup.

from nltk.stem import PorterStemmer


e_words= ["wait", "waiting", "waited", "waits"]
ps =PorterStemmer()
for w in e_words:
rootWord=ps.stem(w)
print(rootWord)

Output:

The below program uses the Porter Stemming Algorithm for stemming.

import nltk
from nltk.stem.porter import PorterStemmer
porter_stemmer = PorterStemmer()

word_data = "It originated from the idea that there are readers who prefer
learning new skills from the comforts of their drawing rooms"
# First Word tokenization
nltk_tokens = nltk.word_tokenize(word_data)
#Next find the roots of the word
for w in nltk_tokens:
print ("Actual: %s Stem: %s" % (w,porter_stemmer.stem(w)))

Output:
Lemmatization:

Lemmatization is the grouping together of different forms of the same word.


For example if a paragraph has words like cars, trains and automobile, then it will
link all of them to automobile. In the below program we use the WordNet lexical
database for lemmatization.

from nltk.stem import WordNetLemmatizer


nltk.download('wordnet')

a= WordNetLemmatizer()

print("rocks :", a.lemmatize("rocks"))


print("corpora :", a.lemmatize("corpora"))
print("oranges:",a.lemmatize("oranges"))
# a denotes adjective in "pos"
print("better :", a.lemmatize("better", pos ="a"))
Practical 3

Aim: Implement a tri-gram model (N-Gram).

Unigram

from nltk.util import ngrams

n=1
sentence = 'You will face many defeats in life, but never let yourself be
defeated.'
unigrams = ngrams(sentence.split(), n)

for item in unigrams:


print(item)

Output:

bi-gram

from nltk.util import ngrams

n=2

sentence = 'The purpose of our life is to happy'

unigrams = ngrams(sentence.split(), n)
for item in unigrams:

print(item)

output:

tri-gram

from nltk.util import ngrams

n=3

sentence = 'Whoever is happy will make others happy too'

unigrams = ngrams(sentence.split(), n)

for item in unigrams:

print(item)

output:
Practical 4

Aim: Implement PoS TAGGING USING HMM &NEURAL Model.

Program:

import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
stop_words = set(stopwords.words('english'))

txt = "Sukanya, Rajib and Naba are my good friends. " \


"Sukanya is getting married next year. " \
"Marriage is a big step in one’s life." \
"It is both exciting and frightening. " \
"But friendship is a sacred bond between people." \
"It is a special kind of love between us. " \
"Many of you must have tried searching for a friend "\
"but never found the right one."

# sent_tokenize is one of instances of


# PunktSentenceTokenizer from the nltk.tokenize.punkt module

tokenized = sent_tokenize(txt)

for i in tokenized:

# Word tokenizers is used to find the words


# and punctuation in a string
wordsList = nltk.word_tokenize(i)
# removing stop words from wordList
wordsList = [w for w in wordsList if not w in stop_words]

# Using a Tagger. Which is part-of-speech


# tagger or POS-tagger.
tagged = nltk.pos_tag(wordsList)

print(tagged)

output

or
import spacy

# Load English tokenizer, tagger,


# parser, NER and word vectors
nlp = spacy.load("en_core_web_sm")

# Process whole documents


text = ("""My name is Shaurya Uppal.
I enjoy writing articles on GeeksforGeeks checkout
my other article by going to my profile section.""")

doc = nlp(text)

# Token and Tag


for token in doc:
print(token, token.pos)

# You want list of Verb tokens


print("Verbs:", [token.text for token in doc if token.pos_ == "VERB"])

output:
Practical 5

Write a program to Implement syntactic parsing of a given text.

Trained Models & Pipelines · spaCy Models Documentation

A compound is made up of various parts of speech such as a noun, verb,


and adverb. This means that compounds can be a combination of noun
plus noun, verb plus noun, adjective plus noun, etc. The word "bedroom"
is made up of two nouns, bed and room.

spaCy is a library for advanced Natural Language Processing in Python


and Cython.
sm-small
en-english model

import spacy
# Loading the model
nlp=spacy.load('en_core_web_sm')
text = "Reliance Retail acquires majority stake in designer brand
Abraham & Thakore."
# Creating Doc object
doc=nlp(text)
print(doc)
# Getting dependency tags
for token in doc:
print(token.text,'=>',token.dep_)
# Importing visualizer
from spacy import displacy
# Visualizing dependency tree
displacy.render(doc,jupyter=True)

or

import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_sm')
doc = nlp(u'This is a sentence.')
displacy.serve(doc, style='dep')
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")
displacy.serve(doc, style="dep")
Practical 6

Write a program to Implement dependency parsing of a given text.

Dependency structure shows which word or phrase depends on which


other words or phrases. We use dependency-based parsing to analyze and
infer both structure and semantic dependencies and relationships between
tokens in a sentence.

import spacy
nlp=spacy.load('en_core_web_sm')

text='It took me more than two hours to translate a few pages of English.'

for token in nlp(text): print(token.text,'=>',token.pos_,'=>',token.tag_)


doc=nlp(text)
print(doc)
from spacy import displacy

displacy.render(doc, style='dep')

doc=nlp(text)
import spacy

py_text = "spacy dependency parser in python."

py_nlp = spacy.load("en_core_web_sm")

py_doc = py_nlp( py_text)

from spacy import displacy

displacy.render(py_doc,style='dep')
Practical 7
Write a program to Implement Named Entity Recognition
(NER).

import spacy

nlp = spacy.load('en_core_web_sm')

sentence = "Apple is looking at buying U.K. startup for $1 billion"

doc = nlp(sentence)

for ent in doc.ents:

print(ent.text, ent.start_char, ent.end_char, ent.label_)


# imports and load spacy english language package

import spacy

from spacy import displacy

from spacy import tokenizer

nlp = spacy.load('en_core_web_sm')

#Load the text and process it

# I copied the text from python wiki

text =("Python is an interpreted, high-level and general-purpose


programming language "

"Pythons design philosophy emphasizes code readability with"

"its notable use of significant indentation."

"Its language constructs and object-oriented approach aim to"

"help programmers write clear and"

"logical code for small and large-scale projects")

# text2 = # copy the paragraphs from


https://www.python.org/doc/essays/

doc = nlp(text)

#doc2 = nlp(text2)

sentences = list(doc.sents)

print(sentences)

# tokenization

for token in doc:

print(token.text)
# print entities

ents = [(e.text, e.start_char, e.end_char, e.label_) for e in doc.ents]

print(ents)

# now we use displaycy function on doc2

displacy.render(doc, style='ent', jupyter=True)


Practical 8
Write a program to Implement Text Summarization for the given
sample text.

Text Summarization Approaches for NLP - Practical Guide with


Generative Examples - Machine Learning Plus

original_text = 'Junk foods taste good that’s why it is mostly liked by


everyone of any age group especially kids and school going children. They
generally ask for the junk food daily because they have been trend so by
their parents from the childhood. They never have been discussed by their
parents about the harmful effects of junk foods over health. According to
the research by scientists, it has been found that junk foods have negative
effects on the health in many ways. They are generally fried food found in
the market in the packets. They become high in calories, high in
cholesterol, low in healthy nutrients, high in sodium mineral, high in
sugar, starch, unhealthy fat, lack of protein and lack of dietary fibers.
Processed and junk foods are the means of rapid and unhealthy weight
gain and negatively impact the whole body throughout the life. It makes
able a person to gain excessive weight which is called as obesity. Junk
foods tastes good and looks good however do not fulfil the healthy calorie
requirement of the body. Some of the foods like french fries, fried foods,
pizza, burgers, candy, soft drinks, baked goods, ice cream, cookies, etc are
the example of high-sugar and high-fat containing foods. It is found
according to the Centres for Disease Control and Prevention that Kids and
children eating junk food are more prone to the type-2 diabetes. In type-2
diabetes our body become unable to regulate blood sugar level. Risk of
getting this disease is increasing as one become more obese or overweight.
It increases the risk of kidney failure. Eating junk food daily lead us to the
nutritional deficiencies in the body because it is lack of essential nutrients,
vitamins, iron, minerals and dietary fibers. It increases risk of
cardiovascular diseases because it is rich in saturated fat, sodium and bad
cholesterol. High sodium and bad cholesterol diet increases blood pressure
and overloads the heart functioning. One who like junk food develop more
risk to put on extra weight and become fatter and unhealthier. Junk foods
contain high level carbohydrate which spike blood sugar level and make
person more lethargic, sleepy and less active and alert. Reflexes and senses
of the people eating this food become dull day by day thus they live more
sedentary life. Junk foods are the source of constipation and other disease
like diabetes, heart ailments, clogged arteries, heart attack, strokes, etc
because of being poor in nutrition. Junk food is the easiest way to gain
unhealthy weight. The amount of fats and sugar in the food makes you
gain weight rapidly. However, this is not a healthy weight. It is more of
fats and cholesterol which will have a harmful impact on your health. Junk
food is also one of the main reasons for the increase in obesity
nowadays.This food only looks and tastes good, other than that, it has no
positive points. The amount of calorie your body requires to stay fit is not
fulfilled by this food. For instance, foods like French fries, burgers, candy,
and cookies, all have high amounts of sugar and fats. Therefore, this can
result in long-term illnesses like diabetes and high blood pressure. This
may also result in kidney failure. Above all, you can get various nutritional
deficiencies when you don’t consume the essential nutrients, vitamins,
minerals and more. You become prone to cardiovascular diseases due to
the consumption of bad cholesterol and fat plus sodium. In other words, all
this interferes with the functioning of your heart. Furthermore, junk food
contains a higher level of carbohydrates. It will instantly spike your blood
sugar levels. This will result in lethargy, inactiveness, and sleepiness. A
person reflex becomes dull overtime and they lead an inactive life. To
make things worse, junk food also clogs your arteries and increases the
risk of a heart attack. Therefore, it must be avoided at the first instance to
save your life from becoming ruined.The main problem with junk food is
that people don’t realize its ill effects now. When the time comes, it is too
late. Most importantly, the issue is that it does not impact you instantly. It
works on your overtime; you will face the consequences sooner or later.
Thus, it is better to stop now.You can avoid junk food by encouraging
your children from an early age to eat green vegetables. Their taste buds
must be developed as such that they find healthy food tasty. Moreover, try
to mix things up. Do not serve the same green vegetable daily in the same
style. Incorporate different types of healthy food in their diet following
different recipes. This will help them to try foods at home rather than
being attracted to junk food.In short, do not deprive them completely of it
as that will not help. Children will find one way or the other to have it.
Make sure you give them junk food in limited quantities and at healthy
periods of time. '
!pip3 install gensim==3.6.0

import gensim

from gensim.summarization import summarize

short_summary = summarize(original_text)

print(short_summary)

Output:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy