01 NLP - Merged Vinay
01 NLP - Merged Vinay
MARATHE COLLEGE OF
ARTS, SCIENCE & COMMERCE.
(Affiliated to University Of Mumbai)
PRACTICAL JOURNAL
PSCSP514
Natural Language Processing
SUBMITTED BY
2023-2024
MUMBAI-400 071
N.G.ACHARYA & D.K.MARATHE COLLEGE
OF ARTS,SCIENCE & COMMERCE
(Affiliated to University of Mumbai)
CERTIFICATE
This is to certify that Mr. Vinay Vijay Gupta Seat No. studying
in Master of Science in Computer Science Part I Semester II has
satisfactorily completed the Practical of PSCSP514 Natural Language
Processing as prescribed by University of Mumbai, during the academic
year 2023-24.
7
Write a program to Implement Named Entity
Recognition (NER)
What is Tokenization?
Program:
Tokenization:
import nltk
nltk.download('punkt')
from nltk.tokenize import sent_tokenize
sentence = "Hi, My name is Aman, I hope you like my
work. You can follow me on Instagram for more resources.
My username is 'the.clever.programmer'."
print(sent_tokenize(sentence))
from nltk.tokenize import TreebankWordTokenizer
word_token = TreebankWordTokenizer()
print(word_token.tokenize(sentence))
Output:
b. Segmentation:
The process of deciding from where the sentences actually start or end in
NLP or we can simply say that here we are dividing a paragraph based on
sentences. This process is known as Sentence Segmentation.
Type of text the pipeline is trained on, e.g. web or news. For example,
en_core_web_sm is a small English pipeline trained on written web text
(blogs, news, comments), that includes vocabulary, vectors, syntax and
entities.
output:
Practical 2
Porter Stemmer This is the Porter stemming algorithm. It follows the algorithm
presented in Porter, M. "An algorithm for suffix stripping.
CONNECT
CONNECTIONS ----- > CONNECT
CONNECTED ----- > CONNECT
CONNECTING ----- > CONNECT
CONNECTION ----- > CONNECT
output:
What is Stemming?
Output:
The below program uses the Porter Stemming Algorithm for stemming.
import nltk
from nltk.stem.porter import PorterStemmer
porter_stemmer = PorterStemmer()
word_data = "It originated from the idea that there are readers who prefer
learning new skills from the comforts of their drawing rooms"
# First Word tokenization
nltk_tokens = nltk.word_tokenize(word_data)
#Next find the roots of the word
for w in nltk_tokens:
print ("Actual: %s Stem: %s" % (w,porter_stemmer.stem(w)))
Output:
Lemmatization:
a= WordNetLemmatizer()
Unigram
n=1
sentence = 'You will face many defeats in life, but never let yourself be
defeated.'
unigrams = ngrams(sentence.split(), n)
Output:
bi-gram
n=2
unigrams = ngrams(sentence.split(), n)
for item in unigrams:
print(item)
output:
tri-gram
n=3
unigrams = ngrams(sentence.split(), n)
print(item)
output:
Practical 4
Program:
import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize, sent_tokenize
stop_words = set(stopwords.words('english'))
tokenized = sent_tokenize(txt)
for i in tokenized:
print(tagged)
output
or
import spacy
doc = nlp(text)
output:
Practical 5
import spacy
# Loading the model
nlp=spacy.load('en_core_web_sm')
text = "Reliance Retail acquires majority stake in designer brand
Abraham & Thakore."
# Creating Doc object
doc=nlp(text)
print(doc)
# Getting dependency tags
for token in doc:
print(token.text,'=>',token.dep_)
# Importing visualizer
from spacy import displacy
# Visualizing dependency tree
displacy.render(doc,jupyter=True)
or
import spacy
from spacy import displacy
nlp = spacy.load('en_core_web_sm')
doc = nlp(u'This is a sentence.')
displacy.serve(doc, style='dep')
import spacy
from spacy import displacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")
displacy.serve(doc, style="dep")
Practical 6
import spacy
nlp=spacy.load('en_core_web_sm')
text='It took me more than two hours to translate a few pages of English.'
displacy.render(doc, style='dep')
doc=nlp(text)
import spacy
py_nlp = spacy.load("en_core_web_sm")
displacy.render(py_doc,style='dep')
Practical 7
Write a program to Implement Named Entity Recognition
(NER).
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp(sentence)
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
#doc2 = nlp(text2)
sentences = list(doc.sents)
print(sentences)
# tokenization
print(token.text)
# print entities
print(ents)
import gensim
short_summary = summarize(original_text)
print(short_summary)
Output: