UNIT-5 Quetions - Answers
UNIT-5 Quetions - Answers
What is NLTK?
NLTK is a Python library, which stands for Natural Language Toolkit. We use
NLTK to process data in human-spoken languages. NLTK allows us to apply
techniques such as parsing, tokenization, lemmatization, stemming, and more to
understand natural languages. It helps in categorizing text, parsing linguistic
structure, analyzing documents, etc.
A few of the libraries of the NLTK package that we often use in NLP are:
1. SequentialBackoffTagger
2. DefaultTagger
3. UnigramTagger
4. treebank
5. wordnet
6. FreqDist
7. patterns
8. RegexpTagger
9. backoff_tagger
10.UnigramTagger, BigramTagger, and TrigramTagger
Suppose, if A and B are regular expressions, then the following are true for
them:
Output:
1(‘run’, ‘cooki', ‘fly’ )
How to tokenize a sentence using the nltk package?
Output:
1[ 'Hi Guys.' ,
2'Welcome to SVPCET. ',
3'This is a blog on the NLP interview questions and answers. ' ]
Now, to tokenize a word, we will import word_tokenize from the nltk package.
1from nltk.tokenize import word_tokenize
Para = “Hi Guys. Welcome to SVPCET. This is a blog on the NLP interview
questions and answers.”
1word_tokenize(Para)
Output:
[ 'Hi' , 'Guys' , ' . ' , 'Welcome' , 'to' , 'SVPCET' , ' . ' , 'This' , 'is' , 'a', 'blo
1, 'answers' , ' . ' ]
● When the machine parses the text one word at a time, then it is
a unigram.
● When the text is parsed two words at a time, it is a bigram.
● The set of words is a trigram when the machine parses three words
at a time.
Now, let’s implement parsing with the help of the nltk package.
1import nltk
2text = ”Top 30 NLP interview questions and answers”
Now, we will use the function for extracting unigrams, bigrams, and trigrams.
1list(nltk.unigrams(text))
Output:
1[ "Top 30 NLP interview questions and answer"]
1list(nltk.bigrams(text))
Output:
1["Top 30", "30 NLP", "NLP interview", "interview questions", "questions and", "and answer"
1list(nltk.trigrams(text))
Output:
1["Top 30 NLP", "NLP interview questions", "questions and answers"]
For extracting n-grams, we can use the function nltk.ngrams and give the
argument n for the number of parsers.
1list(nltk.ngrams(text,n))
Output:
1(‘run’, ‘cooki', ‘fly’ )