0% found this document useful (0 votes)
2 views5 pages

Viva Q&a

The document provides an overview of Natural Language Processing (NLP), including its definition, applications, challenges, and key concepts such as tokenization, morphology, and named entity recognition. It also discusses various NLP techniques and models, including word frequency analysis, probabilistic models, POS tagging, and chunking, along with their implementation details and challenges. Additionally, it mentions popular NLP libraries in Python, such as NLTK and spaCy.

Uploaded by

bms714491
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views5 pages

Viva Q&a

The document provides an overview of Natural Language Processing (NLP), including its definition, applications, challenges, and key concepts such as tokenization, morphology, and named entity recognition. It also discusses various NLP techniques and models, including word frequency analysis, probabilistic models, POS tagging, and chunking, along with their implementation details and challenges. Additionally, it mentions popular NLP libraries in Python, such as NLTK and spaCy.

Uploaded by

bms714491
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

VIVA Q&A:

IMPORTANT Q&A:

 What is Natural Language Processing (NLP)?


NLP is a field of AI that enables computers to understand, interpret, and generate human language.

 Mention any two applications of NLP.


Machine Translation (e.g., Google Translate) and Sentiment Analysis.

 What are the key challenges in NLP?


Ambiguity, context understanding, and language variability.

 Define Tokenization.
Tokenization is the process of breaking text into smaller units like words or sentences.

 What is the difference between Syntax and Semantics in NLP?


Syntax deals with the structure of sentences, while semantics focuses on the meaning.

 What is a Corpus in NLP?


A corpus is a large collection of text used for training NLP models.

 Explain the term Morphology in NLP.


Morphology is the study of the structure and formation of words.

 What is Lemmatization?
Lemmatization reduces words to their base or dictionary form (lemma).

 Define Stop Words with an example.


Stop words are common words (e.g., "is," "the") often removed in NLP tasks.

 What is Named Entity Recognition (NER)?


NER identifies and classifies entities like names, locations, and dates in text.

 What are N-grams in NLP?


N-grams are continuous sequences of n words in a given text.

 Define Part-of-Speech (POS) tagging.


POS tagging assigns grammatical categories (like noun, verb) to words.

 What is a Language Model?


A language model predicts the probability of word sequences in text.

 What is the purpose of Stemming in NLP?


Stemming reduces words to their root form by removing suffixes.

 Mention any two NLP libraries in Python.


NLTK and spaCy.
QUESTIONS BASED ON PROGRAM:

1. Word Analysis (Frequency and Distribution of Words)

 What is word frequency analysis?

Counting how often each word appears in a text.

 How does your program handle case sensitivity?

By converting all text to lowercase.

 How do you manage punctuation in the text?

By removing or ignoring punctuation marks.

 Which data structure did you use to store word counts? Why?

A dictionary for fast key-based access.

2. Word Generation Using Probabilistic Models

 What is a probabilistic model in the context of word generation?

A model that predicts the next word based on probability.

 How does your program decide which word to generate next?

By selecting words based on their probability distribution.

 What is the difference between bigram and trigram models?

Bigrams consider one previous word, trigrams consider two.

 How does increasing the order of n-grams affect the output?

It improves context but requires more data.

3. Morphology Analysis (Root Words, Prefixes, Suffixes)

 What is morphology in NLP?

The study of word structure, including roots, prefixes, and suffixes.

 How does your program identify root words?

Using stemming or lemmatization techniques.

 What libraries or algorithms did you use for morphological analysis?

NLTK’s stemmer or spaCy’s lemmatizer.

 What is the difference between stemming and lemmatization?


Stemming chops off affixes; lemmatization finds the dictionary form.

4. Implementing N-Grams

 What is an N-gram?

A sequence of n words from a text.

 How does your program handle the start and end of a sentence?

By adding start and end tokens.

 What is the effect of changing the value of 'n' in N-grams?

Higher n gives better context but increases sparsity.

 How do you handle unseen word sequences?

Using smoothing techniques.

 Can you explain the real-world applications of N-grams?

Text prediction, autocomplete, and speech recognition.

5. N-Grams Smoothing

 Why is smoothing important in N-gram models?

To handle zero probabilities for unseen word sequences.

 What type of smoothing technique did you implement?

Laplace (add-one) smoothing.

 How does smoothing improve model performance?

By assigning small probabilities to unseen events.

 What happens if you don’t apply smoothing?

The model will assign zero probability to unseen sequences.

 How can you evaluate the effectiveness of smoothing techniques?

By measuring perplexity or model accuracy.

6. POS Tagging Using Hidden Markov Model (HMM)

 What is a Hidden Markov Model?

A statistical model where the system has hidden states (like POS tags).

 How does HMM help in POS tagging?

By modeling sequences of tags based on observed words.


 What are the hidden states and observations in your model?

Hidden states are POS tags; observations are words.

 How do you estimate transition and emission probabilities?

Using training data with tagged sentences.

 What are the limitations of HMM for POS tagging?

Difficulty handling long-range dependencies and unknown words.

7. POS Tagging Using Viterbi Decoding

 What is the Viterbi algorithm?

A dynamic programming algorithm for finding the most likely sequence of hidden states.

 Why is Viterbi decoding efficient for POS tagging?

It reduces the computational complexity of sequence prediction.

 How does your program initialize the Viterbi matrix?

By assigning probabilities to the starting states.

 What is the role of backtracking in the Viterbi algorithm?

To trace the optimal sequence of POS tags.

 How does Viterbi decoding handle ambiguous words?

By selecting the tag sequence with the highest probability.

8. Building a POS Tagger

 What approach did you use to build the POS tagger?

A statistical model using HMM and Viterbi decoding.

 How does your program deal with unknown words?

By using smoothing or assigning default probabilities.

 What datasets did you use to train your POS tagger?

Tagged corpora like the Penn Treebank.

 How do you evaluate the accuracy of your POS tagger?

By comparing predicted tags with a labeled test set.

 Can your POS tagger be improved with more data? How?

Yes, more data improves probability estimates and accuracy.


9. Chunking (Grouping Words into Phrases)

 What is chunking in NLP?

Grouping words into meaningful phrases like noun or verb phrases.

 How is chunking different from POS tagging?

POS tagging labels individual words, while chunking groups them.

 What are chunk patterns, and how are they defined?

Regular expressions based on POS tag sequences.

 How does your program recognize noun phrases?

Using patterns like (DT + JJ* + NN+).

 Can chunking be applied to languages other than English?

Yes, with language-specific POS tagging models.

10. Building a Chunker

 What algorithm or library did you use to build the chunker?

NLTK’s RegexpParser for pattern-based chunking.

 How do regular expressions help in chunking?

They define rules for grouping words based on POS tags.

 How do you evaluate the performance of your chunker?

Using precision, recall, and F1 score against annotated data.

 What challenges did you face while building the chunker?

Handling complex sentence structures and ambiguous tags.

 How would you extend the chunker to handle more complex syntactic structures?

By adding more detailed patterns and rules.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy