Bai601 Simp
Bai601 Simp
3. Discuss the major challenges in NLP, such as ambiguity, idioms, evolving language,
and ellipses, and explain how context helps in resolving these issues.
4. What is the difference between language and grammar? How does Chomsky’s
transformational grammar help in parsing natural language?
5. Explain the differences between Indian languages and English that affect NLP, and
describe how the Paninian framework addresses them.
6. What is Karaka Theory? Illustrate at least four Karaka roles with examples in an
Indian language sentence.
9. What are the applications of NLP in real-world systems? Briefly explain at least
three: Machine Translation, Question Answering, and Text Summarization.
1. Define regular expressions. Explain how they are implemented using Finite-State
Automata (FSA) with examples.
7. Explain Hidden Markov Model (HMM) tagging using unigram and bigram
probabilities. Show how Viterbi decoding is approximated.
8. What is Context-Free Grammar (CFG)? Write CFG rules for sentence generation and
parse the sentence: “Hena reads a book.”
1. Explain the Naive Bayes Classifier. Derive the final equation for text classification
and explain the bag-of-words and conditional independence assumptions.
2. How is a Naive Bayes classifier trained? Explain how to estimate the prior and
likelihood probabilities using Maximum Likelihood Estimation and Laplace
Smoothing.
3. Perform a step-by-step Naive Bayes classification for a given test document using a
small training set.
4. What is binary Naive Bayes? Explain how clipping word counts and handling
negation improve sentiment classification.
5. What are the common issues faced in sentiment analysis using Naive Bayes?
Discuss solutions like negation handling, stop-word removal, and use of sentiment
lexicons.
6. Describe the use of Naive Bayes in spam detection and language identification.
Mention examples of features used in these tasks.
7. Explain how Naive Bayes can be viewed as a language model. How does it assign
probabilities to entire sentences?
8. How is text classification performance evaluated? Define precision, recall, F1-
score, and explain the importance of confusion matrix in classification.
9. What is the role of cross-validation and statistical significance testing in evaluating
classifiers? Explain the paired bootstrap test with an example.
1. Explain the architecture and design features of an Information Retrieval (IR) system.
How do indexing, stop-word removal, and stemming contribute to its efficiency?
2. Compare and contrast the three classical IR models: Boolean, Probabilistic, and
Vector Space. Include examples and evaluation criteria.
3. What is TF-IDF weighting? Derive the formula and explain its significance with an
example.
4. Describe the Cluster, Fuzzy, and LSI (Latent Semantic Indexing) models in IR. How
do they address limitations of classical models?
5. Explain Zipf’s Law and how it applies to term selection and index size reduction in IR
systems.
8. Describe FrameNet and its role in semantic role labeling. Use examples from frames
like ARREST or COMMUNICATION.
9. List different POS taggers (e.g., HMM, Brill, TreeTagger, Stanford Tagger). Compare
their approaches and applications in IR/NLP tasks.
o Lexical divergences
o Morphological typology
o Referential density
4. What are parallel corpora and how are they used to train MT systems? Discuss the
role of sentence alignment in creating bilingual datasets.
7. What are the two key criteria for evaluating MT systems? How do BLEU and chrF
metrics work? Compare their strengths and limitations.
8. What are the bias and ethical concerns in Machine Translation? Explain with
examples how gender bias can manifest in MT outputs and how it is evaluated.