0% found this document useful (0 votes)
14 views4 pages

Bai601 Simp

The document outlines a series of questions related to Natural Language Processing (NLP) across five modules, covering topics such as language modeling, syntactic analysis, text classification, information retrieval, and machine translation. Each module includes definitions, comparisons, explanations of theories, and practical applications, emphasizing the challenges and methodologies in NLP. The questions aim to assess understanding of fundamental concepts, techniques, and real-world applications in the field.

Uploaded by

rayangesh15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views4 pages

Bai601 Simp

The document outlines a series of questions related to Natural Language Processing (NLP) across five modules, covering topics such as language modeling, syntactic analysis, text classification, information retrieval, and machine translation. Each module includes definitions, comparisons, explanations of theories, and practical applications, emphasizing the challenges and methodologies in NLP. The questions aim to assess understanding of fundamental concepts, techniques, and real-world applications in the field.

Uploaded by

rayangesh15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

BAI601 TIE SIMP Questions- based on 22 scheme

Module 1: Introduction & Language Modeling

1. Define Natural Language Processing (NLP). Compare the rationalist and


empiricist approaches to modeling human language understanding.

2. Explain the five levels of language processing (lexical, syntactic, semantic,


discourse, and pragmatic) with suitable examples.

3. Discuss the major challenges in NLP, such as ambiguity, idioms, evolving language,
and ellipses, and explain how context helps in resolving these issues.

4. What is the difference between language and grammar? How does Chomsky’s
transformational grammar help in parsing natural language?

5. Explain the differences between Indian languages and English that affect NLP, and
describe how the Paninian framework addresses them.

6. What is Karaka Theory? Illustrate at least four Karaka roles with examples in an
Indian language sentence.

7. Compare Grammar-based and Statistical language models. Discuss n-gram


models and how sentence probability is estimated using bigrams.

8. Describe and differentiate between Add-one smoothing and Good-Turing


smoothing in statistical language modeling.

9. What are the applications of NLP in real-world systems? Briefly explain at least
three: Machine Translation, Question Answering, and Text Summarization.

Module 2: Word-Level & Syntactic Analysis

1. Define regular expressions. Explain how they are implemented using Finite-State
Automata (FSA) with examples.

2. What is morphological parsing? Describe the components of a morphological parser


and explain with examples (e.g., 'eggs', 'played').

3. Discuss different types of spelling errors (typographical, OCR, phonetic). Explain


minimum edit distance with an example (e.g., tutor → tumour).

4. Explain the difference between stemmers and morphological analyzers. Compare


Lovins' and Porter’s stemmer.

5. Describe the two-level morphological parsing model using Finite-State Transducers


(FSTs) with a relevant example (e.g., "walking" → "walk+V+PP").
6. What is Part-of-Speech tagging? Explain the three major types: Rule-based,
Stochastic (HMM), and Hybrid (Brill's tagger), with examples.

7. Explain Hidden Markov Model (HMM) tagging using unigram and bigram
probabilities. Show how Viterbi decoding is approximated.

8. What is Context-Free Grammar (CFG)? Write CFG rules for sentence generation and
parse the sentence: “Hena reads a book.”

9. Compare top-down and bottom-up parsing techniques. Explain CYK or Earley’s


algorithm with steps and an example.

Module 3: Naive Bayes, Text Classification & Sentiment Analysis

1. Explain the Naive Bayes Classifier. Derive the final equation for text classification
and explain the bag-of-words and conditional independence assumptions.
2. How is a Naive Bayes classifier trained? Explain how to estimate the prior and
likelihood probabilities using Maximum Likelihood Estimation and Laplace
Smoothing.
3. Perform a step-by-step Naive Bayes classification for a given test document using a
small training set.
4. What is binary Naive Bayes? Explain how clipping word counts and handling
negation improve sentiment classification.
5. What are the common issues faced in sentiment analysis using Naive Bayes?
Discuss solutions like negation handling, stop-word removal, and use of sentiment
lexicons.
6. Describe the use of Naive Bayes in spam detection and language identification.
Mention examples of features used in these tasks.
7. Explain how Naive Bayes can be viewed as a language model. How does it assign
probabilities to entire sentences?
8. How is text classification performance evaluated? Define precision, recall, F1-
score, and explain the importance of confusion matrix in classification.
9. What is the role of cross-validation and statistical significance testing in evaluating
classifiers? Explain the paired bootstrap test with an example.

Module 4: Information Retrieval & Lexical Resources

1. Explain the architecture and design features of an Information Retrieval (IR) system.
How do indexing, stop-word removal, and stemming contribute to its efficiency?

2. Compare and contrast the three classical IR models: Boolean, Probabilistic, and
Vector Space. Include examples and evaluation criteria.

3. What is TF-IDF weighting? Derive the formula and explain its significance with an
example.
4. Describe the Cluster, Fuzzy, and LSI (Latent Semantic Indexing) models in IR. How
do they address limitations of classical models?

5. Explain Zipf’s Law and how it applies to term selection and index size reduction in IR
systems.

6. Discuss major issues in Information Retrieval, including vocabulary mismatch,


polysemy, and scalability. Suggest strategies to handle them.

7. What is WordNet? Explain synsets, semantic relations (e.g., hypernym, hyponym,


troponym), and key applications like WSD and query expansion.

8. Describe FrameNet and its role in semantic role labeling. Use examples from frames
like ARREST or COMMUNICATION.

9. List different POS taggers (e.g., HMM, Brill, TreeTagger, Stanford Tagger). Compare
their approaches and applications in IR/NLP tasks.

Module 5: Machine Translation

1. Explain the major types of language divergences encountered in Machine


Translation. Illustrate with examples of:

o Word order typology

o Lexical divergences

o Morphological typology

o Referential density

2. What is an Encoder-Decoder architecture in Machine Translation? Describe how


the encoder and decoder components work together to generate translations.

3. Explain tokenization in modern MT systems. How do methods like Byte Pair


Encoding (BPE) and WordPiece help in subword modeling?

4. What are parallel corpora and how are they used to train MT systems? Discuss the
role of sentence alignment in creating bilingual datasets.

5. Describe the architecture and working of the Transformer-based Encoder-Decoder


model for MT. Include key components like multi-head attention, cross-attention,
and positional encoding.

6. What strategies are used to perform machine translation in low-resource


languages? Discuss data augmentation (backtranslation) and multilingual models
with examples.

7. What are the two key criteria for evaluating MT systems? How do BLEU and chrF
metrics work? Compare their strengths and limitations.
8. What are the bias and ethical concerns in Machine Translation? Explain with
examples how gender bias can manifest in MT outputs and how it is evaluated.

9. Discuss the advantages and challenges of automatic MT evaluation over human


evaluation. How is statistical significance used to compare MT systems?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy