Viva Q&a
Viva Q&a
IMPORTANT Q&A:
Define Tokenization.
Tokenization is the process of breaking text into smaller units like words or sentences.
What is Lemmatization?
Lemmatization reduces words to their base or dictionary form (lemma).
Which data structure did you use to store word counts? Why?
4. Implementing N-Grams
What is an N-gram?
How does your program handle the start and end of a sentence?
5. N-Grams Smoothing
A statistical model where the system has hidden states (like POS tags).
A dynamic programming algorithm for finding the most likely sequence of hidden states.
How would you extend the chunker to handle more complex syntactic structures?