NLP Midsem Paper Jan 2024 Regular Exam
NLP Midsem Paper Jan 2024 Regular Exam
Mid-Semester Test
(EC-2 Regular Paper)
B.
i. Given is the following toy corpus. Calculate all the bigram probabilities. [2 marks]
<s> I like apples </s>
<s> Apple is good for health</s>
<s> Apple is in red colour </s>
ii. For above training data in (i), Calculate the probability of below sentence using raw bigram
probabilities and using Laplace smoothing, <s> I am eating apples</s> [2marks]
If the input layer ‘X’ denote the one hot encoding of the vocabulary, “e” is the embedding layer,
“h1”,”h2” are hidden layers and “Y” is the output layer emitting continuous valued output, identify
no more than 3 issues/error in the architecture and suggest modification to suit the below use case
requirement. If there are no corrections required, then mention “No Error” explicitly.
Use Case: Neural network in language modelling for sentence completion
Given a training corpus with below vocabulary each vectorized with four dimensions, and following
test sentence phrase, the neural network, should have predictive ability to identify next best word
to fill in the blank of the test sentence, by analyzing context window with five tokens.
Vocabulary: {he, she, bat, tree, wooden, park, playing, saw, was, a, on, the, with, in, morning,
evening} Test Sentence: “on a morning he was playing in the ________”
B. The number of times each word appears in different documents is given in the table below.
Calculate the TF-IDF value for each term in D1. [1.5 mark]
Find the word embedding for each term using TF-IDF value. Find which words are closest using TF-
IDF word embedding. [0.5 mark]
Which documents are more similar to each other? [0.5 mark]
What is the disadvantage of using the TF-IDF values for the word embedding’s? [0.5 mark]
D1 D2 D3
NLP 10 0 0
is 50 66 89
extremely 20 22 12
interesting 30 32 11
course 20 0 0
Find the word embedding for each term using TF-IDF value.
NLP [0.496,0,0]
is [0,0,0]
extremely [0,0,0]
interesting[0,0,0]
course [0.63,0,0]
Find which words are closest using TF-IDF word embedding. [0.5 mark]
Which documents are more similar to each other? [0.5 mark] D2 and D3
What is the disadvantage of using the TF-IDF values for the word embedding’s? [0.5 mark] -
Sparsity
a) Generate the training dataset for an input target word “played” and context window of 1 next
word and hyper parameter value k=2 for the negative sampling task. Use the information
available in the question.
b) Calculate the error for the above dataset for only the first iteration of skip-gram training, with only
one hidden layer.
c) Explain in no more than 40 words, why skipgram algorithm training was modified from multiclass
to binary classification task.
Question 4. [5 Marks]
Find the appropriate POS tag using statistical model with bigram assumption, for the word “cook” in the
sentence,
“He will cook the food”
Question 5. [5 Marks]
By using Viterbi Algorithm, fill the Viterbi table for the sentence, “He will fight”. The tag transition
probabilities and the word likelihood for this corpus are as follows:
h
Word likelihood probabilities e will fight
MD 0 0.8 0
NN 0 0.2 0.4
VB 0 0 0.6
PRP 1 0 0
H FIGH
Viterbi Table E WILL T
NN
VB
MD
PRP
Note:
PRP: PERSONAL PRONOUN
MD:MODAL
VB:VERB BASE FORM
NN:NOUN, SINGULAR OR
MASS