0% found this document useful (0 votes)

48 views6 pages

MUD Exam 2024 SOLVED

solved exam

Uploaded by

silvshootss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views6 pages

MUD Exam 2024 SOLVED

solved exam

Uploaded by

silvshootss

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Master on Data Science Universitat Politècnica de Catalunya

Mining Unstructured Data - MUD

Final Exam June 17th , 2024

——– PART A ———

Exercise 1. (3 points)
Given the following morphologically analyzed sentence,
I saw bats yesterday
PRP VBD NNS NN
NN NN JJ ADV
VBZ
and a HMM model partially represented by the following matrices,
A PRP JJ NN NNS VBZ VBD ADV B I saw bats yesterday
* 0.4 0.1 0.3 0.1 PRP 1
PRP 0.2 0.3 0.1 JJ 0.2
JJ 0.8 0.2 NN 0.1 0.4 0.1
NN 0.2 0.3 0.4 0.1 NNS 0.5
NNS 0.1 0.5 VBZ 0.3
VBZ 0.2 0.2 0.3 0.2 VBD 0.5
VBD 0.1 0.4 0.2 0.2 ADV 1
ADV 0.1 0.1 0.1 0.2 0.3

a) Apply Viterbi algorithm to get the best POS-tag sequence. Provide the whole dynamic table with
all the information required to achieve the resulting POS-tag sequence.
b) Which is the resulting best POS-tag sequence and its probability? The answer must be justified by
means of the information of the dynamic table, if not, the answer will be considered wrong.
c) Is the resulting POS-tag sequence correct? Justify briefly your answer.

Solution
a) The table:

I saw bats yesterday

PRP 0.4*1
δ=0.4
JJ 0.06*0.1*0.2
δ=0.0012
φ=VBD
NN 0.0012*0.8*0.1 0.006*0.1*0.1
δ=0.000096 δ=0.00006
φ=JJ
NNS 0.06*0.2*0.5
δ=0.006
φ=VBD
VBZ
VBD 0.4*0.3*0.5
δ=0.06
phi=PRP
ADV

1
b) Result: PRP VBD JJ NN; prob = 0.000096
c) No. ’Yesterday’ is an adverb (ADV) in the context of the sentence. Note that ”saw bats yesterday”
refers to a set of animals seen in a period of time.

Exercise 2. (2 points)
Given the following sentence with the result of a POS tagger:

John saw my brother playing with his glasses

NNP VBD PRP$ NN VBG IN PRP$ NNS

NNP: proper noun; VBD: verb, past; PRP$: possessive pronoun; NN/NNS, singular/plural noun; VBG: verb, gerund; IN: preposition

a) We want to learn a CRF model able to recognize noun-phrase chunks. Design one correct feature
template useful to recognize more than one noun-phrase chunks occurring in the sentence. Derive
one correct and useful feature function.

b) Draw the parse trees derived by the following PCFG for the sentence. Which is the best parse tree?
What is the result when using CKY algorithm with this grammar? Justify briefly the answers.

S → NP VP (1.0) NNP → John (1.0)

NP → PRP$ NN (0.5) NN → brother (1.0)
NP → PRP$ NNS (0.3) NNS → glasses (1.0)
NP → NNP (0.1) PRP$ → my (0.6)
NP → NP AP (0.1) PRP$ → his (0.4)
PP → IN NP (1.0) IN → with (1.0)
AP → VBG PP (1.0) VBD → saw (0.7)
VP → VBD NP (0.4) VBG → playing (0.3)
VP → VP AP (0.6)

Solution
a) A template is correct if it is defined a) only considering its parameters and metaparameters, and
b) defining the current state, xt , mandatory. Obviously, the observations must be words of the
modeled language and the states must take values from the labels BIO. A template is considered a
priori useful if it makes sense for the specific task, although the associated λ-value results zero after
learning the model.
A possible correct and a priori useful template could be the following:

fa,b (xt−1 , xt , W, t) = 1 si xt = a y pos(wt ) = b ; 0 otherwise

Note that the current state, xt , is defined and the template will be correct if the values for meta-
parameter a are labels BIO. Also, note that the template is a priori useful for recognizing 2 noun-
phrase chunks (as required) because we can derive feature functions like the following one, with
which we are defining that a noun-phrase chunk can start with a particular PoS tag (PRP$):

fB,P RP $ (xt−1 , xt , W, t) = 1 si xt = B y pos(wt ) = P RP $ ; 0 otherwise

This feature function is useful to define ”your” and ”his” as starting words of the noun-phrase
chunks ”your brother” and ”his glasses” because both words are labeled with PoS tag PRP$ . With
Viterbi algorithm, the combination of this features with others would be optimized to achieve the
recognition of optimal BIO sequences.

2
b) The parse trees that can be derived from the grammar are the following:

NP (0.1) VP (0.4)

NNP

John VBD NP (0.1)

saw
NP (0.5) AP

PRP$ NN VBG PP
your (0.6) brother playing
IN NP (0.3)

with PRP$ NNS

his (0.4) glasses

NP (0.1) VP (0.6)

NNP

John VP AP

VBD NP (0.5) VBG PP

saw PRP$ NN playing

IN NP (0.3)
your (0.6) brother with PRP$ NNS

his (0.4) glasses

Their probabilities are 1.14e-4 y 8.64e-4, respectively. So, the best tree is the second one.
Note that the grammar is not in CNF and so, CKY cannot be applied. It is acceptable to answer that
CKY cannot be applied or that CKY returns an ERROR. In no case is it acceptable to answer that
the grammar can be transformed into CNF and then apply CKY because the transformed grammar
is a different grammar, so we would be using a different grammar.

——– PART B ———

Exercise 3. (3 points)
You are evaluating a set of word embedding models for various NLP tasks. You are using extrinsic eval-
uation methods to assess the performance of these models. Your analysis reveals that the performance
of the models varies significantly depending on the training parameters, the specific NLP task, and the
nature of the text data.

3
Explain the following observations you made while evaluating the different models, providing a
theoretical justification for each. You can provide examples to illustrate your points.
(a) Word embeddings trained with a larger window size tend to perform better in semantic similarity
tasks, while those trained with a smaller window size excel in syntactic analogy tasks.

(b) Word embeddings based on TF-IDF work better than Word2Vec for author classification in poems.
(c) FastText works better for topic extraction in tweets whereas Word2Vec works better for topic extrac-
tion in paper abstracts.
(d) PPMI-based word embeddings outperform Word2Vec for identifying the semantic similarity of rare
words in a corpus.
(e) Sentence embeddings generated by averaging GloVe word embeddings outperform BERT embed-
dings for identifying duplicate questions in a community forum.
(f) Contextual embeddings from the same BERT model but obtained from different layers are suitable
for different tasks: embeddings from earlier layers are better at part-of-speech tagging, while those
from later layers excel at sentence classification.

Solution
(a) Word embeddings trained with a bigger window aggregate more information regarding the semantic
context of each word, but lose positional information within a sentence. Smaller windows will
capture the function of a word within a sentence, making them suitable for syntactic analogy tasks
where the relative position of words is key. For example, a model trained on a large window might
learn that ”king” and ”queen” are semantically similar due to their frequent co-occurrence in royal
contexts. However, a model trained on a smaller window would be better at recognising that ”king”
is to ”rule” as ”chef” is to ”cook”.
(b) TF-IDF based word embeddings capture the importance of words specific to a particular document
or author in a corpus. This makes them suitable for tasks like author classification in poems, where
stylistic choices and unique vocabulary are strong indicators of authorship. Word2Vec, on the other
hand, learns embeddings based on the co-occurrence of words in across the corpus and might not
be as effective in capturing individual writing styles present in a limited set of poems.
(c) FastText considers character n-grams within words, while Word2Vec works a the level of whole
words. This makes FastText more robust to noisy text such as tweets, which often include mis-
spellings and informal language, as it can still capture meaning from partial word representations.
This is necessary for topic extraction in tweets. In contrast, Word2Vec’s reliance on full-word con-
texts makes it more suitable for topic extraction in paper abstracts, which are generally written using
formal language and consistent terminology.

(d) PPMI-based word embeddings address the issue of rare words by normalizing co-occurrence counts
with their joint probabilities, giving more weight to statistically significant relationships. This makes
them particularly effective for identifying the semantic similarity of rare words, which might not
co-occur frequently enough in a corpus for Word2Vec to learn accurate representations. For exam-
ple, PPMI would be more likely to identify the similarity between ”serendipitous” and ”fortuitous”
even if they appear infrequently, as their co-occurrence is statistically significant compared to their
individual occurrences.
(e) Averaging GloVe word embeddings creates a simple sentence representation that captures the overall
semantic content. This is suitable for identifying duplicate questions in a community forum, where
the focus is on semantic equivalence rather than subtle nuances in meaning or word order. BERT
embeddings, while powerful, can be sensitive to word order and context, potentially overfitting to
small variations in duplicate questions. For example, ”How do I bake a cake?” and ”What is the recipe
for a cake?” are semantically similar but have different structures that BERT might overemphasize.

4
An alternative and also valid answer to this question comes from the fact that BERT needs to be
fine-tuned for semantic similarity tasks, so a general pre-trained MLM BERT model might not be
suitable for the task, whereas GloVe word embeddings should work out of the box.
(f) BERT’s layered architecture allows it to capture different levels of linguistic information. Earlier
layers tend to encode more syntactic information, as they are closer to the word level input. As the
information propagates through the layers, it becomes more abstract and semantically rich. As a re-
sult, later layers are better suited for tasks requiring sentence-level understanding, such as sentiment
analysis or sentence classification. For example, earlier layers might be good at identifying the part
of speech of ”running” in ”I am running late”, while later layers would be better at understanding
the overall meaning of the sentence conveying lateness.

Exercise 4. (2 points)
You are fine-tuning a large language model (LLM) pre-trained on a large, general-purpose dataset to
develop a specialized home assistant chatbot. During the fine-tuning process, you observe that while the
model’s performance on the home assistant tasks initially improves, it then begins to degrade rapidly.
Furthermore, you notice a significant drop in the model’s performance on the original, general-purpose
tasks it was initially trained on. This phenomenon is known as catastrophic forgetting.
(a) Explain two potential causes for this catastrophic forgetting in your fine-tuned LLM.
(b) Propose two strategies to mitigate or prevent catastrophic forgetting and preserve the LLM’s perfor-
mance on both the original and new tasks.
(c) You decide to improve your home assistant chatbot by incorporating user feedback. How does
Reinforcement Learning from Human Feedback (RLHF) differ from traditional fine-tuning in this
context?

Solution
(a) Potential Causes for Catastrophic Forgetting (only two are requested):
• Overwriting of Shared Representations: Fine-tuning on a specialized dataset can overwrite the
general knowledge representations learned during pre-training, especially if the new data is
significantly different in domain or style. This is because both pre-training and fine-tuning
tasks often share the same underlying model parameters, espeically in the lower layers.
• Dataset bias: The home assistant dataset likely has a different data distribution compared to
the general-purpose dataset. This bias can cause the model to overfit to the specific patterns in
the home assistant data, degrading its ability to generalize to the context of the original tasks.
• Insufficient Training Data: If the fine-tuning dataset for the home assistant is relatively small,
the model may not have enough examples to learn the new task effectively without sacrificing
its previously acquired knowledge.
• Aggressive Optimization: Using a high learning rate or training for too many epochs during
fine-tuning can lead to significant changes in the model’s parameters, potentially overwriting
the more subtle representations learned during pre-training.
(b) Strategies to Mitigate Catastrophic Forgetting (only two are requested):

• Parameter Freezing: Instead of fine-tuning all the model parameters, we could freeze the lower
layers responsible for general language understanding and only train the upper layers on the
new dataset. This preserves the pre-trained knowledge while allowing specialization for the
new task.
• Adaptation techniques: Instead of updating all parameters of a layer, only fine-tune a subset
of them, or add additional weights. This includes techniques such as bias tuning, adapter
modules/matrices and Low-rank adaptation (LoRA).

5
• Regularization Techniques: Applying regularization methods like L2 regularization or dropout
during fine-tuning can prevent the model from overfitting the new task and hence deviate too
much from the original knowledge. Other more specialized regularization techniques like Elas-
tic Weight Consolidation (EWC) or Synaptic Intelligence (SI) can also be used. These discourage
the model from drastically changing the weights important for the original tasks.
• Multi-Task Learning: Train the LLM on both the original and new tasks simultaneously. This
can be done by interleaving data from both datasets during training.
• Proximal Policy Optimization (PPO): PPO can be applied to mitigate catastrophic forgetting.
PPO limits the update size of the model parameters during fine-tuning, ensuring that the new
knowledge is integrated gradually without drastically deviating from the original distribution.

(c) RLHF vs. Traditional Fine-Tuning:

• Traditional fine-tuning relies on a fixed dataset with predefined labels/sequences to adjust the
model parameters and improve performance on the specific task. In our context (home assis-
tant), this involves training the model on a dataset of user queries paired with expected chatbot
responses.
• RLHF incorporates human feedback directly into the training loop. Instead of relying only
on predefined labels/sequences, RLHF utilizes human evaluation to assess the quality of the
chatbot’s responses. This feedback, usually given as rankings or preferences between different
responses, is used as a reward signal to train a reward model. The reward model then guides the
chatbot’s learning process through reinforcement learning, encouraging it to generate responses
that align better with human preferences.

100 NLP Questions
100% (6)
100 NLP Questions
23 pages
Dharwar Drilling Society: Case Analysis Report
No ratings yet
Dharwar Drilling Society: Case Analysis Report
8 pages
Question Bank NLP SOLUTIONS
No ratings yet
Question Bank NLP SOLUTIONS
21 pages
07 Exercises Dependency Parsing MUD SOLVED
No ratings yet
07 Exercises Dependency Parsing MUD SOLVED
13 pages
NLP Exam 2024
No ratings yet
NLP Exam 2024
4 pages
NLP Endsem Paper Regular Paper SOLUTION April 2024
No ratings yet
NLP Endsem Paper Regular Paper SOLUTION April 2024
10 pages
Learning Representations That Convey Semantic and Syntactic Information
No ratings yet
Learning Representations That Convey Semantic and Syntactic Information
14 pages
7a. Word Embeddings Word2Vec and GloVe
No ratings yet
7a. Word Embeddings Word2Vec and GloVe
39 pages
NLP CT2 Set B Answer Key
No ratings yet
NLP CT2 Set B Answer Key
12 pages
NLP Midsem Paper Jan 2024 Regular Exam
No ratings yet
NLP Midsem Paper Jan 2024 Regular Exam
4 pages
06 Wordvectors
No ratings yet
06 Wordvectors
96 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Word Embeddings Classification
No ratings yet
Word Embeddings Classification
52 pages
Applying Deep Learning To Answer Selection - A Study and An Open Task
No ratings yet
Applying Deep Learning To Answer Selection - A Study and An Open Task
8 pages
Unsupervised Learning of Sentence Embeddings Using Compositional N-Gram Features
No ratings yet
Unsupervised Learning of Sentence Embeddings Using Compositional N-Gram Features
11 pages
NLP Notes
No ratings yet
NLP Notes
9 pages
Natural Language Processing With Deep Learning CS224N/Ling284
No ratings yet
Natural Language Processing With Deep Learning CS224N/Ling284
57 pages
Complex Sentiment Analysis Using Recursive Autoencoders
No ratings yet
Complex Sentiment Analysis Using Recursive Autoencoders
5 pages
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
No ratings yet
CS671A/CS671: Introduction To Natural Language Processing Mid-Semester Exam
7 pages
Wordembed
No ratings yet
Wordembed
31 pages
ML For NLP-LO4
No ratings yet
ML For NLP-LO4
42 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
第二周quiz小测验
No ratings yet
第二周quiz小测验
4 pages
0 Yqn EK3 VG 4 He OTv 089 KX SI1 Ij Wzu Ax T1 Ag Gev OKKJE
No ratings yet
0 Yqn EK3 VG 4 He OTv 089 KX SI1 Ij Wzu Ax T1 Ag Gev OKKJE
4 pages
Dis8 Sol
No ratings yet
Dis8 Sol
6 pages
CS6314
No ratings yet
CS6314
2 pages
mockExamWS21 With Solution
No ratings yet
mockExamWS21 With Solution
35 pages
NLP Final
No ratings yet
NLP Final
11 pages
Multi-Tagging For Transition-Based Dependency Parsing
No ratings yet
Multi-Tagging For Transition-Based Dependency Parsing
10 pages
Unit 5b - Natural Language Processing
No ratings yet
Unit 5b - Natural Language Processing
41 pages
NLP Summary
No ratings yet
NLP Summary
6 pages
The Impact of Preprocessing On Word Embedding Quality: A Comparative Study
No ratings yet
The Impact of Preprocessing On Word Embedding Quality: A Comparative Study
35 pages
CCS369 Unit-2 20.12.24
No ratings yet
CCS369 Unit-2 20.12.24
41 pages
Distributional Semantics Word Vectors (3) - 71-93
No ratings yet
Distributional Semantics Word Vectors (3) - 71-93
23 pages
NLP Final Review
No ratings yet
NLP Final Review
32 pages
3 cs626 Pos Tagging Week of 8aug22
No ratings yet
3 cs626 Pos Tagging Week of 8aug22
27 pages
cs224n 2025 Lecture02 Wordvecs2
No ratings yet
cs224n 2025 Lecture02 Wordvecs2
46 pages
Lecture1 Word Embeddings
No ratings yet
Lecture1 Word Embeddings
99 pages
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
No ratings yet
Christopher Manning Lecture 2: Word Vectors, Word Senses, and Neural Classifiers
57 pages
Sheet 3
No ratings yet
Sheet 3
5 pages
Machine Learning For NLP: Vocabulary
No ratings yet
Machine Learning For NLP: Vocabulary
37 pages
MTE Practice Set
No ratings yet
MTE Practice Set
4 pages
Exercises en Text Models 2
No ratings yet
Exercises en Text Models 2
5 pages
Chapter 3 After Modfiy
No ratings yet
Chapter 3 After Modfiy
4 pages
NLP Sample Questions-Stu
No ratings yet
NLP Sample Questions-Stu
4 pages
92 Y. Li and T. Yang: Fig. 4.5 (A) The Structure of The Recursive Neural Network Model Where Each Node Represents
No ratings yet
92 Y. Li and T. Yang: Fig. 4.5 (A) The Structure of The Recursive Neural Network Model Where Each Node Represents
13 pages
NLP Quiz
No ratings yet
NLP Quiz
2 pages
Lecture 2a - Word Level Semantics
No ratings yet
Lecture 2a - Word Level Semantics
34 pages
End Sem Answer Key 2023
No ratings yet
End Sem Answer Key 2023
4 pages
Roark - Lec 2 - HMM Viterbi Forward
No ratings yet
Roark - Lec 2 - HMM Viterbi Forward
37 pages
Unit IV
No ratings yet
Unit IV
57 pages
Cs224n 2024 Lecture02 Wordvecs2
No ratings yet
Cs224n 2024 Lecture02 Wordvecs2
45 pages
Noun Phrase Extraction: A Description of Current Techniques
No ratings yet
Noun Phrase Extraction: A Description of Current Techniques
36 pages
Sample Questions: Subject Name: Semester: VIII
No ratings yet
Sample Questions: Subject Name: Semester: VIII
7 pages
CCS369 Two Marks
No ratings yet
CCS369 Two Marks
9 pages
Sri Dev Suman Uttarakhand University ी देव सुमन उ तराख ड व व व यालय
No ratings yet
Sri Dev Suman Uttarakhand University ी देव सुमन उ तराख ड व व व यालय
1 page
RGUHS - B.SC Nursing - 2012 - 1 - Mar - 1754 Anatomy and Physiology (Rs 3)
No ratings yet
RGUHS - B.SC Nursing - 2012 - 1 - Mar - 1754 Anatomy and Physiology (Rs 3)
1 page
Research Reports
No ratings yet
Research Reports
11 pages
Belvilla en - Rent Out Your Holiday Home Successfully
No ratings yet
Belvilla en - Rent Out Your Holiday Home Successfully
4 pages
Fm200 Checklist and Task Id
No ratings yet
Fm200 Checklist and Task Id
15 pages
Electrical Technology
No ratings yet
Electrical Technology
24 pages
CBSE - X Biology Phase - 2 Session - II (Set - A)
No ratings yet
CBSE - X Biology Phase - 2 Session - II (Set - A)
3 pages
Unit 6 - Extra Test
100% (1)
Unit 6 - Extra Test
5 pages
Surgery Year 5 5 Feb 2013 (Omega 1)
No ratings yet
Surgery Year 5 5 Feb 2013 (Omega 1)
26 pages
DR AI 1688489062
No ratings yet
DR AI 1688489062
44 pages
Honors Electric Vehicles 2019 Course
No ratings yet
Honors Electric Vehicles 2019 Course
8 pages
06 Activity 1
No ratings yet
06 Activity 1
3 pages
Client Services Agreement
No ratings yet
Client Services Agreement
37 pages
Vendor Letter
No ratings yet
Vendor Letter
2 pages
Blackand Berendzen 2020
No ratings yet
Blackand Berendzen 2020
16 pages
Electronic Certificate
No ratings yet
Electronic Certificate
2 pages
Growth Comparison of Planting Tomato in Hydroponic Wick System and Soil Based System
No ratings yet
Growth Comparison of Planting Tomato in Hydroponic Wick System and Soil Based System
5 pages
Data Structure Programs Using C Language (Unit-3)
No ratings yet
Data Structure Programs Using C Language (Unit-3)
10 pages
Discrete and Stationary Wavelet Decomposition For Image Resolution Enhancement
100% (2)
Discrete and Stationary Wavelet Decomposition For Image Resolution Enhancement
61 pages
Sow and Comp For Petanque Court
No ratings yet
Sow and Comp For Petanque Court
6 pages
1133010I Rev. 02
No ratings yet
1133010I Rev. 02
2 pages
RW 11 12 Unit 5 Lesson 3 Problem-Solution
No ratings yet
RW 11 12 Unit 5 Lesson 3 Problem-Solution
23 pages
QUESTION BANK - (Laplace and Fourier Transform - CUTM1002)
No ratings yet
QUESTION BANK - (Laplace and Fourier Transform - CUTM1002)
7 pages
Carder - Poedagar 2
No ratings yet
Carder - Poedagar 2
1 page
RRL in Combined Cryptographic Algorithms
No ratings yet
RRL in Combined Cryptographic Algorithms
8 pages
11 Developing ICT Project For Social Change
No ratings yet
11 Developing ICT Project For Social Change
39 pages
Tutorial Sheet - 9
No ratings yet
Tutorial Sheet - 9
2 pages
Lesson Plan CSE 4th Sem Database Management System Swagatika Dalai
No ratings yet
Lesson Plan CSE 4th Sem Database Management System Swagatika Dalai
3 pages
Giverny Capital - Annual Letter 2018 Web PDF
No ratings yet
Giverny Capital - Annual Letter 2018 Web PDF
18 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

MUD Exam 2024 SOLVED

Uploaded by

MUD Exam 2024 SOLVED

Uploaded by

Master on Data Science Universitat Politècnica de Catalunya

Mining Unstructured Data - MUD

Final Exam June 17th , 2024

——– PART A ———

I saw bats yesterday

John saw my brother playing with his glasses

S → NP VP (1.0) NNP → John (1.0)

fa,b (xt−1 , xt , W, t) = 1 si xt = a y pos(wt ) = b ; 0 otherwise

fB,P RP $ (xt−1 , xt , W, t) = 1 si xt = B y pos(wt ) = P RP $ ; 0 otherwise

John VBD NP (0.1)

with PRP$ NNS

his (0.4) glasses

VBD NP (0.5) VBG PP

saw PRP$ NN playing

his (0.4) glasses

——– PART B ———

(c) RLHF vs. Traditional Fine-Tuning:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.