2023-24 ML End-Semester Make-Up QP Anwer-Keys
2023-24 ML End-Semester Make-Up QP Anwer-Keys
--------------------------------------------------------------------------------------------------------------------------
Course No. : DSECLZG565/ AIMLCLZ565
Course Title : Machine Learning
Nature of Exam : Open Book
Weightage : 40%
Duration : 2 Hours
Date of Exam :
Note:
1. Please follow all the Instructions to Candidates given on the cover page of the answer book.
2. All parts of a question should be answered consecutively. Each answer should start from a fresh
page.
3. Assumptions made if any, should be stated clearly at the beginning of your answer.
Question 1:
Consider the following dataset for text classification where three training instances are given with
corresponding classifications into the ‘+’ or –‘- category: [5]
Showing all intermediate calculations, find the appropriate classification for the test instance: “Chinese
Kannada Chinese” using the Multinomial NB text classification approach.
Question 2:
You have been tasked to create a discriminative model using a linear Support Vector Machine
method. Consider the below training dataset for training the model, where X1, and X2 are
independent features and Y is the target variable. [4]
X1 X2 Y
3 2 Positive
5 3 Positive
-2 -2 Negative
4 4 Positive
3 -1 Positive
1 0 Negative
-1 -1 Negative
0 2 Negative
1 4 Negative
-1 2 Negative
4 5 Positive
Question 3:
Use case: The Glasgow Coma Scale assesses patients according to three aspects of responsiveness: eye-
opening, motor, and verbal responses. Reporting each of these separately provides a clear,
communicable picture of a patient's state.
[3 + 5 = 8 Marks]
Note: Wherever applicable use only Manhattan distance & No scaling is required. Round all the
calculation to 4 decimal places if any. Use average as the aggregation function for the final estimation
wherever required unless specific function is recommended. Show all steps. Calculation error will also
be penalized.
Predict the risk factor of new patient with below observation using both the following independent
experiments for the above training data.
Query Instance: <Eye Opening = 5, Verbal Responses = 5, Motor Responses = 5>
i) Predict the risk factor using 3-NN model.
ii) If the initial estimation is proposed as locally weighted regression model instead, use: Patient
Risk = 10 – 0.1 XVERBALRESPONSES – 0.1 XMOTORRESPONSES, and apply 2-NN with kernel:
K(d(xq,xi)) = (-1)/ d(xq,xi) and apply the gradient descent only for one iteration with learning
rate = 0.1. Apply gradient descent only for one iteration and predict the risk factor for the query
instance.
-------------------------------
Answer Key:
a) Prediction with average of below 3-NN = 5
Manhattan
Distance
7
5
4
8
10
b) Only for the top 2-NN Below are calculated:
Y-Pred = {9.5, 9.4}
Delta gradient for Wverbalresponse = {3, 0.35}
Delta gradient for Wmotorresponse = {4.5, 1.75}
Delta gradient for Wo = {1.5, 0.35}
New (Wverbalresponse, Wmotorresponse, Wo) = (0.435, 0.725, 10.185)
Prediction with new regression equation = 4.385
Marking Scheme:
a)
2 mark: Distance calculation between test and all other instances. Order the results of the distance.
1 mark: Results of 3-NN average
b)
1 mark : Y-Pred calculation
0.5 mark : Delta gradient for Wverbalresponse calculation
0.5 mark : Delta gradient for Wmotorresponse calculation
1 mark : Delta gradient for Wo calculation
1 mark : New (Wverbalresponse, Wmotorresponse, Wo)
1 mark : Prediction with new regression equation
----------------------------------------------------------------------
Question 4:
Use case: The Glasgow Coma Scale assesses patients according to three aspects of responsiveness: eye-
opening, motor, and verbal responses. Reporting each of these separately provides a clear,
communicable picture of a patient's state. Quantified values of attributes are discretized in below data.
[4 + 1 + 2 = 7 Marks]
a) Use following distance measure to cluster the given patients into three clusters using k-modes
clustering algorithm for only one iteration. Show the step by step working of the
Expectation and Maximization step. The centroids are marked in the given table. Assume all
the features are categorical in nature and use only the following distance metric for your
calculation. Round-off all the proximity values to two decimal places.
𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (𝑑𝑎𝑡𝑎1 , 𝑑𝑎𝑡𝑎2)
𝑁𝑢𝑚𝑏𝑒𝑟. 𝑜𝑓. 𝑚𝑎𝑡𝑐ℎ𝑖𝑛𝑔. 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙. 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠
(1 − )
= 10 ∗ 𝑇𝑜𝑡𝑎𝑙. 𝑛𝑢𝑚𝑏𝑒𝑟. 𝑜𝑓. 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙. 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒𝑠
Maximization:
New Centroid 1 = (Bad, Unclear, Weak)
Centroid 2 : No Change
New Centroid 3 = (Worst, Others, Strong)
b) FALSE. Note due to sampling method chosen, same data might be chosen to be arbitrary
centroids of multiple clusters. Hence convergence may needs more than one EM iteration in
worst case.
Marking Scheme:
a)
1 mark: Distance Calculation w.r.t C1
1 mark: Distance Calculation w.r.t C2
0.5 mark: Distance Calculation w.r.t C3
0.5 mark: New Centroid using mode of member’s value for centroid 1
1 mark: New Centroid using given formula for centroid 3
b)
1 mark: No new calculation are required, All the values used in part a) has to correctly referred
to find the difference in value
c)
1 mark: Answer “False”
1 mark: Right Justification
----------------------------------------------------------------------
Question 5:
In a single iteration of AdaBoost on three sample points, we initiate the process with uniform weights
assigned to the sample points. The ground truth labels and predictions are binary, taking values of either
+1 or −1. The table provided below contains some missing values. [3]
Question 6:
Illustrate a situation in which a lack of interpretability in a model could result in ethical issues.
Explain how interpretability could mitigate these concerns.
Solution:
A scenario where lack of model interpretability could raise ethical concerns is in automated
hiring systems. If a machine learning model is used to screen job applicants and is not
interpretable, it might unintentionally favor or discriminate against candidates based on
sensitive attributes like gender, ethnicity, or age without providing any rationale.
a) You have trained a ML model and discovered that it yields unacceptably high error on the test
data. You also plotted a learning curve for both test data and training data as shown below.
Comment on the performance of this ML model. Additionally, discuss strategies to address such
cases, including the approaches and measures you would take in such scenarios. [2.5]
High Bias [1 mark if explanation is also provided otherwise 0.5 marks]
https://www.dataquest.io/blog/learning-curves-machine-learning/
[1.5 mark to suggest the strategies to avoid high bias]
b)
Question 7:
a) You are fitting a logistic regression model to predict whether an email is spam (class 1) or not
(class 0) based on the length of the email's subject line (Feature). The model's coefficients are:
Coefficient: 0.03
Intercept: -1.2
Calculate the predicted probability of an email being spam for an instance with a subject line
length of 50 characters and classify the instance, assuming 70% is the threshold. Additionally,
justify the assignment of a specific class to the given instance. [5 marks]
To calculate the predicted probability of an email being spam for an instance with a
subject line length of 50 characters, you can use the logistic regression equation:
logit(p)=β0+β1×Feature
Where:
logit(p) is the logarithm of the odds of the positive class (spam) probability,
- β0 is the intercept (bias) coefficient,
- β1 is the coefficient for the feature (subject line length).
In this case:
- β0=−1.2 (Intercept)
- β1=0.03 (Coefficient)
Feature=50 (Subject line length)
Substitute these values into the equation:
logit(p)=−1.2+0.03×50
logit(p)=−1.2+1.5
logit(p)=0.3
Now, to convert the logit back into a probability p, you can use the sigmoid (logistic)
function:
p=1/1+0.7408181
p≈0.57444
So, the predicted probability of an email with a subject line length of 50 characters being
classified as spam is approximately 0.57444 or 57.44%.
Since the predicted probability is 57.4% which is less than the threshold, i.e. 70%, hence
the predicted class would be “Not Spam” [0.5 marks for the correct classification, 0.5
marks for the justification]
c) Which of the following charts of Residual Sum of Squares (RSS) and model complexity
represent training phase for a fixed dataset?