examBD2223 January Solutions
examBD2223 January Solutions
Answer: small values of K. In the extreme case of K=1, instances are classified
with the closest instance. If the closest instance is noisy, that is equivalent to
memorizing noisy instances, which is one of the signs of overfitting. Another
possible answer is that more complex models are more likely to result in
overfitting and for small values of K, classification boundaries in feature space
are more complex than for large values of K.
Answer:
6. Explain one disadvantage of permutation-based feature selection.
Answer:
def objective(trial):
a = trial.suggest_uniform(‘a’, 1, 10)
b = trial.suggest_uniform(‘b’, 0, 10*a)
8. What is the main idea behind the F-value (f_classif) method for feature
selection in classification problems? (Drawings can be used in the explanation).
Answer: the main idea behind F-value for classification is that features are
relevant if they separate well the average values of the two classes and the
spread of the two classes is small. More in detail:
9. Explain in detail how the KNN-based imputation method works
Answer: Two key ideas: (1) use KNN to predict values for the attribute with the
missing value, based on the average/majority class of that attribute of the K
neighbors. (2) use a distance that has the possibility that other attributes of the
neighbors can also have missing values.
10. Let’s suppose that D is our available data and that we intend to train and
evaluate a model using the most relevant attributes only. We follow this
workflow: 1) a feature selection method is used to select the 5 best attributes;
2) then data is divided into a training partition and a testing partition; 3) Finally,
a model is trained with the training partition and evaluated with the test
partition. Is this workflow correct? Why?
Answer: it is wrong, because if feature selection is done before splitting into
train and test, the most relevant features are selected using information that
later is going to belong to the test partition. Therefore, the model trained with
those selected features, will contain some information about the testing set,
and therefore the evaluation of the model on that testing partition will be
optimistically biased (data/information leakage).
Answer: for problems with lots of numerical features. The most time
consuming process when training the trees of the Random Forest ensemble is
computing the right threshold (because almost all possible thresholds of each
numerical attribute are considered and evaluated). Given that ERT’s thresholds
are random values, much less computation is required for them. Empirically it
has been shown that ERT perform similarly to RF but using much less compute.
It is also the case that RF use bootstraping for obtaining the training samples
for each tree, while ERT always use the complete dataset (hence time is saved
by not generating bootstrap samples). However, this is less important than the
issue about the thresholds commented above.
12.For what kind of machine learning problems and machine learning algorithms,
the ‘thresholding’ technique is useful? How does it work?