AL3451 - QUESTION BANK
AL3451 - QUESTION BANK
Note:
Part - A
Ten questions should be of lower order (LO) cognitive type i.e. remembrance type questions.
Ten questions should be of intermediate order (IO) cognitive type i.e. understanding type questions.
Part - B
Questions (both subdivisions) should be of lower order (LO) cognitive type
Two or three questions (both subdivisions) should be of intermediate order (IO) cognitive type
Part -C
Compulsory Questions
(Application / Design / Analysis / Evaluation / Creativity / Case study questions)- should be a Higher Order
cognitive type question
UNIT 3- ENSEMBLE TECHNIQUES AND UNSUPERVISED LEARNING
SYLLABUS
Combining multiple learners: Model combination schemes, Voting, Ensemble Learning - bagging,
boosting, stacking, Unsupervised learning: K-means, Instance Based Learning: KNN, Gaussian
Mixture models and Expectation maximization.
PART -A
K
Q.NO QUESTIONS CO’s
LEVEL
1. What is ensemble learning? K1 CO3
14. Compare Hard Voting vs. Soft Voting in ensemble learning. K2 CO3
K2 CO3
15. Explain the working steps of K-Means clustering.
18. How does Expectation Maximization (EM) optimize GMM parameters? K2 CO3
PART B
1. (a) Evaluate the advantages and disadvantages of Bagging and Boosting in K5 CO3
terms of overfitting and bias-variance tradeoff.
(b) Consider a dataset where Decision Trees overfit. Analyze whether
Bagging or Boosting is a better approach and justify your reasoning with
an example.
2. (a) Design a Stacking-based ensemble model using three different K6 CO3
classifiers. Explain how each model contributes to the final prediction.
(b) Given a dataset where Bagging achieves 85% accuracy and Boosting
achieves 90% accuracy, propose an approach using Stacking to further
improve accuracy.
3. (a) A retail company wants to segment customers based on purchasing K6,K5 CO3
behavior. Design a K-Means clustering approach, specifying how to
choose the value of K.
(b) Evaluate how the presence of outliers in the dataset affects K-Means
clustering and suggest techniques to mitigate this issue.
4. (a) Compare the use of K-Nearest Neighbors (KNN) and K-Means K5 CO3
Clustering for a fraud detection problem. Justify which algorithm is more
suitable.
(b) Given a dataset with 1000 samples and 50 features, analyze whether
KNN or K-Means is computationally more efficient.
(a) Design how Gaussian Mixture Model (GMM) can be used to segment K6,K5 CO3
5. images by classifying pixels into different object regions.
(b) Evaluate why GMM performs better than K-Means for segmenting
images with overlapping clusters.
(a) Given a dataset with two Gaussian distributions, perform one iteration K6,K5 CO3
6. of the Expectation Step (E-Step) in the Expectation Maximization (EM)
algorithm with assumed probabilities.
(b) Evaluate the role of Maximum Likelihood Estimation (MLE) in the
Maximization Step (M-Step) of Expectation Maximization.
(a) Evaluate how ensemble learning methods help in addressing the bias- K5,K6 CO3
7. variance tradeoff in machine learning.
(b) Given a dataset with high variance, propose a modified ensemble
approach to reduce variance while maintaining accuracy.
(a) A dataset has 90% samples of class A and only 10% samples of class K5,K6 CO3
8. B. Compare Boosting techniques with oversampling and undersampling in
handling class imbalance.
(b) Design a new hybrid approach combining Boosting and resampling to
improve classification performance for imbalanced datasets.
(a) A supermarket wants to group products frequently bought together. K5,K7 CO3
9. Evaluate how unsupervised learning techniques (K-Means, GMM, or
hierarchical clustering) can be used for this purpose.
(b) Based on the dataset, propose a hybrid approach combining
unsupervised learning and association rule mining (e.g., Apriori) to
improve the results.
(a) A bank wants to detect fraudulent transactions. Design a combined K6,K5 CO3
10. approach using ensemble learning for classification and K-Means for
anomaly detection.
(b) Evaluate the advantages and drawbacks of combining supervised
(ensemble) and unsupervised (clustering) methods in fraud detection.
UNIT 4- NEURAL NETWORKS
SYLLABUS
Multilayer perceptron, activation functions, network training – gradient descent optimization –
stochastic gradient descent, error back propagation, from shallow networks to deep networks –Unit
saturation (aka the vanishing gradient problem) – ReLU, hyperparameter tuning, batch
normalization, regularization, dropout.
PART -A
K
Q.NO QUESTIONS CO’s
LEVEL
1. K1 CO4
What is a Multilayer Perceptron (MLP)?
2. K1 CO4
List the types of activation functions used in neural networks.
3. K1 CO4
Define gradient descent in neural networks.
4. K1 CO4
What is stochastic gradient descent (SGD)?
5. K1 CO4
What is the vanishing gradient problem?
6. K1 CO4
Define ReLU (Rectified Linear Unit) activation function.
7. K1 CO4
What is the role of batch normalization in deep networks?
8. K1 CO4
What is dropout in neural networks?
9. K1 CO4
Define hyperparameter tuning in neural networks.
10. K1 CO4
What is regularization in deep learning?
11. K2 CO4
Explain the difference between shallow and deep networks.
12. K2 CO4
Compare gradient descent and stochastic gradient descent (SGD).
13. Why is the sigmoid activation function not preferred in deep K2 CO4
networks?
14. K2 CO4
Explain the advantages of using ReLU over sigmoid and tanh.
Describe how batch normalization helps in training deep K2 CO4
15.
networks.
16. K2 CO4
What is the effect of dropout on overfitting?
17. K2 CO4
How does L1 and L2 regularization work in neural networks?
18. K2 CO4
Why is hyperparameter tuning important in deep learning?
19. Explain how error back propagation helps in training neural K2 CO4
networks.
20. K2 CO4
What are the common challenges faced in deep network training?
PART B
Apply the gradient descent algorithm for a simple neural network K3 CO4
1. with two weights and compute one iteration with a given learning
rate.
Given a dataset, demonstrate how stochastic gradient descent K3 CO4
2. (SGD) updates weights differently from batch gradient descent.
Design a multilayer perceptron (MLP) for classifying handwritten K3 CO4
3.
digits and explain the steps involved.
PART-C
1. Evaluate the performance of a deep learning model trained for K5 CO4
image classification. Discuss its strengths, weaknesses, and
possible improvements.
2. Given a dataset, analyze the impact of different activation K5 CO4
functions (Sigmoid, Tanh, ReLU, Leaky ReLU) on training deep
networks. Justify the best choice.
3. Consider two optimization techniques: Adam and Stochastic K5 CO4
Gradient Descent (SGD). Compare their performance on a neural
network with a given dataset and justify the best approach.
4. Evaluate the effectiveness of batch normalization and dropout in K5 CO4
preventing overfitting in deep learning models using a case study.
Given a pre-trained model with overfitting issues, propose and K5 CO4
5. justify modifications using regularization techniques,
hyperparameter tuning, and data augmentation.
Design a deep neural network for medical image classification K6 CO4
6. (e.g., cancer detection). Describe the architecture, activation
functions, loss function, and training process.
Given a raw dataset, design and implement a deep learning K6 CO4
7. pipeline, including preprocessing, feature selection, model
selection, and evaluation.
Develop a custom loss function for an imbalanced classification K6 CO4
8. problem. Explain how it improves model performance over
standard loss functions.
Construct a real-world case study where deep learning is applied K6 CO4
9. (e.g., fraud detection, self-driving cars, speech recognition).
Explain the network architecture and challenges.
Create a hybrid model combining neural networks with another K6 CO4
10. machine learning technique (e.g., CNN + RNN, Neural Networks
+ Decision Trees). Justify the design choices and expected
performance benefits.
UNIT 5- DESIGN AND ANALYSIS OF MACHINE LEARNING EXPERIMENTS
SYLLABUS
Guidelines for machine learning experiments, Cross Validation (CV) and resampling – K-fold CV,
bootstrapping, measuring classifier performance, assessing a single classification algorithm and
comparing two classification algorithms – t test, McNemar’s test, K-fold CV paired t test
PART -A
K
Q.NO QUESTIONS CO’s
LEVEL
1. Define cross-validation. K1 CO5
PART B
K
Q.NO QUESTIONS CO’s
LEVEL
Apply K-fold cross-validation on a given dataset and compute the K3 CO5
1. accuracy.
Solve a numerical problem using bootstrapping to estimate mean K3 CO5
2. accuracy.
Compute the test statistics for a paired t-test given a dataset of K3 CO5
3. classifier accuracies.
4. Analyze how bias-variance tradeoff affects cross-validation K4 CO5
results.
5. Explain how different values of K in K-fold CV impact model K3 CO5
performance.
6. Illustrate the steps involved in McNemar’s test with an example K3 CO5
dataset.
Compare different resampling methods for assessing classifier K3 CO5
7. performance.
Given a classification dataset, perform bootstrapping and interpret K3 CO5
8. the results.
Explain the advantages and disadvantages of leave-one-out cross- K3 CO5
9. validation (LOOCV).
Demonstrate a case study where K-fold CV improved model K3 CO5
10. selection.
Solve a numerical problem using K-fold CV paired t-test for K4 CO5
11. classifier comparison.
Given two classifiers' results, compute McNemar’s test and K4 CO5
12. interpret the findings.
Differentiate between bootstrap resampling and K-fold CV with K4 CO5
13. examples.
Solve a hypothesis testing problem using t-test for a classification K4 CO5
14. dataset.
Explain how cross-validation helps in reducing overfitting in deep K4 CO5
15. learning models.
Implement bootstrapping on a real dataset and compare the results K3 CO5
16. with K-fold CV.
Compute confidence intervals for a classifier’s accuracy using K3 CO5
17. bootstrap sampling.
Analyze why a model with high variance performs poorly on test K3 CO5
18. data.
Discuss how nested cross-validation is used in hyperparameter K3 CO5
19. tuning.
Explain how statistical significance testing helps in model K3 CO5
20. selection.
PART-C
K
Q.NO QUESTIONS CO’s
LEVEL
1. A classifier achieves 85% accuracy on a dataset using 5-fold CV, K5 CO5
while another classifier achieves 87% accuracy using
bootstrapping. Perform a statistical test (paired t-test) to determine
if the difference is significant. Justify your conclusion. (Problem-
based)
2. Two classifiers are tested on a dataset with 200 samples. K5 CO5
Classifier A has 90 correct predictions, while Classifier B has 95
correct predictions. Use McNemar’s test to evaluate whether
Classifier B significantly outperforms Classifier A. (Problem-
based)
3. Given the following classifier performance scores obtained using K5 CO5
10-fold cross-validation, evaluate whether there is a significant
difference using a paired t-test. (Problem-based with dataset
provided)
4. A machine learning researcher is comparing a deep learning K5 CO5
model and a random forest classifier. Discuss the importance of
choosing the right evaluation metrics and justify which approach
is statistically more reliable. (Critical thinking-based)
You are given accuracy scores of two different classification K5 CO5
5. models evaluated using bootstrap resampling. Explain how
confidence intervals are used to validate the significance of
results. (Problem-based with dataset provided)
Design an experimental setup to compare three classification K6 CO5
6. models using K-fold cross-validation and statistical significance
testing. Implement and justify the methodology using an example
dataset. (Design-based Problem)
Given a dataset with imbalanced classes, propose an innovative K6 CO5
7. evaluation strategy that incorporates resampling techniques
(SMOTE, bootstrapping, stratified K-fold CV) to ensure fair
model comparison. Implement and analyze the results.
(Implementation-based Problem)
Create a Python program to automate the selection of the best K6 CO5
8. machine learning model using cross-validation, t-tests, and
McNemar’s test. Validate your approach with a real dataset.
(Programming & Implementation Problem)
Develop a hybrid cross-validation approach that combines nested K6 CO5
9. cross-validation and bootstrapping to evaluate deep learning
models. Implement this approach and justify its advantages over
traditional methods. (Design & Innovation-based)
A healthcare company wants to deploy an AI-based disease K6 CO5
10. prediction model. Design a complete experimental framework to
validate the model using multiple evaluation techniques (cross-
validation, hypothesis testing, statistical tests). Implement your
approach and analyze the results. (Case Study & Real-world
Application Problem)