100% found this document useful (1 vote)
57 views12 pages

AL3451 - QUESTION BANK

The document outlines the syllabus and assessment structure for a Machine Learning course, detailing the types of questions and cognitive levels for each part of the exam. It covers topics such as ensemble techniques, unsupervised learning, neural networks, and various algorithms including K-Means and Gaussian Mixture Models. The assessment is divided into three parts, with specific requirements for lower, intermediate, and higher-order cognitive questions.

Uploaded by

rajalakshmir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
57 views12 pages

AL3451 - QUESTION BANK

The document outlines the syllabus and assessment structure for a Machine Learning course, detailing the types of questions and cognitive levels for each part of the exam. It covers topics such as ensemble techniques, unsupervised learning, neural networks, and various algorithms including K-Means and Gaussian Mixture Models. The assessment is divided into three parts, with specific requirements for lower, intermediate, and higher-order cognitive questions.

Uploaded by

rajalakshmir
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

DEPARTMENT : AI & DS

SUBJECT CODE : AL3451

SUBJECT NAME : MACHINE LEARNING

FACULTY NAME : RAJALAKSHMI R

YEAR & SEMESTER : II nd year & 04

Note:
Part - A
Ten questions should be of lower order (LO) cognitive type i.e. remembrance type questions.
Ten questions should be of intermediate order (IO) cognitive type i.e. understanding type questions.

Part - B
Questions (both subdivisions) should be of lower order (LO) cognitive type
Two or three questions (both subdivisions) should be of intermediate order (IO) cognitive type

Part -C
Compulsory Questions
(Application / Design / Analysis / Evaluation / Creativity / Case study questions)- should be a Higher Order
cognitive type question
UNIT 3- ENSEMBLE TECHNIQUES AND UNSUPERVISED LEARNING

SYLLABUS
Combining multiple learners: Model combination schemes, Voting, Ensemble Learning - bagging,
boosting, stacking, Unsupervised learning: K-means, Instance Based Learning: KNN, Gaussian
Mixture models and Expectation maximization.

PART -A

K
Q.NO QUESTIONS CO’s
LEVEL
1. What is ensemble learning? K1 CO3

2. Define Bagging in ensemble learning. K1 CO3

3. What is the objective of Boosting? K1 CO3

4. Define Stacking in model combination. K1 CO3

5. What is a Voting Classifier? K1 CO3

6. List any two advantages of Ensemble Learning. K1 CO3

7. What is K-Means Clustering? K1 CO3

8. Define Centroid in K-Means clustering. K1 CO3

9. What is the main characteristic of K-Nearest Neighbors (KNN)? K1 CO3

10. Define Gaussian Mixture Model (GMM). K1 CO3

11. How does Bagging help reduce overfitting? K2 CO3

12. Differentiate between Boosting and Bagging. K2 CO3

13. How does Stacking improve model performance? K2 CO3

14. Compare Hard Voting vs. Soft Voting in ensemble learning. K2 CO3

K2 CO3
15. Explain the working steps of K-Means clustering.

16. Why is KNN called a lazy learner? K2 CO3

17. Compare KNN with K-Means clustering. K2 CO3

18. How does Expectation Maximization (EM) optimize GMM parameters? K2 CO3

19. Why does Boosting focus more on misclassified instances? K2 CO3


20. Explain the impact of choosing different distance metrics in KNN. K2 CO3

PART B

(a) Explain Bagging with an example. K3 CO3


1. (b) Compute the expected accuracy of an ensemble using Bagging, given
three weak classifiers with accuracies of 60%, 65%, and 70%.
(a) Describe the Boosting algorithm and its working. K3 CO3
2. (b) Given a weak classifier with 55% accuracy, explain how Boosting
might improve its performance over multiple iterations.
3. (a) Explain Stacking ensemble learning with an example. K3 CO3
(b) Given three base classifiers with accuracies of 75%, 80%, and 85%,
discuss how Stacking can improve overall accuracy.
4. (a) Differentiate between Hard Voting and Soft Voting in ensemble K3 CO3
learning.
(b) Apply a Voting Classifier on three models with predictions (0, 1, 1) and
compute the final prediction using both Hard Voting and Soft Voting
(assuming probabilities: 0.3, 0.8, 0.9).
5. (a) Explain the K-Means clustering algorithm step by step. K3 CO3
(b) Perform one iteration of K-Means clustering for points (2,3), (3,4),
(5,6), (8,9) assuming K=2 and initial centroids as (2,3) and (8,9).
6. (a) Describe the importance of centroid initialization in K-Means. K3 CO3
(b) Explain the Elbow Method for selecting the optimal number of clusters.
(a) Define the K-Nearest Neighbors (KNN) algorithm and its working. K3 CO3
7. (b) Given the dataset below and K=3, classify the new point (6,5) using
Euclidean Distance.
Data Points: (2,3) → Class A, (5,6) → Class A, (8,9) → Class B, (7,5) →
Class B
(a) Explain Gaussian Mixture Model (GMM) and its application. K3 CO3
8. (b) If a dataset is assumed to be a mix of three Gaussian distributions,
describe how GMM will assign probabilities to each cluster.
(a) Explain Expectation Maximization (EM) in GMM. K3 CO3
9. (b) Suppose we have data points with hidden variables representing cluster
assignments. Explain how E-Step and M-Step work using a small example.
(a) Compare Supervised vs. Unsupervised Learning with examples. K3 CO3
10. (b) Explain how KNN handles multi-class classification with an example
(a) Explain how Bootstrap Sampling is used in Bagging. K3 CO3
11. (b) Given a dataset of 20 samples, estimate the number of unique samples
that appear in each bootstrap sample.
(a) Discuss the Bias-Variance Tradeoff in ensemble learning. K3 CO3
12. (b) If a model has high variance, which ensemble method (Bagging or
Boosting) should be used? Justify your answer.
(a) Explain why KNN is called a lazy learner. K3 CO3
13. (b) Compute the Manhattan Distance between points (3,4) and (7,8).
(a) Explain why GMM is considered soft clustering while K-Means is hard K3 CO3
14. clustering.
(b) Suppose we fit a GMM with two components. If a point has
probabilities 0.6 for cluster 1 and 0.4 for cluster 2, explain how it is
assigned to a cluster.
(a) Explain how decision trees are used as base models in Bagging and CO3
15. Boosting. K3
(b) Analyze why Boosting is prone to overfitting in complex datasets.
(a) Compare the computational complexity of Bagging, Boosting, and K4 CO3
16. Stacking.
(b) Given a dataset with 1000 features, analyze whether Bagging or
Boosting would be computationally more expensive and why.
(a) Discuss the impact of outliers in K-Means clustering. K4 CO3
17. (b) Given a dataset where one cluster contains a high-density region and
another contains scattered points, explain why GMM may perform better
than K-Means.
(a) Compare the performance of KNN and K-Means on high-dimensional K4 CO3
18. data.
(b) Suppose we increase K in KNN from 3 to 15. Analyze the impact on
classification accuracy and bias-variance tradeoff.
(a) Given a dataset, perform one iteration of Expectation Maximization K4 CO3
19. (EM) in GMM, assuming initial parameters.
(b) Compare how K-Means and GMM handle overlapping clusters.
(a) Analyze how Boosting adjusts weights for misclassified samples in an K4 CO3
20. imbalanced dataset.
(b) Given a dataset with imbalanced classes, explain whether Bagging or
Boosting is the better choice.
PART-C

1. (a) Evaluate the advantages and disadvantages of Bagging and Boosting in K5 CO3
terms of overfitting and bias-variance tradeoff.
(b) Consider a dataset where Decision Trees overfit. Analyze whether
Bagging or Boosting is a better approach and justify your reasoning with
an example.
2. (a) Design a Stacking-based ensemble model using three different K6 CO3
classifiers. Explain how each model contributes to the final prediction.
(b) Given a dataset where Bagging achieves 85% accuracy and Boosting
achieves 90% accuracy, propose an approach using Stacking to further
improve accuracy.
3. (a) A retail company wants to segment customers based on purchasing K6,K5 CO3
behavior. Design a K-Means clustering approach, specifying how to
choose the value of K.
(b) Evaluate how the presence of outliers in the dataset affects K-Means
clustering and suggest techniques to mitigate this issue.
4. (a) Compare the use of K-Nearest Neighbors (KNN) and K-Means K5 CO3
Clustering for a fraud detection problem. Justify which algorithm is more
suitable.
(b) Given a dataset with 1000 samples and 50 features, analyze whether
KNN or K-Means is computationally more efficient.
(a) Design how Gaussian Mixture Model (GMM) can be used to segment K6,K5 CO3
5. images by classifying pixels into different object regions.
(b) Evaluate why GMM performs better than K-Means for segmenting
images with overlapping clusters.
(a) Given a dataset with two Gaussian distributions, perform one iteration K6,K5 CO3
6. of the Expectation Step (E-Step) in the Expectation Maximization (EM)
algorithm with assumed probabilities.
(b) Evaluate the role of Maximum Likelihood Estimation (MLE) in the
Maximization Step (M-Step) of Expectation Maximization.
(a) Evaluate how ensemble learning methods help in addressing the bias- K5,K6 CO3
7. variance tradeoff in machine learning.
(b) Given a dataset with high variance, propose a modified ensemble
approach to reduce variance while maintaining accuracy.
(a) A dataset has 90% samples of class A and only 10% samples of class K5,K6 CO3
8. B. Compare Boosting techniques with oversampling and undersampling in
handling class imbalance.
(b) Design a new hybrid approach combining Boosting and resampling to
improve classification performance for imbalanced datasets.
(a) A supermarket wants to group products frequently bought together. K5,K7 CO3
9. Evaluate how unsupervised learning techniques (K-Means, GMM, or
hierarchical clustering) can be used for this purpose.
(b) Based on the dataset, propose a hybrid approach combining
unsupervised learning and association rule mining (e.g., Apriori) to
improve the results.
(a) A bank wants to detect fraudulent transactions. Design a combined K6,K5 CO3
10. approach using ensemble learning for classification and K-Means for
anomaly detection.
(b) Evaluate the advantages and drawbacks of combining supervised
(ensemble) and unsupervised (clustering) methods in fraud detection.
UNIT 4- NEURAL NETWORKS

SYLLABUS
Multilayer perceptron, activation functions, network training – gradient descent optimization –
stochastic gradient descent, error back propagation, from shallow networks to deep networks –Unit
saturation (aka the vanishing gradient problem) – ReLU, hyperparameter tuning, batch
normalization, regularization, dropout.
PART -A

K
Q.NO QUESTIONS CO’s
LEVEL
1. K1 CO4
What is a Multilayer Perceptron (MLP)?
2. K1 CO4
List the types of activation functions used in neural networks.
3. K1 CO4
Define gradient descent in neural networks.
4. K1 CO4
What is stochastic gradient descent (SGD)?
5. K1 CO4
What is the vanishing gradient problem?
6. K1 CO4
Define ReLU (Rectified Linear Unit) activation function.
7. K1 CO4
What is the role of batch normalization in deep networks?
8. K1 CO4
What is dropout in neural networks?
9. K1 CO4
Define hyperparameter tuning in neural networks.
10. K1 CO4
What is regularization in deep learning?
11. K2 CO4
Explain the difference between shallow and deep networks.
12. K2 CO4
Compare gradient descent and stochastic gradient descent (SGD).
13. Why is the sigmoid activation function not preferred in deep K2 CO4
networks?
14. K2 CO4
Explain the advantages of using ReLU over sigmoid and tanh.
Describe how batch normalization helps in training deep K2 CO4
15.
networks.
16. K2 CO4
What is the effect of dropout on overfitting?
17. K2 CO4
How does L1 and L2 regularization work in neural networks?
18. K2 CO4
Why is hyperparameter tuning important in deep learning?
19. Explain how error back propagation helps in training neural K2 CO4
networks.
20. K2 CO4
What are the common challenges faced in deep network training?

PART B

Apply the gradient descent algorithm for a simple neural network K3 CO4
1. with two weights and compute one iteration with a given learning
rate.
Given a dataset, demonstrate how stochastic gradient descent K3 CO4
2. (SGD) updates weights differently from batch gradient descent.
Design a multilayer perceptron (MLP) for classifying handwritten K3 CO4
3.
digits and explain the steps involved.

4. Solve an example problem where you calculate weight updates K3 CO4


using the backpropagation algorithm for a simple 2-layer network.

5. Implement ReLU activation function on a given set of input K3 CO4


values and analyze its effect on output.

6. Consider a dataset where neural networks overfit. Apply K3 CO4


regularization techniques (L1, L2, dropout) to improve
generalization.
Apply batch normalization to a given dataset and analyze its K3 CO4
7. impact on training speed and performance.
Given a trained deep network, modify its hyper parameters K3 CO4
8. (learning rate, batch size, dropout rate) and evaluate its new
performance.
Solve a problem that demonstrates how vanishing gradients affect K3 CO4
9. weight updates in deep networks using the sigmoid activation
function.
Analyze why ReLU is preferred over sigmoid/tanh for deep K4 CO4
10. networks using mathematical reasoning.
Compare shallow networks and deep networks in terms of K4 CO4
11. representation power, training difficulty, and generalization.
Analyze the impact of different learning rates on gradient descent K4 CO4
12. convergence using a given dataset.
Given a problem where a network underfits, determine how K4 CO4
13. hyperparameter tuning can improve accuracy.
Compare and contrast batch normalization and dropout in terms of K4 CO4
14. their effect on training deep networks.
Consider a real-world application (e.g., self-driving cars). Analyze K4 CO4
15. how deep neural networks process sensor data to make decisions.
Given a dataset, analyze how loss function selection (cross- K4 CO4
16. entropy vs. MSE) affects deep learning model performance.
Compare stochastic gradient descent (SGD), mini-batch gradient K4 CO4
17. descent, and batch gradient descent with an example.
Analyze how unit saturation (vanishing gradient problem) can be K4 CO4
18. mitigated using alternative activation functions like ReLU.
Given a pre-trained neural network, analyze how transfer learning K4 CO4
19. can be used to solve a different but related problem.
Given a deep learning model architecture, propose modifications K4 CO4
20. to improve its performance based on hyperparameter tuning and
regularization techniques.

PART-C
1. Evaluate the performance of a deep learning model trained for K5 CO4
image classification. Discuss its strengths, weaknesses, and
possible improvements.
2. Given a dataset, analyze the impact of different activation K5 CO4
functions (Sigmoid, Tanh, ReLU, Leaky ReLU) on training deep
networks. Justify the best choice.
3. Consider two optimization techniques: Adam and Stochastic K5 CO4
Gradient Descent (SGD). Compare their performance on a neural
network with a given dataset and justify the best approach.
4. Evaluate the effectiveness of batch normalization and dropout in K5 CO4
preventing overfitting in deep learning models using a case study.
Given a pre-trained model with overfitting issues, propose and K5 CO4
5. justify modifications using regularization techniques,
hyperparameter tuning, and data augmentation.
Design a deep neural network for medical image classification K6 CO4
6. (e.g., cancer detection). Describe the architecture, activation
functions, loss function, and training process.
Given a raw dataset, design and implement a deep learning K6 CO4
7. pipeline, including preprocessing, feature selection, model
selection, and evaluation.
Develop a custom loss function for an imbalanced classification K6 CO4
8. problem. Explain how it improves model performance over
standard loss functions.
Construct a real-world case study where deep learning is applied K6 CO4
9. (e.g., fraud detection, self-driving cars, speech recognition).
Explain the network architecture and challenges.
Create a hybrid model combining neural networks with another K6 CO4
10. machine learning technique (e.g., CNN + RNN, Neural Networks
+ Decision Trees). Justify the design choices and expected
performance benefits.
UNIT 5- DESIGN AND ANALYSIS OF MACHINE LEARNING EXPERIMENTS

SYLLABUS

Guidelines for machine learning experiments, Cross Validation (CV) and resampling – K-fold CV,
bootstrapping, measuring classifier performance, assessing a single classification algorithm and
comparing two classification algorithms – t test, McNemar’s test, K-fold CV paired t test
PART -A

K
Q.NO QUESTIONS CO’s
LEVEL
1. Define cross-validation. K1 CO5

2. What is bootstrapping in machine learning experiments? K1 CO5

3. List different types of cross-validation techniques. K1 CO5

4. Define K-fold cross-validation. K1 CO5

5. What is a resampling technique? K1 CO5

6. Mention two advantages of bootstrapping. K1 CO5

7. What is the purpose of the t-test in machine learning experiments? K1 CO5

8. Define McNemar’s test. K1 CO5

9. What are Type I and Type II errors in hypothesis testing? K1 CO5

10. What is the role of statistical significance in comparing K1 CO5


classifiers?
11. Explain the importance of measuring classifier performance. K2 CO5

12. Describe the steps involved in conducting a K-fold CV K2 CO5


experiment.
13. Differentiate between K-fold CV and Leave-One-Out CV. K2 CO5

14. Explain the concept of variance in machine learning performance K2 CO5


evaluation.
Why is bootstrapping useful in small datasets? K2 CO5
15.
16. Explain how paired t-tests are used in classifier comparison. K2 CO5

17. Differentiate between parametric and non-parametric tests. K2 CO5

18. What is the significance of McNemar’s test in binary K2 CO5


classification?
19. Discuss the challenges of overfitting in cross-validation. K2 CO5
20. Explain why cross-validation is preferred over a simple train-test K2 CO5
split.

PART B

K
Q.NO QUESTIONS CO’s
LEVEL
Apply K-fold cross-validation on a given dataset and compute the K3 CO5
1. accuracy.
Solve a numerical problem using bootstrapping to estimate mean K3 CO5
2. accuracy.
Compute the test statistics for a paired t-test given a dataset of K3 CO5
3. classifier accuracies.
4. Analyze how bias-variance tradeoff affects cross-validation K4 CO5
results.
5. Explain how different values of K in K-fold CV impact model K3 CO5
performance.
6. Illustrate the steps involved in McNemar’s test with an example K3 CO5
dataset.
Compare different resampling methods for assessing classifier K3 CO5
7. performance.
Given a classification dataset, perform bootstrapping and interpret K3 CO5
8. the results.
Explain the advantages and disadvantages of leave-one-out cross- K3 CO5
9. validation (LOOCV).
Demonstrate a case study where K-fold CV improved model K3 CO5
10. selection.
Solve a numerical problem using K-fold CV paired t-test for K4 CO5
11. classifier comparison.
Given two classifiers' results, compute McNemar’s test and K4 CO5
12. interpret the findings.
Differentiate between bootstrap resampling and K-fold CV with K4 CO5
13. examples.
Solve a hypothesis testing problem using t-test for a classification K4 CO5
14. dataset.
Explain how cross-validation helps in reducing overfitting in deep K4 CO5
15. learning models.
Implement bootstrapping on a real dataset and compare the results K3 CO5
16. with K-fold CV.
Compute confidence intervals for a classifier’s accuracy using K3 CO5
17. bootstrap sampling.
Analyze why a model with high variance performs poorly on test K3 CO5
18. data.
Discuss how nested cross-validation is used in hyperparameter K3 CO5
19. tuning.
Explain how statistical significance testing helps in model K3 CO5
20. selection.

PART-C

K
Q.NO QUESTIONS CO’s
LEVEL
1. A classifier achieves 85% accuracy on a dataset using 5-fold CV, K5 CO5
while another classifier achieves 87% accuracy using
bootstrapping. Perform a statistical test (paired t-test) to determine
if the difference is significant. Justify your conclusion. (Problem-
based)
2. Two classifiers are tested on a dataset with 200 samples. K5 CO5
Classifier A has 90 correct predictions, while Classifier B has 95
correct predictions. Use McNemar’s test to evaluate whether
Classifier B significantly outperforms Classifier A. (Problem-
based)
3. Given the following classifier performance scores obtained using K5 CO5
10-fold cross-validation, evaluate whether there is a significant
difference using a paired t-test. (Problem-based with dataset
provided)
4. A machine learning researcher is comparing a deep learning K5 CO5
model and a random forest classifier. Discuss the importance of
choosing the right evaluation metrics and justify which approach
is statistically more reliable. (Critical thinking-based)
You are given accuracy scores of two different classification K5 CO5
5. models evaluated using bootstrap resampling. Explain how
confidence intervals are used to validate the significance of
results. (Problem-based with dataset provided)
Design an experimental setup to compare three classification K6 CO5
6. models using K-fold cross-validation and statistical significance
testing. Implement and justify the methodology using an example
dataset. (Design-based Problem)
Given a dataset with imbalanced classes, propose an innovative K6 CO5
7. evaluation strategy that incorporates resampling techniques
(SMOTE, bootstrapping, stratified K-fold CV) to ensure fair
model comparison. Implement and analyze the results.
(Implementation-based Problem)
Create a Python program to automate the selection of the best K6 CO5
8. machine learning model using cross-validation, t-tests, and
McNemar’s test. Validate your approach with a real dataset.
(Programming & Implementation Problem)
Develop a hybrid cross-validation approach that combines nested K6 CO5
9. cross-validation and bootstrapping to evaluate deep learning
models. Implement this approach and justify its advantages over
traditional methods. (Design & Innovation-based)
A healthcare company wants to deploy an AI-based disease K6 CO5
10. prediction model. Design a complete experimental framework to
validate the model using multiple evaluation techniques (cross-
validation, hypothesis testing, statistical tests). Implement your
approach and analyze the results. (Case Study & Real-world
Application Problem)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy