Cheating Chits
Cheating Chits
The Machine Learning (ML) life cycle encompasses several steps from problem identification to the deployment of a model. The first Q.1 What is a Confusion Matrix, and how is it used to evaluate classification models? A Confusion Matrix is a tool used to evaluate the performance of classification models by
step is Defining the Objective. Understanding the problem and the business goal is critical for choosing the right approach. Next is Data Collection, where relevant data is gathered showing the actual vs. predicted classifications. It provides a detailed breakdown of classification results through four key metrics:1.True Positives (TP): Instances correctly classified
from sources like databases, web scraping, or APIs. In the Data Preprocessing stage, the collected data is cleaned to handle missing values, outliers, or irrelevant information, and as positive.2. True Negatives (TN): Instances correctly classified as negative.3.False Positives (FP): Instances incorrectly classified as p ositive.4.False Negatives (FN): Instances
it’s often transformed into a suitable format (e.g., encoding categorical variables).Once the data is prepared, it’s split into training and testing datasets during the Data Splitting incorrectly classified as negative.From these values, we can derive several important evaluation metrics:•Accuracy: Measures the overall success rate.•Precision: Measures the
phase. In Model Training, different algorithms are applied to the training dataset to learn patterns. The model's parameters are fine-tuned during Hyperparameter Tuning to improve accuracy of positive predictions.•Recall (Sensitivity): Measures the ability to correctly identify positive instances.•F1-Score: Harmonic mean of Precision and Recall, useful when
performance. Model Evaluation follows, assessing how well the model performs using metrics like accuracy, precision, recall, or F1-score.In the Deployment stage, the model is there is class imbalance.The Confusion Matrix helps determine if the model is biased toward a particular class, if it is overfitting, or if it is classifying instances incorrectly. It is
integrated into a production environment. Finally, Model Monitoring is crucial to track the model's performance over time, ensuring it remains accurate, and triggers updates when especially valuable when dealing with imbalanced datasets, as it gives a more granular view of performance than accuracy alone.
necessary. This process is iterative; feedback may lead back to earlier steps if improvements are needed. Q1. IoT Security Life Cycle• Design & Development: Secure hardware, software, and firmware design.•Deployment: Secure installation and network configurations.•Operation &
Maintenance: Monitoring, updates, and vulnerability management.•Decommissioning: Secure removal, data wipe, and preventing unauthorized access.
Q.2 Discuss the basic ensemble techniques: Max Voting, Averaging, and Weighted Average.1.Max Voting:In Max Voting, each model in the ensemble casts a vote for a class, and the
class that receives the majority of votes is chosen as the final prediction. This technique is often used in classification tasks. For example, if three models predict class A, and two
2. Discuss the role of data visualization in machine learning? Data Visualization plays a pivotal role in Machine Learning by enabling data scientists to understand patterns, trends, Q2. Cryptographic Fundamentals for IoT Security
predict class B, class A will be chosen as the final output. This method helps improve accuracy by leveraging the diversity of models.2.Averaging:In Averaging, the predictions of
and anomalies within datasets visually. It provides insights during the exploratory data analysis phase, helping in selecting relevant features. Graphs like scatter plots, histograms,
multiple models are averaged to produce the final output. This technique is commonly used in regression tasks, where the mode l’s prediction is the average of all individual model •Symmetric Encryption: Single key for encryption/decryption (e.g., AES).•Asymmetric Encryption: Public and private keys (e.g., RSA, ECC).•Hash Functions: Ensures data integrity
heatmaps, and box plots are essential tools for depicting the distribution, relationships, and correlations between variables . This can reveal important insights such as class
predictions. Averaging helps reduce the impact of individual model errors, especially when there is noise in the data. For example, if one model predicts 5, and another predicts 7, the (e.g., SHA-256).•Digital Signatures: Verifies authenticity and integrity.•Key Management: Secure handling and storage of cryptographic keys.
imbalance, data skewness, and potential outliers that might impact model performance.Visualization aids in feature engineering by highlighting which features are the most
final prediction will be (5 + 7) / 2 = 6.3.Weighted Average:In Weighted Average, instead of treating each model equally, each model’s prediction is given a weight based on its
influential. For instance, a correlation heatmap helps identify multicollinearity between features, guiding the decision to remove or combine features. Post-model training,
performance (e.g., accuracy or reliability). The final prediction is the weighted average of all model predictions. This method ensures that better-performing models contribute more Q3. Top 10 Trending Security Concerns in IoT
visualizations are used for model evaluation. ROC curves, confusion matrices, and precision-recall graphs give a visual sense of a model's effectiveness. Visualizing predictions and
to the final output. For example, if one model performs better than others, its prediction will have a larger influence on the final result.These ensemble techniques are fundamental to
comparing them with actual data can expose the strengths and weaknesses of a model, making it easier to diagnose issues.In communication, visualizations make it simpler to 1.Device Authentication2. Data Privacy3.Unauthorized Access4.Malware and Botnets5.Vulnerability in IoT Protocols6.Lack of Regular Updates7.Network Security8.Data
improving the robustness and accuracy of predictive models by combining the strengths of multiple algorithms.Q.3 How does Boosting (e.g., AdaBoost, Gradient Boosting) differ from
explain complex models to stakeholders, bridging the gap between technical and non-technical audiences. Thus, visualization is integral from data exploration to decision-making, Integrity9.Resource Constraints10.Legal and Regulatory Compliance
Bagging?Boosting and Bagging are both ensemble learning techniques designed to improve the performance of models by combining multiple weak learners, but they differ in how
improving the accuracy and effectiveness of machine learning models.
they build and combine the models.1.Bagging (Bootstrap Aggregating):Bagging involves training multiple models independently on different subsets of the training data (using
Q1. Authentication/Authorization in IoT
bootstrapping, i.e., random sampling with replacement). Each model has equal weight, and the final prediction is typically made by averaging the predictions in regression or by
majority voting in classification. Bagging helps reduce variance and prevents overfitting, especially with high-variance models like decision trees. Random Forest is a popular example •Authentication: Verifying identity of devices/users (e.g., password, biometrics).•Authorization: Defining permissions for users and devices (e.g., Role-Based Access Control).
3. What is a hypothesis function in machine learning and how do you test it?In Machine Learning, a hypothesis function is a mathematical model that maps input data (features) to
of Bagging.Key Characteristics:•Reduces variance.•Models are trained independently and in parallel.•Example: Random Forest.2.Boosting:Boosting, on the other hand, builds
the predicted output (labels). It’s a function used to make predictions based on the learned parameters. For instance, in linear regression, the hypothesis function might be
models sequentially, with each new model focusing on correcting the errors made by previous models. The predictions from all models are combined by weighted voting or averaging. Q2. Attacks Specific to IoT
represented as h(x)=θ0+θ1xh(x)=θ0 +θ1 x, where θ0θ0 and θ1θ1 are the parameters learned during training. The goal is to minimize the difference between the predicted and actual
Boosting reduces both bias and variance, as each successive model tries to improve upon the weaknesses of the previous ones. AdaBoost and Gradient Boosting are popular
output values.Testing the hypothesis involves validating how well the hypothesis function generalizes to unseen data. This process starts with dividing the dataset into training and
boosting algorithms.Key Characteristics:•Reduces both bias and variance.•Models are trained sequentially, and each model focuses on correcting errors.•Example: AdaBoost, 1. DDoS Attacks2. Man-in-the-Middle (MitM)3. Eavesdropping4. Physical Tampering5. Replay Attacks
testing sets. The model is trained on the training data, learning the parameters that best fit the dataset. Post-training, the model is evaluated using the testing set. Metrics such as
Gradient Boosting.Differences:•Bagging reduces variance by training models independently and averaging their predictions, while Boosting reduces both bias and variance by training
Mean Squared Error (MSE), R-squared, accuracy, precision, recall, and F1-score measure the hypothesis's effectiveness.Cross-validation is another technique to test the hypothesis, Q3. Threats to Access Control and Privacy
models sequentially, with each focusing on the mistakes of previous ones.•Bagging works well for high-variance models, while Boosting is better for improving the performance of
ensuring that the model performs well across different subsets of data. Residual analysis (checking the difference between predicted and actual values) is also crucial. If the residuals
exhibit patterns, the hypothesis might be incorrect, suggesting the need for a more complex model. This iterative process ensures that the fi nal hypothesis provides a robust solution weak models.
•Unauthorized access to sensitive data.•Insecure communication channels.•Weak authentication mechanisms.•Data leakage from de vice to cloud.
for the given problem.
Q.1 What is the “Curse of Dimensionality,” and how does it affect machine learning models?The Curse of Dimensionality refers to the challenges and issues that arise when Q1. Privacy Preservation for IoT in Smart Buildings
analyzing and organizing data in high-dimensional spaces, particularly when the number of features (dimensions) increases. As the number of features grows, the volume of the
1. Explain the types of clustering methods?Clustering is a Machine Learning technique for grouping similar data points. There are several types of clustering methods:Partitioning •Data Anonymization: Protects privacy by removing identifiable information.•Access Control: Restricts data access based on roles.•Encryption: Secures communication between
space increases exponentially, leading to several problems:1. Sparse Data: In high-dimensional spaces, data points become sparse, which makes it difficult to find patterns
devices and cloud.
Clustering: This method divides data into distinct, non-overlapping groups. The most popular technique is K-means, where data points are grouped based on their proximity to the or meaningful relationships.2. Overfitting: As the dimensionality increases, models can become too complex, leading to overfitting. The model learns noise or irrelevant
cluster's centroid. K-medoids is another method, similar to K-means but more robust to outliers by using actual data points as centers.Hierarchical Clustering: Builds a hierarchy of patterns in the data, which leads to poor generalization.3. Increased Computation: The computational complexity of algorithms increases as the number of dimensions Q2. Mobility Social Features for Location Privacy in IoV
clusters, either by agglomerative (bottom-up) or divisive (top-down) approaches. In agglomerative clustering, individual data points start as separate clusters, merging as they move grows, making them slower and more resource-intensive.For instance, in classification problems, if the number of features increases, the distance between data points grows,
up the hierarchy. Divisive starts with one large cluster, dividing it into smaller clusters. A dendrogram helps visualize the clustering process.Density-Based Clustering: Groups data making it harder for distance-based models like K-Nearest Neighbors (KNN) to identify relationships. To mitigate the curse of dimensionality, dimensionality reduction techniques like 1. Geofencing2. Location Obfuscation3. Privacy-Preserving Protocols4. Anonymous Location Sharing
based on regions of high density separated by regions of low density. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a popular algorithm that can find PCA (Principal Component Analysis) are often used to reduce the number of features while retaining the most important information.
arbitrarily shaped clusters and identify noise or outliers.Model-Based Clustering: Uses probabilistic models to represent clusters, such as Gaussian Mixture Models (GMM). It Q3. Mobile WBSN and Participatory Sensing
assumes data points are generated from a mixture of several Gaussian distributions.Fuzzy Clustering: Assigns data points to multiple clusters with a degree of belonging rather than
Q.2 What is the Explained Variance Ratio, and how do you choose the right number of dimensions?The Explained Variance Ratio refers to the proportion of the dataset’s total • Mobile WBSN: Wireless Body Sensor Networks for health monitoring.• Participatory Sensing: Crowdsourced data collection for environmental monitoring.
a hard assignment. Fuzzy C-means is the commonly used algorithm in this category.Each method has its strengths; for example, K-means is fast but sensitive to initial conditions,
variance that is explained by each principal component in PCA (Principal Component Analysis). It quantifies how much information (variance) each component captures from the
while DBSCAN handles noise well and finds clusters of arbitrary shapes.
original data.To choose the right number of dimensions (or components), we look at the cumulative explained variance. For example:• If the first two principal components Q1. Preventing Unauthorized Access to Sensor Data
explain 90% of the variance, we may choose to retain just these two components, reducing the dimensionality while preserving most of the original data’s structure.In practice, a
common approach is to select enough components to explain at least 95% of the variance, balancing between dimensionality reduction and data retention. Plotting the cumulative 1. Strong Authentication Mechanisms2. Encryption of Data in Transit3. Role-Based Access Control (RBAC)4. Regular Firmware Updates
2. Explain the concept of Gaussian Mixture Models and how Expectation-Maximization (EM) works.Gaussian Mixture Models (GMM) are a probabilistic approach to clustering
that assumes data is generated from a mixture of several Gaussian distributions. Each Gaussian in the mixture has its own mea n and variance, and data points are assumed to belong explained variance and observing the “elbow point” (where adding more components contributes less variance) can also guide this decision.
Q2. Secure Path Generation Scheme for Real-Time Green IoT
to one of these distributions with a certain probability. Unlike K-means, which assigns each point to a single cluster, GMM assigns a probability for each point's membership across
multiple clusters.To estimate the parameters of the Gaussian distributions, the Expectation-Maximization (EM) algorithm is used. The EM algorithm is an iterative method comprising Q.3 What is the difference between PCA and Kernel PCA? How do you select a kernel for Kernel PCA? • Ensures secure, energy-efficient communication paths for IoT devices.• Minimizes data exposure during transmission.
two main steps: the Expectation (E) step and the Maximization (M) step. In the E-step, the algorithm calculates the probability that each data point belongs to each Gaussian
distribution based on the current parameters. In the M-step, it updates the parameters (means, variances, and mixing coefficients) to maximize the likelihood of the observed data Q3. Security Protocols for IoT Access Networks
1.Principal Component Analysis (PCA):PCA is a linear technique used to reduce dimensionality by finding new, uncorrelated variables (principal components) that maximize variance.
given the current assignment probabilities.This process repeats until convergence, meaning when the parameter values stabilize and the change between iterations falls below a set
It is effective when the data lies on a linear subspace, as it projects the data along axes that capture the most variance.2. Kernel PCA:Kernel PCA extends PCA to nonlinear data 1. IPSec for secure communications.2. TLS/SSL for data encryption.3. MQTT with TLS for IoT messaging security.
threshold. EM is effective for finding hidden patterns in data with overlapping clusters, where traditional clustering techniques like K-means may not perform well.
by using kernel methods. Instead of using a linear transformation, Kernel PCA applies a kernel function (e.g., polynomial, Gaussian RBF) to map the data into a higher-dimensional
feature space where linear separation becomes possible. It’s especially useful when the data is not linearly separable, as it can capture complex structures.Difference:• Q1. Governance Framework for Privacy and Trust in IoT
Q2) Parameter estimation method? In machine learning, parameter estimation methods are used to determine the values of parameters (such as coefficients in a linear model or PCA is suitable for linear datasets, while Kernel PCA is designed for nonlinear datasets.• Kernel PCA can handle more complex patterns by applying kernel
functions to transform the data before applying PCA.Selecting a Kernel:• The choice of kernel depends on the data characteristics:• Linear Kernel: When data is • Establishes policies for data management.• Focuses on user consent, data access control, and auditing.
weights in a neural network) that best fit the model to the data. Two commonly used methods for parameter estimation are Maximum Likelihood Estimation (MLE) and Maximum A
Posteriori Estimation (MAP). Let’s dive into these two methods.1. Maximum Likelihood Estimation (MLE)Concept: MLE finds the parameter values that maximize the likelihood of linearly separable.• Polynomial Kernel: When the data has polynomial relationships.• Gaussian RBF Kernel: For capturing complex, highly nonlinear relationships.•
Q2. Policy-Based Approach for Informed Consent in IoT
the observed data given the model. In other words, it finds parameters that make the observed data most probable under the assumed model.Application: Widely used in Cross-validation or grid search can help in selecting the best kernel and its parameters (e.g., kernel size for RBF).
probabilistic models, such as linear regression, logistic regression, and Gaussian models.How MLE Works:Define the Likelihood Function:Suppose we have a dataset • Defines user consent mechanisms.• Ensures transparency in data collection and usage.
X={x1,x2,...,xn}X={x1 ,x2 ,...,xn } and a probabilistic model with parameters θθ.The likelihood function L(θ)L(θ) is defined as the probability of the observed data given the
parameters:L(θ)=P(X∣θ)L(θ)=P(X∣θ)Maximize the Likelihood:Since probabilities for multiple observations are typically very small, it is common to use the log-likelihood, which is Q3. Security for IoT-Based Healthcare (Smart City)
the logarithm of the likelihood function: logL(θ)=∑i=1nlog P(xi∣θ)logL(θ)=i=1∑n logP(xi ∣θ)MLE finds the parameter θθ that maximizes the log-likelihood. This can be done by
•Data Encryption: Protects patient data in transit and at rest.•Access Control: Ensures only authorized personnel can access sensitive health data.•Compliance: Adheres to
differentiating the log-likelihood with respect to θθ and setting the derivative to zero (using calculus).Example:Suppose we have data points from a Gaussian distribution with mean
regulations like HIPAA for patient data security.
μμ and variance σ2σ2.The likelihood function is:L(μ,σ)=∏i=1n12πσ2e−(xi−μ)22σ2L(μ,σ)=i=1∏n 2πσ2 1 e−2σ2(xi −μ)2The log-likelihood
becomes:logL(μ,σ)=−n2log(2πσ2)−12σ2∑i=1n(xi−μ)2logL(μ,σ)=−2n log(2πσ2)−2σ21 i=1∑n (xi −μ)2Taking partial derivatives with respect to μμ and σσ and solving for them gives
the values that maximize the likelihood.Pros and Cons of MLE:Pros: MLE is straightforward and often results in unique solutions, especially for large datasets.Cons: Sensitive to
noise and outliers, especially if the data is small or the assumed model is incorrect.2. Maximum A Posteriori Estimation (MAP)Concept: MAP estimation is similar to MLE but
incorporates prior beliefs about the parameters. It combines both the likelihood of the data and a prior probability distribution over the parameters, resulting in a
posteriordistribution.Application: Used in Bayesian models, especially when we have prior knowledge or assumptions about parameter values.How MAP Works:Define the
Posterior Distribution: According to Bayes' Theorem, the posterior distribution of parameters θθ given data XX is:P(θ∣X)=P(X∣θ)P(θ)P(X)P(θ∣X)=P(X)P(X∣θ)P(θ) Here, P(θ∣X)P(θ∣X) is
the posterior distribution, P(X∣θ)P(X∣θ) is the likelihood, P(θ)P(θ) is the prior distribution, and P(X)P(X) is the evidence (a constant with respect to θθ).Maximize the Posterior:MAP
estimation finds the parameter θθ that maximizes the posterior:θ^MAP=argmaxθP(θ∣X)=argmaxθ P(X∣θ)P(θ)θ^MAP =argθmax P(θ∣X)=argθmax P(X∣θ)P(θ)Taking the log of the
posterior (similar to MLE):log P(θ∣X)=log P(X∣θ)+log P(θ)logP(θ∣X)=logP(X∣θ)+logP(θ)The term log P(θ)logP(θ) represents the prior distribution, incorporating any prior
knowledge.Pros and Cons of MAP:Pros: Incorporates prior information, making it more robust in cases with small data or outliers.Cons: Requires a prior distribution, which may be
subjective and hard to define if there is no prior knowledge.