0% found this document useful (0 votes)

22 views31 pages

ML Supervised Full Notes (1) - New

Uploaded by

insta.secrty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views31 pages

ML Supervised Full Notes (1) - New

Uploaded by

insta.secrty

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

GATE in Data Science and AI study material GATE in Data Science and AI study material

GATE in Data Science and AI Study Materials 1 Introduction to Machine Learning

Machine Learning
By Piyush Wairale Defination

• Machine Learning is the field of study that gives computers the capability to learn
without being explicitly programmed. ML is one of the most exciting technologies
Instructions: that one would have ever come across. As it is evident from the name, it gives the
computer that makes it more similar to humans: The ability to learn. Machine learning
• Kindly go through the lectures/videos on our website www.piyushwairale.com
is actively being used today, perhaps in many more places than one would expect.
• Read this study material carefully and make your own handwritten short notes. (Short
• Machine learning is programming computers to optimize a performance criterion using
notes must not be more than 5-6 pages)
example data or past experience. We have a model defined up to some parameters,
• Attempt the question available on portal. and learning is the execution of a computer program to optimize the parameters of
the model using the training data or past experience. The model may be predictive to
• Revise this material at least 5 times and once you have prepared your short notes, then make predictions in the future, or descriptive to gain knowledge from data, or both.
revise your short notes twice a week
• The field of study known as machine learning is concerned with the question of how
• If you are not able to understand any topic or required detailed explanation, to construct computer programs that automatically improve with experience
please mention it in our discussion forum on webiste
Defination of Learning
• Let me know, if there are any typos or mistake in study materials. Mail A computer program is said to learn from experience E with respect to some class of tasks
me at piyushwairale100@gmail.com T and performance measure P, if its performance at tasks T, as measured by P, improves
with experience E.
A computer program which learns from experience is called a machine learning program or
simply a learning program. Such a program is sometimes also referred to as a learner.

Examples

1. Handwriting recognition learning problem

• Task T: Recognising and classifying handwritten words within images

• Performance P: Percent of words correctly classified
• Training experience E: A dataset of handwritten words with given classifications

2. A robot driving learning problem

• Task T: Driving on highways using vision sensors

• Performance measure P: Average distance traveled before an error
• training experience: A sequence of images and steering commands recorded while
observing a human driver

3. A chess learning problem.

1 2

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

• Task T: Playing chess 1.1 Introduction to Data in Machine Learning

• Performance measure P: Percent of games won against opponents What is data?
• Training experience E: Playing practice games against itself as measured by P, It can be any unprocessed fact, value, text, sound, or picture that is not being interpreted
improves with experience E. and analyzed. Data is the most important part of all Data Analytics, Machine Learning,
and Artificial Intelligence.
Without data, we can’t train any model and all modern research and automation will go in
How Machine Learn? vain.
The learning process, whether by a human or a machine, can be divided into four
components, namely, data storage, abstraction, generalization and evaluation. • Data is a crucial component in the field of Machine Learning. It refers to the set of
observations or measurements that can be used to train a machine-learning model.

• The quality and quantity of data available for training and testing play a significant
role in determining the performance of a machine-learning model.

• Data can be in various forms such as numerical, categorical, or time-series data, and
can come from various sources such as databases, spreadsheets, or APIs.

• Machine learning algorithms use data to learn patterns and relationships between input
variables and target outputs, which can then be used for prediction or classification
tasks.
Understanding data
Since an important component of the machine learning process is data storage, we briefly
consider in this section the different types and forms of data that are encountered in the
machine learning process.
Unit of observation
By a unit of observation we mean the smallest entity with measured properties of interest
for a study.
Examples
• A person, an object or a thing

• A time point

• A geographic region

• A measurement
Sometimes, units of observation are combined to form units such as person-years.
Examples and features
Datasets that store the units of observation and their properties can be imagined as collec-
tions of data consisting of the following:
Examples
An “example” is an instance of the unit of observation for which properties have been
recorded.
An “example” is also referred to as an “instance”, or “case” or “record.” (It may be noted

3 4

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

that the word “example” has been used here in a technical sense.) Labeled data includes a label or target variable that the model is trying to predict,
Features whereas unlabeled data does not include a label or target variable. The data used in
A “feature” is a recorded property or a characteristic of examples. It is also referred to as machine learning is typically numerical or categorical. Numerical data includes values that
“attribute”, or “variable” or “feature.” can be ordered and measured, such as age or income. Categorical data includes values that
Examples for “examples” and “features” represent categories, such as gender or type of fruit.
1. Cancer detection
Consider the problem of developing an algorithm for detecting cancer. In this study we note • Data can be divided into training and testing sets.
the following. • The training set is used to train the model, and the testing set is used to evaluate the
(a) The units of observation are the patients. performance of the model.
(b) The examples are members of a sample of cancer patients.
(c) The following attributes of the patients may be chosen as the features: • It is important to ensure that the data is split in a random and representative way.
• gender
• age • Data preprocessing is an important step in the machine learning pipeline. This step
• blood pressure can include cleaning and normalizing the data, handling missing values, and feature
• the findings of the pathology report after a biopsy selection or engineering.
2. Pet selection
How do we split data in Machine Learning?
Suppose we want to predict the type of pet a person will choose.
(a) The units are the persons.
(b) The examples are members of a sample of persons who own pets 1. Training Data: The part of data we use to train our model. This is the data that
(c) The features might include age, home region, family income, etc. of persons who own your model actually sees(both input and output) and learns from.
pets.
2. Validation Data: The part of data that is used to do a frequent evaluation of the
model, fit on the training dataset along with improving involved hyperparameters
(initially set parameters before the model begins learning). This data plays its part
when the model is actually training.

3. Testing Data: Once our model is completely trained, testing data provides an unbi-
ased evaluation. When we feed in the inputs of Testing data, our model will predict
some values(without seeing actual output).

After prediction, we evaluate our model by comparing it with the actual output present
in the testing data. This is how we evaluate and see how much our model has learned
from the experiences feed in as training data, set at the time of training.
Figure 1: Example for “examples” and “features” collected in a matrix format (data relates
to automobiles and their features) Different forms of data

1. Numeric data If a feature represents a characteristic measured in numbers, it is called

Type of Data
a numeric feature.
Data is typically divided into two types:
2. Categorical or nominal A categorical feature is an attribute that can take on one of
a limited, and usually fixed, number of possible values on the basis of some qualitative
1. Labeled data
property. A categorical feature is also called a nominal feature.
2. Unlabeled data

5 6

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

3. Ordinal data This denotes a nominal variable with categories falling in an ordered 1.2 Different types of learning
list. Examples include clothing sizes such as small, medium, and large, or a measure-
In general, machine learning algorithms can be classified into three types.
ment of customer satisfaction on a scale from “not at all happy” to “very happy.”
1. Supervised learning
Examples In the data given in Fig.1, the features “year”, “price” and “mileage” are numeric
and the features “model”, “color” and “transmission” are categorical. • Supervised learning is the machine learning task of learning a function that maps
an input to an output based on example input-output pairs.
Properties of Data
• In supervised learning, each example in the training set is a pair consisting of an
• Volume: Scale of Data. With the growing world population and technology at expo- input object (typically a vector) and an output value.
sure, huge data is being generated each and every millisecond. • A supervised learning algorithm analyzes the training data and produces a func-
tion, which can be used for mapping new examples.
• Variety: Different forms of data – healthcare, images, videos, audio clippings.
• In the optimal case, the function will correctly determine the class labels for
• Velocity: Rate of data streaming and generation. unseen instances.
• Value: Meaningfulness of data in terms of information that researchers can infer from • Both classification and regression problems are supervised learning prob-
it. lems.
• A wide range of supervised learning algorithms are available, each with its strengths
• Veracity: Certainty and correctness in data we are working on.
and weaknesses. There is no single learning algorithm that works best on all su-
• Viability: The ability of data to be used and integrated into different systems and pervised learning problems.
processes. • Important Point :A “supervised learning” is so called because the process of
an algorithm learning from the training dataset can be thought of as a teacher
• Security: The measures taken to protect data from unauthorized access or manipula-
supervising the learning process. We know the correct answers (that is, the cor-
tion.
rect outputs), the algorithm iteratively makes predictions on the training data
• Accessibility: The ease of obtaining and utilizing data for decision-making purposes. and is corrected by the teacher. Learning stops when the algorithm achieves an
Integrity: The accuracy and completeness of data over its entire lifecycle. acceptable level of performance.

• Usability: The ease of use and interpretability of data for end-users.

Fig 2:Supervised learning

Example:
Consider the following data regarding patients entering a clinic. The data consists of
the gender and age of the patients and each patient is labeled as “healthy” or “sick”.

7 8

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

gender age label • A learner (the program) is not told what actions to take as in most forms of
M 48 sick machine learning, but instead must discover which actions yield the most reward
M 67 sick by trying them.
F 53 healthy • In the most interesting and challenging cases, actions may affect not only the
M 49 healthy immediate reward but also the next situations and, through that, all subsequent
F 34 sick rewards.
M 21 healthy
• For example:, consider teaching a dog a new trick: we cannot tell it what to
do, but we can reward/punish it if it does the right/wrong thing. It has to find
Based on this data, when a new patient enters the clinic, how can one predict whether out what it did that made it get the reward/punishment. We can use a similar
he/she is healthy or sick? method to train computers to do many tasks, such as playing backgammon or
chess, scheduling jobs, and controlling robot limbs.
2. Unsupervised learning
• Reinforcement learning is different from supervised learning. Supervised learning
• Unsupervised learning is a type of machine learning algorithm used to draw in- is learning from examples provided by a knowledgeable expert.
ferences from datasets consisting of input data without labeled responses.
• In unsupervised learning algorithms, a classification or categorization is not in-
cluded in the observations.
• There are no output values and so there is no estimation of functions. Since the
examples given to the learner are unlabeled, the accuracy of the structure that is
output by the algorithm cannot be evaluated.
• The most common unsupervised learning method is cluster analysis, which is used
for exploratory data analysis to find hidden patterns or grouping in data

Example Consider the following data regarding patients entering a clinic. The data
consists of the gender and age of the patients.

gender age
M 48
M 67
F 53
M 49
F 34
M 21

Based on this data, can we infer anything regarding the patients entering the clinic?

3. Reinforcement learning

• Reinforcement learning is the problem of getting an agent to act in the world so

as to maximize its rewards.

9 10

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

2 Supervised Learning 3 Regression:

In supervised learning, the training data you feed to the algorithm includes the desired • In machine learning, a regression problem is the problem of predicting the value of a
solutions, called labels. These methods use a training set that consists of labeled data points numeric variable based on observed values of the variable.
(for which we know the correct label values). We refer to a data point as labeled if its label
value is known. Labeled data points might be obtained from human experts that annotate • The value of the output variable may be a number, such as an integer or a floating
(“label”) data points with their label values. point value. These are often quantities, such as amounts and sizes. The input variables
may be discrete or real-valued.

• Regression algorithms are used if there is a relationship between the input variable and
the output variable.

• It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends, etc.

General Approach
Let x denote the set of input variables and y the output variable. In machine learning, the
general approach to regression is to assume a model, that is, some mathematical relation
between x and y, involving some parameters say, θ, in the following form:
y = f (x, θ)
The function f (x, θ) is called the regression function. The machine learning algorithm opti-
mizes the parameters in the set θ such that the approximation error is minimized; that is,
the estimates of the values of the dependent variable y are as close as possible to the correct
values given in the training set.
• Supervised learning is a machine learning paradigm where algorithms aim to optimize
parameters to minimize the difference between target and computed outputs, com- Example
monly used in tasks like classification and regression. For example, if the input variables are “Age”, “Distance” and “Weight” and the output
variable is “Price”, the model may be
• In supervised learning, training examples are associated with target outputs (initially
labeled) and computed outputs (generated by the learning algorithm), and the goal is
to minimize misclassification or error.

• The learning process in supervised learning involves initializing parameters randomly,

computing output values for labeled examples, and updating parameters iteratively to
minimize errors or achieve convergence.
where x =(Age, Distance, Weight) denotes the set of input variables and θ = (a0 , a1 , a2 , a3 )
Type of Supervised Learning: denotes the set of parameters of the model.
1.Regression Various types of regression techniques These techniques mostly differ in three aspects,
2.Classification namely, the number and type of independent variables, the type of dependent variables and
the shape of regression line. Some of these are listed below.

1. Simple linear regression: There is only one continuous independent variable x and the
assumed relation between the independent variable and the dependent variable y is
y = a + bx

11 12

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

2. Multivariate linear regression: There are more than one independent variable, say 3.0.1 Linear Regression
x1, ..., xn, and the assumed relation between the independent variables and the depen-
Simple linear regression is a basic machine learning technique used for modeling the relation-
dent variable is y = a0 + a1 x1 + .... + an xn
ship between a single independent variable (often denoted as ”x”) and a dependent variable
3. Polynomial regression: There is only one continuous independent variable x and the (often denoted as ”y”). It assumes a linear relationship between the variables and aims to
assumed model is y = a0 + a1 x + .... + an xn find the best-fitting line (typically represented by the equation y = mx + b) that minimizes
It is a variant of the multiple linear regression model, except that the best fit line is the sum of squared differences between the observed data points and the values predicted
curved rather than straight. by the model.

Equation: The linear regression model is represented by the equation

y = ax + b,
where:
y is the dependent variable (the one you want to predict).
x is the independent variable (the one used for prediction).
a is the slope (also called the regression coefficient), representing how much y changes for
each unit change in x.
b is the y-intercept, representing the value of y when x is 0.

In order to determine the optimal estimates of a and b, an estimation method known as

Ordinary Least Squares (OLS) is used.
Formulas to find a and b
The means of x and y are given by

4. Ridge regression: Ridge regression is one of the types of linear regression in which a
small amount of bias is introduced so that we can get better long-term predictions.
Ridge regression is a regularization technique, which is used to reduce the complexity
of the model. It is also called as L2 regularization.

5. Logistic regression: The dependent variable is binary, that is, a variable which takes
only the values 0 and 1. The assumed model involves certain probability distributions. and also that the variance of x is given by

The covariance of x and y, denoted by Cov(x, y) is defined as

It can be shown that the values of a and b can be computed using the following formulas:

13 14

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

Advantages of Simple Linear Regression:

• Simplicity and ease of interpretation.

• Transparent modeling with clear coefficient interpretations.

• Computational efficiency, suitable for large datasets.

• A baseline model for assessing feature significance.

• Effective when the relationship between variables is linear.

Disadvantages of Simple Linear Regression:

• Limited to linear relationships, may perform poorly for nonlinear data.

• Sensitive to outliers, leading to parameter influence.

Therefore, the linear regression model for the data is y = 0.785 + 0.425x
• Prone to underfitting when facing complex relationships.

• Assumptions of independent and normally distributed errors are critical.

• Suitable only when one independent variable is involved in the analysis.

Example
Obtain a linear regression for the data in below table assuming that y is the independent
variable.

15 16

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

3.0.2 Multivariate Linear Regression Advantages:

Multiple Linear Regression is a machine learning technique used to model the relationship • The multivariate regression method helps you find a relationship between multiple
between a dependent variable (target) and multiple independent variables (features) by fit- variables or features.
ting a linear equation to the data.
The model can be expressed as: y = β0 + β1 x1 + .... + βN xN • It also defines the correlation between independent variables and dependent variables.
Where:
Disadvantages:
y is the dependent variable (the one you want to predict).
x1 , x2 , ..., xn are the independent variables. • Multivariate regression technique requires high-level mathematical calculations.
β0 is the y-intercept.
β1 , β2 , ...βn are the coefficients associated with each independent variable. • It is complex.

Let there also be n observed values of these variables: • The output of the multivariate regression model is difficult to analyse.

• The loss can use errors in the output.

• Multivariate regression yields better results when used with larger datasets rather than
small ones.

Example:
Fit a multiple linear regression model to the following data:

As in simple linear regression, here also we use the ordinary least squares method to
obtain the optimal estimates of β0 , β1 , ...βn The method yields the following procedure for
the computation of these optimal estimates. Let

In this problem, there are two independent variables and four sets of values of the vari-
ables. Thus, in the notations used above, we have n = 2 and N = 4. The multiple linear
regression model for this problem has the form y = β0 + β1 x1 + β2 x2

Then it can be shown that the regression coefficients are given by

B = (X T X)−1 X T Y

17 18

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

3.0.3 Ridge Regression

(Important for GATE DA)
Ridge Regression, also known as L2 regularization, is a machine learning technique used to
mitigate the issues of multicollinearity and overfitting in Multiple Linear Regression. It adds
a regularization term to the regression equation, modifying the loss function, and aims to
minimize the sum of squared errors along with the sum of the squared coefficients, which
can be expressed as:

• When data exhibits multicollinearity, that is, the ridge regression technique is applied
when the independent variables are highly correlated. While least squares estimates
The required model is y = 2.0625 − 2.3750x1 + 3.2500x2 are unbiased in multicollinearity, their variances are significant enough to cause the
observed value to diverge from the actual value. Ridge regression reduces standard
errors by biassing the regression estimates.

• The lambda (λ) variable in the ridge regression equation resolves the multicollinearity
problem.

• Lambda (λ) is the penalty term. So, by changing the values of (λ), we are controlling
the penalty term. The higher the values of (λ), the bigger is the penalty and therefore
the magnitude of coefficients is reduced.

• In this technique, the cost function is altered by adding the penalty term to it. The
amount of bias added to the model is called Ridge Regression penalty. We can
calculate it by multiplying with the lambda to the squared weight of each individual
feature.

• In the above equation, the penalty term regularizes the coefficients of the model, and
hence ridge regression reduces the amplitudes of the coefficients that decreases the
complexity of the model.

• As we can see from the above equation, if the values of (λ) tend to zero, the equation
becomes the cost function of the linear regression model. Hence, for the minimum
value of (λ) , the model will resemble the linear regression model.

• A general linear or polynomial regression will fail if there is high collinearity between
the independent variables, so to solve such problems, Ridge regression can be used.

• It helps to solve the problems if we have more parameters than samples.

19 20

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

Bias and variance trade-off • Interpretability: Ridge regression may make the model less interpretable because
Bias and variance trade-off is generally complicated when it comes to building ridge regression it shrinks coefficients toward zero. It can be challenging to discern the individual
models on an actual dataset. However, following the general trend which one needs to importance of predictors.
remember is:
• Tuning Lambda: Cross-validation is often used to tune the lambda parameter and
• The bias increases as (λ) increases. find the optimal trade-off between fitting the data and regularization.

• The variance decreases as (λ) increases. Disadvantages of Ridge Regression:

Assumptions of Ridge Regressions: • Sensitivity to Lambda: Proper selection of the regularization parameter (λ) is
The assumptions of ridge regression are the same as that of linear regression: linearity, con- crucial; an incorrect choice can result in underfitting or ineffective regularization.
stant variance, and independence. However, as ridge regression does not provide confidence
limits, the distribution of errors to be normal need not be assumed. • Loss of Interpretability: Ridge regression can make the model less interpretable
since it shrinks coefficients towards zero, potentially making it harder to discern the
• Linear Relationship: Ridge Regression assumes that there is a linear relationship individual predictor’s importance.
between the independent variables and the dependent variable.
• Ineffective for Feature Selection: Ridge regression does not perform feature se-
• Homoscedasticity: Ridge Regression assumes that the variance of the errors is con- lection. It retains all predictors in the model but assigns smaller coefficients to less
stant across all levels of the independent variables. important ones.

• Independence of errors: Ridge Regression assumes that the errors are independent • Less Effective for Sparse Data: In cases where many predictors are irrelevant or
of each other, i.e., the errors are not correlated. unimportant, Ridge may not eliminate them from the model effectively.

• Normality of errors: Ridge Regression assumes that the errors follow a normal Extra info to understand more about Ridge Regression
distribution.
What is Regularization?
Key points about Ridge Regression in machine learning:
• Regularization is one of the most important concepts of machine learning. It is a
• Regularization: Ridge regression adds a penalty term that discourages the magnitude
technique to prevent the model from overfitting by adding extra information to it.
of the coefficients from becoming too large. This helps prevent overfitting.
• Sometimes the machine learning model performs well with the training data but does
• Multicollinearity Mitigation: It’s particularly effective when you have highly corre-
not perform well with the test data. It means the model is not able to predict the
lated independent variables (multicollinearity) by shrinking the coefficients and making
output when deals with unseen data by introducing noise in the output, and hence the
the model more stable.
model is called overfitted. This problem can be deal with the help of a regularization
• Lambda Parameter: The choice of the lambda parameter (λ) is essential. A small λ technique.
is close to standard linear regression, while a large λ results in stronger regularization.
• This technique can be used in such a way that it will allow to maintain all variables or
• Balancing Act: Ridge regression performs a balancing act between fitting the data features in the model by reducing the magnitude of the variables. Hence, it maintains
well and preventing overfitting. It maintains all predictors but assigns smaller coeffi- accuracy as well as a generalization of the model.
cients to less important ones.
• It mainly regularizes or reduces the coefficient of features toward zero. In simple words,
• Model Stability: It makes the model more stable, especially when you have a high- ”In regularization technique, we reduce the magnitude of the features by keeping the
dimensional dataset with many predictors. This can lead to better generalization to same number of features.”
new, unseen data.

21 22

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

What is Shrinkage? 4 Logistic Regression

• Shrinkage refers to the process of shrinking the estimated regression coefficients towards • Logistic Regression is a machine learning algorithm used for binary classification tasks,
zero. This is done by adding a penalty term to the sum of squared residuals in the modeling the probability of an event occurring or not, by fitting a logistic curve to the
regression equation, which is called the regularization term. data. It’s expressed as the logistic function, which maps the linear combination of
input features to values between 0 and 1.
• The regularization term is proportional to the square of the magnitude of the regression
coefficients, and it is controlled by a tuning parameter, usually denoted as λ. The higher • Logistic regression predicts the output of a categorical dependent variable. Therefore
the value of λ, the more the coefficients are shrunk towards zero. the outcome must be a categorical or discrete value. It can be either Yes or No, 0 or
1, true or False, etc. but instead of giving the exact value as 0 and 1, it gives the
• Shrinkage helps to reduce the variance of the estimates and can improve the prediction
probabilistic values which lie between 0 and 1.
accuracy of the model.
• Logistic Regression is much similar to the Linear Regression except that how they
What is Multicollinearity?
are used. Linear Regression is used for solving Regression problems, whereas Logistic
• Multicollinearity is a phenomenon where one predicted value in several regression mod- regression is used for solving the classification problems.
els is linearly predicted with others.
• In Logistic regression, instead of fitting a regression line, we fit an “S” shaped logistic
• Multicollinearity basically happens when more than two anticipated variables have function, which predicts two maximum values (0 or 1). The curve from the logistic
substantial correlations with one another. function indicates the likelihood of something such as whether the cells are cancerous
or not, a mouse is obese or not based on its weight, etc.
• In modeled data, multicollinearity could be defined as the presence of a correlation
between independent variables. Estimates of the regression coefficient may become • Logistic Regression is a significant machine learning algorithm because it has the ability
inaccurate as a result. It can potentially raise the standard errors of the regression to provide probabilities and classify new data using continuous and discrete datasets.
coefficients and reduce the efficacy of any t-tests.
• Logistic Regression can be used to classify the observations using different types of
• In addition to increasing model redundancy and decreasing predictability’s effectiveness data and can easily determine the most effective variables used for the classification.
and dependability, multicollinearity can provide false results and p-values.
• It is used for predicting the categorical dependent variable using a given set of inde-
• Multicollinearity can be introduced by using multiple data sources. This could happen pendent variables.
as a result of limitations placed on linear or demographic models, an overly precise
model, outliers, or model design or choice made during the data collection process.

• Multicollinearity may be introduced during the data collection process if the data were
gathered using an inappropriate sampling method. Even if the sample size is smaller
than expected, it could still happen.

• Because there are more variables than data, multicollinearity will be visible if the model
is overspecified.

23 24

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

4.1 Logistic Function (Sigmoid Function) Key points about Logistic Regression include:

• Binary Classification: It’s primarily used for two-class classification problems, where
the output is either 0 or 1, indicating the absence or presence of an event.

• Logistic Function: Utilizes the logistic (sigmoid) function to convert a linear combi-
nation of input features into a probability value between 0 and 1.

• Coefficient Interpretation: Coefficients represent the impact of each feature on the

probability of the event, making the model interpretable.

• Maximum Likelihood Estimation: The model is trained using maximum likelihood

estimation to find the optimal parameters that best fit the data.

• Decision Boundary: Logistic Regression calculates a decision boundary to separate

the classes, making it a linear classifier by default.

• Regularization: Regularization techniques, such as L1 (Lasso) and L2 (Ridge), can

be applied to prevent overfitting and improve model generalization.

• Evaluating Performance: Common performance metrics include accuracy, preci-

sion, recall, F1 score, and the ROC-AUC curve.

• Extensions: Multinomial (softmax) logistic regression is used for multi-class classifi-

cation tasks.
• The sigmoid function is a mathematical function used to map the predicted values to
probabilities. It maps any real value into another value within a range of 0 and 1. o • Applications: Logistic Regression is applied in various fields, including medical di-
The value of the logistic regression must be between 0 and 1, which cannot go beyond agnosis, finance, spam detection, and sentiment analysis.
this limit, so it forms a curve like the “S” form.
• The S-form curve is called the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and
a value below the threshold values tends to 0.

4.2 Type of Logistic Regression:

On the basis of the categories, Logistic Regression can be classified into three types:
• Binomial: In binomial Logistic regression, there can be only two possible types of the
dependent variables, such as 0 or 1, Pass or Fail, etc.
• Multinomial: In multinomial Logistic regression, there can be 3 or more possible
unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”
• Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered types
of dependent variables, such as “low”, “Medium”, or “High”.

25 26

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

5 K-Nearest Neighbors
• K-Nearest Neighbors (KNN) is a simple and intuitive machine-learning algorithm used
for both classification and regression tasks.

• It is a non-parametric and instance-based learning method, which means it doesn’t

make any assumptions about the underlying data distribution and makes predictions
based on the similarity of data points.

• K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.

• K-NN algorithm stores all the available data and classifies a new data point based on
the similarity. This means when new data appears then it can be easily classified into
a well suite category by using K- NN algorithm.

• K-NN is a non-parametric algorithm, which means it does not make any assumption
on underlying data.

• It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs
an action on the dataset.

• KNN tries to predict the correct class for the test data by calculating the distance
between the test data and all the training points. Then select the K number of points
which is closest to the test data. The KNN algorithm calculates the probability of the
test data belonging to the classes of ‘K’ training data and class that holds the highest
probability will be selected. In the case of regression, the value is the mean of the ‘K’
selected training points.

Voronoi diagram, deals with when value of K =1

Describes the areas that are nearest to any given point, given a set of data. Each line segment
is equidistant between two points of opposite class

27 28

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

Choosing Value of K

• Larger k may lead to better performance But if we set k too large we may end up 5.2 K-Nearest Neighbors (KNN) Classification Example
looking at samples that are not neighbors (are far away from the query)
Suppose we have a dataset with the following points:
• We can use cross-validation to find k
Data Point Feature 1 (X1) Feature 2 (X2) Class
• Rule of thumb is k ¡ sqrt(n), where n is the number of training example A 1 2 Blue
B 2 3 Blue
• Larger k produces smoother boundary effect C 2 1 Red
• When K==N, always predict the majority class D 3 3 Red
E 4 2 Blue

5.1 Working Now, let’s say we want to classify a new data point with features X1 = 2.5 and X2 = 2.5
using a KNN algorithm with k = 3 (i.e., considering the three nearest neighbors).
The K-NN working can be explained on the basis of the below algorithm: 1. Calculate Euclidean Distances:
√
1. Select the number K of the neighbors
p
Distance to A: (2.5 − 1)2 + (2.5 − 2)2 = 1.5
p √
2. Calculate the Euclidean distance of K number of neighbors Distance to B: (2.5 − 2)2 + (2.5 − 3)2 = 1.5
p √
3. Take the K nearest neighbors as per the calculated Euclidean distance. Distance to C: (2.5 − 2)2 + (2.5 − 1)2 = 1.5
p √
Distance to D: (2.5 − 3)2 + (2.5 − 3)2 = 2.5
4. Among these k neighbors, count the number of the data points in each category. p √
Distance to E: (2.5 − 4)2 + (2.5 − 2)2 = 4.5
5. Assign the new data points to that category for which the number of the neighbor is
maximum. 2. Find K Nearest Neighbors: Identify the three nearest neighbors based on the
calculated distances. In this case, the three closest points are A, B, and C.
6. Our model is ready. 3. Majority Voting: Determine the majority class among the three nearest neighbors.
Since A and B are Blue, and C is Red, the majority class is Blue.
4. Prediction: Predict that the new point X1 = 2.5, X2 = 2.5 belongs to the majority
class, which is Blue.

29 30

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

5.3 Advantages of K-Nearest Neighbors (KNN): 6 Naive Bayes Classifier

• Simplicity: KNN is easy to understand and implement, making it a good choice for Naı̈ve Bayes algorithm is a supervised learning algorithm, which is based on Bayes theorem
simple classification and regression tasks. and used for solving classification problems. It is mainly used in text classification that
• No Training Period: KNN is a lazy learner, meaning it doesn’t require a lengthy includes a high-dimensional training dataset. Naı̈ve Bayes Classifier is one of the simple and
training period. It stores the entire training dataset and makes predictions when most effective Classification algorithms which helps in building the fast machine learning
needed. models that can make quick predictions. It is a probabilistic classifier, which means it
predicts on the basis of the probability of an object. Some popular examples of Naı̈ve Bayes
• Non-parametric: KNN doesn’t make any assumptions about the underlying data Algorithm are spam filtration, Sentimental analysis, and classifying articles.
distribution, making it versatile for a wide range of applications.

• Adaptability: KNN can be used for both classification and regression tasks, and it 6.1 Bayes’ Theorem
can handle multi-class problems without modification. Bayes’ theorem is also known as Bayes’ Rule or Bayes’ law, which is used to determine the
probability of a hypothesis with prior knowledge. It depends on the conditional probability.
• Interpretability: The algorithm provides human-interpretable results, as predictions
The formula for Bayes’ theorem is given as: Where,
are based on the majority class or the average of the nearest neighbors.

5.4 Disadvantages of K-Nearest Neighbors (KNN):

• Computational Intensity: KNN can be computationally expensive, especially for
large datasets, as it requires calculating distances to all data points during prediction.

• Sensitivity to Feature Scaling: KNN is sensitive to the scale of features, so it’s

P(A—B) is Posterior probability: Probability of hypothesis A on the observed event B.
essential to normalize or standardize your data before applying the algorithm.
P(B—A) is Likelihood probability: Probability of the evidence given that the probability
• Curse of Dimensionality: KNN’s performance degrades as the number of features of a hypothesis is true. P(A) is Prior Probability: Probability of hypothesis before observing
or dimensions increases, as distances between data points become less meaningful in the evidence.
high-dimensional spaces. P(B) is Marginal Probability: Probability of Evidence.
Assumption
• Determining the Optimal K: Selecting the right value for K is crucial, and choosing The fundamental Naive Bayes assumption is that each feature makes an independent and
an inappropriate K can lead to underfitting or overfitting. There’s no universally equal contribution to the outcome.
optimal value, and it often requires experimentation. With relation to our dataset, this concept can be understood as:
We assume that no pair of features are dependent. For example, the temperature being ‘Hot’
• Imbalanced Data: KNN can be biased towards the majority class in imbalanced has nothing to do with the humidity or the outlook being ‘Rainy’ has no effect on the winds.
datasets. It’s essential to balance the dataset or adjust the class weights when neces- Hence, the features are assumed to be independent.
sary. Secondly, each feature is given the same weight(or importance). For example, knowing only
temperature and humidity alone can’t predict the outcome accurately. None of the attributes
is irrelevant and assumed to be contributing equally to the outcome.
Here are the key concepts and characteristics of the Naive Bayes classifier:

• Bayes’ Theorem: The classifier is based on Bayes’ theorem, which calculates the
probability of a hypothesis (in this case, a class label) given the evidence (features or at-
tributes). Mathematically, it is expressed as P(class—evidence) = [P(evidence—class)
* P(class)] / P(evidence).

31 32

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

• Independence Assumption: The ”Naive” in Naive Bayes refers to the assumption

that all features are independent of each other, given the class label. In reality, this as-
sumption is often not true, but the simplification makes the algorithm computationally
efficient and easy to implement.

• Types of Naive Bayes Classifiers:

1. Multinomial Naive Bayes: Typically used for text classification where features
represent word counts.
2. Gaussian Naive Bayes: Suitable for continuous data and assumes a Gaussian
distribution of features.
3. Bernoulli Naive Bayes: Applicable when features are binary, such as presence or
absence.

• Prior Probability (P(class)): This is the probability of a class occurring before

observing any evidence. It can be calculated from the training data.

• Likelihood (P(evidence—class)): This represents the probability of observing a

specific set of features given a class label. For text classification, this might involve
counting the occurrence of words in documents.

• Posterior Probability (P(class—evidence)): It is the probability of a class label

given the evidence, which is what the Naive Bayes classifier calculates for each class.

• Classification: To classify a new data point, the classifier calculates the posterior
probabilities for each class and selects the class with the highest probability.

33 34

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

6.2 Advantages of Naive Bayes:

• Simplicity: Naive Bayes is easy to implement and understand, making it a good choice
for quick classification tasks.

• Efficiency: It can handle a large number of features efficiently, particularly in text

classification.

• Works Well with Small Datasets: It can perform reasonably well even with limited
training data.

• Multiclass Classification: It can be used for multiclass classification problems.

• Interpretable: The results are easy to interpret, as it provide the probability of belong-
ing to each class.

6.3 Disadvantages of Naive Bayes:

• Independence Assumption: The assumption of feature independence doesn’t al-
ways hold, which can affect accuracy.

• Sensitivity to Feature Distribution: It may not perform well when features have com-
plex, non-Gaussian distributions.

• Requires Sufficient Data: For some cases, Naive Bayes might not perform well when
there is a scarcity of data.

• Zero Probability Problem: If a feature-class combination does not exist in the training
data, the probability will be zero, causing issues. Smoothing techniques are often used
to address this.

35 36

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

7 Decision Trees
• A decision tree is a simple model for supervised classification. It is used for classifying
a single discrete target feature.

• Each internal node performs a Boolean test on an input feature (in general, a test may
have more than two options, but these can be converted to a series of Boolean tests).
The edges are labeled with the values of that input feature.

• Each leaf node specifies a value for the target feature

• Classifying an example using a decision tree is very intuitive. We traverse down the
tree, evaluating each test and following the corresponding edge. When a leaf is reached,
we return the classification on that leaf.

• Decision Tree is a Supervised learning technique that can be used for both classification
and Regression problems, but mostly it is preferred for solving Classification problems.
7.1 Terminologies
• It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome. • Root Node: A decision tree’s root node, which represents the original choice or
feature from which the tree branches, is the highest node.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
Decision nodes are used to make any decision and have multiple branches, whereas • Internal Nodes (Decision Nodes): Nodes in the tree whose choices are determined
Leaf nodes are the output of those decisions and do not contain any further branches. by the values of particular attributes. There are branches on these nodes that go to
other nodes.
• The decisions or the test are performed on the basis of features of the given dataset. It
is a graphical representation for getting all the possible solutions to a problem/decision • Leaf Nodes (Terminal Nodes): The branches’ termini, when choices or forecasts
based on given conditions. are decided upon. There are no more branches on leaf nodes.

• It is called a decision tree because, similar to a tree, it starts with the root node, which • Branches (Edges): Links between nodes that show how decisions are made in re-
expands on further branches and constructs a tree-like structure. sponse to particular circumstances.

• In order to build a tree, we use the CART algorithm, which stands for Classification • Splitting: The process of dividing a node into two or more sub-nodes based on a
and Regression Tree algorithm. decision criterion. It involves selecting a feature and a threshold to create subsets of
data.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further
split the tree into subtrees. • Parent Node: A node that is split into child nodes. The original node from which a
split originates.

• Child Node: Nodes created as a result of a split from a parent node.

• Decision Criterion: The rule or condition used to determine how the data should
be split at a decision node. It involves comparing feature values against a threshold.

• Pruning: The process of removing branches or nodes from a decision tree to improve
its generalization and prevent overfitting.

37 38

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

7.2 Measures of impurity • Gini index is a measure of impurity or purity used while creating a decision tree
in the CART(Classification and Regression Tree) algorithm.
While implementing a Decision tree, the main issue arises that how to select the best attribute
for the root node and for sub-nodes. So, to solve such problems there is a technique which is • An attribute with a low Gini index should be preferred as compared to the high
called as Attribute selection measure or ASM. By this measurement, we can easily select the Gini index.
best attribute for the nodes of the tree. There are two popular techniques for ASM, which • It only creates binary splits, and the CART algorithm uses the Gini index to
are: create binary splits.
1. Information Gain Gini index can be calculated using the below formula:

7.3 Advantages of Decision Trees:

• Interpretability: Decision Trees are easy to interpret, making them a good choice when
you need to explain or visualize the model’s decisions.

• Handling Non-linearity: Decision Trees can capture non-linear relationships between

features and the target variable.

• Feature Selection: They can automatically select the most important features, reducing
• Information gain is the measurement of changes in entropy after the segmentation the need for feature engineering.
of a dataset based on an attribute.
• It calculates how much information a feature provides us about a class. • Versatility: Decision Trees can handle both categorical and numerical data.
• According to the value of information gain, we split the node and build the decision • Efficiency: They are relatively efficient during prediction, with time complexity loga-
tree. rithmic in the number of data points.
• A decision tree algorithm always tries to maximize the value of information gain,
and a node/attribute having the highest information gain is split first. 7.4 Disadvantages of Decision Trees:
It can be calculated using the below formula: • Overfitting: Decision Trees can be prone to overfitting, creating complex models that
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature) don’t generalize well to new data. Pruning and setting appropriate parameters can
Entropy: Entropy is a metric to measure the impurity in a given attribute. help mitigate this.

• Bias Toward Dominant Classes: In classification tasks, Decision Trees can be biased
toward dominant classes, leading to imbalanced predictions.

• Instability: Small variations in the data can lead to different tree structures, making
It specifies randomness in data. Entropy can be calculated as: them unstable models.

• Greedy Algorithm: Decision Trees use a greedy algorithm, making locally optimal
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no) decisions at each node, which may not lead to the global optimal tree structure
2. Gini Index

39 40

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

8 Support Vector Machine • Hyperplane: There can be multiple lines/decision boundaries to segregate the classes
in n-dimensional space, but we need to find out the best decision boundary that helps
• Support Vector Machine is a system for efficiently training linear learning machines to classify the data points. This best boundary is known as the hyperplane of SVM.
in kernel-induced feature spaces, while respecting the insights of generalisation theory
and exploiting optimisation theory.’ • In a binary classification problem, an SVM finds a hyperplane that best separates the
data points of different classes. This hyperplane is the decision boundary.
• The goal of the SVM algorithm is to create the best line or decision boundary that
can segregate n-dimensional space into classes so that we can easily put the new data • The dimensions of the hyperplane depend on the features present in the dataset, which
point in the correct category in the future. This best decision boundary is called a means if there are 2 features (as shown in image), then hyperplane will be a straight
hyperplane. line. And if there are 3 features, then hyperplane will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maxi-
• SVMs pick best separating hyperplane according to some criterion e.g. maximum
mum distance between the data points.
margin
• Support Vectors:The data points or vectors that are the closest to the hyperplane
• Training process is an optimisation
and which affect the position of the hyperplane are termed as Support Vector. Since
• Training set is effectively reduced to a relatively small number of support vectors these vectors support the hyperplane, hence called a Support vector. They are critical
for defining the margin and determining the location of the hyperplane.
• SVM chooses the extreme points/vectors that help in creating the hyperplane. These
extreme cases are called as support vectors, and hence algorithm is termed as Support • Margin: The margin is the distance between the support vectors and the decision
Vector Machine. boundary. SVM aims to maximize this margin because a larger margin often leads to
better generalization.
Consider the below diagram in which there are two different categories that are classi-
fied using a decision boundary or hyperplane: • C Parameter: The regularization parameter ”C” controls the trade-off between maxi-
mizing the margin and minimizing the classification error. A smaller ”C” value results
in a larger margin but may allow some misclassifications, while a larger ”C” value
allows for fewer misclassifications but a smaller margin.

• Multi-Class Classification: SVMs are inherently binary classifiers, but they can be
extended to handle multi-class classification using techniques like one-vs-one (OvO) or
one-vs-all (OvA) classification.

• The Scalar Product:The scalar or dot product is, in some sense, a measure of
Similarity a.b = |a|.|b|cos(θ)

• Decision Function for binary classification

Here are the key concepts and characteristics of Support Vector Machines:

41 42

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

• Feature Spaces We may separate data by mapping to a higherdimensional feature

space
– The feature space may even have an infinite number of dimensions!
• We need not explicitly construct the new feature space

8.1 Kernels
We may use Kernel functions to implicitly map to a new feature space
• Kernel fn: K(x1 , x2 ) ∈ R
• Kernel must be equivalent to an inner product in some feature space
8.2 Classification Margin
• Kernel Trick: SVM can handle non-linearly separable data by using a kernel function
to map the data into a higher-dimensional space where it becomes linearly separable.
Common kernel functions include linear, polynomial, radial basis function (RBF), and
sigmoid kernels.

Examples of kernel functions

43 44

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

is to maximize this margin. The hyperplane with the maximum margin is called the
optimal hyperplane.

8.3 Types of SVM

1. Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset
can be classified into two classes by using a single straight line, then such data is termed
as linearly separable data, and classifier is used called as Linear SVM classifier.

2. Non-linear SVM: Non-linear SVM is used for non-linearly separated data, which means
The working of the SVM algorithm can be understood by using an example. Suppose
if a dataset cannot be classified by using a straight line, then such data is termed as
we have a dataset that has two tags (green and blue), and the dataset has two features
non-linear data and classifier used is called as Non-linear SVM classifier.
x1 and x2. We want a classifier that can classify the pair(x1, x2) of coordinates in
either green or blue. So as it is 2-d space so by just using a straight line, we can easily
separate these two classes. But there can be multiple lines that can separate these
classes.
Hence, the SVM algorithm helps to find the best line or decision boundary; this best
boundary or region is called as a hyperplane. SVM algorithm finds the closest point of
the lines from both the classes. These points are called support vectors. The distance
between the vectors and the hyperplane is called as margin. And the goal of SVM

45 46

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

9 Bias-Variance Trade-Off
• The goal of supervised machine learning is to learn or derive a target function that can
best determine the target variable from the set of input variables.
• A key consideration in learning the target function from the training data is the extent
of generalization. This is because the input data is just a limited, specific view and the
new, unknown data in the test data set may be differing quite a bit from the training
data.
• The fitness of a target function approximated by a learning algorithm determines how
correctly it is able to classify a set of data it has never seen.

9.1 Underfitting
• If the target function is kept too simple, it may not be able to capture the essential
nuances and represent the underlying data well.
8.4 Advantages of Support Vector Machines: • A typical case of underfitting may occur when trying to represent a non-linear data
• Effective in High-Dimensional Spaces: SVMs perform well even in high-dimensional with a linear model as demonstrated by both cases of underfitting shown in figure 1.1
feature spaces.
• Many times underfitting happens due to the unavailability of sufficient training data.
• Robust to Overfitting: SVMs are less prone to overfitting, especially when the margin • Underfitting results in both poor performance with training data as well as poor gen-
is maximized. Accurate for Non-Linear Data: The kernel trick allows SVMs to work eralization to test data. Underfitting can be avoided by
effectively on non-linear data by transforming it into higher dimensions. 1. using more training data
• Wide Applicability: SVMs can be applied to various tasks, including classification, 2. reducing features by effective feature selection
regression, and outlier detection.

• Strong Theoretical Foundation: SVMs are based on solid mathematical principles.

8.5 Disadvantages of Support Vector Machines:

• Computationally Intensive: Training an SVM can be computationally expensive, es-
pecially for large datasets.

• Sensitivity to Kernel Choice: The choice of the kernel function and kernel parameters
can significantly impact the SVM’s performance.

• Challenging for Large Datasets: SVMs may not be suitable for very large datasets
because of their computational complexity.

• Interpretability: The decision boundary learned by SVMs can be challenging to inter-

pret, especially in high-dimensional spaces.
Figure:1.1 Underfitting and Overfitting of models

47 48

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

9.2 Overfitting • Parametric models generally have high bias making them easier to understand/inter-
pret and faster to learn.
• Overfitting refers to a situation where the model has been designed in such a way that
it emulates the training data too closely. In such a case, any specific deviation in the • These algorithms have a poor performance on data sets, which are complex in nature
training data, like noise or outliers, gets embedded in the model. It adversely impacts and do not align with the simplifying assumptions made by the algorithm.
the performance of the model on the test data.
• Underfitting results in high bias.
• Overfitting, in many cases, occur as a result of trying to fit an excessively complex
model to closely match the training data. This is represented with a sample data set Errors due to ‘Variance’:
in figure 1.1. The target function, in these cases, tries to make sure all training data
• Errors due to variance occur from difference in training data sets used to train the
points are correctly partitioned by the decision boundary. However, more often than
model.
not, this exact nature is not replicated in the unknown test data set. Hence, the target
function results in wrong classification in the test data set. • Different training data sets (randomly sampled from the input data set) are used to
• Overfitting results in good performance with training data set, but poor generalization train the model. Ideally the difference in the data sets should not be significant and
and hence poor performance with test data set. Overfitting can be avoided by the model trained using different training data sets should not be too different.
1. using re-sampling techniques like k-fold cross validation • However, in case of overfitting, since the model closely matches the training data, even
2. hold back of a validation data set a small difference in training data gets magnified in the model.
3. remove the nodes which have little or no predictive power for the given machine
learning problem. So, the problems in training a model can either happen because either
(a) the model is too simple and hence fails to interpret the data grossly or
(b) the model is extremely complex and magnifies even small differences in the training data.
• Both underfitting and overfitting result in poor classification quality which is reflected
by low classification accuracy

9.3 Bias – variance trade-off

Bias-variance trade-off is a fundamental concept in machine learning that refers to the bal-
ance between two sources of error that affect the predictive performance of a model: bias
and variance.
Bias: Bias is the error due to overly simplistic assumptions in the learning algorithm. High
bias can lead to underfitting, where the model is too simple to capture the underlying pat-
terns in the data.
Variance: Variance is the error due to too much complexity in the learning algorithm. High
variance can lead to overfitting, where the model is overly sensitive to noise in the training
data and fails to generalize well to new, unseen data.
In supervised learning, the class value assigned by the learning model built based on the
training data may differ from the actual class value. This error in learning can be of two
types – errors due to ‘bias’ and error due to ‘variance’. Let’s try to understand each of them
in details.
Error due to ’Bias’: Key points about the bias-variance trade-off:

• Errors due to bias arise from simplifying assumptions made by the model to make the • Complex Models vs. Simple Models: Complex models (e.g., deep neural net-
target function less complex or easier to learn. In short, it is due to underfitting of the works) tend to have low bias but high variance, whereas simple models (e.g., linear
model. regression) tend to have high bias but low variance.

49 50

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

• Balancing Act: Machine learning practitioners aim to strike a balance between bias supervised machine learning is to achieve a balance between bias and variance. The learn-
and variance to achieve a model with good generalization, one that performs well on ing algorithm chosen and the user parameters which can be configured helps in striking a
both the training data and new, unseen data. tradeoff between bias and variance.
For example, in a popular supervised algorithm k-Nearest Neighbors or kNN, the user con-
• Underfitting and Overfitting: The trade-off helps address the problems of underfit- figurable parameter ‘k’ can be used to do a trade-off between bias and variance. In one hand,
ting (high bias) and overfitting (high variance). Underfit models don’t capture enough when the value of ‘k’ is decreased, the model becomes simpler to fit and bias increases. On
of the data’s complexity, while overfit models fit noise in the data. the other hand, when the value of ‘k’ is increased, the variance increases.
• Model Complexity: Adjusting model complexity, such as the number of features,
the choice of hyperparameters, and regularization techniques, is a way to manage the
bias-variance trade-off.

• Cross-Validation: Cross-validation techniques, like k-fold cross-validation, help es-

timate a model’s performance on unseen data and guide the selection of the optimal
model complexity.

• Generalization: Achieving good generalization, where a model performs well on new,

unseen data, is the ultimate goal of managing the bias-variance trade-off.

Important Note
Increasing the bias will decrease the variance, and Increasing the variance will decrease the
bias On one hand, parametric algorithms are generally seen to demonstrate high bias but
low variance. On the other hand, non-parametric algorithms demonstrate low bias and high
variance.

Figure 1.3.1

As can be observed in Figure 1.3.1, the best solution is to have a model with low bias
as well as low variance. However, that may not be possible in reality. Hence, the goal of

51 52

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

10 Cross-validation methods • Bias-Variance Trade-Off: It helps in managing the bias-variance trade-off. The
model’s performance is assessed under different training and test subsets, helping you
• When the dataset is small, the method is prone to high variance. Due to the random detect issues like overfitting or underfitting.
partition, the results can be entirely different for different test sets. To deal with this
issue, we use cross-validation to evaluate the performance of a machine-learning model. • Hyperparameter Tuning: K-Fold Cross-Validation is often used for hyperparameter
tuning. By trying different hyperparameters on different folds, you can choose the set
• In cross-validation, we don’t divide the dataset into training and test sets only once. of hyperparameters that yield the best average performance.
Instead, we repeatedly partition the dataset into smaller groups and then average the
performance in each group. That way, we reduce the impact of partition randomness • K-Fold Variations: Variations include stratified K-Fold, which ensures that each fold
on the results. has a similar class distribution, and repeated K-Fold, where the process is repeated
multiple times with different random splits.
• Many cross-validation techniques define different ways to divide the dataset at hand.
We’ll focus on the two most frequently used: the k-fold and the leave-one-out methods. • Performance: The final model performance is typically determined by averaging the
results of all ’k’ iterations, such as mean accuracy or root mean squared error.
10.1 K-Fold Cross-Validation • Trade-Off: There’s a trade-off between computational cost and model assessment
K-Fold Cross-Validation is a widely used technique in machine learning for assessing the quality. Larger ’k’ values lead to a more accurate assessment but require more compu-
performance and generalization ability of a model. It involves dividing the dataset into ’k’ tation.
subsets of approximately equal size, where one of these subsets is used as the test set, and • Usage: K-Fold Cross-Validation is widely used in various machine learning tasks,
the remaining ’k-1’ subsets are used as the training set. This process is repeated ’k’ times, including model selection, hyperparameter tuning, and performance estimation.
each time using a different subset as the test set.
• Validation Set: In practice, a separate validation set might be used to validate
the final model after hyperparameter tuning, while K-Fold Cross-Validation helps in
assessing the overall performance of the model.

10.2 Leave-One-Out Cross-Validation

Cross-validation methods, including leave-one-out (LOO) cross-validation, are techniques
used in machine learning to assess the performance and generalization ability of a model.
In the leave-one-out (LOO) cross-validation, we train our machine-learning model n times
where n is to our dataset’s size. Each time, only one sample is used as a test set while the
rest are used to train our model.

K-Fold Cross-Validation

Here are the key points about K-Fold Cross-Validation:

• Data Splitting: The dataset is divided into ’k’ subsets or folds, where each fold is
used as the test set exactly once, and the rest are used for training.

• Multiple Evaluations: K-Fold Cross-Validation enables multiple evaluations of the

model’s performance, providing a more reliable estimate of its generalization ability
compared to a single train-test split.

53 54

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

Comparison
An important factor when choosing between the k-fold and the LOO cross-validation methods
is the size of the dataset.
When the size is small, LOO is more appropriate since it will use more training samples in
each iteration. That will enable our model to learn better representations.
Conversely, we use k-fold cross-validation to train a model on a large dataset since LOO trains
n models, one per sample in the data. When our dataset contains a lot of samples, training
so many models will take too long. So, the k-fold cross-validation is more appropriate.
Also, in a large dataset, it is sufficient to use less than n folds since the test folds are large
enough for the estimates to be sufficiently precise.

Leave-One-Out Cross-Validation

Here’s an explanation of LOO cross-validation and its role:

• Principle: LOO cross-validation is a special case of k-fold cross-validation, where k is

equal to the number of data points (n) in the dataset. It involves splitting the dataset
into n subsets, each containing a single data point. For each iteration, one data point
is held out as the test set, and the remaining n-1 data points are used as the training
set. This process is repeated n times (once for each data point), and the model’s
performance is evaluated by averaging the results across all iterations.

• Comprehensive Evaluation: LOO cross-validation provides an exhaustive assess-

ment of a model’s performance, as each data point is used as a test set exactly once.
This makes it suitable for small to moderately sized datasets.

• Bias and Variance: It tends to produce a more reliable estimate of a model’s per-
formance as it reduces bias compared to other cross-validation methods like k-fold
cross-validation. However, LOO can have high variance due to its many iterations,
making it computationally expensive.

• Model Evaluation: LOO cross-validation allows you to assess how well the model
generalizes to unseen data and identify potential issues like overfitting or data leakage.

• Advantages: LOO is particularly useful when dealing with imbalanced datasets or

when each data point is scarce and valuable, as none are left out during training.

• Computational Cost: LOO cross-validation can be computationally expensive, es-

pecially for large datasets, as it requires training the model n times. Variance Esti-
mation: It’s valuable for estimating the variance of performance metrics, helping you
understand the stability and robustness of your model’s predictions.

55 56

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

11 Feedforward Neural Network (FNN) • Activation Functions

Non-linear functions applied to the weighted sum to introduce non-linearity and enable
Neural networks are a class of machine learning models inspired by the structure and function
the network to learn complex patterns.
of the human brain. Feedforward Neural Networks (FNN) represent the simplest form of
neural networks, where information travels in one direction, from the input layer to the • Weights
output layer.
Parameters that the network learns during training, determining the strength of con-
nections between neurons.
11.1 Architecture of a Feedforward Neural Network
• Biases
The architecture of a feedforward neural network consists of three types of layers: the input
layer, hidden layers, and the output layer. Each layer is made up of units known as neurons, Additional parameters that are added to the weighted sum before applying the acti-
and the layers are interconnected by weights. vation function, allowing the network to better fit the data.

• Input Layer: This layer consists of neurons that receive inputs and pass them on 11.3 Feedforward Process
to the next layer. The number of neurons in the input layer is determined by the
dimensions of the input data. 1. The input data is fed into the input layer.

• Hidden Layers: These layers are not exposed to the input or output and can be 2. Each neuron in the hidden layers processes the input using weights, biases, and acti-
considered as the computational engine of the neural network. Each hidden layer’s vation functions.
neurons take the weighted sum of the outputs from the previous layer, apply an acti-
3. The output from each hidden layer is passed to the next layer.
vation function, and pass the result to the next layer. The network can have zero or
more hidden layers. 4. This process continues until the output layer produces the final prediction.
• Output Layer: The final layer that produces the output for the given inputs. The
number of neurons in the output layer depends on the number of possible outputs the 11.4 How Feedforward Neural Networks Work
network is designed to produce.
The working of a feedforward neural network involves two phases: the feedforward phase
• Each neuron in one layer is connected to every neuron in the next layer, making this a and the backpropagation phase.
fully connected network. The strength of the connection between neurons is represented
• Feedforward Phase: In this phase, the input data is fed into the network, and it
by weights, and learning in a neural network involves updating these weights based on
propagates forward through the network. At each hidden layer, the weighted sum of
the error of the output.
the inputs is calculated and passed through an activation function, which introduces
The input and hidden layers use sigmoid and linear activation functions whereas the output non-linearity into the model. This process continues until the output layer is reached,
layer uses a Heaviside step activation function at nodes because it is a two-step activation and a prediction is made.
function that helps in predicting results as per requirements. All units also known as neurons
• Backpropagation Phase: Once a prediction is made, the error (difference between
have weights and calculation at the hidden layer is the summation of the dot product of all
the predicted output and the actual output) is calculated. This error is then propagated
weights and their signals and finally the sigmoid function of the calculated sum. Multiple
back through the network, and the weights are adjusted to minimize this error. The
hidden and output layer increases the accuracy of the output.
process of adjusting weights is typically done using a gradient descent optimization
algorithm.
11.2 Neurons,Activation Functions, Weights and Biases
• Neurons
Nodes in the network that receive inputs, perform a weighted sum, and pass the result
through an activation function.

57 58

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

11.5 Optimization Techniques 12 Multi-Layer Perceptron

• Gradient Descent • A Multi-Layer Perceptron (MLP) is a class of feedforward artificial neural networks,
An iterative optimization algorithm that adjusts weights and biases to minimize the often used in machine learning and deep learning for various tasks, such as classifi-
loss function. cation, regression, and pattern recognition. An MLP consists of multiple layers of
interconnected nodes, where information flows in one direction, from the input layer
• Learning Rate to the output layer, without feedback loops.
A hyperparameter that determines the step size in the weight and bias updates.
• A multi-layered perceptron consists of interconnected neurons transferring information
• Mini-Batch Gradient Descent to each other, much like the human brain. Each neuron is assigned a value. The
network can be divided into three main layers.
An optimization technique that processes the training data in small batches to speed
up convergence. • The MLP is a feedforward neural network, which means that the data is transmitted
from the input layer to the output layer in the forward direction.
11.6 Common Activation Functions
• The connections between the layers are assigned weights. The weight of a connection
• Sigmoid specifies its importance. This concept is the backbone of an MLP’s learning process.
Outputs values between 0 and 1, commonly used in the output layer for binary classi- • While the inputs take their values from the surroundings, the values of all the other
fication. neurons are calculated through a mathematical function involving the weights and
• Hyperbolic Tangent (tanh) values of the layer before it.

Similar to the sigmoid but outputs values between -1 and 1, often used in hidden layers.

• Rectified Linear Unit (ReLU)

Outputs the input for positive values and zero for negative values, widely used in
hidden layers.

Overfitting
When a model performs well on the training data but poorly on new, unseen data.

Regularization Techniques
Methods like dropout and L2 regularization are employed to prevent overfitting by pe-
nalizing overly complex models.

Applications of Feedforward Neural Networks

Image and speech recognition, natural language processing, financial forecasting, and
many other tasks where complex patterns need to be learned from data.

Multi-Layer Perceptron

59 60

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com
GATE in Data Science and AI study material GATE in Data Science and AI study material

12.1 Architecture: References

• Input Layer: The input layer is responsible for receiving data from the outside world. • Lecture Notes in MACHINE LEARNING, by Dr V N Krishnachandran
Each neuron in the input layer corresponds to one feature, and the values from the
dataset are directly fed into these neurons. • https://alexjungaalto.github.io/MLBasicsBook.pdf

• Hidden Layers: Between the input and output layers, there can be one or more • Taeho Jo Machine Learning Foundations Supervised, Unsupervised, and Advanced
hidden layers. These layers contain neurons, also known as units or nodes, which are Learning Springer book
responsible for learning complex patterns and relationships in the data. Hidden layers
add the capacity to model non-linear functions. An MLP can have a varying number • IIT Madras BS Degree Lectures and Notes
of hidden layers and units, depending on the problem’s complexity. • NPTEL Lectures and Slides
• Output Layer: The output layer is responsible for producing the final results or • www.medium.com
predictions. The number of output neurons depends on the nature of the task. For
instance, in binary classification, there might be a single output neuron that outputs • geeksforgeeks.org/
the probability of belonging to one class, while in multi-class classification, there could
be multiple output neurons, each corresponding to a class. • javatpoint.com/

12.2 Backpropagation
• Backpropagation is a technique used to optimize the weights of an MLP using the
outputs as inputs.
• In a conventional MLP, random weights are assigned to all the connections. These
random weights propagate values through the network to produce the actual output.
Naturally, this output would differ from the expected output. The difference between
the two values is called the error.
• Backpropagation refers to the process of sending this error back through the network,
readjusting the weights automatically so that eventually, the error between the actual
and expected output is minimized.
• In this way, the output of the current iteration becomes the input and affects the next
output. This is repeated until the correct output is produced. The weights at the end
of the process would be the ones on which the neural network works correctly.

12.3 Hyperparameter Tuning:

• Architecture: Determining the number of hidden layers, the number of neurons in
each layer, and the choice of activation functions are important architectural decisions.
• Learning Rate:Choosing an appropriate learning rate is crucial for effective training,
as it controls the size of weight updates during backpropagation.
• Regularization: Regularization techniques like dropout, L1, and L2 regularization
are used to prevent overfitting.

61 62

For GATE DA Crash Course, visit: www.piyushwairale.com For GATE DA Crash Course, visit: www.piyushwairale.com

ML Supervised Full Notes
No ratings yet
ML Supervised Full Notes
62 pages
ML Module2-Chapter 1
No ratings yet
ML Module2-Chapter 1
50 pages
Unit 1 ML
No ratings yet
Unit 1 ML
30 pages
AI&ML BM4251 Unit 1-5 Notes
No ratings yet
AI&ML BM4251 Unit 1-5 Notes
116 pages
Machine Learning 1
No ratings yet
Machine Learning 1
29 pages
ML Module 1 (Bcs602)
No ratings yet
ML Module 1 (Bcs602)
48 pages
Textbook ML - Removed - Removed - Removed
No ratings yet
Textbook ML - Removed - Removed - Removed
42 pages
Week3 02 Dataset Characteristics
No ratings yet
Week3 02 Dataset Characteristics
41 pages
1 - Module5 - Machine Learning
100% (1)
1 - Module5 - Machine Learning
78 pages
Machine Learning Unit-1.1
No ratings yet
Machine Learning Unit-1.1
43 pages
Chapter 2 Notes
No ratings yet
Chapter 2 Notes
16 pages
Reinforcement Learning: Parallelizing Genetic Algorithms
No ratings yet
Reinforcement Learning: Parallelizing Genetic Algorithms
5 pages
22K61A0203-Ml AddPage AddPage AddPage Removed Removed (2) AddPage Removed (1) AddPage
No ratings yet
22K61A0203-Ml AddPage AddPage AddPage Removed Removed (2) AddPage Removed (1) AddPage
66 pages
Module 2
No ratings yet
Module 2
28 pages
Machine Learning Techniques (KCS 055) 29.09.2023
No ratings yet
Machine Learning Techniques (KCS 055) 29.09.2023
96 pages
Machine Learning Techniques (KCS 055) 26.09.2023
No ratings yet
Machine Learning Techniques (KCS 055) 26.09.2023
58 pages
Machine Learning Techniques (KCS 055) 06.09.2023
No ratings yet
Machine Learning Techniques (KCS 055) 06.09.2023
34 pages
ML Unit 1 Pallav
No ratings yet
ML Unit 1 Pallav
22 pages
ML Notes All
No ratings yet
ML Notes All
257 pages
MAchine Learning Notes
No ratings yet
MAchine Learning Notes
41 pages
Bcs602-Ml-mod 1 & 2
No ratings yet
Bcs602-Ml-mod 1 & 2
235 pages
AIML Module-2.2 Notes
No ratings yet
AIML Module-2.2 Notes
55 pages
Aiml Co - 3,4 Notes
No ratings yet
Aiml Co - 3,4 Notes
98 pages
Machine Learning BCS602
No ratings yet
Machine Learning BCS602
81 pages
Machine Learning - ch1
No ratings yet
Machine Learning - ch1
46 pages
Introduction To Data in Machine Learning
No ratings yet
Introduction To Data in Machine Learning
12 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
8 pages
Unit No. 1
No ratings yet
Unit No. 1
73 pages
L1 - Introduction To Data Science
No ratings yet
L1 - Introduction To Data Science
33 pages
Textbook ML - Removed - Removed - Removed - Removed
No ratings yet
Textbook ML - Removed - Removed - Removed - Removed
40 pages
Ch-1 Notes
No ratings yet
Ch-1 Notes
7 pages
AIML Module-3
No ratings yet
AIML Module-3
31 pages
Introduction To AI and ML - Day 1: Gururajan Narasimhan Erode
No ratings yet
Introduction To AI and ML - Day 1: Gururajan Narasimhan Erode
39 pages
Index: Unit No Topic Page No
No ratings yet
Index: Unit No Topic Page No
5 pages
Iml Material
No ratings yet
Iml Material
139 pages
ML
No ratings yet
ML
79 pages
Ch7 Introduction To Machine Learning
No ratings yet
Ch7 Introduction To Machine Learning
29 pages
Data in Machine Learning
No ratings yet
Data in Machine Learning
7 pages
Ai 1
No ratings yet
Ai 1
28 pages
Data Science Using ML and AI at Learn N Build Yashvardhan 19evj
No ratings yet
Data Science Using ML and AI at Learn N Build Yashvardhan 19evj
22 pages
ML Notes
No ratings yet
ML Notes
7 pages
Unit 2 - Advance Concepts of Modelling in AI
No ratings yet
Unit 2 - Advance Concepts of Modelling in AI
12 pages
Module-1 Notes-Bcs602
No ratings yet
Module-1 Notes-Bcs602
32 pages
ML Unit 1-Notes
No ratings yet
ML Unit 1-Notes
21 pages
Domains of AI
No ratings yet
Domains of AI
11 pages
Module 1
No ratings yet
Module 1
80 pages
AI - Unit - 5 - Machine Learning
No ratings yet
AI - Unit - 5 - Machine Learning
27 pages
ML 1
No ratings yet
ML 1
21 pages
Textbook ML - Removed - Removed
No ratings yet
Textbook ML - Removed - Removed
44 pages
AIML Text Book 6th Semister
No ratings yet
AIML Text Book 6th Semister
226 pages
Module 1 Part 2
No ratings yet
Module 1 Part 2
19 pages
Machine Learning Unit-1.1
No ratings yet
Machine Learning Unit-1.1
29 pages
My ML Notes
No ratings yet
My ML Notes
6 pages
MACHINELEARING UNIT 1material
100% (1)
MACHINELEARING UNIT 1material
64 pages
191AIC502T - Machine Learning - Unit 1
No ratings yet
191AIC502T - Machine Learning - Unit 1
41 pages
ML - 1 - Sovan - Introduction To ML
No ratings yet
ML - 1 - Sovan - Introduction To ML
83 pages
Module-1 ML
No ratings yet
Module-1 ML
113 pages
03-07-2024-Data Science - Orentation Programme
No ratings yet
03-07-2024-Data Science - Orentation Programme
53 pages
AI Azure Basics
No ratings yet
AI Azure Basics
17 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
5 pages
Telecom Fraud Detection Based On Machine Learning
No ratings yet
Telecom Fraud Detection Based On Machine Learning
58 pages
Lizarralde Proofs
No ratings yet
Lizarralde Proofs
11 pages
Optimization Problems For Machine Learning: A Survey
No ratings yet
Optimization Problems For Machine Learning: A Survey
41 pages
AI Intro-1
No ratings yet
AI Intro-1
225 pages
Unit 1 (MLT) Lecture Notes 1 Unit 1mlt Lecture Notes 1
No ratings yet
Unit 1 (MLT) Lecture Notes 1 Unit 1mlt Lecture Notes 1
18 pages
PA Wk1
No ratings yet
PA Wk1
7 pages
NNDL
No ratings yet
NNDL
195 pages
Human Annotator For Imbalanced Dossier
No ratings yet
Human Annotator For Imbalanced Dossier
11 pages
Jailbreak GPT Handbook by Zsec
No ratings yet
Jailbreak GPT Handbook by Zsec
15 pages
AI Assignment 2
No ratings yet
AI Assignment 2
5 pages
Final Report
No ratings yet
Final Report
38 pages
Training Report On Machine Learning
No ratings yet
Training Report On Machine Learning
27 pages
Ai and Machine Learning For Business
No ratings yet
Ai and Machine Learning For Business
114 pages
Unsupervised Learning Using Back Propagation in Neural Networks
No ratings yet
Unsupervised Learning Using Back Propagation in Neural Networks
4 pages
بنك اسئلة للمراجعة - تنقيب بيانات
No ratings yet
بنك اسئلة للمراجعة - تنقيب بيانات
24 pages
Advances of Machine Learning in Multi-Energy District Communities 2022
No ratings yet
Advances of Machine Learning in Multi-Energy District Communities 2022
28 pages
Unit 3
No ratings yet
Unit 3
80 pages
Thesis Book 2
No ratings yet
Thesis Book 2
57 pages
Artificial Neural Networks Methodological Advances and Biomedical Applications Kenji Suzuki PDF Download
No ratings yet
Artificial Neural Networks Methodological Advances and Biomedical Applications Kenji Suzuki PDF Download
80 pages
Deep Semi-Supervised Anomaly Detection For Finding Fraud in The Futures Market
No ratings yet
Deep Semi-Supervised Anomaly Detection For Finding Fraud in The Futures Market
35 pages
How To Choose The Right Machine Learning Algorithm
No ratings yet
How To Choose The Right Machine Learning Algorithm
10 pages
RAGTruth - A Hallucination Corpus For Developing Trustworthy Retrieval-Augmented Language Models
No ratings yet
RAGTruth - A Hallucination Corpus For Developing Trustworthy Retrieval-Augmented Language Models
16 pages
Capstone 1 Gantt Chart
No ratings yet
Capstone 1 Gantt Chart
8 pages
Artificial Neural Networks (ch7)
No ratings yet
Artificial Neural Networks (ch7)
12 pages
Anomaly Detection Report
No ratings yet
Anomaly Detection Report
33 pages
Seminar Report
No ratings yet
Seminar Report
69 pages
Denoising As Adaptation: Noise-Space Domain Adaptation For Image Restoration
No ratings yet
Denoising As Adaptation: Noise-Space Domain Adaptation For Image Restoration
22 pages
MLOps Brochure BITS
No ratings yet
MLOps Brochure BITS
27 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.