0% found this document useful (0 votes)
24 views32 pages

deep learning notes

Deep learning is an AI method that mimics human brain processes to recognize patterns in data, with applications in various fields such as healthcare and self-driving cars. However, it faces challenges like large data requirements, lack of interpretability, and biases. Strategies to overcome these challenges include ensuring data quality and prioritizing interpretable models when necessary.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views32 pages

deep learning notes

Deep learning is an AI method that mimics human brain processes to recognize patterns in data, with applications in various fields such as healthcare and self-driving cars. However, it faces challenges like large data requirements, lack of interpretability, and biases. Strategies to overcome these challenges include ensuring data quality and prioritizing interpretable models when necessary.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

 Deep learning is a method in artificial intelligence (AI) that teaches computers

to process data in a way that is inspired by the human brain. Deep learning
models can recognize complex patterns in pictures, text, sounds, and other
data to produce accurate insights and predictions.application are

1.virtual assistance
2. Chatbots
3. Healthcare
4. Entertainment
5. News Aggregation and Fake News Detection
6. Composing Music
7. Image Coloring
8. Robotics
9. Image Captioning
10. Advertising
11. Self Driving Cars
12. Natural Language Processing
13. Visual Recognition
14. Fraud Detection
15. Personalisations
16. Detecting Developmental Delay in Children
17. Colourisation of Black and White images
18. Adding Sounds to Silent Movies
19. Automatic Machine Translation
20. Automatic Handwriting Generation
21. Automatic Game Playing
22. Language Translations
23. Pixel Restoration
24. Demographic and Election Predictions
25. Deep Dreaming
### Challenges in Deep Learning:

1. **Large Data Requirements**: Deep learning algorithms require vast amounts of


data for training to deliver accurate results. More data leads to more parameters
needing tuning, increasing the complexity and computational demands[1][2][3].
2. **Lack of Interpretability**: Neural networks are often considered black boxes,
making it challenging for researchers to understand how they arrive at specific
conclusions. This lack of interpretability can hinder implementation in domains
requiring transparency[1][2].
3. **Limited Flexibility and Multitasking**: Deep learning models are highly
specialized for specific tasks and struggle with multitasking. Retraining is often
necessary even for similar problems, limiting their adaptability[1][2].
4. **Biases and Data Quality**: Deep learning models trained on biased data can
perpetuate biases in their predictions. Ensuring data quality, addressing biases,
and implementing debiasing techniques are crucial for reliable outcomes[2][3].
5. **Robustness to Adversarial Attacks**: Deep learning models are vulnerable to
adversarial attacks, where small perturbations in input data can lead to
misclassification or erroneous outputs. Developing robust models to withstand
such attacks is essential[3].
6. 6. **Catastrophic Forgetting**: In online deep learning, models may forget
previously learned information when exposed to new data, leading to performance
degradation. Addressing this phenomenon is crucial for continual learning without
forgetting past knowledge[3].

To overcome these challenges, strategies such as ensuring high-quality


data, addressing biases, optimizing computing costs, implementing privacy-
protecting techniques, and prioritizing interpretable models over deep learning in
certain cases can be employed[4]. These approaches aim to enhance the
effectiveness, reliability, and applicability of deep learning systems in various
domains.
Artificial Intlligence Machine Learning Deep Learning

AI stands for Artificial DL stands for Deep


Intelligence, and is ML stands for Machine Learning, and is the
basically the Learning, and is the study that makes use of
study/process which study that uses statistical Neural Networks(similar
enables machines to methods enabling to neurons present in
mimic human machines to improve with human brain) to imitate
behaviour through experience. functionality just like a
particular algorithm. human brain.

AI is the broader
family consisting of ML
ML is the subset of AI. DL is the subset of ML.
and DL as it’s
components.

AI is a computer ML is an AI algorithm DL is a ML algorithm that


Artificial Intlligence Machine Learning Deep Learning

uses deep(more than


algorithm which
one layer) neural
exhibits intelligence which allows system to
networks to analyze
through decision learn from data.
data and provide output
making.
accordingly.

If you are clear about


If you have a clear idea
the math involved in it
about the logic(math)
but don’t have idea
involved in behind and
about the features, so
Search Trees and you can visualize the
you break the complex
much complex math is complex functionalities
functionalities into
involved in AI. like K-Mean, Support
linear/lower dimension
Vector Machines, etc.,
features by adding more
then it defines the ML
layers, then it defines
aspect.
the DL aspect.

It attains the highest


The aim is to basically
The aim is to increase rank in terms of
increase chances of
accuracy not caring much accuracy when it is
success and not
about the success ratio. trained with large
accuracy.
amount of data.

DL can be considered as
neural networks with a
large number of
Three broad parameters layers lying
categories/types Of AI Three broad in one of the four
are: Artificial Narrow categories/types Of ML fundamental network
Intelligence (ANI), are: Supervised Learning, architectures:
Artificial General Unsupervised Learning Unsupervised Pre-
Intelligence (AGI) and and Reinforcement trained Networks,
Artificial Super Learning Convolutional Neural
Intelligence (ASI) Networks, Recurrent
Neural Networks and
Recursive Neural
Networks

The efficiency Of AI is Less efficient than DL as


More powerful than ML
basically the efficiency it can’t work for longer
as it can easily work for
provided by ML and dimensions or higher
larger sets of data.
DL respectively. amount of data.
Artificial Intlligence Machine Learning Deep Learning

Examples of AI
applications include:
Examples of ML Examples of DL
Google’s AI-Powered
applications include: applications include:
Predictions,
Virtual Personal Sentiment based news
Ridesharing Apps Like
Assistants: Siri, Alexa, aggregation, Image
Uber and Lyft,
Google, etc., Email Spam analysis and caption
Commercial Flights
and Malware Filtering. generation, etc.
Use an AI Autopilot,
etc.

AI refers to the broad


field of computer
science that focuses ML is a subset of AI that
DL is a subset of ML that
on creating intelligent focuses on developing
focuses on developing
machines that can algorithms that can learn
deep neural networks
perform tasks that from data and improve
that can automatically
would normally their performance over
learn and extract
require human time without being
features from data.
intelligence, such as explicitly programmed.
reasoning, perception,
and decision-making.

ML algorithms can be
categorized as
supervised,
AI can be further unsupervised, or DL algorithms are
broken down into reinforcement learning. inspired by the structure
various subfields such In supervised learning, and function of the
as robotics, natural the algorithm is trained human brain, and they
language processing, on labeled data, where are particularly well-
computer vision, the desired output is suited to tasks such as
expert systems, and known. In unsupervised image and speech
more. learning, the algorithm is recognition.
trained on unlabeled
data, where the desired
output is unknown.

AI systems can be In reinforcement DL networks consist of


rule-based, learning, the algorithm multiple layers of
knowledge-based, or learns by trial and error, interconnected neurons
data-driven. receiving feedback in the that process data in a
form of rewards or hierarchical manner,
punishments. allowing them to learn
increasingly complex
Artificial Intlligence Machine Learning Deep Learning

representations of the
data.

Artificial Neural Networks (ANNs) are computational models inspired by the


human brain's neural structure. They consist of interconnected artificial neurons
organized in layers, including an input layer, hidden layers, and an output
layer[3]. Each neuron processes inputs, applies weights, computes a sum, and
passes the result through an activation function to produce an output[1][2]. The
connections between neurons have weights that determine their influence on
each other, allowing the network to learn from data and generate outputs[2].

In training ANNs, a training set is used to adjust the network's weights based on
the error between predicted and actual outcomes. This process involves forward
propagation of data through the network, comparing predictions with actual
results, computing errors, and then backpropagating these errors to update the
weights iteratively until the network can make accurate predictions[2][5]. The
learning process involves adjusting thresholds and weights based on a cost
function to minimize errors and improve accuracy over time[5].

Different types of ANNs exist, such as feedforward neural networks which


process data in a single direction through layers of neurons. These networks are
fundamental in various applications like computer vision and natural language
processing[3][5]. ANNs have become powerful tools in artificial intelligence and
machine learning, enabling tasks like image recognition, speech recognition,
and classification with high efficiency[3].

Overall, ANNs serve as versatile tools capable of learning complex patterns and
making decisions similar to the human brain's cognitive processes. They have
diverse applications across industries like social media, healthcare, finance, and
more due to their ability to learn from data and generalize patterns for various
tasks[2][5].
Artificial Neural Networks (ANNs) come in various types, each designed for
specific tasks and applications. Some fundamental types include:

1. **Feedforward Neural Network**: The simplest type where information flows


in one direction from input to output layer through hidden layers. It is used in
applications like speech recognition and computer vision[4][5].

2. **Radial Basis Function Neural Network**: Utilizes radial basis functions to


categorize data based on trends and distances from a center. It is commonly
used to model underlying trends or functions[4].

3. **Recurrent Neural Network (RNN)**: Feedbacks the output of a layer back to


the input, allowing it to retain information over time steps. RNNs are suitable for
tasks involving time-dependent behavior and sequential data processing[4].
4. **Modular Neural Networks**: Comprise independent networks working
collectively to solve tasks, breaking down complex problems into smaller sub-
tasks for efficient computation[3].

5. **Associative Neural Network**: Features memory that can adapt instantly to


new data without retraining, improving predictive ability and data
approximation[2].

These types represent a subset of the diverse range of ANNs available, each
tailored to specific requirements within machine learning and artificial
intelligence applications[1][2][3].

Perceptron is Machine Learning algorithm for supervised learning of various


binary classification tasks. Further, Perceptron is also understood as an
Artificial Neuron or neural network unit that helps to detect certain
input data computations in business intelligence.

Perceptron model is also treated as one of the best and simplest types of
Artificial Neural networks. However, it is a supervised learning algorithm of
binary classifiers. Hence, we can consider it as a single-layer neural network
with four main parameters, i.e., input values, weights and Bias, net sum,
and an activation function.

Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:

o Input Nodes or Input Layer:

This is the primary component of Perceptron which accepts the initial data into
the system for further processing. Each input node contains a real numerical
value.
o Wight and Bias:

Weight parameter represents the strength of the connection between units. This
is another most important parameter of Perceptron components. Weight is
directly proportional to the strength of the associated input neuron in deciding
the output. Further, Bias can be considered as the line of intercept in a linear
equation.

o Activation Function:

These are the final and important components that help to determine whether
the neuron will fire or not. Activation Function can be considered primarily as a
step function.

Types of Activation functions:

o Sign function
o Step function, and
o Sigmoid function

Perceptron model works in two important steps as follows:

Step-1

In the first step first, multiply all input values with corresponding weight values
and then add them to determine the weighted sum. Mathematically, we can
calculate the weighted sum as follows:

∑wi*xi = x1*w1 + x2*w2 +…wn*xn

Add a special term called bias 'b' to this weighted sum to improve the model's
performance.

∑wi*xi + b
Step-2

In the second step, an activation function is applied with the above-mentioned


weighted sum, which gives us output either in binary form or a continuous value
as follows:

Y = f(∑wi*xi + b)

Types of Perceptron Models

Based on the layers, Perceptron models are divided into two types. These are as
follows:

1. Single-layer Perceptron Model


2. Multi-layer Perceptron model

Single Layer Perceptron Model:

This is one of the easiest Artificial neural networks (ANN) types. A single-layered
perceptron model consists feed-forward network and also includes a threshold
transfer function inside the model. The main objective of the single-layer
perceptron model is to analyze the linearly separable objects with binary
outcomes.

In a single layer perceptron model, its algorithms do not contain recorded data,
so it begins with inconstantly allocated input for weight parameters. Further, it
sums up all inputs (weight). After adding all inputs, if the total sum of all inputs
is more than a pre-determined value, the model gets activated and shows the
output value as +1.

If the outcome is same as pre-determined or threshold value, then the


performance of this model is stated as satisfied, and weight demand does not
change. However, this model consists of a few discrepancies triggered when
multiple weight inputs values are fed into the model. Hence, to find desired
output and minimize errors, some changes should be necessary for the weights
input.

"Single-layer perceptron can learn only linearly separable patterns."

Multi-Layered Perceptron Model:

Like a single-layer perceptron model, a multi-layer perceptron model also has


the same model structure but has a greater number of hidden layers.

The multi-layer perceptron model is also known as the Backpropagation


algorithm, which executes in two stages as follows:
o Forward Stage: Activation functions start from the input layer in the
forward stage and terminate on the output layer.
o Backward Stage: In the backward stage, weight and bias values are
modified as per the model's requirement. In this stage, the error between
actual output and demanded originated backward on the output layer and
ended on the input layer.

Hence, a multi-layered perceptron model has considered as multiple artificial


neural networks having various layers in which activation function does not
remain linear, similar to a single layer perceptron model. Instead of linear,
activation function can be executed as sigmoid, TanH, ReLU, etc., for
deployment.

A multi-layer perceptron model has greater processing power and can process
linear and non-linear patterns. Further, it can also implement logic gates such as
AND, OR, XOR, NAND, NOT, XNOR, NOR.

Advantages of Multi-Layer Perceptron:

o A multi-layered perceptron model can be used to solve complex non-linear


problems.
o It works well with both small and large input data.
o It helps us to obtain quick predictions after the training.
o It helps to obtain the same accuracy ratio with large as well as small data.

Disadvantages of Multi-Layer Perceptron:

o In Multi-layer perceptron, computations are difficult and time-consuming.


o In multi-layer Perceptron, it is difficult to predict how much the dependent
variable affects each independent variable.
o The model functioning depends on the quality of the training.

Perceptron Function

Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x'
with the learned weight coefficient 'w'.

Mathematically, we can express it as follows:

f(x)=1; if w.x+b>0

otherwise, f(x)=0

o 'w' represents real-valued weights vector


o 'b' represents the bias
o 'x' represents a vector of input x values.

Characteristics of Perceptron

The perceptron model has the following characteristics.

1. Perceptron is a machine learning algorithm for supervised learning of


binary classifiers.
2. In Perceptron, the weight coefficient is automatically learned.
3. Initially, weights are multiplied with input features, and the decision is
made whether the neuron is fired or not.
4. The activation function applies a step rule to check whether the weight
function is greater than zero.
5. The linear decision boundary is drawn, enabling the distinction between
the two linearly separable classes +1 and -1.
6. If the added sum of all input values is more than the threshold value, it
must have an output signal; otherwise, no output will be shown.

ADVERTISEMENT

Limitations of Perceptron Model

A perceptron model has limitations as follows:

o The output of a perceptron can only be a binary number (0 or 1) due to


the hard limit transfer function.
o Perceptron can only be used to classify the linearly separable sets of input
vectors. If input vectors are non-linear, it is not easy to classify them
properly.

Xor gate===

XOR gate (sometimes EOR, or EXOR and pronounced as Exclusive OR) is a


digital logic gate that gives a true (1 or HIGH) output when the number of true
inputs is odd. An XOR gate implements an exclusive or ) from mathematical
logic; that is, a true output results if one, and only one, of the inputs to the gate
is true. If both inputs are false (0/LOW) or both are true, a false output results.
XOR represents the inequality function, i.e., the output is true if the inputs are
not alike otherwise the output is false. A way to remember XOR is "must have
one or the other but not both".
An XOR gate may serve as a "programmable inverter" in which one input
determines whether to invert the other input, or to simply pass it along with no
change. Hence it functions as a inverter (a NOT gate) which may be activated or

deactivated by a switch.[1][2]

Activation Function to use in the hidden layer as well as at the output layer of
the network.The activation function decides whether a neuron should be
activated or not by calculating the weighted sum and further adding bias to it.
The purpose of the activation function is to introduce non-linearity into the
output of a neuron.
Hidden layer i.e. layer 1:
z(1) = W(1)X + b(1) a(1)
 z(1) is the vectorized output of layer 1
 W(1) be the vectorized weights assigned to neurons of hidden layer i.e. w1,
w2, w3 and w4
 X be the vectorized input features i.e. i1 and i2
 b is the vectorized bias assigned to neurons in hidden layer i.e. b1 and b2
 a(1) is the vectorized form of any linear function.

Layer 2 i.e. output layer :-


Note : Input for layer 2 is output from layer 1
z(2) = W(2)a(1) + b(2)
a(2) = z(2)
Calculation at Output layer
z(2) = (W(2) * [W(1)X + b(1)]) + b(2)
z(2) = [W(2) * W(1)] * X + [W(2)*b(1) + b(2)]
Let,
[W(2) * W(1)] = W
[W(2)*b(1) + b(2)] = b
Final output : z(2) = W*X + b
Variants of Activation Function

Linear Function

 Equation : Linear function has the equation similar to as of a straight line


i.e. y = x
 No matter how many layers we have, if all are linear in nature, the final
activation function of last layer is nothing but just a linear function of the
input of first layer.
 Range : -inf to +inf
 Uses : Linear activation function is used at just one place i.e. output
layer.
 Issues : If we will differentiate linear function to bring non-linearity, result
will no more depend on input “x” and function will become constant, it
won’t introduce any ground-breaking behavior to our algorithm.
For example : Calculation of price of a house is a regression problem. House
price may have any big/small value, so we can apply linear activation at output
layer. Even in this case neural net must have any non-linear function at hidden
layers.

Sigmoid Function

It is a function which is plotted as ‘S’ shaped graph.

 Equation : A = 1/(1 + e-x)


 Nature : Non-linear. Notice that X values lies between -2 to 2, Y values are
very steep. This means, small changes in x would also bring about large
changes in the value of Y.
 Value Range : 0 to 1
 Uses : Usually used in output layer of a binary classification, where result is
either 0 or 1, as value for sigmoid function lies between 0 and 1 only so,
result can be predicted easily to be 1 if value is greater
than 0.5 and 0 otherwise.

Tanh Function
The activation that works almost always better than sigmoid function
is Tanh function also known as Tangent Hyperbolic function. It’s
actually mathematically shifted version of the sigmoid function. Both
are similar and can be derived from each other.

 Equation :-

 Value Range :- -1 to +1
 Nature :- non-linear
 Uses :- Usually used in hidden layers of a neural network as it’s values lies
between -1 to 1 hence the mean for the hidden layer comes out be 0 or
very close to it, hence helps in centering the data by bringing mean close to
0. This makes learning for the next layer much easier.

RELU Function

 It Stands for Rectified linear unit. It is the most widely used activation
function. Chiefly implemented in hidden layers of Neural network.
 Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0
otherwise.
 Value Range :- [0, inf)

Nature :- non-linear, which means we can easily backpropagate the errors


and have multiple layers of neurons being activated by the ReLU function.
 Uses :- ReLu is less computationally expensive than tanh and sigmoid
because it involves simpler mathematical operations. At a time only a few
neurons are activated making the network sparse making it efficient and
easy for computation.
In simple words, RELU learns much faster than sigmoid and Tanh function.

Softmax Function

The softmax function is also a type of sigmoid function but is handy when we
are trying to handle multi- class classification problems.
 Nature :- non-linear
 Uses :- Usually used when trying to handle multiple classes. the softmax
function was commonly found in the output layer of image classification
problems.The softmax function would squeeze the outputs for each class
between 0 and 1 and would also divide by the sum of the outputs.

 Output:- The softmax function is ideally used in the output layer of the
classifier where we are actually trying to attain the probabilities to define
the class of each input.
 The basic rule of thumb is if you really don’t know what activation function
to use, then simply use RELU as it is a general activation function in hidden
layers and is used in most cases these days.
 If your output is for binary classification then, sigmoid function is very
natural choice for output layer.
 If your output is for multi-class classification then, Softmax is very useful to
predict the probabilities of each classes.
Linear regression is a type of supervised machine learning algorithm that
computes the linear relationship between a dependent variable and one or
more independent features. When the number of the independent feature, is 1
then it is known as Univariate Linear regression, and in the case of more than
one feature, it is known as multivariate linear regression.
Types of Linear Regression
There are two main types of linear regression:

Simple Linear Regression

This is the simplest form of linear regression, and it involves only one
independent variable and one dependent variable. The equation for simple
linear regression is:

where:
 Y is the dependent variable
 X is the independent variable
 β0 is the intercept
 β1 is the slope

Multiple Linear Regression

This involves more than one independent variable and one dependent variable.
The equation for multiple linear regression is:

where:
 Y is the dependent variable
 X1, X2, …, Xp are the independent variables
 β0 is the intercept
 β1, β2, …, βn are the slopes
The goal of the algorithm is to find the best Fit Line equation that can
predict the values based on the independent variables.
In regression set of records are present with X and Y values and these values
are used to learn a function so if you want to predict Y from an unknown X this
learned function can be used. In regression we have to find the value of Y, So,
a function is required that predicts continuous Y in the case of regression given
X as independent features.
The best Fit Line equation provides a straight line that represents the
relationship between the dependent and independent variables. The slope of
the line indicates how much the dependent variable changes for a unit change
in the independent variable(s).

Gradient Descent for Linear Regression

A linear regression model can be trained using the optimization


algorithm gradient descent by iteratively modifying the model’s parameters to
reduce the mean squared error (MSE) of the model on a training dataset. To
update θ1 and θ2 values in order to reduce the Cost function (minimizing
RMSE value) and achieve the best-fit line the model uses Gradient Descent.
The idea is to start with random θ1 and θ2 values and then iteratively update
the values, reaching minimum cost.
A gradient is nothing but a derivative that defines the effects on outputs of the
function with a little bit of variation in inputs.

Let’s differentiate the cost function(J) with respect to

Let’s differentiate the cost function(J) with respect to


Finding the coefficients of a linear equation that best fits the training data is
the objective of linear regression. By moving in the direction of the Mean
Squared Error negative gradient with respect to the coefficients, the
coefficients can be changed. And the respective intercept and coefficient of X

will be if is the learning rate.


Assumptions of Simple Linear Regression
Linear regression is a powerful tool for understanding and predicting the
behavior of a variable, however, it needs to meet a few conditions in order to
be accurate and dependable solutions.
1. Linearity: The independent and dependent variables have a linear
relationship with one another. This implies that changes in the dependent
variable follow those in the independent variable(s) in a linear fashion. This
means that there should be a straight line that can be drawn through the
data points. If the relationship is not linear, the linear regression will not be
an accurate model.
2. Independence: The observations in the dataset are independent of each
other. This means that the value of the dependent variable for one
observation does not depend on the value of the dependent variable for
another observation. If the observations are not independent, then linear
regression will not be an accurate model.
3. Homoscedasticity: Across all levels of the independent variable(s), the
variance of the errors is constant. This indicates that the amount of the
independent variable(s) has no impact on the variance of the errors. If the
variance of the residuals is not constant, then linear regression will not be
an accurate model.

4. Normality: The residuals should be normally distributed. This means that


the residuals should follow a bell-shaped curve. If the residuals are not
normally distributed, then linear regression will not be an accurate model.
Assumptions of Multiple Linear Regression
For Multiple Linear Regression, all four of the assumptions from Simple Linear
Regression apply. In addition to this, below are few more:
1. No multicollinearity: There is no high correlation between the
independent variables. This indicates that there is little or no correlation
between the independent variables. Multicollinearity occurs when two or
more independent variables are highly correlated with each other, which
can make it difficult to determine the individual effect of each variable on
the dependent variable. If there is multicollinearity, then multiple linear
regression will not be an accurate model.
2. Additivity: The model assumes that the effect of changes in a predictor
variable on the response variable is consistent regardless of the values of
the other variables. This assumption implies that there is no interaction
between variables in their effects on the dependent variable.
3. Feature Selection: In multiple linear regression, it is essential to carefully
select the independent variables that will be included in the model.
Including irrelevant or redundant variables may lead to overfitting and
complicate the interpretation of the model.
4. Overfitting: Overfitting occurs when the model fits the training data too
closely, capturing noise or random fluctuations that do not represent the
true underlying relationship between variables. This can lead to poor
generalization performance on new, unseen data.

Multicollinearity

Multicollinearity is a statistical phenomenon that occurs when two or more


independent variables in a multiple regression model are highly correlated,
making it difficult to assess the individual effects of each variable on the
dependent variable.
Detecting Multicollinearity includes two techniques:
 Correlation Matrix: Examining the correlation matrix among the
independent variables is a common way to detect multicollinearity. High
correlations (close to 1 or -1) indicate potential multicollinearity.
VIF (Variance Inflation Factor): VIF is a measure that quantifies how
much the variance of an estimated regression coefficient increases if
your predictors are correlated. A high VIF (typically above 10)
suggests multicollinearity.Evaluation Metrics for Linear Regression
A variety of evaluation measures can be used to determine the strength of any
linear regression model. These assessment metrics often give an indication of
how well the model is producing the observed outputs.
The most common measurements are:

Mean Square Error (MSE)

Mean Squared Error (MSE) is an evaluation metric that calculates the average
of the squared differences between the actual and predicted values for all the
data points. The difference is squared to ensure that negative and positive
differences don’t cancel each other out.
Here,
 n is the number of data points.
 yi is the actual or observed value for the ith data point.
 is the predicted value for the ith data point.
MSE is a way to quantify the accuracy of a model’s predictions. MSE is
sensitive to outliers as large errors contribute significantly to the overall score.

Mean Absolute Error (MAE)

Mean Absolute Error is an evaluation metric used to calculate the accuracy of


a regression model. MAE measures the average absolute difference between
the predicted values and actual values.
Mathematically, MAE is expressed as:

Here,
 n is the number of observations
 Yi represents the actual values.

 represents the predicted values


Lower MAE value indicates better model performance. It is not sensitive to the
outliers as we consider absolute differences.

Root Mean Squared Error (RMSE)

The square root of the residuals’ variance is the Root Mean Squared Error . It
describes how well the observed data points match the expected values, or the
model’s absolute fit to the data.

In mathematical notation, it can be expressed as:

Rather than dividing the entire number of data points in the model by the
number of degrees of freedom, one must divide the sum of the squared
residuals to obtain an unbiased estimate. Then, this figure is referred to as the
Residual Standard Error (RSE).
In mathematical notation, it can be expressed as:
RSME is not as good of a metric as R-squared. Root Mean Squared Error can
fluctuate when the units of the variables vary since its value is dependent on
the variables’ units (it is not a normalized measure).

Coefficient of Determination (R-squared)

R-Squared is a statistic that indicates how much variation the developed model
can explain or capture. It is always in the range of 0 to 1. In general, the better
the model matches the data, the greater the R-squared number.
In mathematical notation, it can be expressed as:

 Residual sum of Squares (RSS): The sum of squares of the residual for
each data point in the plot or data is known as the residual sum of squares,
or RSS. It is a measurement of the difference between the output that was
observed and what was anticipated.

 Total Sum of Squares (TSS): The sum of the data points’ errors from the
answer variable’s mean is known as the total sum of squares, or TSS.

R squared metric is a measure of the proportion of variance in the dependent


variable that is explained the independent variables in the model.

Adjusted R-Squared Error

Adjusted R2 measures the proportion of variance in the dependent variable


that is explained by independent variables in a regression model. Adjusted R-
square accounts the number of predictors in the model and penalizes the
model for including irrelevant predictors that don’t contribute significantly to
explain the variance in the dependent variables.
Mathematically, adjusted R2 is expressed as:

Here,
 n is the number of observations
 k is the number of predictors in the model
 R2 is coeeficient of determination
Regularization Techniques for Linear Models

Lasso Regression (L1 Regularization)

Lasso Regression is a technique used for regularizing a linear regression


model, it adds a penalty term to the linear regression objective function to
prevent overfitting.
The objective function after applying lasso regression is:

 the first term is the least squares loss, representing the squared difference
between predicted and actual values.
 the second term is the L1 regularization term, it penalizes the sum of
absolute values of the regression coefficient θj.

Ridge Regression (L2 Regularization)

Ridge regression is a linear regression technique that adds a regularization


term to the standard linear objective. Again, the goal is to prevent overfitting
by penalizing large coefficient in linear regression equation. It useful when the
dataset has multicollinearity where predictor variables are highly correlated.
The objective function after applying ridge regression is:

 the first term is the least squares loss, representing the squared difference
between predicted and actual values.
 the second term is the L1 regularization term, it penalizes the sum of
square of values of the regression coefficient θj.

Elastic Net Regression

Elastic Net Regression is a hybrid regularization technique that combines the


power of both L1 and L2 regularization in linear regression objective.

 the first term is least square loss.


 the second term is L1 regularization and third is ridge regression.
 ???? is the overall regularization strength.
 α controls the mix between L1 and L2 regularization.
Applications of Linear Regression
Linear regression is used in many different fields, including finance,
economics, and psychology, to understand and predict the behavior of a
particular variable. For example, in finance, linear regression might be used to
understand the relationship between a company’s stock price and its earnings
or to predict the future value of a currency based on its past performance.
Advantages & Disadvantages of Linear Regression

Advantages of Linear Regression

 Linear regression is a relatively simple algorithm, making it easy to


understand and implement. The coefficients of the linear regression model
can be interpreted as the change in the dependent variable for a one-unit
change in the independent variable, providing insights into the relationships
between variables.
 Linear regression is computationally efficient and can handle large datasets
effectively. It can be trained quickly on large datasets, making it suitable for
real-time applications.
 Linear regression is relatively robust to outliers compared to other machine
learning algorithms. Outliers may have a smaller impact on the overall
model performance.
 Linear regression often serves as a good baseline model for comparison
with more complex machine learning algorithms.
 Linear regression is a well-established algorithm with a rich history and is
widely available in various machine learning libraries and software
packages.

Disadvantages of Linear Regression

 Linear regression assumes a linear relationship between the dependent and


independent variables. If the relationship is not linear, the model may not
perform well.
 Linear regression is sensitive to multicollinearity, which occurs when there
is a high correlation between independent variables. Multicollinearity can
inflate the variance of the coefficients and lead to unstable model
predictions.
 Linear regression assumes that the features are already in a suitable form
for the model. Feature engineering may be required to transform features
into a format that can be effectively used by the model.
 Linear regression is susceptible to both overfitting and underfitting.
Overfitting occurs when the model learns the training data too well and fails
to generalize to unseen data. Underfitting occurs when the model is too
simple to capture the underlying relationships in the data.
 Linear regression provides limited explanatory power for complex
relationships between variables. More advanced machine learning
techniques may be necessary for deeper insights.

in softmax regression, the key idea is to compute the probabilities of an input


belonging to each class and then predict the class with the highest probability.
The output of the softmax regression is a probability distribution over all
possible classes, and the class with the highest chance is chosen as the
predicted class.

Here’s a basic overview of how softmax regression works:

1. Linear Transformation: Compute a linear combination of the input features


using class-specific weights for each class.

This can be represented as:


where:

 z_i is the linear combination for class i.


 W_i is the weight matrix for class i.
 x is the input feature vector.
 b_i is the bias term for class i.

2. Softmax Function: Apply the softmax function to the computed linear


combinations to convert them into probabilities. The softmax function takes the
exponential of each linear combination and then normalizes them, to sum up to
1. For class i, the probability can be computed as:
P(y=i|x) = exp(z_i) / sum(exp(z_j)) for all j

where:

 P(y=i|x) is the probability that the input x belongs to class i.


 z_i is the linear combination for class i.
 The sum in the denominator is taken over all classes j.

3. Prediction: The class with the highest probability is the output class. In
mathematical terms, the predicted class y_pred can be determined as:
y_pred = argmax(P(y=i|x)) for all i

Softmax regression is often used in scenarios with more than two classes, and
the classes are mutually exclusive (i.e., each input belongs to only one class).
It’s commonly used in multiclass classification problems, such as image and text
categorization.

Training softmax regression involves minimizing a loss function that captures


the difference between predicted probabilities and the actual class labels. Cross-
entropy loss is typically used as the loss function for softmax regression.

It’s important to note that softmax regression assumes that the classes are
mutually exclusive, meaning that each input can belong to only one class. If the
problem involves cases where input can belong to multiple classes (multi-label
classification), softmax regression would not be suitable, and other approaches
like sigmoid-based models or more complex architectures would be more
appropriate.
Applications of softmax regression
Softmax regression, also known as multinomial logistic regression, has various
applications in various fields due to its effectiveness in solving multiclass
classification problems. Here are some typical applications of softmax
regression:

Image Classification: One of the most well-known applications is image


classification, where softmax regression is used to classify images into multiple
categories. Examples include classifying objects in a scene or identifying
handwritten digits.

1. Natural Language Processing (NLP):

 Text Categorization: Softmax regression can categorize text documents into


predefined classes, such as spam detection, sentiment analysis, or topic
classification.
 Part-of-Speech Tagging: In NLP tasks, softmax regression can be employed
for part-of-speech tagging, where each word in a sentence is assigned a specific
part of speech (e.g., noun, verb, adjective).

2. Medical Diagnosis: Softmax regression can assist in diagnosing medical


conditions by classifying patient data into different disease categories based on
various features, such as symptoms, lab results, and medical history.
3. Handwriting Recognition: Softmax regression can be applied to recognize
handwritten characters or words, which finds use in applications like optical
character recognition (OCR).
4. Speech Recognition: In speech recognition systems, softmax regression can
help classify phonemes or words, contributing to the accurate transcription of
spoken language.
5. Face Recognition: It can identify individuals from a database of known faces
for facial recognition tasks.
6. Ecology and Biology: Softmax regression can help classify species based on
ecological studies’ observed features or environmental conditions. For instance,
it could be used to predict the species of a bird based on its characteristics.
7. Quality Control: In manufacturing and quality control, it can classify products
into different quality levels based on various attributes.
8. Financial Fraud Detection: Softmax regression can assist in identifying
fraudulent transactions in financial systems by classifying transactions as either
legitimate or suspicious based on patterns.
9. Customer Segmentation: In marketing, it can segment customers into
different groups based on their purchasing behaviour or demographic
information.
10. Multiclass Segmentation in Computer Vision: In computer vision
tasks, such as semantic segmentation or instance segmentation, softmax
regression can assign each pixel or object to a specific class.

There are many image classification datasets available for machine learning
applications. Some of the most popular ones include:

**ImageNet**: This dataset contains over 14 million annotated images and is used
in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark
in image classification and object detection[1].

- **CIFAR-10**: This dataset consists of 60,000 32x32 color images in 10 classes,


with 6,000 images per class. It is commonly used as a benchmark for image
classification models[1]

- **JFT-300M**: This is an internal Google dataset used for training image


classification models. It contains a large number of images labeled using a complex
algorithm[1].

- **RESISC45**: This dataset contains 31,500 RGB images of size 256x256


divided into 45 scene classes, each class containing 700 images. It is used for
remote sensing image scene classification[1].

- **MVTec AD**: This dataset contains over 5,000 high-resolution images


divided into fifteen different object and texture categories. It is used for
benchmarking anomaly detection methods with a focus on industrial
inspection[1].
**Fruits 360**: This dataset features 90,483 images of different fruits and
vegetables. The training set features 67,692 images (one fruit or vegetable per
image)[3].

**TensorFlow Sun397**: This dataset features 108,000 images from with each
category featuring a minimum 100 images of different scenes, objects, and
other image categories[3].

- **Intel Image Classification**: This dataset contains 25,000 images divided


into categories including forest, mountain, sea, glacier, buildings, and street. It
is commonly used for image classification models[3].

These datasets can be used to train and test image classification models.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy