deep learning notes
deep learning notes
to process data in a way that is inspired by the human brain. Deep learning
models can recognize complex patterns in pictures, text, sounds, and other
data to produce accurate insights and predictions.application are
1.virtual assistance
2. Chatbots
3. Healthcare
4. Entertainment
5. News Aggregation and Fake News Detection
6. Composing Music
7. Image Coloring
8. Robotics
9. Image Captioning
10. Advertising
11. Self Driving Cars
12. Natural Language Processing
13. Visual Recognition
14. Fraud Detection
15. Personalisations
16. Detecting Developmental Delay in Children
17. Colourisation of Black and White images
18. Adding Sounds to Silent Movies
19. Automatic Machine Translation
20. Automatic Handwriting Generation
21. Automatic Game Playing
22. Language Translations
23. Pixel Restoration
24. Demographic and Election Predictions
25. Deep Dreaming
### Challenges in Deep Learning:
AI is the broader
family consisting of ML
ML is the subset of AI. DL is the subset of ML.
and DL as it’s
components.
DL can be considered as
neural networks with a
large number of
Three broad parameters layers lying
categories/types Of AI Three broad in one of the four
are: Artificial Narrow categories/types Of ML fundamental network
Intelligence (ANI), are: Supervised Learning, architectures:
Artificial General Unsupervised Learning Unsupervised Pre-
Intelligence (AGI) and and Reinforcement trained Networks,
Artificial Super Learning Convolutional Neural
Intelligence (ASI) Networks, Recurrent
Neural Networks and
Recursive Neural
Networks
Examples of AI
applications include:
Examples of ML Examples of DL
Google’s AI-Powered
applications include: applications include:
Predictions,
Virtual Personal Sentiment based news
Ridesharing Apps Like
Assistants: Siri, Alexa, aggregation, Image
Uber and Lyft,
Google, etc., Email Spam analysis and caption
Commercial Flights
and Malware Filtering. generation, etc.
Use an AI Autopilot,
etc.
ML algorithms can be
categorized as
supervised,
AI can be further unsupervised, or DL algorithms are
broken down into reinforcement learning. inspired by the structure
various subfields such In supervised learning, and function of the
as robotics, natural the algorithm is trained human brain, and they
language processing, on labeled data, where are particularly well-
computer vision, the desired output is suited to tasks such as
expert systems, and known. In unsupervised image and speech
more. learning, the algorithm is recognition.
trained on unlabeled
data, where the desired
output is unknown.
representations of the
data.
In training ANNs, a training set is used to adjust the network's weights based on
the error between predicted and actual outcomes. This process involves forward
propagation of data through the network, comparing predictions with actual
results, computing errors, and then backpropagating these errors to update the
weights iteratively until the network can make accurate predictions[2][5]. The
learning process involves adjusting thresholds and weights based on a cost
function to minimize errors and improve accuracy over time[5].
Overall, ANNs serve as versatile tools capable of learning complex patterns and
making decisions similar to the human brain's cognitive processes. They have
diverse applications across industries like social media, healthcare, finance, and
more due to their ability to learn from data and generalize patterns for various
tasks[2][5].
Artificial Neural Networks (ANNs) come in various types, each designed for
specific tasks and applications. Some fundamental types include:
These types represent a subset of the diverse range of ANNs available, each
tailored to specific requirements within machine learning and artificial
intelligence applications[1][2][3].
Perceptron model is also treated as one of the best and simplest types of
Artificial Neural networks. However, it is a supervised learning algorithm of
binary classifiers. Hence, we can consider it as a single-layer neural network
with four main parameters, i.e., input values, weights and Bias, net sum,
and an activation function.
Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:
This is the primary component of Perceptron which accepts the initial data into
the system for further processing. Each input node contains a real numerical
value.
o Wight and Bias:
Weight parameter represents the strength of the connection between units. This
is another most important parameter of Perceptron components. Weight is
directly proportional to the strength of the associated input neuron in deciding
the output. Further, Bias can be considered as the line of intercept in a linear
equation.
o Activation Function:
These are the final and important components that help to determine whether
the neuron will fire or not. Activation Function can be considered primarily as a
step function.
o Sign function
o Step function, and
o Sigmoid function
Step-1
In the first step first, multiply all input values with corresponding weight values
and then add them to determine the weighted sum. Mathematically, we can
calculate the weighted sum as follows:
Add a special term called bias 'b' to this weighted sum to improve the model's
performance.
∑wi*xi + b
Step-2
Y = f(∑wi*xi + b)
Based on the layers, Perceptron models are divided into two types. These are as
follows:
This is one of the easiest Artificial neural networks (ANN) types. A single-layered
perceptron model consists feed-forward network and also includes a threshold
transfer function inside the model. The main objective of the single-layer
perceptron model is to analyze the linearly separable objects with binary
outcomes.
In a single layer perceptron model, its algorithms do not contain recorded data,
so it begins with inconstantly allocated input for weight parameters. Further, it
sums up all inputs (weight). After adding all inputs, if the total sum of all inputs
is more than a pre-determined value, the model gets activated and shows the
output value as +1.
A multi-layer perceptron model has greater processing power and can process
linear and non-linear patterns. Further, it can also implement logic gates such as
AND, OR, XOR, NAND, NOT, XNOR, NOR.
Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x'
with the learned weight coefficient 'w'.
f(x)=1; if w.x+b>0
otherwise, f(x)=0
Characteristics of Perceptron
ADVERTISEMENT
Xor gate===
deactivated by a switch.[1][2]
Activation Function to use in the hidden layer as well as at the output layer of
the network.The activation function decides whether a neuron should be
activated or not by calculating the weighted sum and further adding bias to it.
The purpose of the activation function is to introduce non-linearity into the
output of a neuron.
Hidden layer i.e. layer 1:
z(1) = W(1)X + b(1) a(1)
z(1) is the vectorized output of layer 1
W(1) be the vectorized weights assigned to neurons of hidden layer i.e. w1,
w2, w3 and w4
X be the vectorized input features i.e. i1 and i2
b is the vectorized bias assigned to neurons in hidden layer i.e. b1 and b2
a(1) is the vectorized form of any linear function.
Linear Function
Sigmoid Function
Tanh Function
The activation that works almost always better than sigmoid function
is Tanh function also known as Tangent Hyperbolic function. It’s
actually mathematically shifted version of the sigmoid function. Both
are similar and can be derived from each other.
Equation :-
Value Range :- -1 to +1
Nature :- non-linear
Uses :- Usually used in hidden layers of a neural network as it’s values lies
between -1 to 1 hence the mean for the hidden layer comes out be 0 or
very close to it, hence helps in centering the data by bringing mean close to
0. This makes learning for the next layer much easier.
RELU Function
It Stands for Rectified linear unit. It is the most widely used activation
function. Chiefly implemented in hidden layers of Neural network.
Equation :- A(x) = max(0,x). It gives an output x if x is positive and 0
otherwise.
Value Range :- [0, inf)
Softmax Function
The softmax function is also a type of sigmoid function but is handy when we
are trying to handle multi- class classification problems.
Nature :- non-linear
Uses :- Usually used when trying to handle multiple classes. the softmax
function was commonly found in the output layer of image classification
problems.The softmax function would squeeze the outputs for each class
between 0 and 1 and would also divide by the sum of the outputs.
Output:- The softmax function is ideally used in the output layer of the
classifier where we are actually trying to attain the probabilities to define
the class of each input.
The basic rule of thumb is if you really don’t know what activation function
to use, then simply use RELU as it is a general activation function in hidden
layers and is used in most cases these days.
If your output is for binary classification then, sigmoid function is very
natural choice for output layer.
If your output is for multi-class classification then, Softmax is very useful to
predict the probabilities of each classes.
Linear regression is a type of supervised machine learning algorithm that
computes the linear relationship between a dependent variable and one or
more independent features. When the number of the independent feature, is 1
then it is known as Univariate Linear regression, and in the case of more than
one feature, it is known as multivariate linear regression.
Types of Linear Regression
There are two main types of linear regression:
This is the simplest form of linear regression, and it involves only one
independent variable and one dependent variable. The equation for simple
linear regression is:
where:
Y is the dependent variable
X is the independent variable
β0 is the intercept
β1 is the slope
This involves more than one independent variable and one dependent variable.
The equation for multiple linear regression is:
where:
Y is the dependent variable
X1, X2, …, Xp are the independent variables
β0 is the intercept
β1, β2, …, βn are the slopes
The goal of the algorithm is to find the best Fit Line equation that can
predict the values based on the independent variables.
In regression set of records are present with X and Y values and these values
are used to learn a function so if you want to predict Y from an unknown X this
learned function can be used. In regression we have to find the value of Y, So,
a function is required that predicts continuous Y in the case of regression given
X as independent features.
The best Fit Line equation provides a straight line that represents the
relationship between the dependent and independent variables. The slope of
the line indicates how much the dependent variable changes for a unit change
in the independent variable(s).
Multicollinearity
Mean Squared Error (MSE) is an evaluation metric that calculates the average
of the squared differences between the actual and predicted values for all the
data points. The difference is squared to ensure that negative and positive
differences don’t cancel each other out.
Here,
n is the number of data points.
yi is the actual or observed value for the ith data point.
is the predicted value for the ith data point.
MSE is a way to quantify the accuracy of a model’s predictions. MSE is
sensitive to outliers as large errors contribute significantly to the overall score.
Here,
n is the number of observations
Yi represents the actual values.
The square root of the residuals’ variance is the Root Mean Squared Error . It
describes how well the observed data points match the expected values, or the
model’s absolute fit to the data.
Rather than dividing the entire number of data points in the model by the
number of degrees of freedom, one must divide the sum of the squared
residuals to obtain an unbiased estimate. Then, this figure is referred to as the
Residual Standard Error (RSE).
In mathematical notation, it can be expressed as:
RSME is not as good of a metric as R-squared. Root Mean Squared Error can
fluctuate when the units of the variables vary since its value is dependent on
the variables’ units (it is not a normalized measure).
R-Squared is a statistic that indicates how much variation the developed model
can explain or capture. It is always in the range of 0 to 1. In general, the better
the model matches the data, the greater the R-squared number.
In mathematical notation, it can be expressed as:
Residual sum of Squares (RSS): The sum of squares of the residual for
each data point in the plot or data is known as the residual sum of squares,
or RSS. It is a measurement of the difference between the output that was
observed and what was anticipated.
Total Sum of Squares (TSS): The sum of the data points’ errors from the
answer variable’s mean is known as the total sum of squares, or TSS.
Here,
n is the number of observations
k is the number of predictors in the model
R2 is coeeficient of determination
Regularization Techniques for Linear Models
the first term is the least squares loss, representing the squared difference
between predicted and actual values.
the second term is the L1 regularization term, it penalizes the sum of
absolute values of the regression coefficient θj.
the first term is the least squares loss, representing the squared difference
between predicted and actual values.
the second term is the L1 regularization term, it penalizes the sum of
square of values of the regression coefficient θj.
where:
3. Prediction: The class with the highest probability is the output class. In
mathematical terms, the predicted class y_pred can be determined as:
y_pred = argmax(P(y=i|x)) for all i
Softmax regression is often used in scenarios with more than two classes, and
the classes are mutually exclusive (i.e., each input belongs to only one class).
It’s commonly used in multiclass classification problems, such as image and text
categorization.
It’s important to note that softmax regression assumes that the classes are
mutually exclusive, meaning that each input can belong to only one class. If the
problem involves cases where input can belong to multiple classes (multi-label
classification), softmax regression would not be suitable, and other approaches
like sigmoid-based models or more complex architectures would be more
appropriate.
Applications of softmax regression
Softmax regression, also known as multinomial logistic regression, has various
applications in various fields due to its effectiveness in solving multiclass
classification problems. Here are some typical applications of softmax
regression:
There are many image classification datasets available for machine learning
applications. Some of the most popular ones include:
**ImageNet**: This dataset contains over 14 million annotated images and is used
in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a benchmark
in image classification and object detection[1].
**TensorFlow Sun397**: This dataset features 108,000 images from with each
category featuring a minimum 100 images of different scenes, objects, and
other image categories[3].
These datasets can be used to train and test image classification models.