1
1
A perceptron also called an artificial neuron is a neural network unit that does certain
computations to detect features.
It is a single-layer neural network used as a linear classifier while working with a set of
input data. Since perceptron uses classified data points which are already labeled, it is
a supervised learning algorithm. This algorithm is used to enable neurons to learn and
process elements in the training set one at a time.
1. Single-Layer Perceptrons
2. Multilayer Perceptrons
The loss function is used as a measure of accuracy to identify whether our neural
network has learned the patterns accurately or not with the help of the training data.
This is completed by comparing the training data with the testing data.
Therefore, the loss function is considered as a primary measure for the performance of
the neural network. In Deep Learning, a good-performing neural network will have a low
value of the loss function at all times when training happens.
The reason for using activation functions in Neural Networks are as follows:
1. The idea behind the activation function is to introduce nonlinearity into the neural
network so that it can learn more complex functions.
2. Without the Activation function, the neural network behaves as a linear classifier,
learning the function which is a linear combination of its input data.
5. To make the decision, firstly it calculates the weighted sum and further adds bias with
it.
6. So, the basic purpose of the activation function is to introduce non-linearity into the
output of a neuron.
5. List down the names of some popular Activation Functions used in Neural
Networks.
Some of the popular activation functions that are used while building the deep learning
models are as follows:
• Sigmoid function
• Maxout function
While building deep learning models, our whole objective is to minimize the cost
function.
A cost function explains how well the neural network is performing for its given training
data and the expected output.
It may depend on the neural network parameters such as weights and biases. As a
whole, it provides the performance of a neural network.
• It can forward the propagation of training data through the network to generate
output.
• It uses target value and output value to compute error derivatives by concerning
the output activations.
• It uses the previously computed derivatives for output and all hidden layers to
calculate the error derivative concerning weights.
• It updates the weights and repeats until the cost function is minimized.
Neural network initialization means initialized the values of the parameters i.e, weights
and biases. Biases can be initialized to zero but we can’t initialize weights with zero.
Weight initialization is one of the crucial factors in neural networks since bad weight
initialization can prevent a neural network from learning the patterns.
On the contrary, a good weight initialization helps in giving a quicker convergence to the
global minimum. As a rule of thumb, the rule for initializing the weights is to be close to
zero without being too small.
If we initialize the set of weights in the neural network as zero, then all the neurons at
each layer will start producing the same output and the same gradients during
backpropagation.
As a result, the neural network cannot learn anything at all because there is no source
of asymmetry between different neurons. Therefore, we add randomness while
initializing the weight in neural networks.
Gradient Descent is an optimization algorithm that aims to minimize the cost function
or to minimize an error. Its main objective is to find the local or global minima of a
function based on its convexity. This determines in which direction the model should go
to reduce the error.
The five main steps that are used to initialize and use the gradient descent algorithm are
as follows:
• Pass the input data through the network i.e, the input layer.
• Compute the difference or the error between the expected and the predicted
values.
• Adjust the values i.e, weight updation in neurons to minimize the loss function.
• We repeat the same steps i.e, multiple iterations to determine the best weights
for efficient working.
In general, data normalization boils down each of the data points to subtracting the
mean and dividing by its standard deviation. This technique improves the performance
and stability of neural networks since we normalized the inputs in every layer.
Forward propagation: The input is fed into the network. In each layer, there is a specific
activation function and between layers, there are weights that represent the connection
strength of the neurons. The input runs through the individual layers of the network,
which ultimately generates an output.
Backward propagation: an error function measures how accurate the output of the
network is. To improve the output, the weights have to be optimized. The
backpropagation algorithm is used to determine how the individual weights have to be
adjusted. The weights are adjusted during the gradient descent method.
Mini-batch Gradient Descent: In Mini-batch Gradient Descent, the batch size must be
between 1 and the size of the training dataset. As a result, we get k batches. Therefore,
the weights of the neural networks are updated after each mini-batch iteration.
Batch Gradient Descent: In Batch Gradient Descent, the batch size is equal to the size
of the training dataset. Therefore, the weights of the neural network are updated after
each epoch.
One of the most basic Deep Learning models is a Boltzmann Machine, which resembles
a simplified version of the Multi-Layer Perceptron.
This model features a visible input layer and a hidden layer — just a two-layer neural
network that makes stochastic decisions as to whether a neuron should be activated or
not.
In the Boltzmann Machine, nodes are connected across the layers, but no two nodes of
the same layer are connected.
16. How does the learning rate affect the training of the Neural Network?
While selecting the learning rate to train the neural network, we have to choose the
value very carefully due to the following reasons:
If the learning rate is set too low, training of the model will continue very slowly as we
are making very small changes to the weights since our step size that is governed by the
equation of gradient descent is small. It will take many iterations before reaching the
point of minimum loss.
If the learning rate is set too high, this causes undesirable divergent behavior to the
loss function due to large changes in weights due to a larger value of step size. It may
fail to converge (the model can give a good output) or even diverge (data is too chaotic
for the network to train).
Once the data is formatted correctly, we are usually working with hyperparameters in
neural networks. A hyperparameter is a kind of parameter whose values are fixed before
the learning process begins.
It decides how a neural network is trained and also the structure of the network which
includes:
ReLU (Rectified Linear Unit) is the most commonly used activation function in neural
networks due to the following reasons:
2. Faster training: Networks with RELU tend to show better convergence performance.
Therefore, we have a much lower run time.
3. Sparsity: For all negative inputs, a RELU generates an output of 0. This means that
fewer neurons of the network are firing. So we have sparse and efficient activations in
the neural network.
On the contrary, if the derivatives are small e.g, If use a Sigmoid activation
function then the gradient will decrease exponentially as we propagate through the
model until it eventually vanishes, and this is the Vanishing gradient problem.
Optimizers are algorithms or methods that are used to adjust the parameters of the
neural network such as weights, biases, and learning rate, etc to minimize the loss
function. These are used to solve the optimization problems by minimizing the function.
• Gradient Descent
• AdaDelta
• RMSprop
• Adam
21. Why are Deep Neural Networks preferred over Shallow Neural Networks?
Neural networks contain hidden layers apart from input and output layers. There is only
a single hidden layer between the input and output layers for shallow neural networks
whereas, for Deep neural networks, there are multiple layers used.
To approximate any function, both shallow and deep networks are good enough and
capable but when a shallow neural network fits into any function, it requires a lot of
parameters to learn. On the contrary, deep networks can fit functions even better with a
limited number of parameters since they contain several hidden layers.
So, for the same level of accuracy, deeper networks can be much more powerful and
efficient in terms of both computation and the number of parameters to learn.
One other important thing about deeper networks is that they can create deep
representations and at every layer, the network learns a new, more abstract
representation of the input.
Therefore, in modern days deep neural networks have become preferable owing to their
ability to work on any kind of data modeling.
23. What is the difference between Epoch, Batch, and Iteration in Neural Networks?
Epoch, iteration, and batch are different types that are used for processing the datasets
and algorithms for gradient descent. All these three methods, i.e., epoch, iteration, and
batch size are basically ways of working on the gradient descent depending on the size
of the data set.
Epoch: It represents one iteration over the entire training dataset (everything put into
the training model).
Batch: This refers to when we are not able to pass the entire dataset into the neural
network at once due to the problem of high computations, so we divide the dataset into
several batches.
Iteration: Let’s have 10,000 images as our training dataset and we choose a batch size
of 200. then an epoch should run (10000/200) iterations i.e, 50 iterations.
1. What is a neural network?
A neural network is a type of machine learning algorithm modeled after the structure
and function of the human brain. It consists of interconnected nodes, called neurons,
that process information and make predictions based on that information. Neural
networks can be used to perform a wide variety of tasks, including image and speech
recognition, natural language processing, and regression and classification tasks.
• Convolutional Neural Networks (CNNs): These networks are commonly used for
image classification and are designed to preserve spatial relationships between
pixels in an image.
• Recurrent Neural Networks (RNNs): These networks are used for tasks where the
order of the inputs is important, such as in natural language processing.
Batch normalization is a technique used to improve the stability and speed of training
deep neural networks. It normalizes the activations of each layer in the network for each
mini-batch, which can help to reduce the internal covariate shift that occurs during
training. This, in turn, can lead to faster convergence and improved accuracy.
3. Dropout — During training, dropout randomly drops out (i.e. sets to zero) a
certain percentage of neurons in the network, forcing the remaining neurons to
share the burden of representing the data.
The idea behind gradient descent is to follow the direction of the steepest slope (i.e., the
gradient) of the cost function in order to find the minimum. At each iteration, the model
parameters are updated in the direction of the negative gradient, which points towards
the minimum. The size of the update is determined by a learning rate, which controls
the step size towards the minimum.
Gradient descent is important because it allows us to train neural networks and other
machine learning models. By minimizing the cost function, gradient descent helps us
find the optimal parameters that give us the best predictions.
17. What is transfer learning and how can it be applied to neural networks?
Transfer learning is a machine learning technique where a model trained on one task is
re-purposed on a related task. This is done by using the learned features from the
original task as a starting point, rather than training a model from scratch.
• Fine-tuning: The pre-trained weights are updated for the new task using a smaller
dataset. This is done by unfreezing the last few layers of the pre-trained model
and training the new layers on the smaller dataset.
• Feature extraction: The pre-trained model is used as a fixed feature extractor. The
output of the pre-trained model is used as the input to a new, task-specific
classifier. The pre-trained weights are not updated
The convolutional layer performs a convolution operation on the input data, where a set
of filters are used to extract features from the input data. The activation layer applies a
non-linear activation function to the output of the convolutional layer. The pooling layer
reduces the spatial size of the data, allowing the network to handle inputs of varying
size while maintaining the important features. Finally, the fully connected layer uses the
output of the pooling layer to make a prediction.
The key difference between a traditional neural network and a CNN is that a CNN uses
convolution and pooling layers to process the input data, whereas a traditional neural
network processes the input data using only fully connected layers. This allows a CNN
to take advantage of the spatial structure of image data and reduces the number of
parameters in the network, making it more computationally efficient.
3. What Is a Multi-layer Perceptron(MLP)?
As in Neural Networks, MLPs have an input layer, a hidden layer, and an output layer. It
has the same structure as a single layer perceptron with one or more hidden layers. A
single layer perceptron can classify only linear separable classes with binary output
(0,1), but MLP can classify nonlinear classes.
Except for the input layer, each node in the other layers uses a nonlinear activation
function. This means the input layers, the data coming in, and the activation function is
based upon all nodes and weights being added together, producing the output. MLP
uses a supervised learning method called “backpropagation.” In backpropagation, the
neural network calculates the error with the help of cost function. It propagates this
error backward from where it came (adjusts the weights to train the model more
accurately).
The process of standardizing and reforming data is called “Data Normalization.” It’s a
pre-processing step to eliminate data redundancy. Often, data comes in, and you get
the same information in different formats. In these cases, you should rescale values to
fit into a particular range, achieving better convergence.
One of the most basic Deep Learning models is a Boltzmann Machine, resembling a
simplified version of the Multi-Layer Perceptron. This model features a visible input
layer and a hidden layer -- just a two-layer neural net that makes stochastic decisions
as to whether a neuron should be on or off. Nodes are connected across layers, but no
two nodes of the same layer are connected.
At the most basic level, an activation function decides whether a neuron should be fired
or not. It accepts the weighted sum of the inputs and bias as input to any activation
function. Step function, Sigmoid, ReLU, Tanh, and Softmax are examples of activation
functions.
Also referred to as “loss” or “error,” cost function is a measure to evaluate how good
your model’s performance is. It’s used to compute the error of the output layer during
backpropagation. We push that error backward through the neural network and use that
during the different training functions.
Overfitting occurs when the model learns the details and noise in the training data to
the degree that it adversely impacts the execution of the model on new information. It is
more likely to occur with nonlinear models that have more flexibility when learning a
target function. An example would be if a model is looking at cars and trucks, but only
recognizes trucks that have a specific box shape. It might not be able to notice a flatbed
truck because there's only a particular kind of truck it saw in training. The model
performs well on training data, but not in the real world.
Underfitting alludes to a model that is neither well-trained on data nor can generalize to
new information. This usually happens when there is less and incorrect data to train a
model. Underfitting has both poor performance and accuracy.
To combat overfitting and underfitting, you can resample the data to estimate the model
accuracy (k-fold cross-validation) and by having a validation dataset to evaluate the
model.
There are two methods here: we can either initialize the weights to zero or assign them
randomly.
Initializing all weights to 0: This makes your model similar to a linear model. All the
neurons and every layer perform the same operation, giving the same output and
making the deep net useless.
Initializing all weights randomly: Here, the weights are assigned randomly by initializing
them very close to 0. It gives better accuracy to the model since every neuron performs
different computations. This is the most commonly used method.
2. ReLU Layer - it brings non-linearity to the network and converts all the negative
pixels to zero. The output is a rectified feature map.
3. Pooling Layer - pooling is a down-sampling operation that reduces the
dimensionality of the feature map.
4. Fully Connected Layer - this layer recognizes and classifies the objects in the
image.
• Step 3: The network decides what part of the current state makes it to the output.