0% found this document useful (0 votes)
29 views15 pages

1

Uploaded by

renowa3444
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views15 pages

1

Uploaded by

renowa3444
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

1. What do you mean by Perceptron?

A perceptron also called an artificial neuron is a neural network unit that does certain
computations to detect features.

It is a single-layer neural network used as a linear classifier while working with a set of
input data. Since perceptron uses classified data points which are already labeled, it is
a supervised learning algorithm. This algorithm is used to enable neurons to learn and
process elements in the training set one at a time.

2. What are the different types of Perceptrons?

There are two types of perceptrons:

1. Single-Layer Perceptrons

Single-layer perceptrons can learn only linearly separable patterns.

2. Multilayer Perceptrons

Multilayer perceptrons, also known as feedforward neural networks having two or


more layers have a higher processing power.

3. What is the use of the Loss functions?

The loss function is used as a measure of accuracy to identify whether our neural
network has learned the patterns accurately or not with the help of the training data.

This is completed by comparing the training data with the testing data.

Therefore, the loss function is considered as a primary measure for the performance of
the neural network. In Deep Learning, a good-performing neural network will have a low
value of the loss function at all times when training happens.

4. What is the role of the Activation functions in Neural Networks?

The reason for using activation functions in Neural Networks are as follows:

1. The idea behind the activation function is to introduce nonlinearity into the neural
network so that it can learn more complex functions.

2. Without the Activation function, the neural network behaves as a linear classifier,
learning the function which is a linear combination of its input data.

3. The activation function converts the inputs into outputs.

4. The activation function is responsible for deciding whether a neuron should be


activated i.e, fired or not.

5. To make the decision, firstly it calculates the weighted sum and further adds bias with
it.
6. So, the basic purpose of the activation function is to introduce non-linearity into the
output of a neuron.

5. List down the names of some popular Activation Functions used in Neural
Networks.

Some of the popular activation functions that are used while building the deep learning
models are as follows:

• Sigmoid function

• Hyperbolic tangent function

• Rectified linear unit (RELU) function

• Leaky RELU function

• Maxout function

• Exponential Linear unit (ELU) function

6. What do you mean by Cost Function?

While building deep learning models, our whole objective is to minimize the cost
function.

A cost function explains how well the neural network is performing for its given training
data and the expected output.

It may depend on the neural network parameters such as weights and biases. As a
whole, it provides the performance of a neural network.

7. What do you mean by Backpropagation?

The backpropagation algorithm is used to train multilayer perceptrons. It propagates the


error information from the end of the network to all the weights inside the network. It
allows the efficient computation of the gradient or derivatives.

Backpropagation can be divided into the following steps:

• It can forward the propagation of training data through the network to generate
output.

• It uses target value and output value to compute error derivatives by concerning
the output activations.

• It can backpropagate to calculate the derivatives of the error concerning output


activations in the previous layer and continue for all the hidden layers.

• It uses the previously computed derivatives for output and all hidden layers to
calculate the error derivative concerning weights.
• It updates the weights and repeats until the cost function is minimized.

8. How to initialize Weights and Biases in Neural Networks?

Neural network initialization means initialized the values of the parameters i.e, weights
and biases. Biases can be initialized to zero but we can’t initialize weights with zero.

Weight initialization is one of the crucial factors in neural networks since bad weight
initialization can prevent a neural network from learning the patterns.

On the contrary, a good weight initialization helps in giving a quicker convergence to the
global minimum. As a rule of thumb, the rule for initializing the weights is to be close to
zero without being too small.

9. Why is zero initialization of weight, not a good initialization technique?

If we initialize the set of weights in the neural network as zero, then all the neurons at
each layer will start producing the same output and the same gradients during
backpropagation.

As a result, the neural network cannot learn anything at all because there is no source
of asymmetry between different neurons. Therefore, we add randomness while
initializing the weight in neural networks.

10. Explain Gradient Descent and its types.

Gradient Descent is an optimization algorithm that aims to minimize the cost function
or to minimize an error. Its main objective is to find the local or global minima of a
function based on its convexity. This determines in which direction the model should go
to reduce the error.

There are three types of gradient descent:

• Mini-Batch Gradient Descent

• Stochastic Gradient Descent

• Batch Gradient Descen

11. Explain the different steps used in Gradient Descent Algorithm.

The five main steps that are used to initialize and use the gradient descent algorithm are
as follows:

• Initialize biases and weights for the neural network.

• Pass the input data through the network i.e, the input layer.

• Compute the difference or the error between the expected and the predicted
values.
• Adjust the values i.e, weight updation in neurons to minimize the loss function.

• We repeat the same steps i.e, multiple iterations to determine the best weights
for efficient working.

12. Explain the term “Data Normalization”.

Data normalization is an essential preprocessing step, which is used to rescale the


initial values to a specific range. It ensures better convergence during backpropagation.

In general, data normalization boils down each of the data points to subtracting the
mean and dividing by its standard deviation. This technique improves the performance
and stability of neural networks since we normalized the inputs in every layer.

13. What is the difference between Forward propagation and Backward


Propagation in Neural Networks?

Forward propagation: The input is fed into the network. In each layer, there is a specific
activation function and between layers, there are weights that represent the connection
strength of the neurons. The input runs through the individual layers of the network,
which ultimately generates an output.

Backward propagation: an error function measures how accurate the output of the
network is. To improve the output, the weights have to be optimized. The
backpropagation algorithm is used to determine how the individual weights have to be
adjusted. The weights are adjusted during the gradient descent method.

14. Explain the different types of Gradient Descent in detail.

Stochastic Gradient Descent: In Stochastic gradient descent, a batch size of 1 is used.


As a result, we get n batches. Therefore, the weights of the neural networks are updated
after each training sample.

Mini-batch Gradient Descent: In Mini-batch Gradient Descent, the batch size must be
between 1 and the size of the training dataset. As a result, we get k batches. Therefore,
the weights of the neural networks are updated after each mini-batch iteration.

Batch Gradient Descent: In Batch Gradient Descent, the batch size is equal to the size
of the training dataset. Therefore, the weights of the neural network are updated after
each epoch.

15. What do you mean by Boltzmann Machine?

One of the most basic Deep Learning models is a Boltzmann Machine, which resembles
a simplified version of the Multi-Layer Perceptron.
This model features a visible input layer and a hidden layer — just a two-layer neural
network that makes stochastic decisions as to whether a neuron should be activated or
not.

In the Boltzmann Machine, nodes are connected across the layers, but no two nodes of
the same layer are connected.

16. How does the learning rate affect the training of the Neural Network?

While selecting the learning rate to train the neural network, we have to choose the
value very carefully due to the following reasons:

If the learning rate is set too low, training of the model will continue very slowly as we
are making very small changes to the weights since our step size that is governed by the
equation of gradient descent is small. It will take many iterations before reaching the
point of minimum loss.

If the learning rate is set too high, this causes undesirable divergent behavior to the
loss function due to large changes in weights due to a larger value of step size. It may
fail to converge (the model can give a good output) or even diverge (data is too chaotic
for the network to train).

17. What do you mean by Hyperparameters?

Once the data is formatted correctly, we are usually working with hyperparameters in
neural networks. A hyperparameter is a kind of parameter whose values are fixed before
the learning process begins.

It decides how a neural network is trained and also the structure of the network which
includes:

• The number of hidden units

• The learning rate

• The number of epochs, etc.

18. Why is ReLU the most commonly used Activation Function?

ReLU (Rectified Linear Unit) is the most commonly used activation function in neural
networks due to the following reasons:

1. No vanishing gradient: The derivative of the RELU activation function is either 0 or 1,


so it could be not in the range of [0,1]. As a result, the product of several derivatives
would also be either 0 or 1, because of this property, the vanishing gradient problem
doesn’t occur during backpropagation.

2. Faster training: Networks with RELU tend to show better convergence performance.
Therefore, we have a much lower run time.
3. Sparsity: For all negative inputs, a RELU generates an output of 0. This means that
fewer neurons of the network are firing. So we have sparse and efficient activations in
the neural network.

19. Explain the vanishing and exploding gradient problems.

These are the major problems in training deep neural networks.

While Backpropagation, in a network of n hidden layers, n derivatives will be multiplied


together. If the derivatives are large e.g, If use ReLU like activation function then the
value of the gradient will increase exponentially as we propagate down the model until
they eventually explode, and this is what we call the problem of Exploding gradient.

On the contrary, if the derivatives are small e.g, If use a Sigmoid activation
function then the gradient will decrease exponentially as we propagate through the
model until it eventually vanishes, and this is the Vanishing gradient problem.

20. What do you mean by Optimizers?

Optimizers are algorithms or methods that are used to adjust the parameters of the
neural network such as weights, biases, and learning rate, etc to minimize the loss
function. These are used to solve the optimization problems by minimizing the function.

The most common used optimizers in deep learning are as follows:

• Gradient Descent

• Stochastic Gradient Descent (SGD)

• Mini Batch Stochastic Gradient Descent (MB-SGD)

• SGD with momentum

• Nesterov Accelerated Gradient (NAG)

• Adaptive Gradient (AdaGrad)

• AdaDelta

• RMSprop

• Adam

21. Why are Deep Neural Networks preferred over Shallow Neural Networks?

Neural networks contain hidden layers apart from input and output layers. There is only
a single hidden layer between the input and output layers for shallow neural networks
whereas, for Deep neural networks, there are multiple layers used.

To approximate any function, both shallow and deep networks are good enough and
capable but when a shallow neural network fits into any function, it requires a lot of
parameters to learn. On the contrary, deep networks can fit functions even better with a
limited number of parameters since they contain several hidden layers.

So, for the same level of accuracy, deeper networks can be much more powerful and
efficient in terms of both computation and the number of parameters to learn.

One other important thing about deeper networks is that they can create deep
representations and at every layer, the network learns a new, more abstract
representation of the input.

Therefore, in modern days deep neural networks have become preferable owing to their
ability to work on any kind of data modeling.

23. What is the difference between Epoch, Batch, and Iteration in Neural Networks?

Epoch, iteration, and batch are different types that are used for processing the datasets
and algorithms for gradient descent. All these three methods, i.e., epoch, iteration, and
batch size are basically ways of working on the gradient descent depending on the size
of the data set.

Epoch: It represents one iteration over the entire training dataset (everything put into
the training model).

Batch: This refers to when we are not able to pass the entire dataset into the neural
network at once due to the problem of high computations, so we divide the dataset into
several batches.

Iteration: Let’s have 10,000 images as our training dataset and we choose a batch size
of 200. then an epoch should run (10000/200) iterations i.e, 50 iterations.
1. What is a neural network?

A neural network is a type of machine learning algorithm modeled after the structure
and function of the human brain. It consists of interconnected nodes, called neurons,
that process information and make predictions based on that information. Neural
networks can be used to perform a wide variety of tasks, including image and speech
recognition, natural language processing, and regression and classification tasks.

2. What are the different types of neural networks?

There are several types of neural networks, including:

• Feedforward Neural Networks: These networks move information from input


layer to output layer without looping back.

• Convolutional Neural Networks (CNNs): These networks are commonly used for
image classification and are designed to preserve spatial relationships between
pixels in an image.

• Recurrent Neural Networks (RNNs): These networks are used for tasks where the
order of the inputs is important, such as in natural language processing.

• Generative Adversarial Networks (GANs): These networks are used to generate


new data that is similar to a training set.

• 3. How does backpropagation work?


• Backpropagation is the process by which a neural network
adjusts its weights to minimize the error between its
predictions and the actual target values. The error is
propagated backwards through the network, starting from the
output layer, and the weights are updated using an
optimization algorithm such as gradient descent. This process
is repeated multiple times until the error is reduced to a
satisfactory level

4. What is overfitting in a neural network?


Overfitting occurs when a neural network performs well on the
training data, but poorly on new, unseen data. This means that the
network has learned to fit the training data too closely and has not
generalized well to new data. Overfitting can be caused by having too
many parameters in the network, training for too many epochs, or
not having enough training data.
5. What is activation function in a neural network?

An activation function is a mathematical function applied to the output of each neuron


in a neural network. Its purpose is to introduce non-linearity into the output of the
neuron, allowing the network to learn and model complex relationships between the
inputs and outputs. Some common activation functions include the sigmoid function,
the hyperbolic tangent function, and the rectified linear unit (ReLU) function.

6. What is forward propagation in a neural network?

Forward propagation is the process of passing inputs through a neural network to


compute the outputs. During forward propagation, the inputs are multiplied by the
weights of the network and then passed through an activation function to produce an
output. This process is repeated for each layer in the network until the final output is
obtained.

7. What is backpropagation in a neural network?

Backpropagation is the process of updating the weights of a neural network based on


the error between the predicted output and the actual output. During backpropagation,
the error is propagated backwards through the network, and the weights are updated so
as to reduce the error. The backpropagation algorithm is based on the gradient descent
optimization method.

8. What is batch normalization in a neural network?

Batch normalization is a technique used to improve the stability and speed of training
deep neural networks. It normalizes the activations of each layer in the network for each
mini-batch, which can help to reduce the internal covariate shift that occurs during
training. This, in turn, can lead to faster convergence and improved accuracy.

9. What is regularization and how does it prevent overfitting in neural networks?

Regularization is a technique used to prevent overfitting in neural networks. Overfitting


occurs when a model is too complex for the amount of data it is trained on, resulting in
poor generalization to unseen data. Regularization works by adding a penalty term to
the loss function, which discourages the model from assigning too much importance to
any one feature.

There are several popular forms of regularization in neural networks, including:


1. L1 regularization (Lasso regularization) — Adds the sum of the absolute values of
the coefficients as a penalty term to the loss function.

2. L2 regularization (Ridge regularization) — Adds the sum of the squared values of


the coefficients as a penalty term to the loss function.

3. Dropout — During training, dropout randomly drops out (i.e. sets to zero) a
certain percentage of neurons in the network, forcing the remaining neurons to
share the burden of representing the data.

10. What is dropout in a neural network?

Dropout is a regularization technique used to prevent overfitting in


neural networks. It randomly drops out (i.e., sets to zero) a certain
percentage of neurons during training, which can help to reduce the
dependence on any one neuron. At test time, all neurons are used
and the output of the network is averaged over multiple forward
passes, each with a different dropout mask.
11. What is early stopping in a neural network?

Early stopping is a technique used to prevent overfitting in neural networks by stopping


the training process when the performance on a validation set has not improved for a
certain number of epochs. This can help to avoid wasting computational resources on
training a network that is unlikely to improve any further.

14. What is gradient descent and why is it important?

Gradient descent is an optimization algorithm used in machine learning to find the


minimum of a cost function. It is an iterative method that adjusts the model parameters
to minimize the difference between the predicted and actual values.

The idea behind gradient descent is to follow the direction of the steepest slope (i.e., the
gradient) of the cost function in order to find the minimum. At each iteration, the model
parameters are updated in the direction of the negative gradient, which points towards
the minimum. The size of the update is determined by a learning rate, which controls
the step size towards the minimum.

Gradient descent is important because it allows us to train neural networks and other
machine learning models. By minimizing the cost function, gradient descent helps us
find the optimal parameters that give us the best predictions.

17. What is transfer learning and how can it be applied to neural networks?
Transfer learning is a machine learning technique where a model trained on one task is
re-purposed on a related task. This is done by using the learned features from the
original task as a starting point, rather than training a model from scratch.

In the context of neural networks, transfer learning is commonly applied in computer


vision tasks, such as image classification or object detection. For example, a model
trained on a large dataset for image classification, such as ImageNet, can be re-
purposed for a different classification task by fine-tuning the model using a smaller
dataset for the new task. The model is initialized with the pre-trained weights from the
original task, and the weights are then updated on the smaller dataset for the new task.

This technique can be applied in several ways:

• Fine-tuning: The pre-trained weights are updated for the new task using a smaller
dataset. This is done by unfreezing the last few layers of the pre-trained model
and training the new layers on the smaller dataset.

• Feature extraction: The pre-trained model is used as a fixed feature extractor. The
output of the pre-trained model is used as the input to a new, task-specific
classifier. The pre-trained weights are not updated

18. Can you explain Generative Adversarial Networks


(GANs)?
Generative Adversarial Networks (GANs) are a type of deep learning
model that are used for unsupervised learning. They are made up of
two parts: a generator and a discriminator. The generator creates
new data instances, while the discriminator evaluates the
authenticity of the generated instances. The generator and the
discriminator are trained together in a competition, with the
generator trying to produce instances that the discriminator cannot
distinguish from real data, and the discriminator trying to correctly
identify generated instances.

The generator is trained to produce data instances that are similar to


a training dataset, while the discriminator is trained to distinguish
between real and generated data instances. During training, the
generator is provided with random noise as input, and uses this
noise to generate new data instances. The discriminator then
evaluates the authenticity of the generated instances and provides
feedback to the generator on how to improve. This process is
repeated until the generator can produce data instances that are
indistinguishable from real data.

GANs are used in various applications, including image generation,


video generation, and style transfer. They can also be used for data
augmentation, by generating additional training instances to
improve the performance of supervised learning models.
16. Can you explain Convolutional Neural Networks (CNNs) and how they are
different from traditional neural networks?

Convolutional Neural Networks (CNNs) are a type of neural network designed to


process data with grid-like topology, such as an image. They are specifically designed to
handle image data and are used in a variety of tasks, such as object recognition, image
classification, and semantic segmentation.

A traditional neural network, also known as a fully connected network, consists of a


series of fully connected layers, where each neuron in a layer is connected to every
neuron in the previous layer. In a CNN, the layers are designed differently to take
advantage of the spatial structure in image data. A CNN consists of several
convolutional layers, activation layers, pooling layers, and fully connected layers.

The convolutional layer performs a convolution operation on the input data, where a set
of filters are used to extract features from the input data. The activation layer applies a
non-linear activation function to the output of the convolutional layer. The pooling layer
reduces the spatial size of the data, allowing the network to handle inputs of varying
size while maintaining the important features. Finally, the fully connected layer uses the
output of the pooling layer to make a prediction.

The key difference between a traditional neural network and a CNN is that a CNN uses
convolution and pooling layers to process the input data, whereas a traditional neural
network processes the input data using only fully connected layers. This allows a CNN
to take advantage of the spatial structure of image data and reduces the number of
parameters in the network, making it more computationally efficient.
3. What Is a Multi-layer Perceptron(MLP)?

As in Neural Networks, MLPs have an input layer, a hidden layer, and an output layer. It
has the same structure as a single layer perceptron with one or more hidden layers. A
single layer perceptron can classify only linear separable classes with binary output
(0,1), but MLP can classify nonlinear classes.

Except for the input layer, each node in the other layers uses a nonlinear activation
function. This means the input layers, the data coming in, and the activation function is
based upon all nodes and weights being added together, producing the output. MLP
uses a supervised learning method called “backpropagation.” In backpropagation, the
neural network calculates the error with the help of cost function. It propagates this
error backward from where it came (adjusts the weights to train the model more
accurately).

4. What Is Data Normalization, and Why Do We Need It?

The process of standardizing and reforming data is called “Data Normalization.” It’s a
pre-processing step to eliminate data redundancy. Often, data comes in, and you get
the same information in different formats. In these cases, you should rescale values to
fit into a particular range, achieving better convergence.

5. What is the Boltzmann Machine?

One of the most basic Deep Learning models is a Boltzmann Machine, resembling a
simplified version of the Multi-Layer Perceptron. This model features a visible input
layer and a hidden layer -- just a two-layer neural net that makes stochastic decisions
as to whether a neuron should be on or off. Nodes are connected across layers, but no
two nodes of the same layer are connected.

6. What Is the Role of Activation Functions in a Neural Network?

At the most basic level, an activation function decides whether a neuron should be fired
or not. It accepts the weighted sum of the inputs and bias as input to any activation
function. Step function, Sigmoid, ReLU, Tanh, and Softmax are examples of activation
functions.

7. What Is the Cost Function?

Also referred to as “loss” or “error,” cost function is a measure to evaluate how good
your model’s performance is. It’s used to compute the error of the output layer during
backpropagation. We push that error backward through the neural network and use that
during the different training functions.

13. What Are Hyperparameters?


This is another frequently asked deep learning interview question. With neural
networks, you’re usually working with hyperparameters once the data is formatted
correctly. A hyperparameter is a parameter whose value is set before the learning
process begins. It determines how a network is trained and the structure of the network
(such as the number of hidden units, the learning rate, epochs, etc.).

17. What is Overfitting and Underfitting, and How to Combat Them?

Overfitting occurs when the model learns the details and noise in the training data to
the degree that it adversely impacts the execution of the model on new information. It is
more likely to occur with nonlinear models that have more flexibility when learning a
target function. An example would be if a model is looking at cars and trucks, but only
recognizes trucks that have a specific box shape. It might not be able to notice a flatbed
truck because there's only a particular kind of truck it saw in training. The model
performs well on training data, but not in the real world.

Underfitting alludes to a model that is neither well-trained on data nor can generalize to
new information. This usually happens when there is less and incorrect data to train a
model. Underfitting has both poor performance and accuracy.

To combat overfitting and underfitting, you can resample the data to estimate the model
accuracy (k-fold cross-validation) and by having a validation dataset to evaluate the
model.

18. How Are Weights Initialized in a Network?

There are two methods here: we can either initialize the weights to zero or assign them
randomly.

Initializing all weights to 0: This makes your model similar to a linear model. All the
neurons and every layer perform the same operation, giving the same output and
making the deep net useless.

Initializing all weights randomly: Here, the weights are assigned randomly by initializing
them very close to 0. It gives better accuracy to the model since every neuron performs
different computations. This is the most commonly used method.

19. What Are the Different Layers on CNN?

There are four layers in CNN:

1. Convolutional Layer - the layer that performs a convolutional operation, creating


several smaller picture windows to go over the data.

2. ReLU Layer - it brings non-linearity to the network and converts all the negative
pixels to zero. The output is a rectified feature map.
3. Pooling Layer - pooling is a down-sampling operation that reduces the
dimensionality of the feature map.

4. Fully Connected Layer - this layer recognizes and classifies the objects in the
image.

21. How Does an LSTM Network Work?

Long-Short-Term Memory (LSTM) is a special kind of recurrent neural network capable


of learning long-term dependencies, remembering information for long periods as its
default behavior. There are three steps in an LSTM network:

• Step 1: The network decides what to forget and what to remember.

• Step 2: It selectively updates cell state values.

• Step 3: The network decides what part of the current state makes it to the output.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy