Model Questions DWT COMPLETE SOLUTIONS
Model Questions DWT COMPLETE SOLUTIONS
A type of neural network consis ng of mul ple layers: an input layer, one or more
hidden layers, and an output layer.
Each layer consists of neurons that apply ac va on func ons to the weighted sum of
inputs.
Architecture: MLP has mul ple layers, while a single layer perceptron has only one layer
of output nodes.
Complexity: MLP can model complex rela onships due to its depth, whereas single layer
perceptron can only solve linearly separable problems.
Purpose:
3. Explain the concept of a neural network and the role of neurons, weights, and biases?
Neural Network:
Components:
Neurons: Basic units that receive inputs, process them, and produce an output.
Weights: Parameters that adjust the strength of the input signals to the neurons.
Biases: Addi onal parameters that allow the model to fit the data be er by shi ing the
ac va on func on.
4. What is the cost func on? State different cost func ons used in Regression and classifica on.
A measure of how well the model's predic ons match the actual data.
3. Repeat un l convergence.
Gradient Descent:
An op miza on algorithm used to minimize the cost func on by itera vely adjus ng the
weights.
Types:
MLPs can model complex, non-linear rela onships that single-layer perceptrons cannot.
Example:
Classifying images of handwri en digits requires recognizing pa erns that are not
linearly separable, which MLP can achieve through mul ple layers.
8. Consider a neural network with one input layer, one hidden layer with 2 neurons and one output
layer with one neuron. Assume the neurons have a sigmoid ac va on func on, actual output=1,
learning rate=0.9. The network parameters for the neural network are as follows: inputs x1=0.35,
x2=0.9. Weights and bias: input to hidden layer: w1=0.1, w12=0.3, w21=0.3, w22=0.4. Hidden to
output layer: wh1=0.45, wh2=0.65.
(i) Draw the architecture of the neural network with the given data.
Architecture:
Output Layer: y
(ii) Calculate the output of the network in the forward propaga on.
Output layer:
(iii) Calculate the error at the output layer for the actual output Y=0.5.
(iv) Calculate the gradients of the weights for the hidden to output layer in the backward
propaga on.
Compute the gradient for each weight ( wh1 ) and ( wh2 ) using the chain rule.
(v) Calculate the gradients of the weights for input to hidden layer in the backward
propaga on.
Compute the gradients for weights ( w1, w12, w21, w22 ) using the chain rule
and the errors from the output layer.
Machine Learning:
Involves algorithms that learn from data and make predic ons or decisions based on
that data.
Deep Learning:
A subset of machine learning that uses neural networks with many layers (deep
architectures).
Automa cally learns features from raw data, reducing the need for manual feature
extrac on.
10. Write the significance of valida on set in training a deep neural network.
Provides an unbiased evalua on of the model during training, helping to ensure that the
model generalizes well to unseen data.
11. Discuss the methods to avoid overfi ng in deep neural network?
Regulariza on: Techniques like L1 and L2 regulariza on add a penalty for large weights.
Early Stopping: Monitors valida on loss and stops training when it starts to increase.
Data Augmenta on: Increases the diversity of the training set by applying
transforma ons.
Proof:
Each neuron applies a non-linear ac va on func on to the weighted sum of its inputs.
Advantages of MLP:
Can model non-linear rela onships due to mul ple layers and non-linear ac va on
func ons.
Capable of solving complex problems that are not linearly separable, unlike single
perceptrons.
Advantages:
Computa onally efficient and helps mi gate the vanishing gradient problem.
Occurs when neurons output zero for all inputs, effec vely becoming inac ve.
Example: If a neuron has a weight that causes it to always output nega ve values, it will
never ac vate.
16. State how Leaky ReLu overcomes the dying ReLu problem.
Leaky ReLU:
Allows a small, non-zero gradient when the unit is not ac ve, preven ng neurons from
dying.
Convolu on Layers:
Extract features from the input image by applying filters (kernels) that slide over the
input.
Pooling Layers:
18. Discuss the significance of using padding technique in convolu onal layer with suitable example.
Significance of Padding:
19. Discuss types of padding techniques used in CNN with suitable example.
Types of Padding:
Valid Padding: No padding is applied, resul ng in a smaller output size. Example: For a
5x5 filter on a 7x7 input, the output will be 3x3.
Same Padding: Padding is added to ensure the output size matches the input size.
Example: For a 5x5 filter on a 7x7 input, 2 pixels of padding are added, resul ng in a 7x7
output.
Full Padding: Adds enough padding to ensure that the filter can slide over every pixel of
the input. Example: For a 5x5 filter on a 7x7 input, 4 pixels of padding are added,
resul ng in a 10x10 output.
20. Write the difference between valid padding, same padding and full padding.
Valid Padding:
No padding is applied.
Same Padding:
Padding is added to maintain the same output size as the input size.
Full Padding:
Output size is larger than input size, allowing the filter to cover all input pixels.
21. State and discuss types of pooling in CNN. Which pooling technique is widely used?
Types of Pooling:
Max Pooling: Takes the maximum value from a patch of the feature map. Widely used
for its ability to retain important features.
Average Pooling: Takes the average value from a patch of the feature map. Less common
as it may lose important features.
Global Average Pooling: Averages the en re feature map, o en used before the final
classifica on layer.
Widely Used Technique: Max pooling is the most commonly used pooling technique due to its
effec veness in retaining dominant features.
Early Stopping:
Training is halted when the valida on loss begins to increase, indica ng that the model
is star ng to overfit the training data.
23. What is Recurrent Neural Network (RNN)? What is the use of it?
A type of neural network designed for sequen al data, where connec ons between
nodes can create cycles.
It maintains a hidden state that captures informa on about previous inputs, making it
suitable for tasks involving me series or sequences.
Uses:
Natural language processing, speech recogni on, and me series predic on.
24. State the limita ons of RNN model. How LSTM overcomes the limita ons of RNN?
A specialized type of RNN that includes memory cells and gates to control the flow of
informa on.
Capable of retaining informa on over long periods, effec vely addressing the limita ons
of standard RNNs.
25. Differen ate between feed forward neural network and Recurrent Neural Network?
Informa on can flow in both direc ons due to cycles in the architecture.
LSTM Network:
It consists of memory cells, input gates, output gates, and forget gates.
How it Works:
Input Gate: Decides which informa on to keep from the current input.
Forget Gate: Decides which informa on to discard from the cell state.
Output Gate: Determines what the next hidden state should be based on the cell state.
3. Repeat un l convergence.
Range: (0, 1)
Range: (-1, 1)
Range: [0, ∞)
Allows a small gradient when inac ve, addressing dying ReLU problem.
So max: Converts logits into probabili es for mul -class classifica on.
30. Find the op mal weights of the perceptron which act as an OR gate for the given data keeping bias
(b=0) as fixed. w1=0.6, w2=0.6 and Learning rate(η)=0.5. Draw the resultant perceptron which acts as
an OR gate with the op mal weights calculated.
Resultant Perceptron:
3Bias: b=0
31. Find the op mal weights of the perceptron which act as an AND gate for the given data keeping
bias (b=0) as fixed. w1=1.2, w2=0.6 and Learning rate(η)=0.5. Draw the resultant perceptron which
acts as an AND gate with the op mal weights calculated.
Bias: b=0
Types of RNN:
Vanilla RNN: Basic form of RNN with simple recurrent connec ons. Example: Simple
sequence predic on tasks.
GRU (Gated Recurrent Unit): A simplified version of LSTM with fewer parameters.
Example: Time series forecas ng.
**Advantages of LSTM: - Long-Term Memory: LSTMs can remember informa on for long
periods, effec vely addressing the vanishing gradient problem that affects standard RNNs.
Ga ng Mechanisms: LSTMs use input, output, and forget gates to control the flow of
informa on, allowing them to learn which informa on to keep or discard.
Be er Performance: LSTMs generally outperform tradi onal RNNs on tasks involving long
sequences, such as language transla on and speech recogni on.
Architecture of an Autoencoder:
Decoder: Reconstructs the original input from the latent representa on.
Loss Func on: Measures the difference between the input and the reconstructed
output, typically using Mean Squared Error.
35. What are the key differences between Convolu onal Neural Networks (CNNs) and Recurrent
Neural Networks (RNNs)?
Data Type:
Architecture:
CNNs use convolu onal layers to extract features from spa al hierarchies.
Processing:
CNNs process inputs in parallel, making them faster for image data.
RNNs process inputs sequen ally, which can lead to longer training mes.
Dropout:
A regulariza on technique that randomly sets a frac on of the neurons to zero during
training.
Prevents co-adapta on of neurons, forcing the network to learn more robust features.
Reduces overfi ng by ensuring that the model does not rely on any specific set of
neurons.
Regulariza on:
Techniques that add a penalty to the loss func on to discourage complex models.
L1 Regulariza on: Adds the absolute value of weights to the loss func on, promo ng
sparsity.
L2 Regulariza on: Adds the squared value of weights to the loss func on, discouraging
large weights.
38. State the difference between valida on set and test set. Discuss how valida on sets are used in
early stopping the ANN model to combat overfi ng.
Valida on Set:
A subset of the training data used to tune hyperparameters and monitor model
performance during training.
Test Set:
A separate dataset used to evaluate the final model's performance a er training and
valida on.
Early Stopping:
Training is halted when the valida on loss starts to increase, indica ng poten al
overfi ng.
This ensures that the model retains its ability to generalize to unseen data.
Variants of ReLU:
Leaky ReLU: ( f(x) = x ) if ( x > 0 ) else ( \alpha x ) (where ( \alpha ) is a small constant,
e.g., 0.01)
Occurs when gradients become very small during backpropaga on, especially in deep
networks.
Leads to slow or stalled learning, as weights are not updated effec vely.
Makes it difficult for the network to learn long-range dependencies, par cularly in RNNs.
A situa on where neurons output zero for all inputs, effec vely becoming inac ve.
Example: If a neuron has a weight that causes it to always output nega ve values, it will
never ac vate, leading to a loss of informa on and reduced model capacity.
42. State the mathema cal formulas for both tanh and sigmoid func ons and describe their range of
outputs.
Range: (-1, 1)
Range: (0, 1)
43. Given a CNN output of Z= [2.1, 5.5, -4.3], calculate the So max probabili es for each class.
( P(y_1) = \frac{e^{2.1}}{S} )
( P(y_2) = \frac{e^{5.5}}{S} )
( P(y_3) = \frac{e^{-4.3}}{S} )
45. Design a CNN for image classifica on task with 10 classes. The CNN is having CONV1 layer with 8
filters, filter size is 5X5, stride=1, padding=0. CONV1 is followed by a maxpooling layer with filter 2x2.
Conv2 layer is having 16 filters followed by a maxpooling layer.
1Input Layer -> CONV1 (8 filters, 5x5, stride=1, padding=0) -> Max Pooling (2x2) ->
2CONV2 (16 filters, 5x5, stride=1, padding=0) -> Max Pooling (2x2) ->
CONV1:
Parameters = (5 * 5 * 3 + 1) * 8 = 608
CONV2:
Fully Connected Layer: Depends on the output size from the last pooling layer.
c) Find the total number of learnable parameters in the above CNN:
Total = 608 + 32016 + (N * 10) (where ( N ) is the number of outputs from the
last pooling layer).
Perceptron:
A type of linear classifier that makes its predic ons based on a linear predictor func on
combining a set of weights with the feature vector.
Introduc on of Nonlinearity:
Common ac va on func ons include ReLU, Sigmoid, and Tanh, which allow the network
to learn complex pa erns and rela onships in the data by transforming the linear
combina ons of inputs into non-linear outputs. This enables the CNN to capture
intricate features in the input images, enhancing its ability to perform tasks such as
image classifica on and object detec on.
48. How weights are ini alized in neural networks?
Random Ini aliza on: Weights are ini alized randomly, o en using a uniform or normal
distribu on. This helps break symmetry.
Xavier Ini aliza on: Designed for layers with sigmoid or tanh ac va on func ons, it sets
weights to values drawn from a distribu on with a mean of 0 and a variance of (
\frac{2}{n_{in} + n_{out}} ), where ( n_{in} ) and ( n_{out} ) are the number of input and
output units, respec vely.
He Ini aliza on: Suitable for layers with ReLU ac va on func ons, it ini alizes weights
from a distribu on with a mean of 0 and a variance of ( \frac{2}{n_{in}} ).
Zero Ini aliza on: All weights are ini alized to zero, but this is generally avoided as it
leads to symmetry and prevents learning.
49. Write the formula for finding the output shape of the convolu onal layer with given input size,
filter size, stride, and padding in CNN model.
For a convolu onal layer, the output shape can be calculated using the formula: [
\text{Output Height} = \le \lfloor \frac{\text{Input Height} - \text{Filter Height} + 2
\ mes \text{Padding}}{\text{Stride}} \right\rfloor + 1 ] [ \text{Output Width} =
\le \lfloor \frac{\text{Input Width} - \text{Filter Width} + 2 \ mes
\text{Padding}}{\text{Stride}} \right\rfloor + 1 ]
The output depth is equal to the number of filters used in the convolu onal layer.
Dropout:
A technique to improve the training of deep neural networks by normalizing the inputs
to each layer. It standardizes the inputs to have a mean of zero and a variance of one,
which helps in stabilizing and accelera ng the training process. Batch normaliza on can
also act as a form of regulariza on.