0% found this document useful (0 votes)
5 views8 pages

DL Module 2

Deep feedforward networks, or multilayer perceptrons, are neural networks that approximate functions by processing inputs through layers without feedback loops. They consist of an input layer, hidden layers for intermediate computations, and an output layer, utilizing activation functions to learn complex patterns. Regularization techniques, such as L1 and L2, help reduce overfitting by penalizing model complexity, while gradient descent optimizes model parameters to minimize loss through iterative updates.

Uploaded by

spranshu311
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views8 pages

DL Module 2

Deep feedforward networks, or multilayer perceptrons, are neural networks that approximate functions by processing inputs through layers without feedback loops. They consist of an input layer, hidden layers for intermediate computations, and an output layer, utilizing activation functions to learn complex patterns. Regularization techniques, such as L1 and L2, help reduce overfitting by penalizing model complexity, while gradient descent optimizes model parameters to minimize loss through iterative updates.

Uploaded by

spranshu311
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

MQP ANSWERS

Explain the working of deep forward networks.

Deep feedforward networks, often called feedforward neural networks (FNNs) or multilayer
perceptrons (MLPs), are a fundamental class of neural networks in deep learning. Let’s break
down the key aspects of these networks, using the XOR example provided, to illustrate their
significance and functionality.

1. Overview of Deep Feedforward Networks

●​ Objective: The primary goal of a feedforward network is to approximate a function f∗. For
instance, in classification, the network maps an input x to an output y=f(x;θ), where θ
represents the network's parameters.
●​ Flow of Information: Information flows in one direction—from the input X, through
intermediate layers, to the output Y. There are no feedback loops in FNNs, distinguishing
them from recurrent neural networks (RNNs).
●​ Layers:
○​ Input Layer: Receives the input features.
○​ Hidden Layers: Layers where intermediate computations are performed.
○​ Output Layer: Produces the final output.

2. Structure of a Feedforward Network

●​ Composition of Functions: FNNs are built by chaining multiple functions together. For

example, a 3-layer network can be expressed as:

Each layer transforms its input, progressively learning higher-level features.

●​ Depth and Width:


○​ Depth: Refers to the number of layers in the network.
○​ Width: Refers to the number of neurons in each layer.
●​ Activation Functions: Introduce non-linearity, enabling the network to learn complex
functions. Common activations include:
○​ Rectified Linear Unit (ReLU): g(z)=max⁡(0,z)
○​ Sigmoid and Tanh (less commonly used today).

XOR Problem: A Classic Example

3.1 Problem Statement

The XOR (exclusive or) function operates on two binary inputs x1​and x2​, producing the output:

The XOR function is non-linearly separable, meaning it cannot be solved using a linear model.

Learning XOR

A simple feedforward neural network is proposed, consisting of two layers: a hidden layer and an
output layer. This allows the network to represent the XOR function by transforming the input space
in a way that makes it linearly separable.

●​ Input layer: Takes two binary values (like [0, 1], [1, 0], etc.).
●​ Hidden layer: Contains two units (neurons) that process the input using a non-linear
activation function
●​ Output layer: The result from the hidden layer is passed to a final output layer, which
predicts the value of the XOR function for that specific input.
What is regularization? How does regularization help in reducing overfitting.

Regularization is a technique in machine learning used to prevent overfitting by adding a penalty to


the model's complexity. Overfitting occurs when a model learns the noise and unnecessary details
in the training data, making it perform poorly on unseen data. Regularization helps the model
generalize better by discouraging it from being overly complex and focusing only on the significant
patterns.

How Regularization Helps Reduce Overfitting

When training a machine learning model, we aim to minimize a loss function that measures how
well the model fits the data. Regularization modifies this loss function by adding a penalty term.
This penalty increases as the model's parameters (weights) grow larger or become more complex.
By penalizing large or unnecessary parameters, regularization forces the model to simplify, thereby
reducing overfitting.

The modified loss function looks like this:​


Loss = Original Loss + Regularization Term

●​ Original Loss: Measures the model's error on the training data.


●​ Regularization Term: Adds a penalty for large weights, discouraging over-complex
solutions.

Types of Regularization

1.​ L1 Regularization (Lasso Regression)


○​ Adds the sum of the absolute values of the weights to the loss function:​
Loss = Original Loss + λ * Σ|w|
○​ Encourages sparsity by forcing some weights to become exactly 0, effectively
eliminating irrelevant features.
○​ Useful for feature selection, as it selects only the most important features for the
model.
2.​ L2 Regularization (Ridge Regression)
○​ Adds the sum of the squared values of the weights to the loss function:​
Loss = Original Loss + λ * Σ(w²)
○​ Reduces the magnitude of all weights without making them zero. This ensures that
the model does not rely too heavily on any single feature.
○​ Results in a smoother and more stable model, especially when all features
contribute some useful information.
3.​ Elastic Net
○​ Combines L1 and L2 regularization to get the benefits of both.
○​ Adds both absolute and squared weight penalties to the loss function:​
Loss = Original Loss + λ1 * Σ|w| + λ2 * Σ(w²)

Key Benefits of Regularization

1.​ Prevents Overfitting


○​ Regularization simplifies the model, ensuring it does not memorize noise in the
training data.
○​ The model focuses on general patterns rather than overfitting to specific training
examples.
2.​ Improves Generalization
○​ By controlling complexity, regularization enables the model to perform better on
unseen data.
3.​ Handles Multicollinearity
○​ L2 regularization is particularly effective at handling correlated features, as it
distributes weights more evenly.
4.​ Automatic Feature Selection
○​ L1 regularization (Lasso) automatically removes irrelevant features by assigning their
weights to 0.

Explain briefly about the gradient descent algorithm.


Gradient Descent is an optimization algorithm used in machine learning to minimize the loss
function of a model by iteratively adjusting its parameters (weights and biases). The goal is to find
the values of the parameters that result in the smallest possible loss.

Gradient Descent is an optimization algorithm used in machine learning to minimize the loss
function of a model by iteratively adjusting its parameters (weights and biases). The goal is to find
the values of the parameters that result in the smallest possible loss.
How Gradient Descent Works

1.​ Initialization:
○​ Start with random values for the model's parameters.
2.​ Compute the Loss:
○​ Use the current parameters to calculate the loss (how far the model's predictions are
from the actual values).
3.​ Calculate the Gradient:
○​ Compute the gradient of the loss function with respect to the parameters.
○​ The gradient indicates the direction and magnitude of the steepest increase in loss.
4.​ Update the Parameters:
○​ Move the parameters in the opposite direction of the gradient to reduce the loss.
○​ The update rule:​
θ = θ - α * (∂Loss/∂θ)
■​ θ: Model parameters
■​ α: Learning rate (controls the step size of the update)
5.​ Repeat:
○​ Iterate through steps 2–4 until the loss converges to a minimum or a stopping
condition is met.

Types of Gradient Descent

1.​ Batch Gradient Descent


○​ How it works:​
In batch gradient descent, the algorithm computes the gradient of the loss function
using the entire training dataset in every iteration.
○​ Advantages:
■​ Provides a smooth and stable convergence because the gradient is averaged
over all data points.
○​ Disadvantages:
■​ Computationally expensive for large datasets because it requires calculating
the gradient for the whole dataset before updating parameters.
○​ Use case:​
Preferred when the dataset is small and can fit into memory.
2.​ Stochastic Gradient Descent (SGD)
○​ How it works:​
Instead of using the entire dataset, SGD computes the gradient and updates the
model parameters using a single data point at each iteration.
○​ Advantages:
■​ Much faster as it processes one sample at a time, making it suitable for large
datasets.
○​ Disadvantages:
■​ Updates can be noisy, leading to fluctuations in the loss function.
○​ Use case:​
Often used for very large datasets or real-time learning applications where data
arrives sequentially.

3.​ Mini-Batch Gradient Descent


○​ How it works:​
Mini-batch gradient descent splits the dataset into small random batches of fixed
size. It computes the gradient for each batch and updates the parameters.
○​ Advantages:
■​ Strikes a balance between the computational efficiency of batch gradient
descent and the noisy updates of SGD.
○​ Disadvantages:
■​ Requires tuning the batch size. Too small → noisy updates; too large → slow
computations.
○​ Use case:​
Widely used in practice, especially for training deep learning models where the
dataset is large.

Discuss the working of Backpropagation.

Backpropagation (short for backward propagation of errors) is a widely used algorithm in training
neural networks. Its goal is to adjust the weights of the network to minimize the error in predictions.
It achieves this by propagating the error backward through the network, layer by layer, using the
gradient descent optimization technique.
Steps in Backpropagation

1.​ Initialization:
○​ Initialize all the weights and biases in the network randomly or using a specific
method (e.g., Xavier initialization).

2.​ Forward Pass:


○​ Input data is passed through the network layer by layer to calculate the predicted
output.
○​ Each layer applies its weights, biases, and activation function to compute its output.
○​ The final layer produces the network's prediction.

3.​ Compute the Loss:


○​ Compare the predicted output with the actual target value using a loss function
(e.g., Mean Squared Error, Cross-Entropy).
○​ The loss function quantifies the error in the network's prediction.

4.​ Backward Pass (Propagating Errors Backward):​


This is the key step of backpropagation and is divided into two main parts:
○​ Step 4.1: Compute Gradients of the Output Layer:
■​ The algorithm calculates how the error changes with respect to the weights
and biases of the output layer.
■​ It uses the derivative of the activation function and the chain rule of calculus to
propagate the error backward.
○​ For example, the gradient of the loss with respect to the weights in the output layer is
computed as:​

Here:

■​ δ: Error term of the output layer (depends on the loss function and activation).
■​ z Input to the output layer from the previous layer.
○​ Step 4.2: Backpropagate to Hidden Layers:
■​ For each hidden layer, the error is propagated backward using the chain rule.
■​ The gradient of the loss with respect to the weights and biases of each hidden
layer is calculated in a similar way as the output layer, considering the gradient
flow from the next layer.
■​ This process is repeated until the gradients for all layers are computed.

5.​ Update Weights and Biases:


○​ Once the gradients are computed, the weights and biases are updated using an
optimization algorithm like Gradient Descent:

​Here:

■​ η: Learning rate (controls the step size of updates).


∂𝐿
■​ ∂𝑤
​: Gradient of the loss with respect to the weight.

○​ This step reduces the error in the subsequent forward passes.

6.​ Repeat for Multiple Epochs:


○​ The above steps (forward pass → compute loss → backward pass → weight update)
are repeated for multiple epochs (complete passes through the dataset).
○​ Over time, the network learns to minimize the loss and make better predictions.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy