Module 1
Module 1
4. Activation Function: After the neuron performs its calculation, an activation function
decides whether the signal should pass on to the next layer. This helps the network
make complex decisions, like distinguishing between objects or making predictions.
5. Output Layer: The final layer produces the result, like identifying a number in an
image or classifying an email as spam or not.
6. Learning: During training, the network adjusts its weights and biases to minimize
errors, using a process called backpropagation. It compares its predictions to the
actual outcomes, calculates the error, and updates the weights to improve.
Forward Pass
The input data passes through the network to generate predictions.
Loss Function
Measures the performance of a classification model by comparing
predicted probabilities to actual class labels. It penalizes incorrect
predictions more as they diverge from the true label.
Backpropagation
The model adjusts its weights by minimizing the loss through
gradient descent.
Optimizer:
An adaptive learning rate optimizer that updates the model weights based on
the computed gradients during backpropagation. It is widely used due to its
efficiency and convergence properties.
Accuracy:
The percentage of correct predictions out of total predictions. It is calculated by
comparing the predicted class labels to the true labels.
Bias:
Bias shifts the activation function, allowing the model to fit data even if the
optimal solution doesn't pass through the origin.
Imagine a simple linear model:
Without bias (b=0): The line must pass through the origin.
With bias (b≠0): The line can shift up or down to better fit the data.
FORWARD PASS:
Inputs and weights
Weighted sums
Activation
Prediction
LOSS FUNCTION:
A loss function is a mathematical function that measures how well a neural
network's predictions match the true target values. It calculates the error
during training and guides the model to improve by adjusting its weights.
For binary classification :
Where Zj the raw score (logits) from the final layer for class j, and C is the
total number of classes.
Loss Function:
The loss function typically used for multi-class classification is Categorical
Cross-Entropy:
where:
Yij is the true label for class j for sample i (usually 1 for the correct class
and 0 for others).
Yij is the predicted probability for class j for sample i.
C is the number of classes.
Example of a Simple Multi-Class Neural Architecture:
1. Input Layer:
o Size: 784.. (e.g., for a 28x28 pixel image).
2. Hidden Layer 1:
o Size: 128.. neurons with ReLU activation.
3. Hidden Layer 2:
o Size: 64.. neurons with ReLU activation.
4. Output Layer:
o Size: 10.. neurons (for a 10-class problem, like digit classification).
o Activation: Softmax.
Z1, Z2: Represent the weighted sums at each layer before applying the
activation function.
A1, A2: Are the outputs after applying the activation function to Z.
How it works?
Forward Pass: Data is passed through the model to compute the prediction.
Backpropagation: Instead of updating weights (as in training), the gradient
is computed with respect to the input features.
1. Forward Pass
The input data is passed through the neural network, producing an output
(prediction).
Example: In image classification, the input is an image, and the output is
a class score.
Gradients of the target output are calculated with respect to the input
features.
This shows how much each feature influences the output.
The absolute values of the gradients are used to create a saliency map.
High values indicate important features.
Mathematical Representation:
Let f(x) be the model's output for an input x, and xi be the i-th feature.
The gradient represents the importance of feature x i. Features with
higher absolute gradients are more significant.
Methods:
Applications:
1. Image Classification
3. Tabular Data
Autoencoders:
Autoencoders are a special type of unsupervised neural network (no
labels needed).
The main application of Autoencoders is to accurately capture the key
aspects of the provided data to provide a compressed version of the
input data(which is it will learn to copy the input data as in output) ,
generate realistic synthetic data, or flag anomalies.
They are widely used for unsupervised learning tasks such as
dimensionality reduction, data denoising, and anomaly detection.
Architecture:
Encoder:
compresses the input data to remove any form of noise and generates a
latent space/bottleneck.
Therefore, the output neural network dimensions are smaller than the
input and can be adjusted as a hyperparameter in order to decide how
much lossy our compression should be.
The encoder function can be represented as z=f encoder(x).
Decoder:
It making use of only the compressed data representation from the latent
space, tries to reconstruct with as much fidelity as possible the original
input data (the architecture of this neural network is, therefore, generally
a mirror image of the encoder).
The “goodness” of the prediction can then be measured by calculating the
reconstruction error between the input and output data using a loss
function.
x: Original input.
=gdecoder(fencoder(x)) Reconstructed input, which is the output of the
decoder applied to the encoder's representation of x.
∥⋅∥2: Squared Euclidean distance (also called Mean Squared Error).
HYPERPARAMETER:
a hyperparameter is a variable set before the training process begins that
determines the overall structure and learning behavior of the model.
Hyperparameters are not learned from the data but are manually chosen or tuned
through experimentation.
Working principle:
1. Encoder:
o Compresses input data into a lower-dimensional representation.
o Example: A 28×28 pixel image is reduced to a vector of 16 values.
2. Latent Space:
o The compressed representation containing essential features of the
input data.
3. Decoder:
o Reconstructs the input data from the compressed representation.
o Example: Converts the 16-dimensional vector back into a 28×28
pixel image.
4. Loss Function:
o Measures the difference between the input and the reconstructed
output, guiding the network to minimize reconstruction error.
1. Input Image
Example Input:
2. Encoder Compression
The bottleneck layer (16 neurons) compresses the input into a smaller vector of
16 values, which captures the most important features of the image.
After the input passes through the encoder, the output at the bottleneck
layer is a vector of size 16(A vector of 16 values represents a compressed
version of the original 784-pixel input).
This vector represents the compressed version of the original image,
retaining the most critical information.
4. Decoder Reconstruction
The decoder is another neural network that takes the latent vector and
reconstructs it back to the original image size (28×28 pixels).
The decoder gradually expands the dimensionality:
o Latent layer (input): 16 neurons.
o Hidden layer 1: 64 neurons.
o Hidden layer 2: 128 neurons.
o Output layer: 784 neurons (reshaped to 28×28).
5. Reconstructed Image
Types of Autoencoders:
1. Vanilla Autoencoders:
o Basic architecture with fully connected layers for both encoder and
decoder.
2. Sparse Autoencoders:
o Introduces a sparsity constraint on the latent representation zzz to
learn meaningful features.
3. Denoising Autoencoders:
o Trained to reconstruct the original input from a corrupted version,
making them robust to noise.
4. Variational Autoencoders (VAEs):
o A probabilistic variant of autoencoders where the latent space is
modeled as a distribution rather than a fixed point.
5. Convolutional Autoencoders:
o Use convolutional layers for encoding and decoding, making them
suitable for image data.
6. Contractive Autoencoders:
o Add a penalty term to the loss function to make the latent space
robust to small changes in input.
Applications
1. Dimensionality Reduction
2. Data Denoising
3. Anomaly Detection
4. Image Processing
5. Recommendation Systems
Formula: f(x)=1/1+e−x
Range: (0, 1)
Use: Often used in binary classification tasks (e.g., output layer for
classification). It squashes the input to a range between 0 and 1, which can
be interpreted as probabilities.
Drawback: Sigmoid suffers from vanishing gradients for very large or
very small inputs, making it harder for the model to train effectively.
Formula: f(x)=max(0,x)
Range: [0, ∞)
Use: ReLU is the most widely used activation function in modern neural
networks, especially for hidden layers. It’s computationally efficient and
helps mitigate the vanishing gradient problem.
Drawback: ReLU can suffer from the "dying ReLU" problem, where
neurons stop updating during training if their output is always zero.
Leaky ReLU:
Softmax:
Formula: f(xi)=exi/∑jexj
Range: (0, 1) and outputs a probability distribution (sums to 1).
Use: Softmax is typically used in the output layer of a network for multi-
class classification, where it turns raw logits into probabilities.
Deep encoders:
It refer to the encoder components of deep neural networks that consist of
multiple layers, typically more than one.
These layers are responsible for transforming the input data into a compressed
or abstract representation, known as a latent space or bottleneck.
1. Input Layer: Takes the raw input data (e.g., an image or text).
2. Hidden Layers: A series of hidden layers (fully connected, convolutional,
etc.) with non-linear activation functions (e.g., ReLU) that progressively
extract and compress features.
3. Latent Space (Bottleneck): A small, compressed representation of the
input data, typically a vector of lower dimensionality.
4. Output: The encoder's output is typically used as the input to a decoder (in
the case of autoencoders) or passed on to subsequent layers in other types
of networks.
1. Autoencoders
2. Convolutional Neural Networks (CNNs)
3. Natural Language Processing (NLP)
4. Representation Learning
1. Data Preparation:
o Collect and preprocess the dataset. Make sure the data only contains
normal (non-anomalous) instances during training.
2. Train the Autoencoder:
o Train the autoencoder on the normal data. Use a deep encoder
architecture that learns meaningful, compact representations of the
input data. The goal is for the autoencoder to reconstruct normal
instances with minimal error.
3. Evaluate the Reconstruction Error:
o For each data point in a test set (which may include both normal and
anomalous data), compute the reconstruction error, i.e., the
difference between the original input and the reconstructed output.
4. Set a Threshold:
o Determine a threshold for the reconstruction error. Data points with
errors above this threshold are flagged as outliers.
5. Outlier Detection:
o Use the trained model to predict new data points. If their
reconstruction errors are large, they are identified as potential
outliers.
1. Fraud Detection
2. Anomaly Detection in Network Traffic
3. Industrial Equipment Monitoring
4. Medical Anomaly Detection
5. Image and Video Anomaly Detection