0% found this document useful (0 votes)
3 views22 pages

Module 1

Deep learning is a subset of machine learning utilizing artificial neural networks (ANNs) to process complex data and achieve high accuracy in tasks like image recognition and natural language processing. Neural networks consist of interconnected layers that transform input data through weighted connections and activation functions, enabling automatic feature extraction and scalability with data. Applications include chatbots, self-driving cars, and autoencoders for unsupervised learning tasks such as dimensionality reduction and anomaly detection.

Uploaded by

jaibalaya524
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views22 pages

Module 1

Deep learning is a subset of machine learning utilizing artificial neural networks (ANNs) to process complex data and achieve high accuracy in tasks like image recognition and natural language processing. Neural networks consist of interconnected layers that transform input data through weighted connections and activation functions, enabling automatic feature extraction and scalability with data. Applications include chatbots, self-driving cars, and autoencoders for unsupervised learning tasks such as dimensionality reduction and anomaly detection.

Uploaded by

jaibalaya524
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

DEEP LEARNING

What is deep learning?


 The definition of Deep learning is that it is the subset of machine
learning that is based on artificial neural network architecture.
 An artificial neural network or ANN uses layers of interconnected nodes
called neurons that work together to process and learn from the input
data.
Why we are using deep learning?
We use Deep Learning (DL) because it can solve complex problems that
traditional programming and basic machine learning struggle with.
Simple Reasons for Using DL:
1. Handles Complex Data
o DL can process images, audio, video, and text better than
traditional methods.
o Example: Recognizing faces, translating languages, detecting
objects in images.
2. Automatic Feature Extraction
o DL automatically finds patterns in data without needing humans
to select features.
o Example: In image recognition, DL learns to detect edges, shapes,
and objects by itself.
3. High Accuracy
o DL models achieve better accuracy as they handle large datasets
and complex problems.
o Example: Self-driving cars use DL for accurate obstacle detection.
4. Scalable with Data
o The more data you feed, the better DL performs, making it ideal
for big data problems.
5. Real-World Applications
o DL powers technologies like chatbots, virtual assistants (like Siri),
and recommendation systems (Netflix, YouTube).
In short, DL helps machines "think" and "see" more like humans, making it
powerful for tasks that require intelligence.
What is Neural network?
 A neural network is a type of algorithm inspired by the way the human
brain works. It's a key part of deep learning, used to recognize patterns,
classify data, and make predictions.
 It was inspired by HUMAN BRAIN.
Compare HNN and ANN?
HNN:
A human neural network refers to the network of neurons in the brain and
nervous system that allows humans to think, feel, and control the body.
network
ANN:
Artificial Neural Networks (ANNs) are algorithms inspired by the structure and
functioning of the human brain. They are a core part of deep learning and are
used to recognize patterns, classify data, and make predictions.
Human neural network

Artificial Neural network


Artificial neuron network :

How neural network works?


A neural network is like a system of "neurons" (units) that work together to solve problems.
It mimics how the human brain processes information.
Here’s a simple breakdown:
1. Input Layer: This is where data enters the network. For example, if you're recognizing
numbers, the image of a number would be broken into pixels, and each pixel would
be an input.
2. Hidden Layers: These layers process the data. Each neuron in these layers performs a
mathematical operation on the input it receives, which is then passed to the next
layer. The more layers there are, the deeper the network, which is why it’s called
“deep learning.”
3. Weights and Biases: Each connection between neurons has a weight, which adjusts
how much influence a neuron has on the next one. Bias is added to help the model
learn patterns better.

4. Activation Function: After the neuron performs its calculation, an activation function
decides whether the signal should pass on to the next layer. This helps the network
make complex decisions, like distinguishing between objects or making predictions.

5. Output Layer: The final layer produces the result, like identifying a number in an
image or classifying an email as spam or not.

6. Learning: During training, the network adjusts its weights and biases to minimize
errors, using a process called backpropagation. It compares its predictions to the
actual outcomes, calculates the error, and updates the weights to improve.

NEURAL NETWORK USING BINARY CLASSIFICATION:


 A neural network for binary classification is designed to distinguish
between two classes, such as spam vs. not spam, cat vs. dog, or diseased
vs. healthy. This type of model outputs a probability (between 0 and 1)
indicating that an input belongs to the positive class.
 Binary classification is the task of predicting one of two possible classes.
The neural network's job is to map input data to either:
 Class 0 – Negative class (e.g., Not Spam).
 Class 1 – Positive class (e.g., Spam).

Structure of neural network:


Input Layer:
 Each neuron represents one feature (e.g., pixel intensity, temperature).
Hidden Layers:
One or more layers with neurons that apply non-linear transformations using
activation functions (e.g., ReLU).
Output Layer:
 One neuron with a sigmoid activation.
 Outputs a probability between 0 and 1.
Diagram:

 Forward Pass
The input data passes through the network to generate predictions.
 Loss Function
Measures the performance of a classification model by comparing
predicted probabilities to actual class labels. It penalizes incorrect
predictions more as they diverge from the true label.
 Backpropagation
The model adjusts its weights by minimizing the loss through
gradient descent.
Optimizer:
An adaptive learning rate optimizer that updates the model weights based on
the computed gradients during backpropagation. It is widely used due to its
efficiency and convergence properties.
Accuracy:
The percentage of correct predictions out of total predictions. It is calculated by
comparing the predicted class labels to the true labels.
Bias:
Bias shifts the activation function, allowing the model to fit data even if the
optimal solution doesn't pass through the origin.
Imagine a simple linear model:

 Without bias (b=0): The line must pass through the origin.
 With bias (b≠0): The line can shift up or down to better fit the data.

FORWARD PASS:
 Inputs and weights
 Weighted sums
 Activation
 Prediction
LOSS FUNCTION:
A loss function is a mathematical function that measures how well a neural
network's predictions match the true target values. It calculates the error
during training and guides the model to improve by adjusting its weights.
For binary classification :

 Measures the difference between two probability distributions.


 Best for tasks with two classes (e.g., spam vs. not spam).
BACKPROPAGATION:
Adjusts weights by minimizing the loss using gradient descent.

Neural architecture for multiclass models:


 In deep learning, a neural architecture for a multi-class classification
model typically consists of multiple layers of neurons, where each layer
transforms its inputs and passes them through an activation function.
 The goal is to output the class probabilities for each possible class,
usually through a softmax activation in the final layer.
Architecture for Multi-Class Classification:
Input Layer:
 The input layer consists of neurons equal to the number of features in
your dataset. For example, if you're working with images, the input size
would be the number of pixels
(e.g., for 28x28 images, the input size is 784).
Hidden Layers:
 These layers consist of neurons that perform computations and extract
features (Feature extraction is a key pre-processing step in deep learning, where an
algorithm is used to identify important features of input data. This allows for a more efficient
model or classifier when it comes time to train and accurately predict from the available data)
from the input data. They usually have activation functions like ReLU
(Rectified Linear Unit) or Sigmoid.
 The number of hidden layers and the number of neurons in each layer are
hyperparameters you can adjust. More hidden layers allow the model to
learn more complex patterns.
 Example:
o Hidden Layer 1: 128 neurons with ReLU activation.
o Hidden Layer 2: 64 neurons with ReLU activation.
Output Layer:
 The output layer has as many neurons as there are classes in the
classification problem.
o For a 3-class problem, this layer will have 3 neurons.
 Activation Function: The Softmax activation function is used in the output
layer. Softmax converts the raw output into probabilities, where each
output neuron represents the probability of each class, and the sum of all
probabilities is 1.
Formula for softmax for a given class j:

Where Zj the raw score (logits) from the final layer for class j, and C is the
total number of classes.
Loss Function:
 The loss function typically used for multi-class classification is Categorical
Cross-Entropy:

where:
 Yij is the true label for class j for sample i (usually 1 for the correct class
and 0 for others).
 Yij is the predicted probability for class j for sample i.
 C is the number of classes.
Example of a Simple Multi-Class Neural Architecture:
1. Input Layer:
o Size: 784.. (e.g., for a 28x28 pixel image).
2. Hidden Layer 1:
o Size: 128.. neurons with ReLU activation.
3. Hidden Layer 2:
o Size: 64.. neurons with ReLU activation.
4. Output Layer:
o Size: 10.. neurons (for a 10-class problem, like digit classification).
o Activation: Softmax.
 Z1, Z2: Represent the weighted sums at each layer before applying the
activation function.
 A1, A2: Are the outputs after applying the activation function to Z.

Backpropagated Saliency for Feature Selection:


 Backpropagated saliency is a technique used to determine the importance
of input features in neural networks.
 This technique is valuable for interpretability (the ability to understand
and explain how a ML or DL model makes its predictions or decisions),
feature selection, and reducing model complexity.
 Saliency (how much each part of an input (such as a pixel, word, or
feature) influences a model's output.) measures how important each
input feature is to the final output of the model.
 The process uses backpropagation to compute the gradient of the output
(e.g., class score) with respect to the input features.
 This gradient highlights the most influential features – i.e., small changes
in these features lead to significant changes in the model's prediction.
 Feature Selection is the process of choosing the most relevant input
features to improve model performance, reduce overfitting, and make
the model more interpretable.

How it works?

Forward Pass: Data is passed through the model to compute the prediction.
Backpropagation: Instead of updating weights (as in training), the gradient
is computed with respect to the input features.

Saliency Map: The magnitude of these gradients forms a map showing


which features are most responsible for the model’s decision.

A gradient is a mathematical concept that represents the rate of change of a


function with respect to its inputs. In the context of machine learning and
neural networks, gradients play a crucial role in optimizing models and
understanding how changes in inputs or parameters affect the output.

1. Forward Pass

 The input data is passed through the neural network, producing an output
(prediction).
 Example: In image classification, the input is an image, and the output is
a class score.

Input ---> [Hidden Layers] ---> Output

2. Backward Pass (Gradient Computation)

 Gradients of the target output are calculated with respect to the input
features.
 This shows how much each feature influences the output.

Gradients (Saliency Map)

Input ---> [Hidden Layers] ---> Output

3. Saliency Map (Feature Importance)

 The absolute values of the gradients are used to create a saliency map.
 High values indicate important features.

Original Image (or Input) Saliency Map

[ Image ] [ Highlighted Features ]

Mathematical Representation:
 Let f(x) be the model's output for an input x, and xi be the i-th feature.
 The gradient represents the importance of feature x i. Features with
higher absolute gradients are more significant.

Methods:

 Vanilla Gradients: Basic backpropagation of gradients.


 SmoothGrad: Averages gradients over multiple noisy inputs to reduce
noise in the saliency map.
 Grad-CAM (for convolutional layers): Highlights regions of interest in
image classification tasks.
 Integrated Gradients: Accumulates gradients along the path from a
baseline input to the actual input.

Applications:

1. Image Classification

 Pixels contributing most to the classification are highlighted.


 Helps identify regions of interest.

2. Natural Language Processing (NLP)

 Important words or phrases influencing predictions are detected.

3. Tabular Data

 Feature importance can guide feature selection, improving model


performance and interpretability.

Autoencoders:
 Autoencoders are a special type of unsupervised neural network (no
labels needed).
 The main application of Autoencoders is to accurately capture the key
aspects of the provided data to provide a compressed version of the
input data(which is it will learn to copy the input data as in output) ,
generate realistic synthetic data, or flag anomalies.
 They are widely used for unsupervised learning tasks such as
dimensionality reduction, data denoising, and anomaly detection.

Architecture:

An autoencoder consists of two main components(principle):

Encoder:

 compresses the input data to remove any form of noise and generates a
latent space/bottleneck.
 Therefore, the output neural network dimensions are smaller than the
input and can be adjusted as a hyperparameter in order to decide how
much lossy our compression should be.
 The encoder function can be represented as z=f encoder(x).

Decoder:

 It making use of only the compressed data representation from the latent
space, tries to reconstruct with as much fidelity as possible the original
input data (the architecture of this neural network is, therefore, generally
a mirror image of the encoder).
 The “goodness” of the prediction can then be measured by calculating the
reconstruction error between the input and output data using a loss
function.

The goal of an autoencoder is to minimize the reconstruction error between the


original input x and the reconstructed output . The loss function used is
typically:

 x: Original input.
 =gdecoder(fencoder(x)) Reconstructed input, which is the output of the
decoder applied to the encoder's representation of x.
 ∥⋅∥2: Squared Euclidean distance (also called Mean Squared Error).

In simpler terms, the autoencoder tries to minimize the squared difference


between the input x and its reconstruction .

HYPERPARAMETER:
a hyperparameter is a variable set before the training process begins that
determines the overall structure and learning behavior of the model.
Hyperparameters are not learned from the data but are manually chosen or tuned
through experimentation.

Key Characteristics of Hyperparameters

1. Set Before Training:


o Unlike model parameters (like weights and biases), hyperparameters
are specified before the training process and remain fixed during
training.
2. Model Performance:
o Hyperparameters significantly impact model accuracy, training
speed, and generalization.
3. Categories of Hyperparameters: Hyperparameters can be broadly
divided into two categories:
o Model Hyperparameters: Control the architecture and capacity of
the neural network.
 Number of layers.
 Number of neurons per layer.
 Activation functions.
o Training Hyperparameters: It rules how the model learns during
training.
 Learning rate(Determines the step size during optimization).
 Batch size(Number of samples processed before the model
updates its parameters).
 Number of epochs(Number of complete passes through the
entire training dataset).
 Optimizer type (e.g., SGD, Adam),(i.e)algorithm used to
protect loss function.
 Dropout rate(Fraction of neurons randomly disabled during
training to prevent overfitting).

Working principle:

1. Encoder:
o Compresses input data into a lower-dimensional representation.
o Example: A 28×28 pixel image is reduced to a vector of 16 values.
2. Latent Space:
o The compressed representation containing essential features of the
input data.
3. Decoder:
o Reconstructs the input data from the compressed representation.
o Example: Converts the 16-dimensional vector back into a 28×28
pixel image.
4. Loss Function:
o Measures the difference between the input and the reconstructed
output, guiding the network to minimize reconstruction error.

Step by step procedure:

1. Input Image

 Consider a grayscale image of size 28×28 pixels.


 Each pixel value represents the intensity of the pixel, ranging from 0
(black) to 255 (white).
 The input to the autoencoder is a 2D array of shape (28, 28) or flattened
into a vector of size 784 (28×28).

Example Input:

Original Image: 28×28 pixels (784 features)

2. Encoder Compression

 The encoder is a neural network that gradually reduces the dimensionality


of the input.
 This is achieved through layers with gradually reducing number of the
neurons. For example:
o Input layer: 784 neurons (one for each pixel).
o Hidden layer 1: 128 neurons.
o Hidden layer 2: 64 neurons.
o Latent layer (bottleneck): 16 neurons.

The bottleneck layer (16 neurons) compresses the input into a smaller vector of
16 values, which captures the most important features of the image.

3. Latent Representation (Compressed Vector)

 After the input passes through the encoder, the output at the bottleneck
layer is a vector of size 16(A vector of 16 values represents a compressed
version of the original 784-pixel input).
 This vector represents the compressed version of the original image,
retaining the most critical information.

4. Decoder Reconstruction

 The decoder is another neural network that takes the latent vector and
reconstructs it back to the original image size (28×28 pixels).
 The decoder gradually expands the dimensionality:
o Latent layer (input): 16 neurons.
o Hidden layer 1: 64 neurons.
o Hidden layer 2: 128 neurons.
o Output layer: 784 neurons (reshaped to 28×28).

5. Reconstructed Image

 The output of the decoder is a reconstructed image that closely similar to


the original input but may have some loss of detail due to compression.
 The quality of the reconstruction depends on the model's training and the
size of the latent vector.

Types of Autoencoders:

1. Vanilla Autoencoders:
o Basic architecture with fully connected layers for both encoder and
decoder.
2. Sparse Autoencoders:
o Introduces a sparsity constraint on the latent representation zzz to
learn meaningful features.
3. Denoising Autoencoders:
o Trained to reconstruct the original input from a corrupted version,
making them robust to noise.
4. Variational Autoencoders (VAEs):
o A probabilistic variant of autoencoders where the latent space is
modeled as a distribution rather than a fixed point.
5. Convolutional Autoencoders:
o Use convolutional layers for encoding and decoding, making them
suitable for image data.
6. Contractive Autoencoders:
o Add a penalty term to the loss function to make the latent space
robust to small changes in input.

Applications

1. Dimensionality Reduction
2. Data Denoising
3. Anomaly Detection
4. Image Processing
5. Recommendation Systems

Non-linear activation function:


 A Non-linear activation function is a mathematical function applied to the
output of a neuron (or a node) in a neural network to introduce non-linearity
into the model.
 This is crucial because neural networks need to model complex, non-linear
relationships, and without non-linear activation functions, the network would
essentially behave like a linear model, no matter how many layers it has.

Common Non-Linear Activation Functions:

Sigmoid (Logistic Function):

 Formula: f(x)=1/1+e−x
 Range: (0, 1)
 Use: Often used in binary classification tasks (e.g., output layer for
classification). It squashes the input to a range between 0 and 1, which can
be interpreted as probabilities.
 Drawback: Sigmoid suffers from vanishing gradients for very large or
very small inputs, making it harder for the model to train effectively.

Tanh (Hyperbolic Tangent):

 Formula: f(x)=ex-e-x / ex+e-x


 Range: (-1, 1)
 Use: Similar to sigmoid but outputs values between -1 and 1, making it a
better choice for hidden layers. Still suffers from vanishing gradients at
extreme values.
 Drawback: Tanh can also lead to vanishing gradients for very large or very
small inputs

ReLU (Rectified Linear Unit):

 Formula: f(x)=max(0,x)
 Range: [0, ∞)
 Use: ReLU is the most widely used activation function in modern neural
networks, especially for hidden layers. It’s computationally efficient and
helps mitigate the vanishing gradient problem.
 Drawback: ReLU can suffer from the "dying ReLU" problem, where
neurons stop updating during training if their output is always zero.

Leaky ReLU:

 Formula: f(x)=max(αx, x), where α is a small constant (typically 0.01)


 Range: (-∞, ∞)
 Use: A variant of ReLU that allows a small, non-zero gradient when x<0,
which helps address the "dying ReLU" problem.
 Drawback: The choice of α can affect the model’s performance.

Softmax:

 Formula: f(xi)=exi/∑jexj
 Range: (0, 1) and outputs a probability distribution (sums to 1).
 Use: Softmax is typically used in the output layer of a network for multi-
class classification, where it turns raw logits into probabilities.

Deep encoders:
 It refer to the encoder components of deep neural networks that consist of
multiple layers, typically more than one.
 These layers are responsible for transforming the input data into a compressed
or abstract representation, known as a latent space or bottleneck.

Structure of a Deep Encoder:

A deep encoder might look like this in an autoencoder:

1. Input Layer: Takes the raw input data (e.g., an image or text).
2. Hidden Layers: A series of hidden layers (fully connected, convolutional,
etc.) with non-linear activation functions (e.g., ReLU) that progressively
extract and compress features.
3. Latent Space (Bottleneck): A small, compressed representation of the
input data, typically a vector of lower dimensionality.
4. Output: The encoder's output is typically used as the input to a decoder (in
the case of autoencoders) or passed on to subsequent layers in other types
of networks.

Applications of Deep Encoders:

1. Autoencoders
2. Convolutional Neural Networks (CNNs)
3. Natural Language Processing (NLP)
4. Representation Learning

Outlier detection: (also known as anomaly detection) refers to


identifying data points that deviate significantly from the majority of the dataset.
These outliers may represent unusual but important events, errors, or rare
occurrences that might not fit into the standard model of normal behavior.

How Deep Encoders (Autoencoders) Work for Outlier Detection:

o Training the Autoencoder:


 In outlier detection, an autoencoder is trained on normal data
only (i.e., data that represents the typical or expected
behavior).
 The encoder learns to compress the input data into a latent
space, and the decoder reconstructs the data from that
compressed representation. The goal is for the autoencoder to
learn how to efficiently represent normal data and reconstruct
it accurately.
o Reconstruction Error:
 Once the autoencoder is trained on normal data, it can be used
to reconstruct new input data (including test data or new
samples). However, the autoencoder may struggle to
reconstruct data points that are very different from the normal
data it was trained on, resulting in a high reconstruction
error.
 Outliers, being different from the normal patterns, tend to
have large reconstruction errors because they do not fit well
into the latent space the model has learned.
o Thresholding:
 A threshold is set for the reconstruction error. Data points with
a reconstruction error exceeding this threshold are classified
as outliers.

Steps for Outlier Detection Using Deep Encoders (Autoencoders):

1. Data Preparation:
o Collect and preprocess the dataset. Make sure the data only contains
normal (non-anomalous) instances during training.
2. Train the Autoencoder:
o Train the autoencoder on the normal data. Use a deep encoder
architecture that learns meaningful, compact representations of the
input data. The goal is for the autoencoder to reconstruct normal
instances with minimal error.
3. Evaluate the Reconstruction Error:
o For each data point in a test set (which may include both normal and
anomalous data), compute the reconstruction error, i.e., the
difference between the original input and the reconstructed output.
4. Set a Threshold:
o Determine a threshold for the reconstruction error. Data points with
errors above this threshold are flagged as outliers.
5. Outlier Detection:
o Use the trained model to predict new data points. If their
reconstruction errors are large, they are identified as potential
outliers.

Example Applications of Deep Encoders for Outlier Detection:

1. Fraud Detection
2. Anomaly Detection in Network Traffic
3. Industrial Equipment Monitoring
4. Medical Anomaly Detection
5. Image and Video Anomaly Detection

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy