0% found this document useful (0 votes)
16 views16 pages

Unit-3 D.L

Uploaded by

Samana Murali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views16 pages

Unit-3 D.L

Uploaded by

Samana Murali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Unit-3

Introduction to Deep Learning:


In the fast-evolving era of artificial intelligence, Deep Learning stands as a cornerstone technology,
revolutionizing how machines understand, learn, and interact with complex data. This
transformative field has propelled breakthroughs across various domains, from computer vision and
natural language processing to healthcare diagnostics and autonomous driving. As we dive into this
introductory exploration of Deep Learning, we uncover its foundational principles, applications,
and the underlying mechanisms that empower machines to achieve human-like cognitive abilities.

What is Deep Learning?

The definition of Deep learning is that it is the branch machine learning that is based on artificial
neural network architecture. An artificial neural network or ANN uses layers of interconnected
nodes called neurons that work together to process and learn from the input data.
In a fully connected Deep neural network, there is an input layer and one or more hidden layers
connected one after the other. Each neuron receives input from the previous layer neurons or the
input layer. The output of one neuron becomes the input to other neurons in the next layer of the
network, and this process continues until the final layer produces the output of the network. The
layers of the neural network transform the input data through a series of nonlinear transformations,
allowing the network to learn complex representations of the input data.

Scope of Deep Learning


Supervised Machine Learning: Supervised machine learning is the machine learning technique in
which the neural network learns to make predictions or classify data based on the labeled datasets.
Here we input both input features along with the target variables. the neural network learns to make
predictions based on the cost or error that comes from the difference between the predicted and the
actual target, this process is known as backpropagation.

Unsupervised Machine Learning: Unsupervised machine learning is the machine


learning technique in which the neural network learns to discover the patterns or to cluster the
dataset based on unlabeled datasets. Here there are no target variables. while the machine has to
self-determined the hidden patterns or relationships within the datasets. Deep learning algorithms
like autoencoders and generative models are used for unsupervised tasks like clustering,
dimensionality reduction, and anomaly detection.
Reinforcement Machine Learning: Reinforcement Machine Learning is the machine
learning technique in which an agent learns to make decisions in an environment to maximize a
reward signal. The agent interacts with the environment by taking action and observing the resulting
rewards. Deep learning can be used to learn policies, or a set of actions, that maximizes the
cumulative reward over time. Deep reinforcement learning algorithms like Deep Q networks and
Deep Deterministic Policy Gradient (DDPG) are used to reinforce tasks like robotics and game
playing etc

Artificial neural networks


Artificial neural networks are built on the principles of the structure and operation of human
neurons. It is also known as neural networks or neural nets. An artificial neural network’s input
layer, which is the first layer, receives input from external sources and passes it on to the hidden
layer, which is the second layer. Each neuron in the hidden layer gets information from the neurons
in the previous layer, computes the weighted total, and then transfers it to the neurons in the next
layer. These connections are weighted, which means that the impacts of the inputs from the
preceding layer are more or less optimized by giving each input a distinct weight. These weights are
then adjusted during the training process to enhance the performance of the model.

Fully Connected Artificial Neural Network

Historical Trends in Deep learning,


Origins of Neural Networks :

The evolution of deep learning started in 1940’s 2 guys called Warren McCulloch and Walter Pitts

proposed the concept of artificial neurons. They developed a mathematical model which based on the

working of basic biological neuron. That artificial neuron is called Mcp-neuron.That laid the

foundation for Deep Learning.


What is Perceptron ?

In 1957 a guy called Frank Rosenblatt introduced the perceptron, an algorithm based on artificial

neurons. The perceptron was capable of learning from experience and adjusting its weights to make

accurate predictions.

However, perceptron’s were limited to solving linearly separable problems, and their limitations led to

a decline in interest in neural networks.

What is Backpropagation?

After the Failure of Perceptron, a lot of reasearch happened and finally in 1986 a guy called Geoffrey

Hinton, along with David Rumelhart and Ronald Williams, made a breakthrough by introducing the

backpropagation algorithm. Backpropagation enabled training of multi-layer neural networks by

efficiently computing the gradients of the network’s weights. This advancement rekindled interest in

neural networks and paved the way for the development of deep learning.

This guy also called the father of Deep Learning.


Convolutional Neural Networks (CNNs) :

In the late 1990s, primarily due to the groundbreaking work of Yann LeCun. LeCun developed the

LeNet-5 architecture, which utilized convolutional layers, pooling layers, and fully connected layers to

achieve remarkable performance in handwritten digit recognition tasks.

CNNs revolutionized computer vision by enabling automated feature extraction from images, leading

to breakthroughs in image classification, object detection, and more.


In 1999, Nvidia developed the world’s first GPU. Researchers started implementing deep learning

models on GPUs from the beginning of 2006.

What is a Deep Feed-Forward Network?

Basically, Deep Feed-Forward Networks (I will use the abbreviation DFN for the rest of the article)

are such neural networks which only uses input to feed forward through a function, let’s say f*, but

only through forward. There is no feedback mechanism in DFN. There are indeed such cases when we

have feedback mechanism from the output, that are called Recurrent Neural Networks.

What are its applications?

DFNs are really useful in many areas. One of the most-known applications of AI is so called object-

recognition. It uses convolutional neural networks, that is a special kind of feed-forward network.
applications of a DFN

What is happening in the kitchen of this business anyways?

Let’s get back to our function f* dive deeper. As I previously mentioned we have a

function y=f*(x) that we would like to approximate to f(x) by doing some calculations. Clearly our

input is x and we feed it through our function f* and we get our result y. This is simple. But how does

this help us to implement such functionalities in the field? Imagine y as outputs such that we want to

classify inputs x. DNF defines a mapping y = f(x; θ), learns the values of θ and maps the input x to the

categories of y.

g 1.1 — schema of a DFN

As you can observe from the above picture, DFN consists of many layers. You can think of those

layers like composite functions:

f(x) = f1(f2 (f3 (x)))

or the above function it has depth of 3. The name of deep learning is derived from this terminology.

The output layer is the outter-most function which is f1 in this case. These chain structures are the
most-common used structures. The more composite the function gets, the more layers it has. Some of

those layers are called hidden layers.

Units of a Model

Each element of vector is viewed as a neuron. It’s easier to think them like units in parallel that

recieves inputs from many other units and computes its own activation value rather than a vector-

vector function.

retrieved from https://towardsdatascience.com/an-introduction-to-deep-feedforward-neural-networks-


1af281e306cd

Actually what we do here is converting biological nature into mathematical expressions. Every neuron

has their own input (vectore shaped e.g. x = [x1, x2, x3, x4, … , xn]) from other neurons, multiplies

those inputs by distinct weights (w = [w1, w2, w3, …, wn])and appends a bias noted as b which is a

constant value, it’s used to adjust the net output. Then the net output is fed through a

function g(z) called activation function and the result of g(z) is sent to the other neurons.
Gradient Descent (GD) is a widely used optimization algorithm in machine learning and deep
learning that minimises the cost function of a neural network model during training. It works by
iteratively adjusting the weights or parameters of the model in the direction of the negative
What is Gradient Descent?

gradient of the cost function until the minimum of the cost function is reached.
The learning happens during the backpropagation while training the neural network-based model.
There is a term known as Gradient Descent, which is used to optimize the weight and biases based
on the cost function. The cost function evaluates the difference between the actual and predicted
outputs.
Gradient Descent is a fundamental optimization algorithm in machine learning used to minimize the
cost or loss function during model training.
It iteratively adjusts model parameters by moving in the direction of the steepest decrease in the
cost function.
The algorithm calculates gradients, representing the partial derivatives of the cost function
concerning each parameter.
It iteratively adjusts model parameters by moving in the direction of the steepest decrease in the
cost function.
The algorithm calculates gradients, representing the partial derivatives of the cost function
concerning each parameter.
These gradients guide the updates, ensuring convergence towards the optimal parameter values that
yield the lowest possible cost.

# set random seed for reproducibility


torch.manual_seed(42)

# set number of samples


num_samples = 1000

# create random features with 2 dimensions


x = torch.randn(num_samples, 2)

# create random weights and bias for the linear regression model
true_weights = torch.tensor([1.3, -1])
true_bias = torch.tensor([-3.5])

# Target variable
y = x @ true_weights.T + true_bias

# Plot the dataset


fig, ax = plt.subplots(1, 2, sharey=True)
ax[0].scatter(x[:,0],y)
ax[1].scatter(x[:,1],y)

ax[0].set_xlabel('X1')
ax[0].set_ylabel('Y')
ax[1].set_xlabel('X2')
ax[1].set_ylabel('Y')
plt.show()

OUTPUT:
Hidden Unit Definition:
In the realm of artificial neural networks, a hidden unit refers to an individual neuron or node that
resides within one or more hidden layers of the network. These units play a crucial role in the
network's ability to learn complex representations by transforming inputs received from previous
layers into a form that can be used by subsequent layers or the output layer.

Unlike input and output units, hidden units are not exposed to the external environment, meaning they
do not directly interact with the input data or produce the final output. Instead, they contribute to the
internal processing and feature extraction capabilities of the network, allowing it to capture and model
intricate patterns, relationships, and dependencies in the data.

The functioning of hidden units, characterized by their weighted connections, activation functions,
and biases, is fundamental to the depth and expressiveness of neural networks, especially in deep
learning architectures.

Examples/Use Cases:
Consider a deep learning model designed for image recognition tasks, such as identifying objects
within photographs. In this model, hidden units in the initial layers might learn to recognize basic
visual patterns like edges and corners. As information progresses through subsequent hidden layers,
units in these layers combine and recombine these basic patterns to recognize more complex
structures like textures and shapes. In even deeper layers, hidden units might represent parts of
objects, such as wheels or windows in the context of vehicle recognition.

The hierarchical structuring of hidden units allows the network to learn a multi-layered representation
of the data, facilitating the identification of a wide range of objects in images with high accuracy. This
example illustrates how hidden units enable neural networks to perform sophisticated tasks that
require the abstraction and interpretation of high-dimensional data.

Architecture of Neural Networks

We found a non-linear model by combining two linear models with some equation, weight, bias, and
sigmoid function. Let start its better illustration and understand the architecture of Neural
Network and Deep Neural Network.

Let see an example for better understanding and illustration.

Suppose, there is a linear model whose line is represented as-4x 1-x2+12. We can represent it with the
following perceptron.

The weight in the input layer is -4, -1 and 12 represent the equation in the linear model to which input
is passed in to obtain their probability of being in the positive region. Take one more model whose

line is represented as- x1-x2+3. So the expected perceptron through which we can represent it as
follows:
Now, what we have to do, we will combine these two perceptrons to obtain a non-linear perceptron or
model by multiplying the two models with some set of weight and adding biased. After that, we
applied sigmoid to obtain the curve as follows:

In our previous example, suppose we had two inputs x1 and x2. These inputs represent a single point
at coordinates (2, 2), and we want to obtain the probability of the point being in the positive region
and the non-linear model. These coordinates (2, 2) passed into the first input layer, which consists of
two linear models.

Backk-Propagation and Other Differentiation Algorithms.

In machine learning, backpropagation is an effective algorithm used to train artificial neural


networks, especially in feed-forward neural networks.
Backpropagation is an iterative algorithm, that helps to minimize the cost function by determining
which weights and biases should be adjusted. During every epoch, the model learns by adapting the
weights and biases to minimize the loss by moving down toward the gradient of the error. Thus, it
involves the two most popular optimization algorithms, such as gradient descent or stochastic
gradient descent.
Computing the gradient in the backpropagation algorithm helps to minimize the cost function and it
can be implemented by using the mathematical rule called chain rule from calculus to navigate
through complex layers of the neural network.

fig(a) A simple illustration of how the backpropagation works by adjustments of weights

Advantages of Using the Backpropagation Algorithm in Neural Networks


Backpropagation, a fundamental algorithm in training neural networks, offers several advantages
that make it a preferred choice for many machine learning tasks. Here, we discuss some key
advantages of using the backpropagation algorithm:
1. Ease of Implementation: Backpropagation does not require prior knowledge of neural
networks, making it accessible to beginners. Its straightforward nature simplifies the
programming process, as it primarily involves adjusting weights based on error
derivatives.
2. Simplicity and Flexibility: The algorithm’s simplicity allows it to be applied to a wide
range of problems and network architectures. Its flexibility makes it suitable for various
scenarios, from simple feedforward networks to complex recurrent or convolutional
neural networks.
3. Efficiency: Backpropagation accelerates the learning process by directly updating
weights based on the calculated error derivatives. This efficiency is particularly
advantageous in training deep neural networks, where learning features of a function
can be time-consuming.
4. Generalization: Backpropagation enables neural networks to generalize well to unseen
data by iteratively adjusting weights during training. This generalization ability is
crucial for developing models that can make accurate predictions on new, unseen
examples.
5. Scalability: Backpropagation scales well with the size of the dataset and the
complexity of the network. This scalability makes it suitable for large-scale machine
learning tasks, where training data and network size are significant factors.

Working of Backpropagation Algorithm


The Backpropagation algorithm works by two different passes, they are:
 Forward pass
 Backward pass
How does Forward pass work?
 In forward pass, initially the input is fed into the input layer. Since the inputs are raw
data, they can be used for training our neural network.
 The inputs and their corresponding weights are passed to the hidden layer. The hidden
layer performs the computation on the data it receives. If there are two hidden layers in
the neural network, for instance, consider the illustration fig(a), h1 and h2 are the two
hidden layers, and the output of h1 can be used as an input of h2. Before applying it to
the activation function, the bias is added.
 To the weighted sum of inputs, the activation function is applied in the hidden layer to
each of its neurons. One such activation function that is commonly used is ReLU can
also be used, which is responsible for returning the input if it is positive otherwise it
returns zero. By doing this so, it introduces the non-linearity to our model, which
enables the network to learn the complex relationships in the data. And finally, the
weighted outputs from the last hidden layer are fed into the output to compute the final
prediction, this layer can also use the activation function called the softmax function
which is responsible for converting the weighted outputs into probabilities for each
class.

The forward pass using weights and biases


Example of Backpropagation in Machine Learning
Let us now take an example to explain backpropagation in Machine Learning,
Assume that the neurons have the sigmoid activation function to perform forward and
backward pass on the network. And also assume that the actual output of y is 0.5 and the
learning rate is 1. Now perform the backpropagation using backpropagation algorithm.
Example (1) of backpropagation sum

mplementing forward propagation:


Step1: Before proceeding to calculating forward propagation, we need to know the two formulae:
aj=∑(wi,j∗xi)aj=∑(wi,j∗xi)
Where,
 aj is the weighted sum of all the inputs and weights at each node,
 wi,j – represents the weights associated with the jth input to the ith neuron,
 xi – represents the value of the jth input,
yj=F(aj)=11+e−ajyj=F(aj)=1+e−aj1, yi – is the output value, F denotes the activation function
[sigmoid activation function is used here), which transforms the weighted sum into the output
value.
Step 2: To compute the forward pass, we need to compute the output for y3 , y4 , and y5.

To find the outputs of y3, y4 and y5

We start by calculating the weights and inputs by using the formula:


aj=∑(wi,j∗xi)aj=∑(wi,j∗xi) To find y3 , we need to consider its incoming edges along with its
weight and input. Here the incoming edges are from X1 and X2.
At h1 node,
a1=(w1,1x1)+(w2,1x2)=(0.2∗0.35)+(0.2∗0.7)=0.21a1=(w1,1x1)+(w2,1x2
)=(0.2∗0.35)+(0.2∗0.7)=0.21
Once, we calculated the a1 value, we can now proceed to find the y3 value:
yj=F(aj)=11+e−ajyj=F(aj)=1+e−aj1
y3=F(0.21)=11+e−0.21y3=F(0.21)=1+e−0.211
y3=0.56y3=0.56
Similarly find the values of y4 at h2 and y5 at O3 ,
a2=(w1,2∗x1)+(w2,2∗x2)=(0.3∗0.35)+(0.3∗0.7)=0.315a2=(w1,2∗x1)+(w2,2∗x2
)=(0.3∗0.35)+(0.3∗0.7)=0.315
y4=F(0.315)=11+e−0.315y4=F(0.315)=1+e−0.3151
a3=(w1,3∗y3)+(w2,3∗y4)=(0.3∗0.57)+(0.9∗0.59)=0.702a3=(w1,3∗y3)+(w2,3∗y4
)=(0.3∗0.57)+(0.9∗0.59)=0.702
y5=F(0.702)=11+e−0.702=0.67y5=F(0.702)=1+e−0.7021=0.67

Values of y3, y4 and y5

Note that, our actual output is 0.5 but we obtained 0.67. To calculate the error, we can use the
below formula:
Errorj=ytarget–y5Errorj=ytarget–y5
Error = 0.5 – 0.67
= -0.17
Using this error value, we will be backpropagating.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy