Unit-3 D.L
Unit-3 D.L
The definition of Deep learning is that it is the branch machine learning that is based on artificial
neural network architecture. An artificial neural network or ANN uses layers of interconnected
nodes called neurons that work together to process and learn from the input data.
In a fully connected Deep neural network, there is an input layer and one or more hidden layers
connected one after the other. Each neuron receives input from the previous layer neurons or the
input layer. The output of one neuron becomes the input to other neurons in the next layer of the
network, and this process continues until the final layer produces the output of the network. The
layers of the neural network transform the input data through a series of nonlinear transformations,
allowing the network to learn complex representations of the input data.
The evolution of deep learning started in 1940’s 2 guys called Warren McCulloch and Walter Pitts
proposed the concept of artificial neurons. They developed a mathematical model which based on the
working of basic biological neuron. That artificial neuron is called Mcp-neuron.That laid the
In 1957 a guy called Frank Rosenblatt introduced the perceptron, an algorithm based on artificial
neurons. The perceptron was capable of learning from experience and adjusting its weights to make
accurate predictions.
However, perceptron’s were limited to solving linearly separable problems, and their limitations led to
What is Backpropagation?
After the Failure of Perceptron, a lot of reasearch happened and finally in 1986 a guy called Geoffrey
Hinton, along with David Rumelhart and Ronald Williams, made a breakthrough by introducing the
efficiently computing the gradients of the network’s weights. This advancement rekindled interest in
neural networks and paved the way for the development of deep learning.
In the late 1990s, primarily due to the groundbreaking work of Yann LeCun. LeCun developed the
LeNet-5 architecture, which utilized convolutional layers, pooling layers, and fully connected layers to
CNNs revolutionized computer vision by enabling automated feature extraction from images, leading
Basically, Deep Feed-Forward Networks (I will use the abbreviation DFN for the rest of the article)
are such neural networks which only uses input to feed forward through a function, let’s say f*, but
only through forward. There is no feedback mechanism in DFN. There are indeed such cases when we
have feedback mechanism from the output, that are called Recurrent Neural Networks.
DFNs are really useful in many areas. One of the most-known applications of AI is so called object-
recognition. It uses convolutional neural networks, that is a special kind of feed-forward network.
applications of a DFN
Let’s get back to our function f* dive deeper. As I previously mentioned we have a
function y=f*(x) that we would like to approximate to f(x) by doing some calculations. Clearly our
input is x and we feed it through our function f* and we get our result y. This is simple. But how does
this help us to implement such functionalities in the field? Imagine y as outputs such that we want to
classify inputs x. DNF defines a mapping y = f(x; θ), learns the values of θ and maps the input x to the
categories of y.
As you can observe from the above picture, DFN consists of many layers. You can think of those
or the above function it has depth of 3. The name of deep learning is derived from this terminology.
The output layer is the outter-most function which is f1 in this case. These chain structures are the
most-common used structures. The more composite the function gets, the more layers it has. Some of
Units of a Model
Each element of vector is viewed as a neuron. It’s easier to think them like units in parallel that
recieves inputs from many other units and computes its own activation value rather than a vector-
vector function.
Actually what we do here is converting biological nature into mathematical expressions. Every neuron
has their own input (vectore shaped e.g. x = [x1, x2, x3, x4, … , xn]) from other neurons, multiplies
those inputs by distinct weights (w = [w1, w2, w3, …, wn])and appends a bias noted as b which is a
constant value, it’s used to adjust the net output. Then the net output is fed through a
function g(z) called activation function and the result of g(z) is sent to the other neurons.
Gradient Descent (GD) is a widely used optimization algorithm in machine learning and deep
learning that minimises the cost function of a neural network model during training. It works by
iteratively adjusting the weights or parameters of the model in the direction of the negative
What is Gradient Descent?
gradient of the cost function until the minimum of the cost function is reached.
The learning happens during the backpropagation while training the neural network-based model.
There is a term known as Gradient Descent, which is used to optimize the weight and biases based
on the cost function. The cost function evaluates the difference between the actual and predicted
outputs.
Gradient Descent is a fundamental optimization algorithm in machine learning used to minimize the
cost or loss function during model training.
It iteratively adjusts model parameters by moving in the direction of the steepest decrease in the
cost function.
The algorithm calculates gradients, representing the partial derivatives of the cost function
concerning each parameter.
It iteratively adjusts model parameters by moving in the direction of the steepest decrease in the
cost function.
The algorithm calculates gradients, representing the partial derivatives of the cost function
concerning each parameter.
These gradients guide the updates, ensuring convergence towards the optimal parameter values that
yield the lowest possible cost.
# create random weights and bias for the linear regression model
true_weights = torch.tensor([1.3, -1])
true_bias = torch.tensor([-3.5])
# Target variable
y = x @ true_weights.T + true_bias
ax[0].set_xlabel('X1')
ax[0].set_ylabel('Y')
ax[1].set_xlabel('X2')
ax[1].set_ylabel('Y')
plt.show()
OUTPUT:
Hidden Unit Definition:
In the realm of artificial neural networks, a hidden unit refers to an individual neuron or node that
resides within one or more hidden layers of the network. These units play a crucial role in the
network's ability to learn complex representations by transforming inputs received from previous
layers into a form that can be used by subsequent layers or the output layer.
Unlike input and output units, hidden units are not exposed to the external environment, meaning they
do not directly interact with the input data or produce the final output. Instead, they contribute to the
internal processing and feature extraction capabilities of the network, allowing it to capture and model
intricate patterns, relationships, and dependencies in the data.
The functioning of hidden units, characterized by their weighted connections, activation functions,
and biases, is fundamental to the depth and expressiveness of neural networks, especially in deep
learning architectures.
Examples/Use Cases:
Consider a deep learning model designed for image recognition tasks, such as identifying objects
within photographs. In this model, hidden units in the initial layers might learn to recognize basic
visual patterns like edges and corners. As information progresses through subsequent hidden layers,
units in these layers combine and recombine these basic patterns to recognize more complex
structures like textures and shapes. In even deeper layers, hidden units might represent parts of
objects, such as wheels or windows in the context of vehicle recognition.
The hierarchical structuring of hidden units allows the network to learn a multi-layered representation
of the data, facilitating the identification of a wide range of objects in images with high accuracy. This
example illustrates how hidden units enable neural networks to perform sophisticated tasks that
require the abstraction and interpretation of high-dimensional data.
We found a non-linear model by combining two linear models with some equation, weight, bias, and
sigmoid function. Let start its better illustration and understand the architecture of Neural
Network and Deep Neural Network.
Suppose, there is a linear model whose line is represented as-4x 1-x2+12. We can represent it with the
following perceptron.
The weight in the input layer is -4, -1 and 12 represent the equation in the linear model to which input
is passed in to obtain their probability of being in the positive region. Take one more model whose
line is represented as- x1-x2+3. So the expected perceptron through which we can represent it as
follows:
Now, what we have to do, we will combine these two perceptrons to obtain a non-linear perceptron or
model by multiplying the two models with some set of weight and adding biased. After that, we
applied sigmoid to obtain the curve as follows:
In our previous example, suppose we had two inputs x1 and x2. These inputs represent a single point
at coordinates (2, 2), and we want to obtain the probability of the point being in the positive region
and the non-linear model. These coordinates (2, 2) passed into the first input layer, which consists of
two linear models.
Note that, our actual output is 0.5 but we obtained 0.67. To calculate the error, we can use the
below formula:
Errorj=ytarget–y5Errorj=ytarget–y5
Error = 0.5 – 0.67
= -0.17
Using this error value, we will be backpropagating.