0% found this document useful (0 votes)

15 views16 pages

DL_UNIT_3_NOTES

The document provides an overview of deep learning, emphasizing its historical development, key concepts, and architecture design. It explains the structure and function of deep feedforward networks, gradient-based learning, and the importance of hidden units and activation functions. The text also highlights the evolution of deep learning, the significance of increasing dataset and model sizes, and the role of cost functions in training neural networks.

Uploaded by

21ag1a6652

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views16 pages

DL_UNIT_3_NOTES

Uploaded by

21ag1a6652

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

lOMoARcPSD|45190830

Neural Networks AND DEEP Learning Notes-1

Computer science Engineering (Sri Shakthi Institute of Engineering and Technology)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by Sai Patibandla (ping2saas145@gmail.com)
lOMoARcPSD|45190830

٠
١
UNIT – III
Introduction to Deep Learning, Historical Trends in Deep learning, Deep Feed -
forward networks, Gradient-Based learning, Hidden Units, Architecture Design, Back-
Propagation and Other Differentiation Algorithms

Introduction to Deep Learning

Deep learning is a sub-field of machine learning dealing with algorithms inspired by
the structure and function of the brain called artificial neural networks. In other words, it mirrors
the functioning of our brains. Deep learning algorithms are similar to how nervous system
structured where each neuron connected each other and passing information.

Example of different representations: suppose we want to separate two categories of data

by drawing a line between them in a scatterplot.

Deep learning allows the computer to build complex concepts out of simpler
concepts.
Below figure shows how a deep learning system can represent the concept of an image of
a person by combining simpler concepts, such as corners and contours, which are in turn defined
in terms of edges. The quintessential example of a deep learning model is the feedforward deep
network or multilayer perceptron (MLP). A multilayer perceptron is just a mathematical function
mapping some set of input values to output values.

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١

Figure 1.2: Illustration of a deep learning model.

Figure 1.3: Illustration of computational graphs mapping an input to an output where

each node performs an operation.

There are two main ways of measuring the depth of a model. The first view is based
on the number of sequential instructions that must be executed to evaluate the architecture. Above
figure illustrates how this choice of language can give two different measurements for the same
architecture. Another approach, used by deep probabilistic models, regards the depth of a model
as being not the depth of the computational graph but the depth of the graph describing how
concepts are related to each other.

Historical Trends in Deep learning

It is easiest to understand deep learning with some historical context. Rather than
providing a detailed history of deep learning, we identify a few key trends:

• Deep learning has had a long and rich history, but has gone by many names reflecting
different philosophical viewpoints, and has waxed and waned in popularity.

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١
• Deep learning has become more useful as the amount of available training data has
increased.
• Deep learning models have grown in size over time as computer infrastructure (both
hardware and software) for deep learning has improved.
• Deep learning has solved increasingly complicated applications with increasing accuracy
over time.

The Many Names and Changing Fortunes of Neural Networks

Broadly speaking, there have been three waves of development of deep learning:
deep learning known as cybernetics in the 1940s–1960s, deep learning known as connectionism
in the 1980s–1990s, and the current resurgence under the name deep learning beginning in 2006.

Some of the earliest learning algorithms we recognize today were intended to be

computational models of biological learning, i.e. models of how learning happens or could happen
in the brain. As a result, one of the names that deep learning has gone by is artificial neural
networks (ANNs).

Fig: This figure shows two of the three historical waves of artificial neural nets research,
as measured by the frequency of the phrases “cybernetics” and “connectionism” or “neural
networks” according to Google Books.

Increasing Dataset Sizes

One may wonder why deep learning has only recently become recognized as a
crucial technology though the first experiments with artificial neural networks were conducted in
the 1950s. As our computers are increasingly networked together, it becomes easier to centralize
these records and curate them into a dataset appropriate for machine learning applications. As of
2016, a rough rule of thumb is that a supervised deep learning algorithm will generally achieve
acceptable performance with around 5,000 labeled examples per category, and will match or
exceed human performance when trained with a dataset containing at least 10 million labeled

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١
examples. Working successfully with datasets smaller than this is an important research area,
focusing in particular on how we can take advantage of large quantities of unlabeled examples,
with unsupervised or semi-supervised learning.

Increasing Model Sizes

Another key reason that neural networks are wildly successful today after enjoying
comparatively little success since the 1980s is that we have the computational resources to run
much larger models today. The increase in model size over time, due to the availability of faster
CPUs, the advent of general purpose GPUs, faster network connectivity and better software
infrastructure for distributed computing, is one of the most important trends in the history of deep
learning. This trend is generally expected to continue well into the future.

Deep Feed - forward networks

Deep feedforward networks, also often called feedforward neural networks, or

multilayer perceptrons (MLPs), are the quintessential deep learning models. The goal of a
feedforward network is to approximate some function f ∗. For example, for a classifier, y = f ∗(x)
maps an input x to a category y. A feedforward network defines a mapping y = f (x; θ) and learns
the value of the parameters θ that result in the best function approximation.

These models are called feedforward because information flows through the
function being evaluated from x, through the intermediate computations used to define f, and
finally to the output y. There are no feedback connections in which outputs of the model are fed
back into itself.

Feedforward neural networks are called networks because they are typically
represented by composing together many different functions. The model is associated
with a directed acyclic graph describing how the functions are composed together. For
example, we might have three functions f (1), f (2), and f (3) connected in a chain, to form f(x) =
f(3)(f (2)(f(1) (x ))). These chain structures are the most commonly used structures of neural
networks. In this case, f (1) is called the first layer of the network, f (2) is called the second layer,
and so on. The overall length of the chain gives the depth of the model. It is from this terminology
that the name “deep learning” arises. The final layer of a feedforward network is called the output
layer. The learning algorithm must decide how to use these layers to best implement an
approximation of f∗. Because the training data does not show the desired output for each of these
layers, these layers are called hidden layers.

Finally, these networks are called neural because they are loosely inspired by
neuroscience. Each hidden layer of the network is typically vector-valued. The dimensionality of
these hidden layers determines the width of the model.

Feedforward networks have introduced the concept of a hidden layer, and this
requires us to choose the activation functions that will be used to compute the hidden layer values.
We must also design the architecture of the network, including how many layers the network

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١
should contain, how these layers should be connected to each other, and how many units should
be in each layer. Learning in deep neural networks requires computing the gradients of complicated
functions. We present the back-propagation algorithm and its modern generalizations, which can
be used to efficiently compute these gradients.

Figure : An example of a feedforward network, drawn in two different styles. Specifically,

this is the feedforward network we use to solve the XOR example.

It has a single hidden layer containing two units. (Left)In this style, we draw every
unit as a node in the graph. This style is very explicit and unambiguous but for networks larger
than this example it can consume too much space. (Right)In this style, we draw a node in the graph
for each entire vector representing a layer’s activations. This style is much more compact.
Sometimes we annotate the edges in this graph with the name of the parameters that describe the
relationship between two layers. Here, we indicate that a matrix W describes the mapping from x
to h, and a vector w describes the mapping from h to y.

Gradient-Based Learning
Designing and training a neural network is not much different from training any
other machine learning model with gradient descent. Computing the gradient is slightly more
complicated for a neural network, but can still be done efficiently and exactly.
As with other machine learning models, to apply gradient-based learning we must
choose a cost function, and we must choose how to represent the output of the model.

Cost Functions
An important aspect of the design of a deep neural network is the choice of the cost
function. Fortunately, the cost functions for neural networks are more or less the same as those for
other parametric models, such as linear models.

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١
In most cases, our parametric model defines a distribution p(y | x;θ ) and we simply
use the principle of maximum likelihood. This means we use the cross-entropy between the
training data and the model’s predictions as the cost function.
The total cost function used to train a neural network will often combine one of the
primary cost functions described here with a regularization term.
 Learning Conditional Distributions with Maximum Likelihood
Most modern neural networks are trained using maximum likelihood. This means
that the cost function is simply the negative log-likelihood, equivalently described as the cross-
entropy between the training data and the model distribution. This cost function is given by
J(θ) = −E x,y∼pˆdata logpmodel(y|x)

Output Units

The choice of cost function is tightly coupled with the choice of output unit. Most
of the time, we simply use the cross-entropy between the data distribution and the model
distribution. The choice of how to represent the output then determines the form of the cross-
entropy function.
Any kind of neural network unit that may be used as an output can also be used as
a hidden unit. we suppose that the feedforward network provides a set of hidden features defined
by h = f (x ;θ ). The role of the output layer is then to provide some additional transformation from
the features to complete the task that the network must perform.
 Linear Units for Gaussian Output Distributions

One simple kind of output unit is an output unit based on an affine transformation
with no nonlinearity. These are often just called linear units.
Given features h, a layer of linear output units produces a vector yˆ = WTh+b
.Linear output layers are often used to produce the mean of a conditional
Gaussian distribution:
p(y|x) = N(y;yˆ,I).
Hidden Units

The design of hidden units is an extremely active area of research and does not yet have
many definitive guiding theoretical principles. Rectified linear units are an excellent default choice
of hidden unit. The design process consists of trial and error, intuiting that a kind of hidden unit
may work well, and then training a network with that kind of hidden unit and evaluating its
performance on a validation set.
Some of the hidden units included in this list are not actually differentiable at all
input points. For example, the rectified linear function g(z) = max{0,z} is not differentiable at z =
0. This may seem like it invalidates g for use with a gradient based learning algorithm.

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١
Unless indicated otherwise, most hidden units can be described as accepting a
vector of inputs x, computing an aﬃne transformation z = W T x + b, and then applying an element-
wise nonlinear function g(z).
Most hidden units are distinguished from each other only by the choice of the form
of the activation function .

Rectiﬁed Linear Units and Their Generalizations

Rectified linear units use the activation function g(z) = max{0,z }.
Rectified linear units are easy to optimize because they are so similar to linear units. The
only difference between a linear unit and a rectified linear unit is that a rectified linear unit outputs
zero across half its domain.
Rectified linear units are typically used on top of an affine transformation:
h=g(WT x + b)
One drawback to rectified linear units is that they cannot learn via gradient based methods
on examples for which their activation is zero.
Logistic Sigmoid and Hyperbolic Tangent
Prior to the introduction of rectified linear units, most neural networks used the
logistic sigmoid activation function
g (z)=σ (z)
or the hyperbolic tangent activation function
g (z) = tanh(z)
These activation functions are closely related because tanh(z)=2σ(2z)-1
Sigmoidal activation functions are more common in settings other than feedforward
networks. Recurrent networks, many probabilistic models, and some autoencoders have additional
requirements that rule out the use of piecewise linear activation functions and make sigmoidal
units more appealing despite the drawbacks of saturation.
Other Hidden Units
Many other types of hidden units are possible, but are used less frequently. In general, a
wide variety of differentiable functions perform perfectly well. Many unpublished activation
functions perform just as well as the popular ones.

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١
One possibility is to not have an activation g(z) at all. One can also think of this as using
the identity function as the activation function. We have already seen that a linear unit can be
useful as the output of a neural network. It may also be used as a hidden unit.
Softmax units are another kind of unit that is usually used as an output but may sometimes
be used as a hidden unit. Softmax units naturally represent a probability distribution over a discrete
variable with k possible values, so they may be used as a kind of switch.
A few other reasonably common hidden unit types include:
• Radial basis function or RBF unit: hi= exp(− 1/σ2i ||W:,i – x||2). This function becomes
more active as x approaches a template W:,i. Because it saturates to for most , it can be difficult to
optimize.
• Softplus: g(a) = ζ(a) = log(1+ea). This is a smooth version of the rectifier for function
approximation and for the conditional distributions of undirected probabilistic models.
• Hard tanh: this is shaped similarly to the tanh and the rectifier but unlike the latter, it is
bounded, g(a) = max(−1 , min(1,a)).

Architecture Design
The word architecture refers to the overall structure of the network: how many units
it should have and how these units should be connected to each other.
Most neural networks are organized into groups of units called layers. Most neural network
architectures arrange these layers in a chain structure, with each layer being a function of the layer
that preceded it. In this structure, the ﬁrst layer is given by
h(1)= g(1)(W(1)Tx + b(1))
the second layer is given by
h(2)= g(2)(W(2)T h(1) + b(2))
and so on.
In these chain-based architectures, the main architectural considerations are to
choose the depth of the network and the width of each layer. The ideal network architecture for a
task must be found via experimentation guided by monitoring the validation set error.
Universal Approximation Properties and Depth
A linear model, mapping from features to outputs via matrix multiplication, can by
deﬁnition represent only linear functions. It has the advantage of being easy to train because many
loss functions result in convex optimization problems when applied to linear models.

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١
The universal approximation theorem states that a feedforward network with a
linear output layer and at least one hidden layer with any “squashing” activation function (such as
the logistic sigmoid activation function) can approximate any Borel measurable function from one
ﬁnite-dimensional space to another with any desired non-zero amount of error, provided that the
network is given enough hidden units.
The universal approximation theorem means that regardless of what function we
are trying to learn, we know that a large MLP will be able to represent this function.
In summary, a feedforward network with a single layer is suﬃcient to represent any
function, but the layer may be infeasibly large and may fail to learn and generalize correctly. In
many circumstances, using deeper models can reduce the number of units required to represent the
desired function and can reduce the amount of generalization error.

Figure: An intuitive, geometric explanation of the exponential advantage of deeper rectiﬁer

networks
More precisely, the main theorem in Montufar et al. states that the number of linear
regions carved out by a deep rectiﬁer network with d inputs, depth , and units per hidden layer, is

i.e., exponential in the depth . In the case of maxout networks with ﬁlters per l k unit, the
number of linear regions is

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١
(l-1)+d
o(k )

Other Architectural Considerations

Many neural network architectures have been developed for speciﬁc tasks.
Specialized architectures for computer vision called convolutional networks. Feedforward
networks may also be generalized to the 9 recurrent neural networks for sequence processing.
Many architectures build a main chain but then add extra architectural features to
it, such as skip connections going from layer i to layer i+2 or higher. These skip connections make
it easier for the gradient to ﬂow from output layers to layers nearer the input.
Another key consideration of architecture design is exactly how to connect a pair
of layers to each other. In the default neural network layer described by a linear transformation via
a matrix W, every input unit is connected to every output unit.

Figure: Empirical results showing that deeper networks generalize better when used
to transcribe multi-digit numbers from photographs of addresses.

Back-Propagation and Other Diﬀerentiation Algorithms

When we use a feedforward neural network to accept an input x and produce an
output ˆy, information flows forward through the network. The inputs x provide the initial
information that then propagates up to the hidden units at each layer and finally produces yˆ. This
is called forward propagation. During training, forward propagation can continue onward until

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١
it produces a scalar cost J (θ). The back-propagation algorithm (Rumelhart et al., 1986a), often
simply called backprop, allows the information from the cost to then flow backwards through
the network, in order to compute the gradient..
The term back-propagation is often misunderstood as meaning the whole learning
algorithm for multi-layer neural networks. Actually, back-propagation refers only to the method
for computing the gradient, while another algorithm, such as stochastic gradient descent, is used
to perform learning using this gradient.
Computational Graphs
To describe the back-propagation algorithm more precisely, it is helpful to have a
more precise language. computational graph Many ways of formalizing computation as graphs are
possible. Here, we use each node in the graph to indicate a variable. The variable may be a scalar,
vector, matrix, tensor, or even a variable of another type. To formalize our graphs, we also need
to introduce the idea of an operation. An operation is a simple function of one or more variables.

Figure: Examples of computational graphs

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١
Chain Rule of Calculus
Back-propagation is an algorithm that computes the chain rule, with a speciﬁc order
of operations that is highly eﬃcient. Let x be a real number, and let f and g both be functions
mapping from a real number to a real number. Suppose that y = g(x) and z = f(g(x)) = f(y). Then
the chain rule states that
dz/dx = (dz/ dy) (dy/dx ).
Recursively Applying the Chain Rule to Obtain Backprop
Using the chain rule, it is straightforward to write down an algebraic expression for
the gradient of a scalar with respect to any node in the computational graph that produced that
scalar.
Specifically, many subexpressions may be repeated several times within the overall
expression for the gradient. Any procedure that computes the gradient will need to choose whether
to store these subexpressions or to recompute them several times. An example of how these
repeated subexpressions arise is given in figure .

Figure 6.9: A computational graph that results in repeated subexpressions when computing
the gradient.

The back-propagation algorithm is designed to reduce the number of common

subexpressions without regard to memory.

Symbol-to-Symbol Derivatives

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١
Algebraic expressions and computational graphs both operate on symbols, or
variables that do not have specific values. These algebraic and graph-based representations are
called symbolic representations. When we actually use or train a neural network, we must assign
specific values to these symbols. We replace a symbolic input to the network x with a specific
numeric value, such as [1.2,3.765,−1.8]T.

Figure: An example of the symbol-to-symbol approach to computing derivatives. In

this approach, the back-propagation algorithm does not need to ever access any actual
specific numeric values. Instead, it adds nodes to a computational graph describing how
to compute these derivatives.

Some approaches to back-propagation take a computational graph and a set of

numerical values for the inputs to the graph, then return a set of numerical
values describing the gradient at those input values. We call this approach “symbol-to-
number” differentiation.

Another approach is to take a computational graph and add additional nodes to the
graph that provide a symbolic description of the desired derivatives.

General Back-Propagation

The back-propagation algorithm is very simple. To compute the gradient of some

scalar z with respect to one of its ancestors x in the graph, we begin by observing that the gradient
with respect to z is given by dz/dz = 1. We can then compute the gradient with respect to each
parent of z in the graph by multiplying the current gradient by the Jacobian of the operation that
produced z. We continue multiplying by Jacobians traveling backwards through the graph in this
way until we reach x. For any node that may be reached by going backwards from z through two
or more paths, we simply sum the gradients arriving from different paths at that node.

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١
More formally, each node in the graph G corresponds to a variable. To achieve
maximum generality, we describe this variable as being a tensor V. Tensor can in general have
any number of dimensions. They subsume scalars, vectors, and matrices.

We assume that each variable is associated with the following V subroutines:

 get_operation(V): This returns the operation that computes V, represented
by the edges coming into V in the computational graph. For example, there may be a Python
or C++ class representing the matrix multiplication operation, and the get_operation function.
Suppose we have a variable that is created by matrix multiplication, C = AB. Then
get_operation(V) returns a pointer to an instance of the corresponding C++ class.
 get_consumers(V, G): This returns the list of variables that are children of
V in the computational graph G.
 get_inputs(V, G): This returns the list of variables that are parents of V
in the computational graph G.

The back-propagation algorithm itself does not need to know any differentiation
rules. It only needs to call each operation’s bprop rules with the right arguments. Formally,
op.bprop(inputs,X,G) must return

Here, inputs is a list of inputs that are supplied to the operation, op.f is the
mathematical function that the operation implements, X is the input whose gradient we
wish to compute, and G is the gradient on the output of the operation.

Software implementations of back-propagation usually provide both the operations

and their bprop methods, so that users of deep learning software libraries are able to back-
propagate through graphs built using common operations like matrix multiplication, exponents,
logarithms, and so on. Software engineers who build a new implementation of back-propagation
or advanced users who need to add their own operation to an existing library must usually derive
the op.bprop method for any new operations manually.

Complications

Most software implementations need to support operations that can return more
than one tensor. For example, if we wish to compute both the maximum value in a tensor and the
index of that value, it is best to compute both in a single pass through memory, so it is most efficient
to implement this procedure as a single operation with two outputs.

We have not described how to control the memory consumption of back

propagation.Back-propagation often involves summation of many tensors together. In the
naive approach, each of these tensors would be computed separately, then all of them would be
added in a second step. The naive approach has an overly high memory bottleneck that can be
avoided by maintaining a single buffer and adding each value to that buffer as it is computed.

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

lOMoARcPSD|45190830

٠
١

Real-world implementations of back-propagation also need to handle various data

types, such as 32-bit floating point, 64-bit floating point, and integer values.The policy for
handling each of these types takes special care to design.

Some operations have undefined gradients, and it is important to track these

cases and determine whether the gradient requested by the user is undefined.

Various other technicalities make real-world differentiation more

complicated.These technicalities are not insurmountable, and this chapter has described the key
intellectual tools needed to compute derivatives, but it is important to be aware that many more
subtleties exist.

Differentiation outside the Deep Learning Community

The deep learning community has been somewhat isolated from the broader
computer science community and has largely developed its own cultural attitudes concerning how
to perform differentiation. More generally, the field of automatic differentiation is concerned
with how to compute derivatives algorithmically.
The back-propagation algorithm described here is only one approach to automatic
differentiation. It is a special case of a broader class of techniques called reverse mode
accumulation. Other approaches evaluate the subexpressions of the chain rule in different orders.
In general, determining the order of evaluation that results in the lowest computational cost is a
difficult problem. Finding the optimal sequence of operations to compute the gradient is NP-
complete (Naumann, 2008), in the sense that it may require simplifying algebraic expressions into
their least expensive form.

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

Illuminati Bible of Divine Light by Sorin Cerin
100% (3)
Illuminati Bible of Divine Light by Sorin Cerin
1,045 pages
Cse Ug20 CD
No ratings yet
Cse Ug20 CD
694 pages
Deep Learning
No ratings yet
Deep Learning
156 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
ML06_Neural-Network_2024-2025
No ratings yet
ML06_Neural-Network_2024-2025
78 pages
Intro Deep Learning
No ratings yet
Intro Deep Learning
43 pages
Chapter 06 - in class
No ratings yet
Chapter 06 - in class
37 pages
Deep Learning Seminar
No ratings yet
Deep Learning Seminar
15 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
Mind Maps On Mathematics - Compressed
No ratings yet
Mind Maps On Mathematics - Compressed
55 pages
Deep Learning Note 21cs743
No ratings yet
Deep Learning Note 21cs743
96 pages
Deep Learning - Unit 1 Notes
No ratings yet
Deep Learning - Unit 1 Notes
27 pages
9.5 CNN-Variants
No ratings yet
9.5 CNN-Variants
21 pages
Neural-Network-oxygen
No ratings yet
Neural-Network-oxygen
25 pages
RM Notes Speicher v2
No ratings yet
RM Notes Speicher v2
141 pages
Deep Learning Midsem Merged Previous Batch
No ratings yet
Deep Learning Midsem Merged Previous Batch
423 pages
R21 - A7709 - Deep Learning: Dr. Bhawani Sankar Panigrahi
No ratings yet
R21 - A7709 - Deep Learning: Dr. Bhawani Sankar Panigrahi
92 pages
Lecture 1 introduction of deep learning - Copy
No ratings yet
Lecture 1 introduction of deep learning - Copy
31 pages
Unit 1 part 1
No ratings yet
Unit 1 part 1
61 pages
APL703_2024
No ratings yet
APL703_2024
4 pages
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
No ratings yet
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
65 pages
Deep Learning Fundamentals
No ratings yet
Deep Learning Fundamentals
19 pages
MY Final Year Project
No ratings yet
MY Final Year Project
20 pages
Unit 3 Introduction to Deep Learning part 1
No ratings yet
Unit 3 Introduction to Deep Learning part 1
7 pages
Deep Learning Basics in Machine Learnning 1
No ratings yet
Deep Learning Basics in Machine Learnning 1
29 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
NN DL Unit - III
No ratings yet
NN DL Unit - III
19 pages
ISC Class 12 Math Question Paper 2024 Solutions Docx
No ratings yet
ISC Class 12 Math Question Paper 2024 Solutions Docx
23 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
7 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
Salman Technical Seminar
No ratings yet
Salman Technical Seminar
24 pages
Deep Learning Module-01 Search Creators
No ratings yet
Deep Learning Module-01 Search Creators
17 pages
Deep Learning-1
No ratings yet
Deep Learning-1
20 pages
Deep Learning
No ratings yet
Deep Learning
243 pages
DL Intro
No ratings yet
DL Intro
64 pages
Lineer Algebra Lecture Notes
No ratings yet
Lineer Algebra Lecture Notes
100 pages
Deep Learning
No ratings yet
Deep Learning
21 pages
Unit-3
No ratings yet
Unit-3
16 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
Unit-3 D.L
No ratings yet
Unit-3 D.L
16 pages
Unit 5 PR
No ratings yet
Unit 5 PR
47 pages
Expanded_Deep_Learning_Document-1
No ratings yet
Expanded_Deep_Learning_Document-1
11 pages
UNIT I part 1 notes
No ratings yet
UNIT I part 1 notes
28 pages
Neural Networks
No ratings yet
Neural Networks
17 pages
MODULE 1 DL SNOTES
No ratings yet
MODULE 1 DL SNOTES
11 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
1.1. Introduction To Deep Learning
No ratings yet
1.1. Introduction To Deep Learning
26 pages
2 DeepLearning
No ratings yet
2 DeepLearning
46 pages
Deep Learning Project
No ratings yet
Deep Learning Project
24 pages
Notification-CSPE 2020 N Engl
No ratings yet
Notification-CSPE 2020 N Engl
9 pages
7 Step Consulting Framework
100% (6)
7 Step Consulting Framework
60 pages
1 s2.0 S0166218X13002266 Main 2 PDF
No ratings yet
1 s2.0 S0166218X13002266 Main 2 PDF
8 pages
W1 Ann
No ratings yet
W1 Ann
3 pages
BCSL-021 C Language Programming Lab
No ratings yet
BCSL-021 C Language Programming Lab
20 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
Unit 1
No ratings yet
Unit 1
20 pages
Deep Learning
No ratings yet
Deep Learning
13 pages
12th Math (23-24)
No ratings yet
12th Math (23-24)
23 pages
AI Lab 1
No ratings yet
AI Lab 1
11 pages
Wa0001 PDF
No ratings yet
Wa0001 PDF
99 pages
Deep Learning Introduction
No ratings yet
Deep Learning Introduction
14 pages
OCI DL Fundations
No ratings yet
OCI DL Fundations
4 pages
SEEC DiscussionPaper No8
No ratings yet
SEEC DiscussionPaper No8
24 pages
Artificial Neural Networks (Anns) VS Deep Neural Networks
No ratings yet
Artificial Neural Networks (Anns) VS Deep Neural Networks
24 pages
Java Lab Cycle Programs 2022
No ratings yet
Java Lab Cycle Programs 2022
2 pages
4.6 Null Space, Column Space, Row Space
No ratings yet
4.6 Null Space, Column Space, Row Space
10 pages
Dynamic and Autonomous Control of Mini Aerial Vehicle Using Model Predictive Control
No ratings yet
Dynamic and Autonomous Control of Mini Aerial Vehicle Using Model Predictive Control
21 pages
Using Pom in Climate and Ocean Analysis Laboratory (Coal) : Prepared by
No ratings yet
Using Pom in Climate and Ocean Analysis Laboratory (Coal) : Prepared by
12 pages
Deep Learning: A Visual Introduction
No ratings yet
Deep Learning: A Visual Introduction
53 pages
Neural Networks and Deep Learning: Deeplearning - Ai-Summary
No ratings yet
Neural Networks and Deep Learning: Deeplearning - Ai-Summary
24 pages
Unit 03 - Neural Networks - MD
No ratings yet
Unit 03 - Neural Networks - MD
24 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
PWP Practical 13 PDF
No ratings yet
PWP Practical 13 PDF
3 pages
Deep Neural Network
No ratings yet
Deep Neural Network
12 pages
A Guide To Deep Learning and Neural Networks
No ratings yet
A Guide To Deep Learning and Neural Networks
15 pages
SSC CGL Tier - I (General Intelligence and Reasoning) Memory Based
No ratings yet
SSC CGL Tier - I (General Intelligence and Reasoning) Memory Based
4 pages
Tetris Tutorial in C++ Platform Independent Focused in Game Logic For Beginners - Javilop
100% (2)
Tetris Tutorial in C++ Platform Independent Focused in Game Logic For Beginners - Javilop
44 pages
MGMT 221, Ch. II
67% (6)
MGMT 221, Ch. II
33 pages
Finite Element Method
No ratings yet
Finite Element Method
6 pages
Inverted Pendulum State-Space Methods For Controller Design
No ratings yet
Inverted Pendulum State-Space Methods For Controller Design
16 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
Model Paper Mathematics
100% (1)
Model Paper Mathematics
5 pages
Unit 4
100% (1)
Unit 4
57 pages
Math for Deep Learning: What You Need to Know to Understand Neural Networks
From Everand
Math for Deep Learning: What You Need to Know to Understand Neural Networks
Ronald T. Kneusel
No ratings yet
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
From Everand
50 Breakthrough AI Concepts in 500 Words Each: In 500 words, #17
Nietsnie Trebla
No ratings yet
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
From Everand
Deep Learning With Python Illustrated Guide For Beginners & Intermediates: The Future Is Here!: The Future Is Here!, #2
William Sullivan
1/5 (1)
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
From Everand
Attractor Networks: Fundamentals and Applications in Computational Neuroscience
Fouad Sabry
No ratings yet
Deep Learning: Fundamentals and Applications
From Everand
Deep Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

DL_UNIT_3_NOTES

Uploaded by

DL_UNIT_3_NOTES

Uploaded by

lOMoARcPSD|45190830

Neural Networks AND DEEP Learning Notes-1

Computer science Engineering (Sri Shakthi Institute of Engineering and Technology)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Introduction to Deep Learning

Example of different representations: suppose we want to separate two categories of data

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

Figure 1.2: Illustration of a deep learning model.

Figure 1.3: Illustration of computational graphs mapping an input to an output where

Historical Trends in Deep learning

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

The Many Names and Changing Fortunes of Neural Networks

Some of the earliest learning algorithms we recognize today were intended to be

Increasing Dataset Sizes

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

Increasing Model Sizes

Deep Feed - forward networks

Deep feedforward networks, also often called feedforward neural networks, or

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

Figure : An example of a feedforward network, drawn in two different styles. Specifically,

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

Rectiﬁed Linear Units and Their Generalizations

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

Figure: An intuitive, geometric explanation of the exponential advantage of deeper rectiﬁer

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

Other Architectural Considerations

Back-Propagation and Other Diﬀerentiation Algorithms

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

Figure: Examples of computational graphs

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

The back-propagation algorithm is designed to reduce the number of common

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

Figure: An example of the symbol-to-symbol approach to computing derivatives. In

Some approaches to back-propagation take a computational graph and a set of

The back-propagation algorithm is very simple. To compute the gradient of some

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

We assume that each variable is associated with the following V subroutines:

Software implementations of back-propagation usually provide both the operations

We have not described how to control the memory consumption of back

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

Real-world implementations of back-propagation also need to handle various data

Some operations have undefined gradients, and it is important to track these

Various other technicalities make real-world differentiation more

Differentiation outside the Deep Learning Community

Downloaded by Sai Patibandla (ping2saas145@gmail.com)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.