0% found this document useful (0 votes)

32 views19 pages

Unit 3

The document provides an overview of Convolution Neural Networks (CNNs), including their typical architecture and applications. CNNs use convolution layers to extract features from input data like images. They also use pooling and fully connected layers. CNNs are commonly used for tasks involving visual data like image classification, object detection, and segmentation.

Uploaded by

Aanchal Meena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views19 pages

Unit 3

Uploaded by

Aanchal Meena

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 19

UNIT-III

CONVOLUTION NEURAL NETWORK

Convolution Neural Networks (CNNs) are a type of deep learning model primarily used for
processing structured array data such as images. They are particularly powerful for tasks that
involve visual perception. Here's a basic overview of how CNNs work and their key
components:

Convolution Layers:

The core building block of a CNN is the convolution layer.

This layer applies a number of filters to the input. These filters help the network in identifying
various features in the data, such as edges, textures, or specific objects in case of image data.
Each filter convolves across the input data, computing dot products between the entries of the
filter and the input, producing a feature map.

ReLU (Rectified Linear Unit) Layer:

After each convolution operation, a nonlinear layer (such as a ReLU or rectified linear unit) is
applied to introduce nonlinearity into the model, enabling it to learn more complex patterns.

Pooling Layers:

Pooling (also known as sub sampling or down sampling) reduces the dimensionality of each
feature map but retains the most important information.

Common pooling techniques include max pooling and average pooling.

Fully Connected Layers:

After several convolution and pooling layers, the high-level reasoning in the neural network is
done through fully connected layers. Neurons in a fully connected layer have connections to all
activations in the previous layer, and their activations can thus depend on the entire input.

Output Layer:

The final layer, typically a soft ax layer, provides the 'output' of the network, which could be a
class label in classification tasks or a set of values in regression.

Back propagation and Optimization:

CNNs learn through back propagation and an optimization algorithm like Stochastic Gradient
Descent (SGD), Adam, etc. During training, the network adjusts its weights to minimize the error
in its predictions compared to the actual data.
Applications of CNNs:

Image Classification: Determining the category of an image.

Object Detection: Locating objects within an image and classifying them.

Segmentation: Dividing an image into segments to identify different objects or regions.

Face Recognition: Identifying and verifying individuals from their facial features.

Image Generation: Creating new images (e.g., deep fakes, artistic generation).

CNNs have revolutionized the field of computer vision due to their efficiency and accuracy in
processing image data. Their ability to learn hierarchical feature representations makes them
particularly suited for complex visual tasks.

What is a convolution neural network?

A convolution neural network (CNN) is a type of artificial neural network used primarily for
image recognition and processing, due to its ability to recognize patterns in images. A CNN is a
powerful tool but requires millions of labeled data points for training.

CONVOLUTION NEURAL NETWORK ARCHITECTURE

The architecture of a Convolution Neural Network (CNN) typically consists of a series of layers
designed to process and extract features from input data, such as images. Here's a basic overview
of the typical architecture of a CNN:

Input Layer:

The input layer represents the raw input data, which is usually an image in the case of computer
vision tasks.
The dimensions of the input layer correspond to the dimensions of the input data (e.g., width,
height, and depth for images).

Convolution Layers:

Convolution layers are responsible for learning features from the input data. Each convolution
layer applies a set of filters (also known as kernels) to the input data, creating feature maps that
highlight important patterns in the data. These filters are small spatially (along width and height),
but extend through the full depth of the input volume.

Activation Function (e.g., ReLU):

An activation function is applied element-wise to the output of each convolution layer. Common
choices for activation functions include Rectified Linear Unit (ReLU), which introduces non-
linearity into the model.

Pooling Layers:

Pooling layers down sample the feature maps generated by the convolution layers. Common
pooling operations include max pooling and average pooling, which reduce the spatial
dimensions of the input, helping to reduce computation and over fitting.

Fully Connected Layers (Dense Layers):

Output Layer:

The output layer produces the final output of the network, which could be a class label in
classification tasks or a set of values in regression tasks. The number of neurons in the output
layer depends on the specific task (e.g., the number of classes in classification tasks).

Soft ax Activation (for Classification):

In classification tasks, the soft ax activation function is often used in the output layer to convert
raw scores into class probabilities.

Loss Function:

A loss function is used to measure the difference between the network's predictions and the
actual target values. Common loss functions include categorical cross-entropy for classification
and mean squared error for regression.
Optimization Algorithm:

An optimization algorithm (e.g., stochastic gradient descent) is used to minimize the loss
function by adjusting the weights of the network during training. This basic architecture can be
customized and extended for specific applications and datasets. For example, deeper networks
with more layers can capture more complex features, but they also require more computational
resources and are prone to over fitting if not trained carefully. There are also various
architectural innovations like skip connections (e.g., in Resent) and attention mechanisms that
have been introduced to improve the performance of CNNs in different tasks.

What is CNN?

Convolution Neural Networks (CNN or Convent) are a type of multi-layer neural network that is
meant to discern visual patterns from pixel images. In CNN, ‘convolution’ is referred to as the
mathematical function. It’s a type of linear operation in which you can multiply two functions to
create a third function that expresses how one function’s shape can be changed by the other. In
simple terms, two images that are represented in the form of two matrices are multiplied to
provide an output that is used to extract information from the image. CNN is similar to other
neural networks, but because they use a sequence of convolution layers, they add a layer of
complexity to the equation. CNN cannot function without convolution layers.

Typical CNN Architecture

The Convent’s job is to compress the images into a format that is easier to process while
preserving elements that are important for obtaining a decent prediction. This is critical for
designing an architecture that is capable of learning features while also being scalable to large
datasets. A convolution neural network, Convents in short has three layers which are its building
blocks, let’s have a look:

Convolution Layer (CONV): They are the foundation of CNN, and they are in charge of
executing convolution operations. The Kernel/Filter is the component in this layer that performs
the convolution operation (matrix). Until the complete image is scanned, the kernel makes
horizontal and vertical adjustments dependent on the stride rate. The kernel is less in size than a
picture, but it has more depth. This means that if the image has three (RGB) channels, the kernel
height and width will be modest spatially, but the depth will span all three. Other than
convolution, there is another important part of convolution layers, known as the Non-linear
activation function. The outputs of the linear operations like convolution are passed through a
non-linear activation function. Although smooth nonlinear functions such as the sigmoid or
hyperbolic tangent (tanh) function were formerly utilized because they are mathematical
representations of biological neuron actions. The rectified linear unit (ReLU) is now the most
commonly used non-linear activation function. f(x) = max(0, x)
Pooling Layer (POOL): This layer is in charge of reducing dimensionality. It aids in reducing
the amount of computing power required to process the data. Pooling can be divided into two
types: maximum pooling and average pooling. The maximum value from the area covered by the
kernel on the image is returned by max pooling. The average of all the values in the part of the
image covered by the kernel is returned by average pooling.

Fully Connected Layer (FC): The fully connected layer (FC) works with a flattened input,
which means that each input is coupled to every neuron. After that, the flattened vector is sent
via a few additional FC layers, where the mathematical functional operations are normally
performed. The classification procedure gets started at this point. FC layers are frequently found
near the end of CNN architectures if they are present.

Along with the above layers, there are some additional terms that are part of CNN
architecture.

Activation Function: The last fully connected layer’s activation function is frequently distinct
from the others. Each activity necessitates the selection of an appropriate activation function.
The soft ax function, which normalizes output real values from the last fully connected layer to
target class probabilities, where each value ranges between 0 and 1 and all values total to 1, is an
activation function used in the multiclass classification problem.

Dropout Layers: The Dropout layer is a mask that nullifies some neurons’ contributions to the
following layer while leaving all others unchanged. A Dropout layer can be applied to the input
vector, nullifying some of its properties; however, it can also be applied to a hidden layer,
nullifying some hidden neurons. Dropout layers are critical in CNN training because they
prevent the training data from over fitting. If they aren’t there, the first batch of training data has
a disproportionately large impact on learning. As a result, learning of traits that occur only in
later samples or batches would be prevented: Now you have got a good understanding of the
building blocks of CNN, let’s have a look to some of the popular CNN architecture.

LeNet Architecture

The LeNet architecture is simple and modest making it ideal for teaching the fundamentals of
CNNs. It can even run on the CPU (if your system lacks a decent GPU), making it an excellent
“first CNN.” It’s one of the first and most extensively used CNN designs, and it’s been used to
successfully recognize handwritten digits. The LeNet-5 CNN architecture has seven layers.
Three convolution layers, two sub sampling layers, and two fully linked layers make up the layer
composition.

Alex Net Architecture

Alex Net’s architecture was extremely similar to Le Net’s. It was the first convolution network to
employ the graphics processing unit (GPU) to improve performance. Convolution filters and a
non-linear activation function termed ReLU are used in each convolution layer (Rectified Linear
Unit). Max pooling is done using the pooling layers. Due to the presence of fully connected
layers, the input size is fixed. The Alex Net architecture was created with large-scale image
datasets in mind, and it produced state-of-the-art results when it was first released. It has 60
million characteristics in all.

VGG Net Architecture

While prior Alex Net derivatives focused on smaller window sizes and strides in the first
convolution layer, VGG takes a different approach to CNN. It takes input as a 224×224 pixel
RGB image. To keep the input image size consistent for the Image Net competition, the authors
clipped out the middle 224×224 patch in each image. The receptive field of the convolution
layers in VGG is quite tiny. The convolution stride is set at 1 pixel in order to preserve spatial
resolution after convolution. VGG contains three completely connected layers, the first two of
which each have 4096 channels and the third of which has 1000 channels, one for each class.
Due to its adaptability for a variety of tasks, including object detection, the VGG CNN model is
computationally economical and serves as a good baseline for many applications in computer
vision.

Advantages of CNN Architecture

Following are some of the advantages of a Convolution Neural Network: It performs parameter
sharing and uses special convolution and pooling algorithms. CNN models may now run on any
device, making them globally appealing. It finds the relevant features without the need for
human intervention. It can be utilized in a variety of industries to execute key tasks such as facial
recognition, document analysis, climate comprehension, image recognition, and item
identification, among others. By feeding your data on each level and tuning the CNN a little for a
specific purpose, you can extract valuable features from an already trained CNN with its taught
weights.
MOTIVATION LAYER

It seems there might be a misunderstanding with the term "motivation layer." In the context of
neural networks, including Convolution Neural Networks (CNNs), there is no standard layer
referred to as a "motivation layer."

In the context of neural networks, the layers typically involved are:

Input Layer: Receives input data, such as images in the case of CNNs.

Convolution Layers: Perform feature extraction through convolution operations.

Activation Layers: Apply activation functions (like ReLU) to introduce non-linearity.

Pooling Layers: Reduce the spatial dimensions of the input, aiding in feature selection.

Fully Connected Layers: Neurons in these layers have connections to all activations in the
previous layer, performing high-level reasoning.

Output Layer: Produces the final output of the network, which could be a class label in
classification tasks or a set of values in regression tasks.

Motivation Aspect Met model

Figure 61 shows the met model of motivational concepts. It includes the actual motivations or
intentions – i.e., goals, principles, requirements, and constraints – and the sources of these
intentions; i.e., stakeholders, drivers, and assessments.

Motivational elements are related to the core elements via the requirement or constraint concept.

Motivation Extension Meta model

Motivational Concepts

It is essential to understand the factors, often referred to as drivers, which influence the
motivational elements. They can originate from either inside or outside the enterprise. Internal
drivers, also called concerns, are associated with stakeholders, which can be some individual
human being or some group of human beings, such as a project team, enterprise, or society.
Examples of such internal drivers are customer satisfaction, compliance to legislation, or
profitability. It is common for enterprises to undertake an assessment of these drivers; e.g., using
a SWOT analysis, in order to respond in the best way.The actual motivations are represented by
goals, principles, requirements, and constraints. Goals represent some desired result – or end –
that a stakeholder wants to achieve; e.g., increasing customer satisfaction by 10%. Principles and
requirements represent desired properties of solutions – or means – to realize the
goals. Principles are normative guidelines that guide the design of all possible solutions in a
given context. For example, the principle “Data should be stored only once” represents a means
to achieve the goal of “Data consistency” and applies to all possible designs of the organization’s
architecture. Requirements represent formal statements of need, expressed by stakeholders,
which must be met by the architecture or solutions. For example, the requirement “Use a single
CRM system” conforms to the aforementioned principle by applying it to the current
organization’s architecture in the context of the management of customer data.

FILTERS

In the context of Convolution Neural Networks (CNNs), filters, also known as kernels, play a
crucial role. They are fundamental components used in the convolutional layers to extract
features from the input data, such as images. Here's a detailed look at how filters work in CNNs:

What are Filters?

Filters in CNNs are small matrices of weights. These weights are learned during the training
process. Each filter is designed to detect specific features in the input data, such as edges, colors,
textures, or more complex patterns in deeper layers.

Size and Shape:

The size of a filter is typically much smaller than the size of the input data. Common dimensions
are 3x3, 5x5, or 7x7, but this can vary. Filters have a depth that matches the depth of the input
data. For example, for an RGB image (which has a depth of 3), each filter also has a depth of 3.

How Filters Work:

During the forward pass, each filter is convolved across the width and height of the input
volume, computing the dot product between the entries of the filter and the input at any position.
As a filter slides over the input data, it produces a 2-dimensional activation map (or feature map)
that gives the responses of that filter at every spatial position.
Feature Maps:

The feature map obtained by convolving a filter represents the presence of the features detected
by that filter across the input. Different filters detect different features, resulting in different
feature maps for the same input.

Learning Process:

Through the process of back propagation, the CNN adjusts the values of these filters to minimize
the loss function. This learning process enables the filters to become feature detectors, adapting
to extract relevant features for the task at hand.

Strides and Padding:

The stride controls how much the filter moves across the input. A stride of 1 moves the filter one
pixel at a time, while a stride of 2 moves it two pixels, and so on. Padding can be added to the
input volume to control the spatial size of the output volumes, allowing deeper layers to retain a
larger spatial footprint of the input.

Role in Deep Learning:

In deeper layers of the network, filters can detect more complex features, as they receive input
from feature maps created by earlier layers that represent more basic features. The stacking of
convolution layers allows CNNs to learn a hierarchy of features, from simple edges and textures
in early layers to more complex, abstract concepts in deeper layers. In summary, filters are a vital
part of CNN architecture, enabling these networks to automatically and adaptively learn spatial
hierarchies of features from input data, which is a cornerstone of their success in tasks like image
and video recognition, image segmentation, and other computer vision tasks.

What Is a Filter?

A filter is a circuit capable of passing (or amplifying) certain frequencies while attenuating other
frequencies. Thus, a filter can extract important frequencies from signals that also contain
undesirable or irrelevant frequencies. In the field of electronics, there are many practical
applications for filters. Examples include:

Radio communications: Filters enable radio receivers to only "see" the desired signal while
rejecting all other signals (assuming that the other signals have different frequency content).

DC power supplies: Filters are used to eliminate undesired high frequencies (i.e., noise) that are
present on AC input lines. Additionally, filters are used on a power supply's output to reduce
ripple.
Audio electronics: A crossover network is a network of filters used to channel low-frequency
audio to woofers, mid-range frequencies to midrange speakers, and high-frequency sounds to
tweeters.

Analog-to-digital conversion: Filters are placed in front of an ADC input to minimize Four
Major Types of Filters

The four primary types of filters include the low-pass filter, the high-pass filter, the band-pass
filter, and the notch filter (or the band-reject or band-stop filter). Take note, however, that the
terms "low" and "high" do not refer to any absolute values of frequency, but rather, they are
relative values with respect to the cutoff frequency.

Figure 1 below gives a general idea of how each of these four filters works:

A basic depiction of the four major filter types.

Passive and Active Filters

Filters can be placed in one of two categories: passive or active.

Passive filters include only passive components—resistors, capacitors, and inductors. In

contrast, active filters use active components, such as op-amps, in addition to resistors and
capacitors, but not inductors.Passive filters are most responsive to a frequency range from
roughly 100 Hz to 300 MHz. The limitation on the lower end results from the fact that the
inductance or capacitance would have to be quite large at low frequencies. The upper-frequency
limit is due to the effect of parasitic capacitances and inductances. Careful design practices can
extend the use of passive circuits well into the gigahertz range.
Active filters are capable of dealing with very low frequencies (approaching 0 Hz), and they can
provide voltage gain (passive filters cannot). Active filters can be used to design high-order
filters without the use of inductors; this is important because inductors are problematic in the
context of integrated-circuit manufacturing techniques. However, active filters are less suitable
for very high-frequency applications because of amplifier bandwidth limitations. Radio-
frequency circuits must often utilize passive filters. Response curves are used to describe how a
filter behaves. A response curve is simply a graph showing an attenuation ratio (V OUT / VIN)
versus frequency (see Figure 2 below). Attenuation is commonly expressed in units
of decibels (dB). Frequency can be expressed in two forms: either the angular form ω (units are
rad/s) or the more common form of f (units of Hz, i.e., cycles per second). These two forms are
related by ω = 2πf. Finally, filter response curves may be plotted in linear-linear, log-linear, or
log-log form. The most common approach is to have decibels on the y-axis and logarithmic
frequency on the x-axis.

Response curves for the four major filter types.

Note: A notch filter is a band stop filter with a narrow bandwidth. Notch filters are used to
attenuate a narrow range of frequencies. Below are some technical terms that are commonly used
when describing filter response curves: 3 dB frequency (f3dB). This term, pronounced "minus
3dB frequency", corresponds to the input frequency that causes the output signal to drop by -3dB
relative to the input signal. The -3 dB frequency is also referred to as the cutoff frequency. It is
the frequency at which the output power is reduced by one-half (which is why this frequency is
also called the "half-power frequency"), or the output voltage is the input voltage multiplied by
1/√2. For low-pass and high-pass filters, there is only one -3 dB frequency. However, there are
two -3 dB frequencies for band-pass and notch filters—normally referred to as f1 and f2.
Center frequency (f0). The center frequency, a term used for band-pass and notch filters, is a
central frequency between the upper and lower cutoff frequencies. The center frequency is
commonly defined as the arithmetic mean (see equation below) or the geometric mean of the
lower and upper cutoff frequency.

Bandwidth (β or B.W.). The bandwidth is the width of the pass band, and the pass band is the
band of frequencies that do not experience significant attenuation when moving from the input of
the filter to the output of the filter.

Stop band frequency (fs). This is a particular frequency at which the attenuation reaches a
specified value.

For low-pass and high-pass filters, frequencies beyond the stop band frequency are referred to as
the stop band. For band-pass and notch filters, two stop band frequencies exist. The frequencies
between these two stop band frequencies are referred to as the stop band.

Quality factor (Q): The quality factor of a filter conveys its damping characteristics. In the time
domain, damping corresponds to the amount of oscillation in the system’s step response. In the
frequency domain, higher Q corresponds to more (positive or negative) peaking in the system’s
magnitude response. For a band pass or notch filter, Q represents the ratio between the center
frequency and the -3dB bandwidth (i.e., the distance between f 1 and f2).

For both band-pass and notch filters:

PARAMETER SHARING

Parameter sharing is a fundamental concept in Convolution Neural Networks (CNNs) that

significantly reduces the number of parameters (weights) in the network, making the network
more efficient and reducing the risk of over fitting. Here's an overview of what parameter sharing
means and how it works:

Basic Concept:

In a CNN, instead of having unique weights for every pixel in the input data, a convolution layer
uses the same filter (set of weights) across the entire input. This is what is meant by parameter
sharing. This filter is convolved across the width and height of the input image, or feature map,
applying the same weights at each position.

Efficiency in Learning:

Parameter sharing dramatically reduces the number of free parameters compared to a fully
connected layer, where each input pixel would be connected to each neuron with a unique
weight. This efficiency makes CNNs particularly suitable for high-dimensional inputs like
images.
Detection of Features Regardless of Position:

Since the same filter is applied across the entire input, the network can detect a feature regardless
of its position in the input image. For example, if a filter learns to recognize an edge in one part
of the image, it can recognize the same edge in a different part of the image. This property is
known as translation invariance.

Reduction of over fitting:

By having fewer parameters, the risk of over fitting (where the model learns the noise in the
training data instead of the actual pattern) is reduced. This makes CNNs more generalizable to
new, unseen data.

Learning Hierarchies of Features:

Despite using shared parameters, CNNs can learn hierarchies of increasingly complex features.
Lower layers might learn simple features like edges and textures, while higher layers learn more
complex features like patterns or object parts.

Impact on Back propagation:

During back propagation, the gradients from all positions where a filter was applied are summed
up, and this cumulative gradient is used to update the filter weights. This process takes into
account how the filter performed across the entire input.

Depth of the Network:

Parameter sharing is one reason why CNNs can afford to be deep (have many layers); the
number of parameters does not explode with the addition of more layers. In summary, parameter
sharing in CNNs is an efficient way to learn features from images and other high-dimensional
data. It allows the network to be both deep and computationally efficient while also being robust
to over fitting, making it ideal for tasks in computer vision and related areas.

REGULARIZATION

Regularization in machine learning and deep learning is a technique used to prevent overfitting,
where a model performs well on training data but poorly on unseen data. Overfitting often occurs
in complex models with a large number of parameters, such as deep neural networks.
Regularization techniques aim to simplify the model to make it more generalizable. Here are
some common regularization methods:

L1 and L2 Regularization (Weight Regularization):

These are the most common forms of regularization. They work by adding a penalty term to the
loss function.
L1 Regularization (Lasso): Adds the absolute value of the magnitude of the coefficients as the
penalty term to the loss function. It can lead to feature selection as some weights can become
zero.

L2 Regularization (Ridge): Adds the squared magnitude of the coefficients as the penalty term.
It generally leads to smaller and distributed weight values but doesn’t set them to zero.

Dropout:

Dropout is a widely used regularization technique in neural networks, especially deep neural
networks. During training, dropout randomly 'drops' (sets to zero) a proportion of the neurons in
a layer, forcing the network to learn redundant representations and preventing reliance on any
one feature. At test time, dropout is not applied; instead, the outputs are scaled down by the
dropout rate to account for the more active neurons.

Early Stopping:

Early stopping involves halting the training process before the model begins to overfit. This is
typically done by monitoring the model's performance on a validation set and stopping the
training when the performance on the validation set starts to degrade.

Data Augmentation:

In the context of deep learning, particularly in computer vision, data augmentation is a

regularization technique that involves slightly altering the training data, thereby increasing its
diversity. Common augmentations include rotations, translations, flipping, scaling, etc. This
helps the model generalize better to unseen data.

Batch Normalization:

While primarily used to help in faster convergence of the training process, batch normalization
can also have a regularizing effect. It normalizes the output of a previous activation layer by
subtracting the batch mean and dividing by the batch standard deviation.

Noise Injection:

Adding noise to inputs or hidden layers during training can also act as a form of regularization,
forcing the network to learn more robust features.

Ensemble Methods:

Techniques like bagging and boosting, where multiple models are trained and their predictions
are combined, can also be seen as forms of regularization as they generally lead to more robust
and generalized models. The choice of regularization technique(s) can depend on the specific
problem, the type of model being used, and the nature of the dataset. It's often beneficial to
experiment with different methods and combinations to find what works best for a particular
scenario.

POPULAR CNN ARCHITECTURE

Several Convolution Neural Network (CNN) architectures have gained popularity, especially in
the field of computer vision, due to their outstanding performance in tasks like image
classification, object detection, and more. Here's a brief overview of some of the most influential
and widely-used CNN architectures:

LeNet-5:

Developed by Yann LeCun in the late 1990s, LeNet-5 is one of the earliest CNN architectures.
Primarily used for handwritten digit recognition (e.g., MNIST dataset), it consists of convolution
layers followed by sub sampling (pooling) layers, and fully connected layers.

Alex Net:

Designed by Alex Krizhevsky and published in 2012, Alex Net significantly advanced the field
of deep learning, particularly in image classification. It features deeper layers compared to LeNet
and introduced key concepts such as ReLU activations and dropout for regularization.

VGG (VGG16 and VGG19):

Developed by the Visual Graphics Group at Oxford (hence VGG), this model was a runner-up in
the ILSVRC (Image Net Large Scale Visual Recognition Challenge) 2014. VGG is known for its
simplicity, using only 3x3 convolution layers stacked on top of each other in increasing depth,
and was one of the first to show that depth is a critical component for a good model.

Google Net (Inception):

Introduced in 2014, Google Net (or Inception v1) won the ILSVRC 2014. It introduced the
concept of the “Inception module,” which dramatically reduced the number of parameters in the
network (compared to Alex Net and VGG).

ResNet (Residual Networks):

Developed by Microsoft Research, ResNet won the ILSVRC 2015. It introduced residual blocks,
allowing training of extremely deep networks (up to 152 layers) by using skip connections to
prevent the vanishing gradient problem.

Exception:

An extension of the Inception architecture, it replaces Inception modules with depth wise
separable convolutions. Exception stands for “Extreme Inception” and was shown to outperform
Inception modules on multiple benchmarks.
Inception-v3 and v4:

These are further improvements on the Inception model, introducing more efficient and
sophisticated Inception modules.

Dense Net (Densely Connected Convolution Networks):

Similar to ResNet, Dense Net also makes use of skip connections. However, instead of summing
outputs from previous layers, Dense Net concatenates them, leading to a more densely connected
network.

Mobile Nets:

Designed for mobile and embedded vision applications, Mobile Nets use depth wise separable
convolutions to build lightweight deep neural networks.

Efficient Net:

Efficient Net, a more recent architecture, scales up CNNs in a more structured manner using a
compound coefficient to scale depth, width, and resolution uniformly. These architectures have
been influential in pushing the boundaries of what's possible in computer vision and have also
inspired many variations and improvements. They serve as both practical solutions for real-world
applications and as foundational models for further research in the field.

RESNET

ResNet, short for Residual Network, is a type of deep neural network architecture that is
designed to address the problem of vanishing gradients in very deep networks. It was introduced
by Aiming He et al. in their paper "Deep Residual Learning for Image Recognition" in 2015. The
key innovation of ResNet is the use of residual connections, which allow the network to learn
residual functions with respect to the input instead of learning the desired underlying mapping
directly. This is achieved by adding shortcut connections that skip one or more layers, allowing
the network to bypass the usual forward propagation path and directly propagate the input to
deeper layers. This helps in mitigating the vanishing gradient problem and enables the training of
very deep networks (hundreds of layers) effectively. ResNet has been widely adopted in various
computer vision tasks, especially for image classification, where it has achieved state-of-the-art
performance on benchmark datasets like Image Net. It has also been used as a backbone
architecture for other tasks such as object detection, semantic segmentation, and more, due to its
effectiveness in learning hierarchical features from visual data.
ALEXNET-APPLICATION

Alex Net is a deep convolution neural network architecture that gained significant attention after
winning the Image Net Large Scale Visual Recognition Challenge (ILSVRC) in 2012. Here are
some applications and uses of Alex Net:

Image Classification: Alex Net was originally designed for image classification tasks, where it
achieved state-of-the-art performance at the time of its introduction. It can be used to classify
images into various categories, such as identifying objects in photographs.

Object Detection: The architecture of Alex Net can also be adapted for object detection tasks,
where the goal is not only to classify the objects in an image but also to locate and outline them.
This is commonly used in applications like self-driving cars, surveillance, and augmented reality.

Feature Extraction: The convolution layers of Alex Net can be used as a feature extractor. By
removing the fully connected layers and using the output of the last convolution layer, known as
the "bottleneck features," Alex Net can be used to extract features from images. These features
can then be used as inputs to other machine learning models for various tasks.

Medical Image Analysis: Alex Net and similar convolution neural network architectures have
been applied to medical image analysis tasks, such as identifying diseases from medical images
like X-rays, MRIs, and CT scans. The ability of deep learning models to learn complex patterns
in images makes them well-suited for such tasks.

Natural Language Processing (NLP): While Alex Net is primarily designed for image-related
tasks, its underlying principles, especially the use of convolution layers, have inspired
architectures in NLP tasks such as text classification and sentiment analysis. The idea of using
deep learning for feature extraction and hierarchical representation learning has been influential
across various domains.

Overall, Alex Net's impact extends beyond image classification, influencing the development of
deep learning architectures and their applications in a wide range of fields.

Alex Net. The architecture consists of eight layers: five convolution layers and three fully-
connected layers. But this isn’t what makes Alex Net special; these are some of the features used
that are new approaches to convolution neural networks:

ReLU Nonlinearity. Alex Net uses Rectified Linear Units (ReLU) instead of the tanh function,
which was standard at the time. ReLU’s advantage is in training time; a CNN using ReLU was
able to reach a 25% error on the CIFAR-10 dataset six times faster than a CNN using tanh.

Multiple GPUs. Back in the day, GPUs were still rolling around with 3 gigabytes of memory
(nowadays those kinds of memory would be rookie numbers). This was especially bad because
the training set had 1.2 million images. Alex Net allows for multi-GPU training by putting half of
the model’s neurons on one GPU and the other half on another GPU. Not only does this mean that
a bigger model can be trained, but it also cuts down on the training time.

Overlapping Pooling. CNNs traditionally “pool” outputs of neighboring groups of neurons with
no overlapping. However, when the authors introduced overlap, they saw a reduction in error by
about 0.5% and found that models with overlapping pooling generally find it harder to overfit.

The Over fitting Problem. Alex Net had 60 million parameters, a major issue in terms of over
fitting. Two methods were employed to reduce over fitting:

Data Augmentation. The authors used label-preserving transformation to make their data more
varied. Specifically, they generated image translations and horizontal reflections, which increased
the training, set by a factor of 2048. They also performed Principle Component Analysis (PCA)
on the RGB pixel values to change the intensities of RGB channels, which reduced the top-1 error
rate by more than 1%.

Dropout. This technique consists of “turning off” neurons with a predetermined probability (e.g.
50%). This means that every iteration uses a different sample of the model’s parameters, which
forces each neuron to have more robust features that can be used with other random neurons.
However, dropout also increases the training time needed for the model’s convergence.

The Results. On the 2010 version of the Image Net competition, the best model achieved 47.1%
top-1 error and 28.2% top-5 error. Alex Net vastly outpaced this with a 37.5% top-1 error and a
17.0% top-5 error. Alex Net is able to recognize off-center objects and most of its top five classes
for each image are reasonable. Alex Net won the 2012 Image Net competition with a top-5 error
rate of 15.3%, compared to the second place top-5 error rate of 26.2%. Alex Net’s most probable
labels on eight Image Net images. The correct label is written under each image, and the
probability assigned to each label is also shown by the bars. Image credits to Krizhevsky et al.,
the original authors of the Alex Net paper.

What Now? Alex Net is an incredibly powerful model capable of achieving high accuracies on
very challenging datasets. However, removing any of the convolution layers will drastically
degrade Alex Net’s performance. Alex Net is a leading architecture for any object-detection task
and may have huge applications in the computer vision sector of artificial intelligence problems.
In the future, Alex Net may be adopted more than CNNs for image tasks. As a milestone in
making deep learning more widely-applicable, Alex Net can also be credited with bringing deep
learning to adjacent fields such as natural language processing and medical image analysis.

Scottish Fold Cat
100% (2)
Scottish Fold Cat
11 pages
Waves - Label
100% (1)
Waves - Label
2 pages
Komatsu Avance Loader WA470 3 Wheel Loader Operating Maintenance Manual
0% (1)
Komatsu Avance Loader WA470 3 Wheel Loader Operating Maintenance Manual
235 pages
Placer Gold Operations Manual
100% (1)
Placer Gold Operations Manual
178 pages
Assignment 5 - Implementing Image Classification Using Deep Learning
No ratings yet
Assignment 5 - Implementing Image Classification Using Deep Learning
8 pages
UNIT-III DeepLearning Notes
No ratings yet
UNIT-III DeepLearning Notes
30 pages
DL 4
No ratings yet
DL 4
4 pages
Deep LearningUNIT-IV
No ratings yet
Deep LearningUNIT-IV
16 pages
Project Exhibition 2
No ratings yet
Project Exhibition 2
42 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
9 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
21 pages
CNN Model Introduction and Overview
No ratings yet
CNN Model Introduction and Overview
2 pages
Unit 2 Convolutional Neural Network
No ratings yet
Unit 2 Convolutional Neural Network
16 pages
DL Unit4
No ratings yet
DL Unit4
31 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
5 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
74 pages
Unit 2 QUESTIONS and ANSWERS
No ratings yet
Unit 2 QUESTIONS and ANSWERS
26 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
19 pages
DL Unit 3 2019PAT
No ratings yet
DL Unit 3 2019PAT
66 pages
What Is Convolutional Neural Network
No ratings yet
What Is Convolutional Neural Network
16 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
6 pages
Basic Introduction To Convolutional Neural Network in Deep Learning
No ratings yet
Basic Introduction To Convolutional Neural Network in Deep Learning
9 pages
Lecture - 07 (Convolutional Neural Networks)
No ratings yet
Lecture - 07 (Convolutional Neural Networks)
57 pages
Introduction To Convolution Neural Network
No ratings yet
Introduction To Convolution Neural Network
15 pages
Presented By:: SANTHOSH.K-927622BIT087 SIVA BHARAT.B-927622BIT099 NITHISH KUMAR.S-927622BIT066 SUNTHAR SHREE-927622BIT110
No ratings yet
Presented By:: SANTHOSH.K-927622BIT087 SIVA BHARAT.B-927622BIT099 NITHISH KUMAR.S-927622BIT066 SUNTHAR SHREE-927622BIT110
16 pages
Module 05 CNN Arctitecture
No ratings yet
Module 05 CNN Arctitecture
7 pages
CV Unit V
No ratings yet
CV Unit V
18 pages
Unit Iii Deep Learning
No ratings yet
Unit Iii Deep Learning
31 pages
CNN Remake
No ratings yet
CNN Remake
1 page
CNN Notes Unit 3 Notes
No ratings yet
CNN Notes Unit 3 Notes
17 pages
Unit III
No ratings yet
Unit III
89 pages
CV PPT Mt101
No ratings yet
CV PPT Mt101
16 pages
Nria20-Dl - Unit-3 Notes-Final
No ratings yet
Nria20-Dl - Unit-3 Notes-Final
23 pages
Visual and Audio Signal Processing Lab University of Wollongong
No ratings yet
Visual and Audio Signal Processing Lab University of Wollongong
20 pages
Liu 2018 J. Phys. Conf. Ser. 1087 062032
No ratings yet
Liu 2018 J. Phys. Conf. Ser. 1087 062032
8 pages
Introduction To Convolutional Neural Networks
No ratings yet
Introduction To Convolutional Neural Networks
4 pages
Unit III
No ratings yet
Unit III
89 pages
CNN
No ratings yet
CNN
6 pages
Unit - 2
No ratings yet
Unit - 2
51 pages
Convolutional Neural Network (CNN) : Assignment On
No ratings yet
Convolutional Neural Network (CNN) : Assignment On
8 pages
DLT Unit - 4
No ratings yet
DLT Unit - 4
36 pages
Convolutional Neural Networks - 100629
No ratings yet
Convolutional Neural Networks - 100629
3 pages
Module 5
No ratings yet
Module 5
20 pages
CNN Theory
No ratings yet
CNN Theory
3 pages
DL Unit2
No ratings yet
DL Unit2
25 pages
Deep Learning Image Classification
No ratings yet
Deep Learning Image Classification
11 pages
Shravya Banala
No ratings yet
Shravya Banala
29 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
12 pages
What Is A CNN
No ratings yet
What Is A CNN
46 pages
Sommaire CNN Presentation
No ratings yet
Sommaire CNN Presentation
10 pages
CNN Notes
No ratings yet
CNN Notes
10 pages
Introduction To CNNs
No ratings yet
Introduction To CNNs
26 pages
Unit III
No ratings yet
Unit III
60 pages
Unit IV Deep Leraning
No ratings yet
Unit IV Deep Leraning
35 pages
Module5 ML
No ratings yet
Module5 ML
112 pages
Ad3501-Dl-Unit 2 Notes
No ratings yet
Ad3501-Dl-Unit 2 Notes
29 pages
Unit 5 CNN
No ratings yet
Unit 5 CNN
151 pages
What Is A Convolutional Neural Network (CNN) ?
No ratings yet
What Is A Convolutional Neural Network (CNN) ?
5 pages
7 Applications of Convolutional Neural Networks - FWS
No ratings yet
7 Applications of Convolutional Neural Networks - FWS
3 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
New
No ratings yet
New
8 pages
Unit Iv DL
No ratings yet
Unit Iv DL
26 pages
Day8 (CNN)
No ratings yet
Day8 (CNN)
35 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
Unit 5
No ratings yet
Unit 5
23 pages
Unit 2
No ratings yet
Unit 2
34 pages
Unit 1
No ratings yet
Unit 1
21 pages
Unit 4
No ratings yet
Unit 4
27 pages
Ni-Cd Battery For Aircraft Battery Design and Charging Options
No ratings yet
Ni-Cd Battery For Aircraft Battery Design and Charging Options
8 pages
Capacity Planning For Products and Services
No ratings yet
Capacity Planning For Products and Services
31 pages
Dharma Fiber Reactive Procion Dyes
No ratings yet
Dharma Fiber Reactive Procion Dyes
1 page
Instant Download Activate College Reading 1st Edition Ivan Dole PDF All Chapter
100% (2)
Instant Download Activate College Reading 1st Edition Ivan Dole PDF All Chapter
55 pages
Equitable Leasing Corporation vs. Lucita Suyom, Marissa Enano, Myrnatamayo and Felix Oledan (G.R. No. 143360, 5 September 2002, 388 Scra 445)
No ratings yet
Equitable Leasing Corporation vs. Lucita Suyom, Marissa Enano, Myrnatamayo and Felix Oledan (G.R. No. 143360, 5 September 2002, 388 Scra 445)
10 pages
Amit Yadav Project
No ratings yet
Amit Yadav Project
49 pages
Sales Management & Sales Distribution: A Project ON Mumbai Dabawalla'S
No ratings yet
Sales Management & Sales Distribution: A Project ON Mumbai Dabawalla'S
30 pages
14 Concrete Structures Cast in Situ - Colour
No ratings yet
14 Concrete Structures Cast in Situ - Colour
233 pages
IASC Template
No ratings yet
IASC Template
7 pages
Mixed Methods Research
No ratings yet
Mixed Methods Research
10 pages
Small Lab Design
No ratings yet
Small Lab Design
1 page
Pre Cal Circle
No ratings yet
Pre Cal Circle
16 pages
Gul Nawaz CV
No ratings yet
Gul Nawaz CV
2 pages
Robotics Perception Week 3 Assignment
No ratings yet
Robotics Perception Week 3 Assignment
6 pages
E-Nursery: Bachelor of Computer Application
No ratings yet
E-Nursery: Bachelor of Computer Application
9 pages
Template Sop 2 & 3-Sheryl A. Vicente
No ratings yet
Template Sop 2 & 3-Sheryl A. Vicente
8 pages
Eir December 2019
No ratings yet
Eir December 2019
1,937 pages
SpeedHeat EzeeStat II Instructions Rev 04
No ratings yet
SpeedHeat EzeeStat II Instructions Rev 04
4 pages
Installation Information Emg Models: Passive / Passive: Output
No ratings yet
Installation Information Emg Models: Passive / Passive: Output
1 page
Hollow Earth - Wikipedia
No ratings yet
Hollow Earth - Wikipedia
54 pages
Science
No ratings yet
Science
5 pages
Sample Final Paper Quantitative
No ratings yet
Sample Final Paper Quantitative
48 pages
Jyoti PPT (20erwcs025)
No ratings yet
Jyoti PPT (20erwcs025)
20 pages
Spectrum of Imaging Findings in Pulmonary Infections Part 1&2
No ratings yet
Spectrum of Imaging Findings in Pulmonary Infections Part 1&2
19 pages
Petrifilm Salmonella Express SALX Interpretation Guide - en US - FS00587
No ratings yet
Petrifilm Salmonella Express SALX Interpretation Guide - en US - FS00587
6 pages
Lecture 1 - Introduction To Islamic Architecture
No ratings yet
Lecture 1 - Introduction To Islamic Architecture
51 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 3

Uploaded by

Unit 3

Uploaded by

UNIT-III

CONVOLUTION NEURAL NETWORK

The core building block of a CNN is the convolution layer.

ReLU (Rectified Linear Unit) Layer:

Common pooling techniques include max pooling and average pooling.

Fully Connected Layers:

Back propagation and Optimization:

Image Classification: Determining the category of an image.

Object Detection: Locating objects within an image and classifying them.

Segmentation: Dividing an image into segments to identify different objects or regions.

What is a convolution neural network?

CONVOLUTION NEURAL NETWORK ARCHITECTURE

Activation Function (e.g., ReLU):

Fully Connected Layers (Dense Layers):

Soft ax Activation (for Classification):

Typical CNN Architecture

Alex Net Architecture

VGG Net Architecture

Advantages of CNN Architecture

In the context of neural networks, the layers typically involved are:

Convolution Layers: Perform feature extraction through convolution operations.

Activation Layers: Apply activation functions (like ReLU) to introduce non-linearity.

Motivation Aspect Met model

Motivation Extension Meta model

What are Filters?

Size and Shape:

How Filters Work:

Strides and Padding:

Role in Deep Learning:

A basic depiction of the four major filter types.

Passive and Active Filters

Filters can be placed in one of two categories: passive or active.

Passive filters include only passive components—resistors, capacitors, and inductors. In

Response curves for the four major filter types.

For both band-pass and notch filters:

Parameter sharing is a fundamental concept in Convolution Neural Networks (CNNs) that

Reduction of over fitting:

Learning Hierarchies of Features:

Impact on Back propagation:

Depth of the Network:

L1 and L2 Regularization (Weight Regularization):

In the context of deep learning, particularly in computer vision, data augmentation is a

POPULAR CNN ARCHITECTURE

VGG (VGG16 and VGG19):

Google Net (Inception):

ResNet (Residual Networks):

Dense Net (Densely Connected Convolution Networks):

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.