0% found this document useful (0 votes)
30 views5 pages

Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are a type of deep learning architecture primarily used in computer vision, designed to understand and interpret visual data. They consist of three main layers: convolutional layers for feature extraction, pooling layers for dimensionality reduction, and fully connected layers for classification. CNNs excel in recognizing patterns in images by progressively identifying simple to complex features through their layered structure.

Uploaded by

jerih61447
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views5 pages

Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are a type of deep learning architecture primarily used in computer vision, designed to understand and interpret visual data. They consist of three main layers: convolutional layers for feature extraction, pooling layers for dimensionality reduction, and fully connected layers for classification. CNNs excel in recognizing patterns in images by progressively identifying simple to complex features through their layered structure.

Uploaded by

jerih61447
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Convolutional Neural Networks

A Convolutional Neural Network (CNN) is a type of Deep Learning neural network


architecture commonly used in Computer Vision. Computer vision is a field of Artificial
Intelligence that enables a computer to understand and interpret the image or visual data.
When it comes to Machine Learning, Artificial Neural Networks perform really well. Neural
Networks are used in various datasets like images, audio, and text. Different types of Neural
Networks are used for different purposes, for example for predicting the sequence of words we
use Recurrent Neural Networks more precisely an LSTM, similarly for image classification we
use Convolution Neural networks. In this blog, we are going to build a basic building block for
CNN.
In a regular Neural Network there are three types of layers:
 Input Layers: It’s the layer in which we give input to our model. The number of neurons
in this layer is equal to the total number of features in our data (number of pixels in the
case of an image).
 Hidden Layer: The input from the Input layer is then fed into the hidden layer. There can
be many hidden layers depending on our model and data size. Each hidden layer can have
different numbers of neurons which are generally greater than the number of features.
The output from each layer is computed by matrix multiplication of the output of the
previous layer with learnable weights of that layer and then by the addition of learnable
biases followed by activation function which makes the network nonlinear.
 Output Layer: The output from the hidden layer is then fed into a logistic function like
sigmoid or softmax which converts the output of each class into the probability score of
each class.

The architecture of a ConvNet is analogous to that of the connectivity pattern of Neurons in the
Human Brain and was inspired by the organization of the Visual Cortex. Individual neurons
respond to stimuli only in a restricted region of the visual field known as the Receptive Field. A
collection of such fields overlap to cover the entire visual area.
A ConvNet is able to successfully capture the Spatial and Temporal dependencies in an image
through the application of relevant filters. The architecture performs a better fitting to the image
dataset due to the reduction in the number of parameters involved and the reusability of weights.
In other words, the network can be trained to understand the sophistication of the image better.
The objective of the Convolution Operation is to extract the high-level features such as edges,
from the input image. ConvNets need not be limited to only one Convolutional Layer.
Conventionally, the first ConvLayer is responsible for capturing the Low-Level features such as
edges, color, gradient orientation, etc. With added layers, the architecture adapts to the High-
Level features as well, giving us a network that has a wholesome understanding of images in the
dataset, similar to how we would.

How do convolutional neural networks work?

Convolutional neural networks are distinguished from other neural networks by their superior
performance with image, speech, or audio signal inputs. They have three main types of layers,
which are:

 Convolutional layer
 Pooling layer
 Fully-connected (FC) layer

The convolutional layer is the first layer of a convolutional network. While convolutional layers
can be followed by additional convolutional layers or pooling layers, the fully-connected layer is
the final layer. With each layer, the CNN increases in its complexity, identifying greater
portions of the image. Earlier layers focus on simple features, such as colors and edges. As the
image data progresses through the layers of the CNN, it starts to recognize larger elements or
shapes of the object until it finally identifies the intended object.

1. Convolutional layer

The convolutional layer is the core building block of a CNN, and it is where the majority of
computation occurs. It requires a few components, which are input data, a filter, and a feature
map. Let’s assume that the input will be a color image, which is made up of a matrix of pixels in
3D. This means that the input will have three dimensions—a height, width, and depth—which
correspond to RGB in an image. We also have a feature detector, also known as a kernel or a
filter, which will move across the receptive fields of the image, checking if the feature is
present. This process is known as a convolution.

The majority of computations happen in the convolutional layer, which is the core building
block of a CNN. A second convolutional layer can follow the initial convolutional layer. The
process of convolution involves a kernel or filter inside this layer moving across the receptive
fields of the image, checking if a feature is present in the image.

Convolution is the first layer to extract features from an input image. Convolution preserves the
relationship between pixels by learning image features using small squares of input data. It is a
mathematical operation that takes two inputs such as image matrix and a filter or kernel.
2. Pooling layer

Like the convolutional layer, the pooling layer also sweeps a kernel or filter across the input
image. But unlike the convolutional layer, the pooling layer reduces the number of parameters in
the input and also results in some information loss. On the positive side, this layer reduces
complexity and improves the efficiency of the CNN.

The Pooling layer is responsible for reducing the spatial size of the Convolved Feature. This is
to decrease the computational power required to process the data by reducing the
dimensions. There are two types of pooling average pooling and max pooling. I’ve only had
experience with Max Pooling so far I haven’t faced any difficulties.
Fully Connected Layer

The layer we call as FC layer, we flattened our matrix into vector and feed it into a fully
connected layer like a neural network.

In the above diagram, the feature map matrix will be converted as vector (x1, x2, x3, …). With
the fully connected layers, we combined these features together to create a model. Finally, we
have an activation function such as softmax or sigmoid to classify the outputs as cat, dog, car,
truck etc.,
Summary

 Provide input image into convolution layer


 Choose parameters, apply filters with strides, padding if requires. Perform
convolution on the image and apply ReLU activation to the matrix.
 Perform pooling to reduce dimensionality size
 Add as many convolutional layers until satisfied
 Flatten the output and feed into a fully connected layer (FC Layer)
 Output the class using an activation function (Logistic Regression with cost
functions) and classifies images.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy