0% found this document useful (0 votes)
8 views30 pages

DL U-III Computer Vision

The document discusses the applications of deep learning in computer vision, highlighting key areas such as image classification, object detection, and image segmentation. It details various techniques for image segmentation, including thresholding, region growing, and deep learning-based methods, and outlines their real-world applications in fields like autonomous vehicles and medical imaging. Additionally, it covers advanced topics like automatic image captioning, generative adversarial networks, and attention models, emphasizing their significance in enhancing computer vision tasks.

Uploaded by

sanashashikanth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views30 pages

DL U-III Computer Vision

The document discusses the applications of deep learning in computer vision, highlighting key areas such as image classification, object detection, and image segmentation. It details various techniques for image segmentation, including thresholding, region growing, and deep learning-based methods, and outlines their real-world applications in fields like autonomous vehicles and medical imaging. Additionally, it covers advanced topics like automatic image captioning, generative adversarial networks, and attention models, emphasizing their significance in enhancing computer vision tasks.

Uploaded by

sanashashikanth
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT-III

APPLICATIONS OF DEEP LEARNING


TO
COMPUTER VISION
Computer Vision
• Computer vision is a type of AI which enables computers to interpret
and analyze the visual world, simulating the way humans see and
understand their environment.
Deep Learning has been used in the
following computer vision problems:
1.Image Classification
2.Image Classification With Localization
3.Object Detection
4.Object Segmentation
5.Image Style Transfer
6.Image Colorization
7.Image Reconstruction
8.Image Super-Resolution
9.Image Synthesis
10.Other Problems
Image segmentation
• One of the most important operations in Computer
Vision is Segmentation.
• Image segmentation is the process of dividing an
image into multiple parts or regions that belong to the
same class. This task of clustering is based on specific
criteria, for example, color or texture.
• This process is also called pixel-level classification. In
other words, it involves partitioning images (or video
frames) into multiple segments or objects.
Image Segmentation
The Deep Learning Approach to Image
Segmentation

• In the last 40 years, various segmentation methods have been


proposed, ranging from MATLAB image segmentation and traditional
computer vision methods to the state of the art deep learning methods.
Especially with the emergence of Deep Neural Networks (DNN), image
segmentation applications have made tremendous progress
• For image segmentation, deep learning is a great technique. Deep
learning algorithms automatically extract features from data, which
may be used to segment it. Deep learning models can learn complex
characteristics that are difficult to specify manually.
• Convolutional neural networks (CNNs), fully connected networks (FCNs),
and recurrent neural networks are among the deep learning designs
that may be utilized for picture segmentation (RNNs). Each architecture
has its own set of benefits and drawbacks.
semantic image segmentation with driving cars – Source: Sample from the Mapillary
Vistas Dataset
Image Segmentation Techniques

• There are various image segmentation techniques available, and each


technique has its own advantages and disadvantages.
• Thresholding: Thresholding is one of the simplest image segmentation
techniques, where a threshold value is set, and all pixels with intensity
values above or below the threshold are assigned to separate regions.
• Region growing: In region growing, the image is divided into several
regions based on similarity criteria. This segmentation technique starts
from a seed point and grows the region by adding neighboring pixels with
similar characteristics.
• Edge-based segmentation: Edge-based segmentation techniques are
based on detecting edges in the image. These edges represent boundaries
between different regions and are detected using edge detection
algorithms.
• Clustering: Clustering techniques group pixels into clusters based on
similarity criteria. These criteria can be color, intensity, texture, or any
other feature.
• Watershed segmentation: Watershed segmentation is based on the
idea of flooding an image from its minima. In this technique, the
image is treated as a topographic relief, where the intensity values
represent the height of the terrain.
• Active contours: Active contours, also known as snakes, are curves
that deform to find the boundary of an object in an image. These
curves are controlled by an energy function that minimizes the
distance between the curve and the object boundary.
• Deep learning-based segmentation: Deep learning techniques, such
as Convolutional Neural Networks (CNNs), have revolutionized image
segmentation by providing highly accurate and efficient solutions.
• Graph-based segmentation: This technique represents an image as a
graph and partitions the image based on graph theory principles.
• Superpixel-based segmentation: This technique groups a set of
similar image pixels together to form larger, more meaningful regions,
called superpixels
Applications of Image Segmentation

• Image segmentation problems play a central role in a broad range of real-


world computer vision applications, including road sign detection, biology,
the evaluation of construction materials, or video security and surveillance.
• Also, autonomous vehicles and Advanced Driver Assistance Systems (ADAS)
need to detect navigable surfaces or apply pedestrian detection.
• Furthermore, image segmentation is widely applied in medical imaging
applications, such as tumor boundary extraction or measurement of tissue
volumes. Here, an opportunity is to design standardized image databases
that can be used to evaluate fast-spreading new diseases and pandemics
(for example, for AI vision applications of coronavirus control).
• Deep Learning based Image
Segmentation has been
successfully applied to segment
satellite images in the field of
remote sensing, including
techniques for urban planning or
precision agriculture. Also, images
collected by drones (UAVs) have
been segmented using Deep
Learning based techniques,
offering the opportunity to
address important environmental
problems related to climate
change.
Object detection
• Object detection in computer vision refers to the process of locating
and classifying objects within images or video frames. It involves
identifying and delineating the boundaries of objects in a given scene
and associating them with specific object classes or labels. Object
detection goes beyond simple image classification by providing
information about the spatial location of each detected object.
• Key components of object detection include:
1.Localization: Determining the precise location (bounding box) of
each object in the image or frame.
2.Classification: Assigning a label or category to each detected object,
indicating the type or class of the object.
• Object detection is widely used in various applications, such as
autonomous vehicles, surveillance, medical imaging, robotics, and
more.
Object Detection Applications
1.Autonomous Vehicles
2.Surveillance and Security
3.Medical Imaging
4.Retail (Inventory Management, Checkout)
5.Industrial Automation (Quality Control)
6.Augmented Reality
7.Robotics
8.Sports Analytics
9.Environmental Monitoring
10.Retail Analytics
11.Augmented Traffic Management
12.Human-Computer Interaction
Automatic
image
captioning
• Image Caption Generator or
Photo Descriptions is one of
the Applications of Deep
Learning. In Which we have to
pass the image to the model
and the model does some
processing and generating
captions or descriptions as per
its training. This prediction is
sometimes not that much
accurate and generates some
meaningless sentences. We
need very high computational
power and a very huge
dataset for better results.
• Automatic image captioning is a critical research problem with
numerous complexities, attracting a significant amount of work with
extensive applications across various domains such as human-
computer interaction , medical image captioning and prescription,
traffic data analysis, quality control in industry , and especially
assistive technologies for visually impaired individuals.
• Given an input image I, the goal is to generate a caption C describing
the visual contents present inside the given image, with C being a set
of sentences C = {c1, c2, ..., cn} where each ci is a sentence of the
generated caption C
Image generation with
Generative adversarial
networks
• A generative adversarial network is a subclass of
machine learning frameworks in which when we give a
training set, this technique learns to generate new data
with the same statistics as the training set with the help
of algorithmic architectures that uses two neural
networks to generate new, synthetic instances of data
that is very much similar to the real data.
• GANs are usually trained to generate images from random noises and
a GAN has usually two parts in which it works namely the Generator
that generates new samples of images and the second is a
Discriminator that classifies images as real or fake
• Generator: A generator is a model that is used to generate
new reasonable data examples from the problem statement
and
• Discriminator: A discriminator model is a model that
classifies the given examples as real (from the domain) or
fake (generated).
Applications
Image-to-Image 6. Virtual Try-On:
Translation: Allowing users to virtually try on clothes,
Generating images that transform accessories, or other items before making
from one domain to another, such a purchase.
as turning satellite images into 7. Deepfake Generation:
maps or black-and-white photos Creating realistic-looking fake videos or
into color. images by replacing faces in existing content.
Style Transfer: 8. Image Inpainting:
Creating images in the style of a Filling in missing or damaged parts of an
particular artist or applying the image with realistic content.
visual style of one image to 9. Drug Discovery and Molecular
another.
Design:
Face Aging and De-aging: Generating molecular structures for new drug
Simulating the aging or de-aging of candidates or designing novel molecules.
faces in photographs. 10. Image Synthesis for Anomaly
Super-Resolution: Detection:
Enhancing the resolution and Generating normal images to train models for
quality of images, making them detecting anomalies or outliers in datasets.
sharper and more detailed.
Data Augmentation:
Generating additional training data
Video to text with LSTM models
• LSTM stands for Long-Short Term Memory. LSTM is a type of
recurrent neural network but is better than traditional
recurrent neural networks in terms of memory.
• Having a good hold over memorizing certain patterns LSTMs
perform fairly better. As with every other NN, LSTM can have
multiple hidden layers and as it passes through every layer,
the relevant information is kept and all the irrelevant
information gets discarded in every single cell.
• LSTM model is trained on video-sentence pairs and learns to
associate a sequence of video frames to a sequence of words
in order to generate a description of the event in the video
clip.
• A stacked LSTM first encodes the frames one
by one, taking as input the output of a
Convolutional Neural Network (CNN) applied
Applications
to each input frame’s intensity values. 1.Automatic Video Captioning
• Once all frames are read, the model 2.Video Summarization
generates a sentence word by word. 3.Content Indexing and Retrieval
• The encoding and decoding of the frame and 4.Surveillance and Security
word representations are learned jointly 5.Educational Videos
from a parallel corpus. 6.Media Production
• To model the temporal aspects of activities 7.Human-Computer Interaction
typically shown in videos, we also compute 8.Video Search Engines
the optical flow between pairs of 9.Assistive Technologies
consecutive frames. The flow images are also
10.Event Recognition
passed through a CNN and provided as input
to the LSTM.
Attention Models for computer
vision tasks
• Attention mechanisms enhance deep learning models by
selectively focusing on important input elements, improving
prediction accuracy and computational efficiency. They prioritize
and emphasize relevant information, acting as a spotlight to
enhance overall model performance.
• In psychology, attention is the cognitive process of selectively
concentrating on one or a few things while ignoring others.
• The attention mechanism emerged as an improvement over the
encoder decoder-based neural machine translation
system in natural language processing (NLP). Later, this
mechanism, or its variants, was used in other applications,
including computer vision, speech processing, etc.
What Is An Attention Model?

• An attention model, also known as an attention mechanism, is an


input processing technique of neural networks. This mechanism
helps neural networks solve complicated tasks by dividing them
into smaller areas of attention and processing them sequentially.
• Just as the human brain solves a complex task by dividing it into
simpler tasks and focusing on them one by one, the attention
mechanism makes it possible for neural networks to handle
intuitive and challenging tasks like translation and generating
subtitles.
• The neural network focuses on specific aspects of a complex
input until it categorizes the entire dataset.
Types of Attention Model
• There are several types of attention mechanisms, each with its own
characteristics and applications:
• Global (Soft) Attention: The model considers all parts of the input data
when computing the attention weights, leading to a fully differentiable
mechanism.
• Local (Hard) Attention: The model focuses on a subset of the input data,
which is often determined by a learned alignment model. This approach is
less computationally expensive but introduces non-differentiable operations.
• Self-Attention: Also known as intra-attention, this mechanism allows
different positions of a single sequence to attend to each other. It is a key
component of transformer models.
• Multi-Head Attention: This approach extends self-attention by allowing
the model to focus on different parts of the input data from different
representation subspaces, providing a richer understanding of the data.
Example for Self Attention
Mechanism
• The red words are read or
processed at the current
instant, and the blue words are
the memories. The different
shades represent the degree of
memory activation.
• When we are reading or
processing the sentence word
by word, where previously seen
words are also emphasized on,
is inferred from the shades, and
this is exactly what self-
Attention in a machine reader
does.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy