The document discusses the applications of deep learning in computer vision, highlighting key areas such as image classification, object detection, and image segmentation. It details various techniques for image segmentation, including thresholding, region growing, and deep learning-based methods, and outlines their real-world applications in fields like autonomous vehicles and medical imaging. Additionally, it covers advanced topics like automatic image captioning, generative adversarial networks, and attention models, emphasizing their significance in enhancing computer vision tasks.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
8 views30 pages
DL U-III Computer Vision
The document discusses the applications of deep learning in computer vision, highlighting key areas such as image classification, object detection, and image segmentation. It details various techniques for image segmentation, including thresholding, region growing, and deep learning-based methods, and outlines their real-world applications in fields like autonomous vehicles and medical imaging. Additionally, it covers advanced topics like automatic image captioning, generative adversarial networks, and attention models, emphasizing their significance in enhancing computer vision tasks.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30
UNIT-III
APPLICATIONS OF DEEP LEARNING
TO COMPUTER VISION Computer Vision • Computer vision is a type of AI which enables computers to interpret and analyze the visual world, simulating the way humans see and understand their environment. Deep Learning has been used in the following computer vision problems: 1.Image Classification 2.Image Classification With Localization 3.Object Detection 4.Object Segmentation 5.Image Style Transfer 6.Image Colorization 7.Image Reconstruction 8.Image Super-Resolution 9.Image Synthesis 10.Other Problems Image segmentation • One of the most important operations in Computer Vision is Segmentation. • Image segmentation is the process of dividing an image into multiple parts or regions that belong to the same class. This task of clustering is based on specific criteria, for example, color or texture. • This process is also called pixel-level classification. In other words, it involves partitioning images (or video frames) into multiple segments or objects. Image Segmentation The Deep Learning Approach to Image Segmentation
• In the last 40 years, various segmentation methods have been
proposed, ranging from MATLAB image segmentation and traditional computer vision methods to the state of the art deep learning methods. Especially with the emergence of Deep Neural Networks (DNN), image segmentation applications have made tremendous progress • For image segmentation, deep learning is a great technique. Deep learning algorithms automatically extract features from data, which may be used to segment it. Deep learning models can learn complex characteristics that are difficult to specify manually. • Convolutional neural networks (CNNs), fully connected networks (FCNs), and recurrent neural networks are among the deep learning designs that may be utilized for picture segmentation (RNNs). Each architecture has its own set of benefits and drawbacks. semantic image segmentation with driving cars – Source: Sample from the Mapillary Vistas Dataset Image Segmentation Techniques
• There are various image segmentation techniques available, and each
technique has its own advantages and disadvantages. • Thresholding: Thresholding is one of the simplest image segmentation techniques, where a threshold value is set, and all pixels with intensity values above or below the threshold are assigned to separate regions. • Region growing: In region growing, the image is divided into several regions based on similarity criteria. This segmentation technique starts from a seed point and grows the region by adding neighboring pixels with similar characteristics. • Edge-based segmentation: Edge-based segmentation techniques are based on detecting edges in the image. These edges represent boundaries between different regions and are detected using edge detection algorithms. • Clustering: Clustering techniques group pixels into clusters based on similarity criteria. These criteria can be color, intensity, texture, or any other feature. • Watershed segmentation: Watershed segmentation is based on the idea of flooding an image from its minima. In this technique, the image is treated as a topographic relief, where the intensity values represent the height of the terrain. • Active contours: Active contours, also known as snakes, are curves that deform to find the boundary of an object in an image. These curves are controlled by an energy function that minimizes the distance between the curve and the object boundary. • Deep learning-based segmentation: Deep learning techniques, such as Convolutional Neural Networks (CNNs), have revolutionized image segmentation by providing highly accurate and efficient solutions. • Graph-based segmentation: This technique represents an image as a graph and partitions the image based on graph theory principles. • Superpixel-based segmentation: This technique groups a set of similar image pixels together to form larger, more meaningful regions, called superpixels Applications of Image Segmentation
• Image segmentation problems play a central role in a broad range of real-
world computer vision applications, including road sign detection, biology, the evaluation of construction materials, or video security and surveillance. • Also, autonomous vehicles and Advanced Driver Assistance Systems (ADAS) need to detect navigable surfaces or apply pedestrian detection. • Furthermore, image segmentation is widely applied in medical imaging applications, such as tumor boundary extraction or measurement of tissue volumes. Here, an opportunity is to design standardized image databases that can be used to evaluate fast-spreading new diseases and pandemics (for example, for AI vision applications of coronavirus control). • Deep Learning based Image Segmentation has been successfully applied to segment satellite images in the field of remote sensing, including techniques for urban planning or precision agriculture. Also, images collected by drones (UAVs) have been segmented using Deep Learning based techniques, offering the opportunity to address important environmental problems related to climate change. Object detection • Object detection in computer vision refers to the process of locating and classifying objects within images or video frames. It involves identifying and delineating the boundaries of objects in a given scene and associating them with specific object classes or labels. Object detection goes beyond simple image classification by providing information about the spatial location of each detected object. • Key components of object detection include: 1.Localization: Determining the precise location (bounding box) of each object in the image or frame. 2.Classification: Assigning a label or category to each detected object, indicating the type or class of the object. • Object detection is widely used in various applications, such as autonomous vehicles, surveillance, medical imaging, robotics, and more. Object Detection Applications 1.Autonomous Vehicles 2.Surveillance and Security 3.Medical Imaging 4.Retail (Inventory Management, Checkout) 5.Industrial Automation (Quality Control) 6.Augmented Reality 7.Robotics 8.Sports Analytics 9.Environmental Monitoring 10.Retail Analytics 11.Augmented Traffic Management 12.Human-Computer Interaction Automatic image captioning • Image Caption Generator or Photo Descriptions is one of the Applications of Deep Learning. In Which we have to pass the image to the model and the model does some processing and generating captions or descriptions as per its training. This prediction is sometimes not that much accurate and generates some meaningless sentences. We need very high computational power and a very huge dataset for better results. • Automatic image captioning is a critical research problem with numerous complexities, attracting a significant amount of work with extensive applications across various domains such as human- computer interaction , medical image captioning and prescription, traffic data analysis, quality control in industry , and especially assistive technologies for visually impaired individuals. • Given an input image I, the goal is to generate a caption C describing the visual contents present inside the given image, with C being a set of sentences C = {c1, c2, ..., cn} where each ci is a sentence of the generated caption C Image generation with Generative adversarial networks • A generative adversarial network is a subclass of machine learning frameworks in which when we give a training set, this technique learns to generate new data with the same statistics as the training set with the help of algorithmic architectures that uses two neural networks to generate new, synthetic instances of data that is very much similar to the real data. • GANs are usually trained to generate images from random noises and a GAN has usually two parts in which it works namely the Generator that generates new samples of images and the second is a Discriminator that classifies images as real or fake • Generator: A generator is a model that is used to generate new reasonable data examples from the problem statement and • Discriminator: A discriminator model is a model that classifies the given examples as real (from the domain) or fake (generated). Applications Image-to-Image 6. Virtual Try-On: Translation: Allowing users to virtually try on clothes, Generating images that transform accessories, or other items before making from one domain to another, such a purchase. as turning satellite images into 7. Deepfake Generation: maps or black-and-white photos Creating realistic-looking fake videos or into color. images by replacing faces in existing content. Style Transfer: 8. Image Inpainting: Creating images in the style of a Filling in missing or damaged parts of an particular artist or applying the image with realistic content. visual style of one image to 9. Drug Discovery and Molecular another. Design: Face Aging and De-aging: Generating molecular structures for new drug Simulating the aging or de-aging of candidates or designing novel molecules. faces in photographs. 10. Image Synthesis for Anomaly Super-Resolution: Detection: Enhancing the resolution and Generating normal images to train models for quality of images, making them detecting anomalies or outliers in datasets. sharper and more detailed. Data Augmentation: Generating additional training data Video to text with LSTM models • LSTM stands for Long-Short Term Memory. LSTM is a type of recurrent neural network but is better than traditional recurrent neural networks in terms of memory. • Having a good hold over memorizing certain patterns LSTMs perform fairly better. As with every other NN, LSTM can have multiple hidden layers and as it passes through every layer, the relevant information is kept and all the irrelevant information gets discarded in every single cell. • LSTM model is trained on video-sentence pairs and learns to associate a sequence of video frames to a sequence of words in order to generate a description of the event in the video clip. • A stacked LSTM first encodes the frames one by one, taking as input the output of a Convolutional Neural Network (CNN) applied Applications to each input frame’s intensity values. 1.Automatic Video Captioning • Once all frames are read, the model 2.Video Summarization generates a sentence word by word. 3.Content Indexing and Retrieval • The encoding and decoding of the frame and 4.Surveillance and Security word representations are learned jointly 5.Educational Videos from a parallel corpus. 6.Media Production • To model the temporal aspects of activities 7.Human-Computer Interaction typically shown in videos, we also compute 8.Video Search Engines the optical flow between pairs of 9.Assistive Technologies consecutive frames. The flow images are also 10.Event Recognition passed through a CNN and provided as input to the LSTM. Attention Models for computer vision tasks • Attention mechanisms enhance deep learning models by selectively focusing on important input elements, improving prediction accuracy and computational efficiency. They prioritize and emphasize relevant information, acting as a spotlight to enhance overall model performance. • In psychology, attention is the cognitive process of selectively concentrating on one or a few things while ignoring others. • The attention mechanism emerged as an improvement over the encoder decoder-based neural machine translation system in natural language processing (NLP). Later, this mechanism, or its variants, was used in other applications, including computer vision, speech processing, etc. What Is An Attention Model?
• An attention model, also known as an attention mechanism, is an
input processing technique of neural networks. This mechanism helps neural networks solve complicated tasks by dividing them into smaller areas of attention and processing them sequentially. • Just as the human brain solves a complex task by dividing it into simpler tasks and focusing on them one by one, the attention mechanism makes it possible for neural networks to handle intuitive and challenging tasks like translation and generating subtitles. • The neural network focuses on specific aspects of a complex input until it categorizes the entire dataset. Types of Attention Model • There are several types of attention mechanisms, each with its own characteristics and applications: • Global (Soft) Attention: The model considers all parts of the input data when computing the attention weights, leading to a fully differentiable mechanism. • Local (Hard) Attention: The model focuses on a subset of the input data, which is often determined by a learned alignment model. This approach is less computationally expensive but introduces non-differentiable operations. • Self-Attention: Also known as intra-attention, this mechanism allows different positions of a single sequence to attend to each other. It is a key component of transformer models. • Multi-Head Attention: This approach extends self-attention by allowing the model to focus on different parts of the input data from different representation subspaces, providing a richer understanding of the data. Example for Self Attention Mechanism • The red words are read or processed at the current instant, and the blue words are the memories. The different shades represent the degree of memory activation. • When we are reading or processing the sentence word by word, where previously seen words are also emphasized on, is inferred from the shades, and this is exactly what self- Attention in a machine reader does.