0% found this document useful (0 votes)
33 views50 pages

Week1_Lecture2

Lecture notes of CV801
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views50 pages

Week1_Lecture2

Lecture notes of CV801
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

CV801: Advanced Computer Vision

Week 1 Lecture 2
Class Participation and Peer-Review (10% Weightage)
Class-participation: 5%
• In-person Attendance: 3%.
• Full mark: In-person attendance in 18 out of 30 lectures AND 7 out of 15 labs

• Reading research papers in advance, and providing correct answers for the in-class room Quizzes-2%

Peer Review: 5%
• Participate in the discussions related to project presentations and paper presentations of other
students: 1%
• 1-page review report on Projects of other groups ( Each person write two peer-review report): 4%

2
Introduction and Overview of Computer Vision
What is Computer Vision?

• Ability of computers
• To understand visual data
• For example, images, videos…

• Automate tasks
• Which human visual system can perform
What is Computer Vision?
• To extract “meaning” from pixels. To bridge the gap between image pixels and
“meaning” (semantic)!

What we see!
What computer sees!
What do we have here?

Seems easy ……..


Wrong! Vision is Hard
• Vision is an amazing feature of natural intelligence
• Around 50% of neural tissues of human brain is directly or indirectly
related to vision, which assists in visual learning.

Hardware perspective:
Is that a Massive digital data collections
queen or a
bishop?
Why Study Computer Vision?
• Engineering point of view - Computer Vision helps to solve many
practical problems: business potential
• Scientific point of view - Human kind of visual system is one of
the grand challenges of Artificial Intelligence (AI)
AI itself is a grand challenge of computing
• Massive visual data on internet

More than 70 million photos are shared on Instagram every day (more than 50 billion photos in total)

300 million images a day (More than 350 billion photos in total)

More than 500 hours of video uploaded every minute


Why Study Computer Vision?
• Used to be done mostly in academics.
• Recent advancements:

Business potential Substantial Commercial Interest


• Google
• Meta AI/Facebook
• Apple

List of CVPR 2024 sponsors


• Amazon
• Microsoft
• OpenAI
• G42
• TII
•…
Why Study Computer Vision?
• Numerous real-world practical applications

Autonomous Driving Security Computer vision


Health
technology can
improve our lives

Biometric Access Comfort: Robot Fun: Virtual Avatar


Why Study Computer Vision?

12
Why Study Computer Vision?

• CVPR conference ranking (Engineering) as of 2024

13
Why Study Computer Vision?
• CVPR papers
2023 2024
Why Study Computer Vision?
Substantial Commercial Interest

List of CVPR 2022 sponsors


CV801 Topics vs Major topics in CVPR 2023

• Covering 8 Out of 12 top CVPR 2023 topics

• Covering ~12 topics

16
Acceptance Rate for Each Topic: CVPR 2024

17
Common Computer Vision Tasks

18
Common Computer Vision Tasks
Image Categorization/Recognition:

CAT
Common Computer Vision Tasks

Scene Recognition:
Is this an outdoor image?
21
Activity Recognition

Activity:
What is this person doing in this image?
Common Computer Vision Tasks: Detection

Detection:
Where is a car in this image?
Common Computer Vision Tasks: Detection

24
Semantic Segmentation

GRASS, CAT, TREE, SKY

25
Instance Segmentation

DOG, DOG, CAT

26
Common Computer Vision Tasks: Segmentation

Semantic Object Instance


Classification
Segmentation Detection Segmentation

CAT GRASS, CAT, TREE, DOG, DOG, CAT DOG, DOG, CAT
SKY

No spatial extent No objects, just pixels Multiple Objects


Video Instance Segmentation

28
Research Paper Presentations (10% Weightage)
Objective
• Learn to systematically introduce a research topic
• Improve teaching and presentation skills
• Involve in critical discussions about research papers
How to Select a Topic?
• Suggested topics.
• Specialized Applications of Segmentation: Eg. medical image segmentation (~3 presentations)
• Vision Foundation Models: Segment Anything Model (SAM) (~2 presentations)
• Efficient Architectures for Computer Vision Applications: State-space Models and Mamba (~4 presentations)
• Conversational LLMs and Vision-Language Models (~2 presentations)
• Image Generation using Diffusion Models (~5 presentations)
• Remote sensing, change detection (~2 presentations)
• Human-centric Vision (~2 presentations)
• All presenters on the same topic should work together to systematically introduce the concepts.

29
Specialized Applications of Segmentation: 3D Medical Image segmentation

UNETR: Transformers for 3D Medical Image Segmentation, WACV 2022

30
Remote Sensing Change Detection

Change Detection Methods for Remote Sensing in the Last Decade: A Comprehensive Review.
https://arxiv.org/pdf/2305.05813.pdf

34
Foundation Models in Vision

Foundational Models Defining a New Era in Vision: A Survey and Outlook


38
https://github.com/awaisrauf/Awesome-CV-Foundational-Models
Generalizable Localization Models
Segment Anything Model (SAM- https://arxiv.org/abs/2304.02643)
SAM for Synthetic Embryo Detection, Counting and Segmentation
(without training the model on target dataset or target category)

Embryo detection & counting Segmentation

Input Count=307
39
Large Language Models

40
Multi-Model LLMs
mbzuai.ac.ae
Multi-Model LLMs
Image Generation Using Diffusion Models
Diffusion Models in Vision: A Survey https://arxiv.org/pdf/2209.04747.pdf

“A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and
a reverse diffusion stage. In the forward diffusion stage, the input data is gradually perturbed over
several steps by adding Gaussian noise. In the reverse stage, a model is tasked at recovering the original
input data by learning to gradually reverse the diffusion process, step by step “

Forward

Reverse
Image Generation (i)

1. Diffusion Models 2. Multi Model LLM Meets Diffusion Models

Eg: For Person Image Synthesis, CVPR 2023

mbzuai.ac.ae
Image Generation (ii)

3. 3D-aware Image Generation 4. Image Generation for Healthcare Applications


ICCV 2023 MICCAI 2023

mbzuai.ac.ae
Human-centric Scene Understanding

Example: Pedestrian detection, Multi-camera person search, Crowd counting, Pose estimation, Activity
recognition

Pedestrian Detection Person Search Crowd Counting Human Pose Estimation

mbzuai.ac.ae
ARCHITECTURE DESIGN CHOICES FOR
REAL-WORLD VISION APPLICATIONS
• Development of Efficient network architectures
For image classification, object detection, segmentation
and human pose estimation in images and videos.

Vision Mamba

• Mamba for Medical Image Segmentation


mbzuai.ac.ae
Questions?
Survey Outcome
Expected Deep learning and CNN backgrounds

• Perceptron. • Regularization

• Multi-layer Perceptron • Dropout

• Backpropagation • Data Augmentation


• Stochastic gradient descent. • Batch normalization

• Cross entropy loss

• CNN layer
58
Summary
• Course Overview
• Introduction and Overview of Computer Vision
• Common Computer Vision tasks

mbzuai.ac.ae

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy