Week1_Lecture2
Week1_Lecture2
Week 1 Lecture 2
Class Participation and Peer-Review (10% Weightage)
Class-participation: 5%
• In-person Attendance: 3%.
• Full mark: In-person attendance in 18 out of 30 lectures AND 7 out of 15 labs
• Reading research papers in advance, and providing correct answers for the in-class room Quizzes-2%
Peer Review: 5%
• Participate in the discussions related to project presentations and paper presentations of other
students: 1%
• 1-page review report on Projects of other groups ( Each person write two peer-review report): 4%
2
Introduction and Overview of Computer Vision
What is Computer Vision?
• Ability of computers
• To understand visual data
• For example, images, videos…
• Automate tasks
• Which human visual system can perform
What is Computer Vision?
• To extract “meaning” from pixels. To bridge the gap between image pixels and
“meaning” (semantic)!
What we see!
What computer sees!
What do we have here?
Hardware perspective:
Is that a Massive digital data collections
queen or a
bishop?
Why Study Computer Vision?
• Engineering point of view - Computer Vision helps to solve many
practical problems: business potential
• Scientific point of view - Human kind of visual system is one of
the grand challenges of Artificial Intelligence (AI)
AI itself is a grand challenge of computing
• Massive visual data on internet
More than 70 million photos are shared on Instagram every day (more than 50 billion photos in total)
300 million images a day (More than 350 billion photos in total)
12
Why Study Computer Vision?
13
Why Study Computer Vision?
• CVPR papers
2023 2024
Why Study Computer Vision?
Substantial Commercial Interest
16
Acceptance Rate for Each Topic: CVPR 2024
17
Common Computer Vision Tasks
18
Common Computer Vision Tasks
Image Categorization/Recognition:
CAT
Common Computer Vision Tasks
Scene Recognition:
Is this an outdoor image?
21
Activity Recognition
Activity:
What is this person doing in this image?
Common Computer Vision Tasks: Detection
Detection:
Where is a car in this image?
Common Computer Vision Tasks: Detection
24
Semantic Segmentation
25
Instance Segmentation
26
Common Computer Vision Tasks: Segmentation
CAT GRASS, CAT, TREE, DOG, DOG, CAT DOG, DOG, CAT
SKY
28
Research Paper Presentations (10% Weightage)
Objective
• Learn to systematically introduce a research topic
• Improve teaching and presentation skills
• Involve in critical discussions about research papers
How to Select a Topic?
• Suggested topics.
• Specialized Applications of Segmentation: Eg. medical image segmentation (~3 presentations)
• Vision Foundation Models: Segment Anything Model (SAM) (~2 presentations)
• Efficient Architectures for Computer Vision Applications: State-space Models and Mamba (~4 presentations)
• Conversational LLMs and Vision-Language Models (~2 presentations)
• Image Generation using Diffusion Models (~5 presentations)
• Remote sensing, change detection (~2 presentations)
• Human-centric Vision (~2 presentations)
• All presenters on the same topic should work together to systematically introduce the concepts.
29
Specialized Applications of Segmentation: 3D Medical Image segmentation
30
Remote Sensing Change Detection
Change Detection Methods for Remote Sensing in the Last Decade: A Comprehensive Review.
https://arxiv.org/pdf/2305.05813.pdf
34
Foundation Models in Vision
Input Count=307
39
Large Language Models
40
Multi-Model LLMs
mbzuai.ac.ae
Multi-Model LLMs
Image Generation Using Diffusion Models
Diffusion Models in Vision: A Survey https://arxiv.org/pdf/2209.04747.pdf
“A diffusion model is a deep generative model that is based on two stages, a forward diffusion stage and
a reverse diffusion stage. In the forward diffusion stage, the input data is gradually perturbed over
several steps by adding Gaussian noise. In the reverse stage, a model is tasked at recovering the original
input data by learning to gradually reverse the diffusion process, step by step “
Forward
Reverse
Image Generation (i)
mbzuai.ac.ae
Image Generation (ii)
mbzuai.ac.ae
Human-centric Scene Understanding
Example: Pedestrian detection, Multi-camera person search, Crowd counting, Pose estimation, Activity
recognition
mbzuai.ac.ae
ARCHITECTURE DESIGN CHOICES FOR
REAL-WORLD VISION APPLICATIONS
• Development of Efficient network architectures
For image classification, object detection, segmentation
and human pose estimation in images and videos.
Vision Mamba
• Perceptron. • Regularization
• CNN layer
58
Summary
• Course Overview
• Introduction and Overview of Computer Vision
• Common Computer Vision tasks
mbzuai.ac.ae