0% found this document useful (0 votes)
37 views6 pages

Computer Vision AIML Handout v1.0

The document outlines the course structure for 'Computer Vision' at the Birla Institute of Technology & Science, Pilani, detailing objectives, content, textbooks, and evaluation methods. It covers various topics including low-level and mid-level vision, object segmentation, image classification, and deep learning applications. The evaluation scheme includes quizzes, assignments, a mid-semester test, and a comprehensive exam, with specific guidelines for each component.

Uploaded by

psychosaniyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views6 pages

Computer Vision AIML Handout v1.0

The document outlines the course structure for 'Computer Vision' at the Birla Institute of Technology & Science, Pilani, detailing objectives, content, textbooks, and evaluation methods. It covers various topics including low-level and mid-level vision, object segmentation, image classification, and deep learning applications. The evaluation scheme includes quizzes, assignments, a mid-semester test, and a comprehensive exam, with specific guidelines for each component.

Uploaded by

psychosaniyan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI

WORK INTEGRATED LEARNING PROGRAMMES


Digital
Part A: Content Design
Course Title Computer Vision
Course No(s) AIML* ZG525 Computer Vision
Credit Units 4
Content Authors Ms. Seetha Parameswaran
Version 1.0
Date June 26th 2023

Course Objectives
No Course Objective
CO1 Students should understand the fundamentals of a camera producing an image,
including camera calibration, optical distortions, perspective corrections etc.
CO2 Students should be familiar with various building block algorithms in Computer
Vision, including Image processing and Deep Learning with emphasis on the
algorithm building blocks.
CO3 Students should create at least one end-user application.

Text Book(s)
T1 Szeliski, R., 2022. Computer vision: algorithms and applications. Springer Nature.
T2 Image Processing, Analysis, and Machine Vision: Milan Sonka, Vaclav Hlavac,
Roger Boyle, Fourth edition, Cengage Learning

Reference Book(s) & other resources


R1 Forsyth, D. A., & Ponce, J. (2002). Computer vision: a modern approach. Second
Edition. Prentice hall
R2 Practical Machine Learning for Computer Vision: End-to-End Machine Learning for
Images, O’Rielly, 2021
Content Structure

1 Computer Vision ( 4 hrs)


1.1 What is Computer Vision? (T1 Ch 1.1)
1.2 Why Computer Vision is hard? (T2 Ch 1.2)
1.3 Applications of Computer Vision (T1 Ch 1.1)
1.4 Image representation and image analysis tasks (T2 Ch 1.3)
1.5 Image digitization - Sampling and resolution (T2 Ch 2.2)
1.6 Digital Images (T2 Ch 2.3)
1.7 Digital Image types -Binary, Gray-scale and Color (Class Notes)
1.8 Color Images (T2 Ch 2.4)
1.9 Color spaces: RGB and HSV (T2 Ch 2.4)

2 Low-level Vision ( 3 hrs)


2.1 Histogram and Histogram equalization (T1 Ch 3.1.4)
2.2 Gray-scale transformation (T2 Ch 5.1.2)
2.3 Image Smoothing (T2 Ch 5.3.1)
2.4 Connected components in images (T1 Ch 3.3.4)
2.5 Use case: Sharpening, blur, and noise removal using Filtering (T1 Ch 3.4.4)

3 Mid-level Vision ( 4 hrs)


3.1 Edge Detection using Gradients, Sobel, Canny (T1 Ch4.2, T2 Ch 5.3.2, 5.3.5)
3.2 Line detection using Hough transforms (T1 Ch 4.3, T2 Ch 5.3.10)
3.3 Semantic information using RANSAC (T1 Ch 4.3, T2 Ch 10.3)
3.4 Image region descriptor using SIFT (T2 Ch 10.2)
3.5 Use case: Pedestrian detection Using HoG and SIFT descriptors and SVM
(T1 - Ch 14.2)

4 Object Segmentation ( 4 hrs)


4.1 Types of Segmentation: Semantic vs Instance (Class Notes)
4.2 Segmentation using Agglomerative clustering, Kmeans (R1 Ch 9.3)
4.3 Mean-shift clustering (T2 Ch 7.1)
4.4 Vision Transformer (Class Notes)
4.5 Popular DNN Architectures for Segmentation - Detectron family, SOLO,
CondInst, Segment Anything Model (SAM), InternImage (Class Notes)
4.6 Metrics for Object Segmentation (R1 Ch 9.5)
4.6.1 mean IoU
4.6.2 Pixel Accuracy,
4.6.3 Boundary Error: ABPE, BDE
4.7 Use cases for Object Segmentation - Crop classification from satellite
imagery

5 Image Classification using Deep Learning ( 3 hrs)


5.1 Pattern recognition methods in image understanding (T2 Ch 10.6), R1 Ch
15.3)
5.2 Popular DNN Architectures: MobileNet, XceptionNet (Class Notes)
5.3 Metrics for Image Classification (R1 Ch 15.1)
5.3.1 Model Accuracy Metrics
5.3.1.1 Accuracy, Confusion Matrix, TPR, FPR, FNR, Top-K accuracy
5.3.1.2 Precision, Recall, F1 Score
5.3.1.3 AUC-ROC, AUC-PR
5.3.1.4 Intersection-Over-Union (IoU)
5.3.2 Model Performance Metrics
5.3.2.1 FLOPs
5.3.2.2 Memory Footprint for @ specific precision
5.3.2.3 Inference Time on a specific hardware
5.3.3 Metrics for Image Classification.
5.3.3.1 Cross Entroy (Log Loss), Brier Score
5.3.3.2 Macro-Precision, Macro-Recall, Macro-F1
5.4 Example Use cases of Image Classification (R1 – Ch 15, 16)
5.4.1 Automated sorting of fruits based on size, shape, color
5.4.2 Apparel type classification from image
5.5 Classifying Images Of Single Objects (R1 Ch16.2)

6 Object detection and Recognition ( 3 hrs)


6.1 Object detection (T2 Ch 9.2, R1 Ch 17.1)
6.2 Mean-shift clustering (T2 Ch 9.2)
6.3 Using YOLO (Class Notes)
6.4 Metrics (class Notes)
6.4.1 Average-Precision (AP)
6.4.2 Mean-Average-Precision (mAP)
6.5 Multi label object detection and recognition (Class Notes)
6.5.1 Object Localization → Multilabel Classification
6.5.2 Difference between Multiclass vs Multilabel Classification
6.5.3 Popular Models: YOLO, SSD, Faster-RCNN
6.6 Use case: Skin detection Ref: T1 – 14.1 and R1 - Ch17, 18

7 Object tracking ( 4 hrs)


7.1 Motion detection (R1 Ch 11)
7.2 Tracking by Detection (R1 Ch 11.1)
7.3 Tracking with the Mean Shift Algorithm (R1 Ch 11.2)
7.4 Kalman Filters (R1 Ch 11.3)
7.5 DNN architectures: DeepSORT, SiamFC, GSDT, SMILEtrack, SPARSEtrack
7.6 Use case: Pedestrian tracking (add ref)

8 Visual Bag of Words and Semantic Hierarchy ( 4 hrs)


8.1 Knowledge representation (T2 Ch 9.1)
8.2 Syntactic pattern recognition (T2 Ch 9.4)
8.3 Scene labeling (T2 Ch 10.9)
8.4 Semantic image segmentation and understanding (T2 Ch 10.10)
8.5 Summarizing Images with Visual Words (R1 Ch 16.1.3)
8.6 Application: Patch Classification in image of Breast Tumors Detection

9 Edge devices for computer vision ( 2 hrs)


9.1 ESP32 Cam module, Raspberry PI, Banana Pi etc
9.2 Intel
9.2.1 Core and Atom Processors
9.2.2 NUC
9.2.3 Movidias VPUs
9.2.4 OneAPI and OpenVino Libraries
9.3 Nvidia
9.3.1 Jetson Platform - Nano, TX2, Orin
9.3.2 DeepStream Library, CUDA, CUDNN
9.4 Others
9.4.1 Google Coral

Optional Modules to be taken in Experiential Learning / Webinars / Tutorials /


Assignments

1 Face detection and Recognition


1.1 Boosting - Viola Jones algorithm (T2 Ch 10.7)
1.2 DNN architecture: MTCNN, FastFace, RetinaFace
1.3 Active appearance models (T2 Ch 10.5)
1.4 Metrics (class Notes)
1.4.1 IoU based metrics for Face Detection
1.4.2 True Acceptance Rate (TAR), False AR, False Rejection Rate, TAR
@ specific FAR, Top-K Identification Rate
1.5 Use case: Attendance system on face image (add ref)

2 Optical Character Recognition


2.1 Main challenges in OCR
2.2 Popular Approaches for OCR:
2.2.1 Edge and Contours based layout detection
2.2.2 LayoutLM, DiT
2.2.3 LSTM, YOLO
2.2.4 Tesseract, EasyOCR
2.3 Metrics for OCR
2.3.1 Accuracy: character, word, sentence level
2.3.2 String edit distance
2.3.3 mAP for text localization
2.4 Example Use cases of OCR
2.4.1 Vehicle Number Plate recognition
2.4.2 Invoice Parsing
Detailed Plan for Lab work

Module
Lab No. Lab Objective
Reference

Reading images
1 Displaying images 1
Color space conversion

Histogram equalization
2 Gray-scale transformation 2
Filtering applications like sharpening, blur, noise removal, smoothing

Edge detection using Sobel and Canny


Line detection using Hough Transform
3 RANSAC for semantic information 3
SIFT image descriptor
Predestrian detection using HoG and SIFT

Image segmentation using Kmeans


Mean-shift clustering for segmentation
4 4
Vision transformer for segmentation
Crop classification using satellite images

Fruit sorting using transfer learning


5 Apparel type classification using transfer learning 5
Comparison on metrics for evaluation (demo)

Mean shift clustering for object detection


6 Object detection using Yolo and Faster RCNN 6
Skin detection

Mean shift algorithm for object tracking


7 Kalman filtering for object tracking 7
Pedestrian tracking

8 Patch classification in images 8


Evaluation Scheme:
Legend: EC = Evaluation Component; AN = After Noon Session; FN = Fore Noon Session

No Name Type Duration Weight Day, Date, Session, Time

EC-1(a) Quizzes Online 10%

EC-1(b) Assignments Take Home 20%

EC-2 Mid-Semester Test Closed Book 30%

EC-3 Comprehensive Exam Open Book 40%

Note:
Syllabus for Mid-Semester Test (Closed Book): Topics in Session Nos. 1 to 8
Syllabus for Comprehensive Exam (Open Book): All topics (Session Nos. 1 to 16)

Important links and information:

Elearn portal: https://elearn.bits-pilani.ac.in or Canvas


Students are expected to visit the Elearn portal on a regular basis and stay up to date with
the latest announcements and deadlines.
Contact sessions: Students should attend the online lectures as per the schedule
provided on the Elearn portal.

Evaluation Guidelines:
1 EC-1 consists of two Quizzes. Students will attempt them through the course pages
on the Elearn portal. Announcements will be made on the portal, in a timely
manner.
2 EC-2 consists of either one or two Assignments. Students will attempt them
through the course pages on the Elearn portal. Announcements will be made on the
portal, in a timely manner.
3 For Closed Book tests: No books or reference material of any kind will be
permitted.
4 For Open Book exams: Use of books and any printed / written reference material
(filed or bound) is permitted. However, loose sheets of paper will not be allowed.
Use of calculators is permitted in all exams. Laptops/Mobiles of any kind are not
allowed. Exchange of any material is not allowed.
5 If a student is unable to appear for the Regular Test/Exam due to genuine
exigencies, the student should follow the procedure to apply for the Make-Up
Test/Exam which will be made available on the Elearn portal. The Make-Up
Test/Exam will be conducted only at selected exam centres on the dates to be
announced later.

It shall be the responsibility of the individual student to be regular in maintaining the self-
study schedule as given in the course hand-out, attend the online lectures, and take all the
prescribed evaluation components such as Assignment/Quiz, Mid-Semester Test and
Comprehensive Exam according to the evaluation scheme provided in the hand-out.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy