0% found this document useful (0 votes)
7 views43 pages

CV4 F

The document discusses various techniques for feature detection, description, and matching in images, emphasizing the importance of keypoints for tasks such as camera pose estimation and image alignment. It outlines popular keypoint detectors, feature descriptors, and matching strategies, including SIFT and ORB, while also addressing the challenges of invariance to transformations. Additionally, it touches on machine learning approaches for feature detection and the significance of large-scale matching and segmentation methods.

Uploaded by

Vlad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views43 pages

CV4 F

The document discusses various techniques for feature detection, description, and matching in images, emphasizing the importance of keypoints for tasks such as camera pose estimation and image alignment. It outlines popular keypoint detectors, feature descriptors, and matching strategies, including SIFT and ORB, while also addressing the challenges of invariance to transformations. Additionally, it touches on machine learning approaches for feature detection and the significance of large-scale matching and segmentation methods.

Uploaded by

Vlad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

3

What kinds of feature might one use to


establish a set of correspondences between these images?
5

Points and patches


• Required as a pre-cursor to

computing camera pose computing sparse point clouds from


multiple views (SfM, Visual SLAM)

aligning different images object instance and category recognition


stitching image mosaics
performing video stabilization
6

Keypoint detection, description and matching

𝒅 𝒌 ,𝒅 𝒌 ,⋯𝒅 𝒌 𝒅 𝒌′ , 𝒅 𝒌′ , ⋯ 𝒅 𝒌′𝒎

image
Matching the features of keypoints
detected in different views
If 𝒅 𝒌 − 𝒅 𝒌 𝒕𝒉
Detecting ⋯ matched!
keypoints
Describing the feature of each keypoint
𝒅 𝒌 ,𝒅 𝒌 ,⋯𝒅 𝒌
7

Feature detectors (keypoint detection)

Notice how some patches can be


localized or matched with higher accuracy than others.
8

Keypoints need to be easy for matching


• the simplest possible matching criterion for comparing two image
patches, i.e., their summed square difference (SSD):

⊂ 𝐼 𝒙 +𝒖

𝐼 𝒙

• Stability metric with respect to small variations in position Δu,


an auto-correlation (AC) function or surface


Examples of 9

Three auto-correlation surfaces EAC(Δu)

Good
feature
to track!
𝛥𝑢 𝛥𝑢 𝛥𝑢

𝛥𝑢 𝛥𝑢 𝛥𝑢

𝛥𝑢 𝛥𝑢 𝛥𝑢
𝛥𝑢 𝛥𝑢
𝛥𝑢
10

Approximation of EAC(Δu)
𝜕𝐼 𝒙 𝜕𝐼 𝒙
𝐸 Δ𝒖 = 𝐼 𝒙 + 𝚫𝒖 − 𝐼 𝒙 ≈ 𝐼 𝒙 + , ⋅ 𝚫𝒖 − 𝐼 𝒙
𝜕𝑥 𝜕𝑦
⊂ ⊂
𝜕𝐼 𝒙 𝜕𝐼 𝒙
= ( , ) ⋅ 𝚫𝒖 = 𝐼 𝛥𝑢 + 𝐼 𝛥𝑢
𝜕𝑥 𝜕𝑦
⊂ ⊂

= (𝐼 𝛥𝑢 + 2𝐼 𝐼 𝛥𝑢 𝛥𝑢 + 𝐼 𝛥𝑢 ) = 𝛥𝑢 𝐼 + 2𝛥𝑢 𝛥𝑢 𝐼 𝐼 + 𝛥𝑢 𝐼
⊂ ⊂ ⊂ ⊂

𝐼 𝐼 𝐼
⊂ ⊂ 𝛥𝑢
= 𝛥𝑢 𝛥𝑢
𝛥𝑢 = 𝚫𝒖𝑻 𝑨𝚫𝒖
𝐼 𝐼 𝐼
⊂ ⊂
𝒙 𝒙
autocorrelation matrix A ,
11
Uncertainty ellipse
corresponding to an eigenvalue analysis of
the autocorrelation matrix A

⊂ ⊂

⊂ ⊂
12

Good features to track [Shi and Tomasi, CVPR94]

⊂ ⊂

⊂ ⊂
Key Points (Feature Points) :
A patch defined by a window (i.e. 25x25) is accepted as a
candidate feature if in the center of the window both eigenvalues
of , and , exceed a predefined threshold .
13

Outline of a basic feature detection algorithm


1. Compute the horizontal and vertical derivatives 𝐼 (𝒙)and 𝐼 𝒙 by 𝐼 (𝒙)
convolving the input image 𝐼(𝒙) with derivatives of Gaussians, or
Sobel filter, etc.
2. Compute the three images, 𝐼 (𝒙), 𝐼 (𝒙)𝐼 (𝒙), and 𝐼 (𝒙).
3. Convolve each of these three images with a larger Gaussian. 𝐼 (𝒙)𝐼 (𝒙)

4. Consider 𝑨 for each pixel from these three images.

𝐼 𝒙 𝐼 𝒙 𝐼 𝒙
𝑨= ⊂ ⊂ 𝐼 (𝒙)
𝐼 𝒙 𝐼 𝒙 𝐼 𝒙
⊂ ⊂

𝜆 (𝒙)
5. Compute eigenvalues 𝜆 and 𝜆 (𝜆 < 𝜆 ) for scalar interest
measure.
6. Find local maxima above a certain threshold and report them as
detected feature point locations.
keypoints
𝒌 ,𝒌 ,⋯𝒌
14

Popular Keypoint detectors


• Shi and Tomasi (1994)
– minimum eigenvalue λ0

• Harris and Stephens (1988)

• Triggs (2004)

• Brown, Szeliski, and Winder (2005)


– harmonic mean
15

Isocontours of popular keypoint detection functions


uneven distribution of feature 16

points across the image

Adaptive
Non-Maximal
Suppression

Top 250 Top 500


only detect features that are both local maxima and whose response value is significantly (10%) greater
than that of all of its neighbors within a radius r
Feature descriptors
We know how to detect good points
Next question: How to match them?

?
Lots of possibilities (this is a popular research area)
– Simple option: match square windows around the point
– SIFT, ORB, AKAZE, etc…
18

Feature descriptors
• After detecting features (keypoints), we must match them.
– we must determine which features come from corresponding locations in
different images.
• A simple error metrics: the sum of squared differences (SSD) or
normalized cross-correlation (NCC)
– But it is working for almost translational, and same size
• The local appearance of features will change in :
– Orientation
– Scale
– Affine / Perspective deformations

Invariance
Suppose we are comparing two images I1 and I2 (I2 may be a transformed version of I1)

Same features regardless of the transformation: transformational invariance


– Translation, 2D rotation, scale
– Limited 3D rotations (SIFT works up to about 60 degrees)
– Limited affine transformations (some are fully affine invariant)
– Limited illumination/contrast changes

Keypoint
20

Scale invariant feature transform (SIFT)


16 pixel

16
pixel

Aligned to dominant orientation


(a) image gradients (b) keypoint descriptor
Image gradients : (𝐼 , 𝐼 ) A weighted gradient orientation histogram is then
Orientations: tan (𝐼 /𝐼 ) computed in each subregion, using trilinear
interpolation.
Magnitudes: (𝐼 + 𝐼 )
are computed at each pixel and weighted 8bins x 16 subregions
by a Gaussian fall-off function (blue circle).

𝒙 𝒙
,
21
Scale-space feature detection using a sub-octave Difference of Gaussian pyramid
(Keypoint detector proposed in SIFT)
22

Feature matching
Establish some preliminary feature matches between two (or more images.

ORB: an efficient alternative to SIFT or SURF [Ethan Rublee et al., ICCV 2011]
ORB (Oriented FAST and Rotated BRIEF)
https://docs.opencv.org/3.4/d1/d89/tutorial_py_orb.html

Open CV feature point detection and description


https://docs.opencv.org/3.4/db/d27/tutorial_py_table_of_contents_feature2d.html
23

Matching strategy and error rates


The assumption for the feature descriptors:
Euclidean distances in feature space can be used for ranking potential matches.

The simplest matching strategy:


to set a threshold and to return all matches from other images within this threshold

𝑑 𝑑 𝑑 𝑑

Nearest Neighbor Distances Ratio NNDR=


AKAZE(Accelerated KAZE) 24

[Alcantarilla et al. BMCV2013]


http://www.robesafe.com/personal/pablo.alcantarilla/kaze.html
25

Machine learning for feature detectors and descriptors


• FAST and FASTER (Rosten, Porter, and Drummond 2010), one of the first learned detectors
• Learning covariant feature detectors (Lenc and Vedaldi 2016)
• Learning to assign orientations to feature points (Yi, Verdie et al. 2016)

Jointly optimizing the detectors and descriptors in a single (multi-head) pipeline


• LIFT, learned invariant feature transforms (Yi, Trulls et al. 2016)
• SuperPoint, self-supervised interest point detection and description (DeTone, Malisiewicz, and
Rabinovich 2018) https://github.com/magicleap/SuperPointPretrainedNetwork
• LF-Net, learning local features from images (Ono, Trulls et al. 2018)

• Key.Net (Barroso-Laguna, Riba et al. 2019), which uses a combination of handcrafted and
learned CNN features to produce a method that produces state-of-the-art results on the
HPatches benchmark (Balntas, Lenc et al. 2017)
• D2D, Describe-to-Detect, (Tian, Balntas et al. 2020), which extracts dense local feature
descriptors and then keeps the ones that have high saliency
26

Large-scale matching and retrieval


• A large (huge) number of objects in the database vs a query image
– techniques are needed to quickly narrow down the search to a few likely images
• Content-based image retrieval (CBIR) or Instance retrieval
– the problem of finding a particular object in a large collection

• Similar as document retrieval


– the frequency of occurrence of particular words in a document for quick match with a query.
• Visual Words
the high-dimensional feature descriptors that occur in each image must first be mapped into
discrete visual words.
27
Typical pipeline for feature-based instance retrieval
28

Lines
29

Hough Transform
• Vote possible line parameters for each detected edge point

original

oriented
30
LSD: a Line Segment Detector
https://doi.org/10.5201/ipol.2012.gjmr-lsd
31

Contour Tracking
Locating boundary curves corresponding to object boundaries

1. Snakes [Kass, Witkin, and Terzopoulos 1988]


an energy-minimizing, two-dimensional spline curve that
evolves (moves) towards image features such as strong edges.

2. Intelligent Scissors [Mortensen and Barrett 1995]


allow the user to sketch in real time a curve that clings to
object boundaries.
32

Snakes (Kass, Witkin, and Terzopoulos 1988)


f(s)=(x(s),y(s))

f(i)=(x(i),y(i))
33

Intelligent scissors
the system optimize the contour in real time as the user is drawing
34

Segmentation
image segmentation is one of the oldest and most widely studied problems
experimental comparisons 35

on human-labeled databases

• Berkeley Segmentation Dataset and Benchmark


[Martin, Fowlkes, Tal et al. 2001]
http://www.eecs.berkeley.edu/Research/Projects/CS/vision/groupi
ng/segbench/
Many of the more recent image segmentation algorithms report
comparative results on this database.

• A database of foreground and background segmentations


[by Alpert, Galun, Basri et al. 2007]
http://www.wisdom.weizmann.ac.il/~vision/Seg_Evaluation_DB/in
dex.html
36

Mean shift and mode finding


•Model the feature vectors associated with each pixel (e.g., color and position)
as samples from an unknown probability density function
•Try to find clusters (modes) in this distribution
37

How can we find these clusters?


• The k-means and mixtures of Gaussians techniques
– a parametric model of the density function
– the density is the superposition of a small number of simpler distributions
– the locations (centers) and shape (covariance) can be estimated.

• Mean shift
– implicitly models the density function
using a smooth continuous non-parametric model
– smoothes the distribution
– finds its peaks
– the regions of feature space that correspond to each peak

http://comaniciu.net/Papers/MsAnalysis.pdf
38

K-means
• implicitly models the probability density as a superposition of spherically symmetric
distributions
• given the number of clusters k: it is supposed to find
• iteratively updates the cluster center location based on the samples that are closest
to each center
• initialized by randomly sampling k centers from the input feature vectors
39

Mixture of Gaussians
• each cluster center is augmented by a covariance matrix whose values are re-
estimated from the corresponding samples
• a Mahalanobis distance is used to associate input samples with cluster centers

xi: Input samples, k: cluster centers, k: covariance estimates


40

Mean shift
https://docs.opencv.org/4.x/d7/d00/tutorial_meanshift.html
The key to mean shift:
• a technique for efficiently finding peaks in this high-dimensional data
distribution without ever computing the complete function explicitly

how to estimate the density function given a sparse set of samples?


>>> the simplest approaches is to just smooth the data

for higher dimensions, computationally prohibitive to evaluate f(x)


41

multiple restart gradient descent


• Starting at some guess for a local maximum, yk (random input of xi)
• Computes the gradient of the density estimate f’(x) at yk
• Takes an uphill step in that direction

where
clustering in the joint domain 42

of color and location


• Spatial Domain
the spatial coordinates of the image xs = (x, y)
• Range Domain
the color values xr = (r, g, b) or = (L*,a*,b*)

five-dimensional space xj

(hs, hr, M)
= (16, 19, 40)

where spatial regions


containing less than
M pixels are
eliminated.
43

Superpixel segmentation
SLIC(Simple Linear Iterative Clustering)
http://www.sanko-shoko.net/note.php?id=mpfg
44

Visual comparison of superpixels produced by various methods. The


average superpixel size in the upper left of each image is 100 pixels,
and 300 in the lower right. Alternating rows show each segmented image
followed by a detail of the center of each image.
R.Achanta, A.Shaji, K.Smith, A.Lucchi, P.Fua, and S.Susstrunk, "SLIC superpixels compared to state-of-the-art
superpixel methods", IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI), 2011.
45

Summary
• Keypoint detection, description, and matching
– Derivatives of image intensity is used for detecting features
– Feature descriptors should be invariant to camera pose, illumination, …etc.
– Keypoint matching is one of the basic technologies
• 3D reconstruction, image retrieval, AR/MR, etc…

• Line features are also useful especially for artificial targets.

• Learning based approach is also getting more popular.

• Grouping pixels by bottom-up segmentation techniques

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy