Unit4 CV
Unit4 CV
Spatial Dimensions
3. Homogeneous Coordinates and 3D Projective Geometry
3D projective geometry and homogeneous coordinates offer a structure for representing and
handling 3D points, lines, and planes.
Homogeneous coordinates represent points in space using an additional coordinate to allow
geometric transformations like rotation, translation, and scaling through matrix operations.
On the other hand, 3D projective geometry deals with the mathematical representation and
manipulation of 3D objects along with their projections onto 2D image planes.
3D Projective
Geometry
4. Camera Models and Calibration Techniques for 3D Models
The appropriate selection of camera models and their calibration techniques play a crucial role
in 3D CV to precisely reconstruct 3D models from 2D images.
The use of high-definition camera models improves the geometric relationship between 3D
points in the real world and their corresponding 2D projections on the image plane.
Meanwhile, accurate camera calibration helps estimate the camera’s intrinsic parameters, such
as focal length and principal point, as well as extrinsic parameters, including position and
orientation.
These parameters are crucial for correcting distortions, aligning images, and triangulating 3D
points from multiple views to ensure accurate reconstruction of 3D models.
5. Stereo Vision
Stereo vision is a method in 3D CV that utilizes two or more 3D machine vision cameras to
capture images of the same scene from slightly different angles.
This technique works by finding matching points in both images and then calculating their 3D
locations using the known camera geometry.
Stereo vision algorithms analyze the disparity or the difference in the positions of
corresponding points to estimate the depth of points in the scene. This depth data allows the
accurate reconstruction of industry 3D models, which can be useful for tasks like robotic
navigation, augmented reality, and 3D mapping.
3D
Image Reconstruction Using Shape from Texture Technique
Depth from Defocus
Depth from defocus is a process that calculates the depth or three-dimensional structure of a
scene by examining the degree of blur or defocus present in areas of an image.
It works on the principle that objects situated at distances, from the camera lens will exhibit
varying levels of defocus blur. By comparing these blur levels throughout the image, DfD can
generate depth maps or three-dimensional models representing the scene.
Time of
Flight (ToF) Sensor Technique
LiDAR
LiDAR (Light Detection and Ranging) is a remote sensing 3D vision technique that uses laser
light to measure object distances.
It emits laser pulses towards objects and measures the time it takes for the reflected light to
return.
This data generates precise 3D representations of the surroundings. LiDAR systems create
high-resolution 3D maps that are useful for applications like autonomous vehicles, surveying,
archaeology, and atmospheric studies.
2D vs 3C CNNs
which arises naturally, since for a valid depth map z(x, y) with (p, q) = (zx, zy), we have py = zxy = zyx =
qx.
3.Triangulation:
The problem of determining a point’s 3D position from a set of corresponding image locations and known
camera positions is known as triangulation.
4.Bundle adjustment
Bundle adjustment is a crucial optimization technique in computer vision and photogrammetry. It refines
the 3D coordinates of scene geometry, camera positions, and camera parameters to minimize errors in 3D
reconstruction of images.
The name “bundle adjustment” originates from the concept of adjusting the “bundles” of light rays
that travel from 3D points in the scene to the camera’s optical center.
In essence, these bundles represent the paths of light rays captured by the camera from various points
in the scene.
During the bundle adjustment process, the goal is to optimize these light ray bundles to minimize the
reprojection error, which is the difference between the observed image points and the projected points
from the 3D model. By refining the 3D coordinates of the scene, camera positions, and other
parameters, the process ensures a more accurate and reliable 3D reconstruction.
1. Introduction
In this tutorial, we’ll talk about bundle adjustments and their role in photogrammetry and computer vision.
2. The Role in Computer Vision and Photogrammetry
Bundle adjustment (BA) enhances the accuracy and reliability of 3D scene reconstructions from multiple
images and camera views:
We use it to correct errors from the initial 3D reconstruction process, including inaccuracies in
camera pose, scene structure, or feature tracking.
To do so, this technique optimizes the parameters of 3D reconstructions and refines the camera
calibration parameters.
2.1. Applications
The workflow of Bundle Adjustment (BA) consists of a series of steps designed to refine the
parameters of a 3D reconstruction model from several 2D images:
3.1. Data Collection and Initial Reconstruction
We start with a collection of 2D images capturing the scene or object of interest from different
viewpoints, ideally with overlapping features or key points recognizable in multiple images.
To initialize the 3D scene structure and camera parameters, we do an initial structure-from-
motion (SfM) estimation. SfM algorithms analyze the image correspondences and camera
positions to provide an initial guess of the scene’s 3D structure and camera poses.
However, this initial reconstruction often has errors. This is where bundle adjustment comes
into play.
3.2. Reprojection Errors and Objective Function Formulation
First, we compute the reprojection errors for each image, measuring the disparity between
observed 2D features and their current 3D structure-based projected positions.
That way, we asses our 3D model’s accuracy.
We also define an objective function to optimize in further steps. Usually, it’s the sum of
squared (reprojection) errors across all images. It quantifies the overall error in the current
reconstruction.
3.3. Nonlinear Optimization and Simultaneous Parameter Updates
Nonlinear optimization techniques like Levenberg-Marquardt or Gauss-Newton are used
iteratively to minimize the objective function, refining the 3D scene structure and camera
parameters.
We apply simultaneous updates to the 3D structure and camera parameters throughout each
optimization iteration, ensuring that changes in one reconstruction aspect don’t adversely
affect others, thus promoting global consistency.
3.4. Refined Reconstruction and Post-processing
There are many tools and software libraries for bundle adjustment:
OpenCV provides various functions and classes for camera calibration and bundle adjustment,
encompassing sparse and dense bundle adjustment. functions
COLMAP, an open-source software designed for 3D reconstruction from images, incorporates
bundle adjustment as an integral part of its pipeline, supporting both sparse and dense reconstruction.
Bundler, a user-friendly tool for structure-from-motion (SfM), estimates camera parameters and 3D
structure from 2D images, making it a suitable choice for basic bundle adjustment tasks.
g2o, an open-source C++ library specializing in optimizing graph-based nonlinear least-squares
problems, enjoys widespread adoption in computer vision and robotics for bundle adjustment
purposes.
which applies since circles at various depths would give a square law, although the progressive
eccentricity also reduces the area linearly in proportion to the depth.
This information is accumulated in a separate image space and a line is then fitted to these data: false
alarms are eliminated automatically by this Hough-based procedure.
At this stage the original data—the ellipse areas—provide direct information on depth, although some
averaging is required to obtain accurate results.
Although this type of method has been demonstrated in certain instances, it is in practice highly
restricted unless very considerable amounts of computation are performed.
Hence it is doubtful whether it can be of general practical use in machine vision applications.
6.Translational alignment
Translational alignment in computer vision involves aligning images by shifting them horizontally
and vertically without any rotation or scaling. This process is crucial for several applications,
including:
1. Image Stitching: Combining multiple images to create a seamless panoramic view.
2. Video Stabilization: Reducing the effects of camera shake in video sequences.
3. Motion Tracking: Following the movement of objects across frames in a video.
A commonly used method for achieving translational alignment is the Lucas-Kanade
algorithm.
This algorithm works by estimating the displacement of image patches and minimizing the
difference between these patches in consecutive frames. f
It’s widely used in motion-compensated video compression schemes like MPEG and H.263.
The simplest way to establish an alignment between two images or image patches is to shift one
image relative to the other.
Given a template image I0(x) sampled at discrete pixel locations {xi = (xi , yi)}, we wish to find
where it is located in image I1(x).
A least squares solution to this problem is to find the minimum of the sum of squared differences
(SSD) function.
where u = (u, v) is the displacement and ei = I1(xi + u) − I0(xi) is called the residual error.
The assumption that corresponding pixel values remain the same in the two images is often called the
brightness constancy constraint.
Hierarchical motion estimation:
Hierarchical motion estimation is a technique used in computer vision and video processing to
estimate motion between frames of a video sequence.
Pyramid Structure: The process involves creating a pyramid of images at multiple resolutions. The
motion is first estimated at the coarsest level (lowest resolution) and then refined at progressively
finer levels (higher resolutions).
The motion estimate from one level of the pyramid is then used to initialize a smaller local search at
the next finer level.
Coarse-to-Fine Strategy: By starting with a low-resolution image, large motions can be captured
more easily. As the resolution increases, finer details and smaller motions are refined, leading to a
more accurate overall motion estimation .
Applications: This technique is widely used in various applications such as video compression,
object tracking, and 3D reconstruction. It helps in reducing the computational load while maintaining
high accuracy
Fourier-based alignment:
Fourier-based alignment is a technique used in image processing to align images by leveraging the
properties of the Fourier transform.
The Fourier-based alignment algorithm consists of the following steps:
1. For two color channels C1 and C2, compute corresponding Fourier transforms FT1 and FT2.
2. Compute the conjugate of FT2 (denoted as FT2*), and compute the product of FT1 and FT2*.
3. Take the inverse Fourier transform of this product and find the location of the maximum value in the
output image. Use the displacement of the maximum value to obtain the offset of C2 from C1.
Incremental refinement:
Incremental refinement in translational alignment is a technique used to improve the accuracy of aligning
images or sequences by iteratively adjusting the alignment parameters. This method is particularly useful
in fields like computer vision and bioinformatics.
In Computer Vision
In computer vision, incremental refinement is often applied in tasks such as image registration and video
stabilization. The process involves:
1. Initial Alignment: Starting with a rough alignment of the images.
2. Error Metric Calculation: Using metrics like Sum of Squared Differences (SSD) to measure the
alignment error.
The Sum of Squared Differences (SSD) is a common metric used to measure the similarity between two
images or signals. It is particularly useful in image processing tasks like image registration, template
matching, and motion estimation.
How SSD Works
1. Pixel-by-Pixel Comparison: SSD involves comparing corresponding pixels of two images.
2. Difference Calculation: For each pixel, the difference between the pixel values of the two images is
calculated.
3. Squaring the Differences: Each difference is squared to ensure all values are positive and to
emphasize larger differences.
4. Summing Up: The squared differences are summed up to produce a single value, which represents
the SSD.
3. Iterative Adjustment: Gradually refining the alignment by minimizing the error metric through
iterative updates1.