Module-2_Computer Vision Complete
Module-2_Computer Vision Complete
●
In linear filtering each output pixel is estimated as a weighted sum of neighborhood input pixels.
●
Linear filters are easier to compose and are compliant to frequency response analysis.
●
In many cases, however, better performance can be obtained by using a non-linear
combination of neighboring pixels.
Median Filter: A median filter is a technique that removes noise from images by replacing each pixel
with the median value of its neighboring pixels.
●
Median values can be computed in expected linear time using a randomized select algorithm.
●
Median filter is best suitable for removing the shot noise (salt and pepper) from the input image.
●
Since the shot noise value usually lies well outside the true values in the neighborhood, the median
filter is able to filter away such bad pixels.
●
This turns out to be equivalent to minimizing the weighted objective function.
●
where g(i, j) is the desired output value and p = 1 for the weighted median.
●
Useful in image smoothing with edge-preserving.
3. Bilateral filtering
●
A bilateral filter is a non-linear, edge-preserving, and noise-reducing smoothing filter for images.
●
It replaces the intensity of each pixel with a weighted average of intensity values from nearby pixels.
●
Mathematically given by,
4. Guided image filtering
●
Guided image filtering uses context from another image, known as a guidance image, to
influence the output of image filtering.
●
Like other filtering operations, guided image filtering is a neighborhood operation.
●
However, guided image filtering takes into account the statistics of a region in the
corresponding spatial neighborhood in the guidance image when calculating the value of the output
pixel.
●
The guidance image can be the image itself, a different version of the image, or a completely
different image.
Binary Image Processing
● Non-linear filters are often used to enhance grayscale and color images, and they are also used
extensively to process binary images.
●
Such images often occur after a thresholding operation,
● Converting a scanned grayscale document into a binary image for further processing, such as
Optical character recognition and Biometric applications.
Morphological Operations
●
Morphological operations are image processing techniques that change the shape
and structure of objects in an image.
●
They are based on mathematical morphology, which studies the properties of shapes and patterns.
●
To perform such an operation, we first apply convolution on the binary image with a binary
structuring element.
●
Then select a binary output value depending on the thresholded result of the convolution.
●
The structuring element can be any shape, from a simple3 × 3 box filter, to more complicated disc
structures.
●
Let S be the size of the structuring element (number of pixels)
●
The standard operations used in include:
1) Dilation: dilate(f, s) = θ(c, 1);
2) Erosion: erode(f, s) = θ(c, S);
3) Majority: maj(f, s) = θ(c, S/2);
4) Opening: open(f, s) = dilate(erode(f, s), s);
5) Closing: close(f, s) = erode(dilate(f, s), s).
1) Erosion: erode(f, s) = θ(c, S);
●
The basic idea of erosion is just like soil erosion only, it erodes away the boundaries of foreground
object.
●
The kernel slides through the image (as in 2D convolution).
● A pixel in the original image (either 1 or 0) will be considered 1 only if all the pixels
under the kernel is 1, otherwise it is eroded (made to zero).
2) Dilation: dilate(f, s) = θ(c, 1);
●
It is just opposite of erosion, and applies padding to the foreground object.
●
Here, a pixel element is '1' if at least one pixel under the kernel is '1'.
●
So it increases the white region in the image or size of foreground object increases.
●
Normally, in cases like noise removal, erosion is followed by dilation.
●
Because, erosion removes white noises, but it also shrinks our object.
●
So we dilate it. Since noise is gone, they won't come back, but our object area increases. It is
●
Opening is just another name of Erosion followed by Dilation.
●
It is useful in removing the noise from the images.
●
Closing is reverse of Opening, Dilation followed by Erosion.
●
It is useful in closing smallholes inside the foreground objects, or small black points on the object.
Before After
Distance transforms
●
The distance transform provides a metric or measure of the separation of points in the image.
● The distance transform is useful in quickly computing the distance between a point and a set of
points or a curve using a two-pass raster algorithm.
● It has many applications, including level sets, binary image alignment, feathering in image stitching
and blending, and nearest point alignment.
●
The distance transform D(i, j) of a binary image b(i, j) is defined as follows.
●
Let d(k, l) be some distance metric between pixel offsets.
●
Two commonly used metrics include the city block or Manhattan distance
●
and the Euclidean distance
●
The distance transform is then defined as:
●
i.e., it is the distance to the nearest background pixel whose value is 0.
City block distance
●
City block distance is the distance between two points when you can only move along grid lines, like in
a city.
●
It's also known as Manhattan distance, boxcar distance, or absolute value distance.
Euclidean Distance: Euclidean distance between two points in Euclidean space is the length
of the line segment between them.
Distance transforms
Applications:
• Object Segmentation and Feature Extraction: Helps delineate object
boundaries and extract features.
• Path Planning and Robotics: Used for finding shortest paths and obstacle
avoidance.
• Image Analysis and Pattern Recognition: Aids in analyzing and understanding
image features.
Connected components
● Connected components, which are defined as regions of adjacent pixels that have the same input value or
label.
● Pixels are said to be N4 adjacent if they are immediately horizontally or vertically adjacent, and N8 if they
can also be diagonally adjacent.
● Both variants of connected components are widely used in a variety of applications, such as finding individual letters
in a scanned document or finding objects (say, cells) in a thresholded image and computing their area
statistics.
● Once a binary or multi-valued image has been segmented into its connected components, it is often useful to
compute the area statistics for each individual region R.
Such statistics include:
1) The area (number of pixels);
2) The perimeter (number of boundary pixels);
3) The centroid (average x and y values);
4) The second moments,
From which the major and minor axis orientation and lengths can be
computed using eigenvalue analysis.
Fourier transforms
●
Fourier analysis could be used to analyze the frequency characteristics of various signals.
● The Fourier Transform is an important image processing tool which is used to decompose an image into its
sine and cosine components.
● The output of the transformation represents the image in the Fourier or frequency domain, while the input image is
the spatial domain equivalent.
●
In the Fourier domain image, each point represents a particular frequency contained in the spatial domain
image.
●
The Fourier Transform is used in a wide range of applications, such as image analysis, image filtering,
●
where f is the frequency of signal, and angular frequency ω = 2πf, and phase φi.
●
If signal is sampled to form a discrete signal, we get the same frequency domain, but is periodic
in the range [-π, π], or [0, 2π] or [0, N] in for N-point DFT).
●
You can consider an image as a signal which is sampled in two directions.
●
So taking Fourier transform in both X and Y directions gives you the frequency representation
of image.
Phase shift using convolution
● If we convolve the sinusoidal signal s(x) with a filter whose impulse response is h(x), we get another
sinusoid o(x) of the same frequency but different magnitude A and phase φo.
●
closed form equations for the Fourier transform exist both in the continuous domain
and discrete domain.
●
Continuous domain for 1D:
●
Discrete domain for 1D:
●
The discrete form of the Fourier transform is known as the
Discrete Fourier Transform (DFT).
Two-dimensional Fourier transforms
● The formulas and insights we have developed for one- dimensional signals and their
transforms translate directly in to two-dimensional images.
● Here, instead of just specifying a horizontal or vertical frequency ωx or ωy, we can
create an oriented sinusoid of frequency (ω x , ωy).
●
The 1D DCT is computed by taking the dot product of each N-wide block of pixels with a set of
● where k is the coefficient (frequency) index and the 1/2- pixel offset is used to make
the basis coefficients symmetric.
● The two-dimensional version of the DCT is defined
similarly.
Applications:
Sharpening, blur, and noise removal
● Another common application of image processing is the
enhancement of images through the use of sharpening and
noise removal operations, which require some kind of
neighborhood processing.
●
Traditionally, these kinds of operations were performed
using linear filtering.
Image Pyramids
● We often used to work with an image of constant size. But on some occasions, we need to work with
the same image in different resolution.
●
For example, we may need to enlarge a small image to increse its resolution for better quality.
● Alternatively, we may want to reduce the size of an image to speed up the execution of an algorithm or to
save on storage space or transmission time.
●
The set of images with different resolutions are called Image Pyramids (because when they are
kept in a stack with the highest resolution image at the bottom and the lowest resolution image at
●
Very useful for representingimages.
●
Pyramid is built by using multiple copies of same image at different levels.
●
Each level in the pyramid is 1/4th of the size of previous level.
●
The lower level is of the highest resolution.
Image Interpolation (Upsampling)
●
Image interpolation is a technique for estimating pixel values in an image using nearby pixel
values.
●
It's used to resize, rotate, or enhance images, or to fill in missing parts.
● In order to interpolate (or upsample) an image to a higher resolution, we need to select some
interpolation kernel with which to convolve the image,
●
where r is the upsampling rate.
● The linear interpolation: Estimates pixel values between known pixels by linearly combining the values of
its four nearest neighbors.
●
The bilinear kernel: Bilinear interpolation is an extension of linear interpolation to a two-dimensional space.
●
The bicubic interpolation: Estimates the color in an image pixel by calculating the average of 16 pixels residing
●
Bicublic
interpolation
●Upon calculating the coefficients, we then multiply them with
the weights of the known pixels and interpolate the unknown
pixels.
● Let us take the same input 2x2 image we took in the two
examples above.
● Upon bicubic interpolation, we get the following result:
Gaussian Pyramid
●
A Gaussian pyramid in image processing is a multi-resolution representation of an image created by
repeatedly applying a Gaussian blur(smoothing) and downsampling to produce a series of images at
different scales, useful for various image processing tasks.
●
Higher level (Low resolution) in a Gaussian Pyramid is formed by removing consecutive rows and
columns in Lower level (higher resolution) image.
●
Then each pixel in higher level is formed by the contribution from
●
5 pixels in underlying level with gaussian weights.
●
By doing so, a M × N image becomes M / 2 × N / 2 image.
Gaussian Pyramid Frequency
Decomposition
● The "pyramid" is constructed by repeatedly calculating a weighted
average of the neighboring pixels of a source image and scaling the
image down.
● It can be visualized by stacking progressively smaller versions of the
image on top of one another.
● This process creates a pyramid shape with the base as the original
image and the tip a single pixel representing the average value of the
entire image.
Laplacian
Pyramid
●
Laplacian have decomposition based on difference-
of- lowpass filters.
●
The image is recursively decomposed into low-pass
and highpass bands.
●
g0, g1,…. are the levels of a Gaussian pyramid
●
L0, L1,…. are the levels of a Laplacian pyramid
Laplacian
Pyramid
●
We create the Laplacian pyramid from the Gaussian
pyramid using the formula below :
●
g0, g1,…. are the levels of a Gaussian pyramid
●
L0, L1,…. are the levels of a Laplacian pyramid
Wavelet Transforms
●
Fourier Transforms are used extensively in computer vision applications, but some people
use wavelet decompositions as an alternative.
●
Wavelets can solve and model complex signal processing problems.
●
Wavelets are filters that localize a signal in both time and frequency and are defined over a
hierarchy of scales.
●
Wavelets provide a smooth way to decompose a signal into frequency components without
blocking and are closely related to pyramids.
●
Wavelet refers to small waves.
2. Provide a way for analyzing waveforms, bounded in both frequency (horizontal) and time
(vertical).
●
Convert a signal into a series of wavelets.
●
This helps in multi-resolution signal analysis.
Discrete Wavelet Transform
• The Discrete Wavelet Transform (DWT) is a more practical version of the CWT, where the scaling and
translation parameters a and b are discretized into powers of two.
• The DWT breaks down digital signal into a series of coefficients representing different scales and
resolutions.
• DWT includes, decompose a signal into approximation (low-frequency)
• and detail (high-frequency) components at each step.
• This process is done iteratively, with each step dividing the approximation further, which yields a multi-
level decomposition.
• Mathematically, the DWT is computed using filter banks consisting of low-pass and high-pass filters.
• At each level of decomposition, the DWT produces two sets of coefficients:
• Approximation coefficients (A): Represent the low-frequency components of the signal, capturing the
general trend.
• Detail coefficients (D): Represent the high-frequency components, capturing the finer details.
Scaling a n d Translation
• Scaling and translation are the two fundamental operations used to manipulate wavelets in
the Wavelet Transform.
• Scaling: When a wavelet is scaled, its width changes. Larger scales correspond to wider
wavelets, which capture lower-frequency components (coarse features), while more minor
scales correspond to narrower wavelets, which capture higher- frequency components (fine
details). The ability to scale wavelets makes them versatile for capturing both a signal’s
broader trends and finer details.
• Translation: Translation refers to shifting the wavelet along the time axis. By translating the
wavelet, it can be aligned with different parts of the signal to analyze local features. This
localized analysis is one of the key advantages of the Wavelet Transform, allowing it to
detect transient or time-varying events that global transforms like the Fourier Transform
would miss.
1-D Wavelet Transform
Multiple-Level Decomposition of
wavelets
Geometric transformations
Geometric transformations
Parametric transformations
●
Parametric transformations apply a global deformation to an image, where
the behavior of the transformation is controlled by a small number of
parameters.
Hierarchy of 2D coordinate
transformations.
Geometric
transformations
●
In general, given a transformation specified by a formula
x` = h(x) and a source image f(x), how do we compute
the values of the pixels in the new image g(x).
●
This process is called forward warping or forward
mapping and is shown in Figure 3.45a.
Forward wrapping
Limitations of forward
warping
●
Rounding the value of x` to the nearest integer coordinate and
copy the pixel there, but the resulting image has severe aliasing
and pixels that jump around a lot when animating the
transformation.
●
You can also “distribute” the value among its four nearest
neighbors in a weighted (bilinear) fashion, keeping track of the
per-pixel weights and normalizing at the end.
●
This technique is called splatting and is sometimes used for
volume rendering in the graphics community.
●
The second major problem with forward warping is the
appearance of cracks and holes, especially when magnifying an
image.
●
Filling such holes with their nearby neighbors can lead to further
aliasing and blurring.
Inverse warpping
Mesh-based warping
●
Many projection environments require images that are not simple perspective projections that are the
norm for flat screen displays.
●
Examples include geometry correction for cylindrical displays and some new methods of
projecting into planetarium domes or upright domes intended for VR.
●
The standard approach is to create the image in a format that contains all the required visual
information and distort it to compensate for the non planar nature of the projection device or
surface.
●
Mesh-based warping, a technique used in image processing and computer graphics, involves
deforming or warping an image by manipulating a mesh (a network of points and lines) that
represents the image's geometry.
Figure 1. Image applied as a texture to a mesh, each node is defined by a position (x,y) and texture
coordinate (u,v).
Algorithm:
●
Identify control points for each picture
●
Place them in mesh matrices
●
Iterate through each intermediary frame
Find intermediary mesh for each frame
Get the color mappings for this new mesh from each picture Do a weighted average of the
Example:
●
Idea: use splines to specify curves on each image
●
Get control of warping
●
Input
Source & destination images
2D array of control points in source
2D array of control points in destination
Feature-based morphing
Feature-based morphing in image processing transforms one image into another by identifying and warping
corresponding features.
Algorithm:
Step 1: Select lines in source image I s and Destination Image Id .
Step 2: Generate intermediate frame I by generating new set of line segments by interpolating lines from
their positions in I s to positions in Id .
Step 3: Now intermediate frame I pixels in each line segments map to frame I s pixels by multiple line
algorithm.
Step 4: Multiple line algorithm:
For each pixel X in the destination
Find the corresponding weights in source
Step 5: Now warped image I s and warped image I d is cross dissolved with given dissolution factor
[0,1].
Feature-based morphing
Multiple-line algorithm