0% found this document useful (0 votes)
2 views57 pages

Module-2_Computer Vision Complete

The document discusses various non-linear filtering techniques used in image processing, including median filtering, weighted median filtering, bilateral filtering, and guided image filtering, highlighting their applications in noise reduction and edge preservation. It also covers morphological operations, distance transforms, connected components, Fourier transforms, and image pyramids, explaining their significance in enhancing and analyzing images. Additionally, it details interpolation methods such as nearest neighbor, linear, bilinear, and bicubic interpolation, as well as Gaussian and Laplacian pyramids for multi-resolution image representation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views57 pages

Module-2_Computer Vision Complete

The document discusses various non-linear filtering techniques used in image processing, including median filtering, weighted median filtering, bilateral filtering, and guided image filtering, highlighting their applications in noise reduction and edge preservation. It also covers morphological operations, distance transforms, connected components, Fourier transforms, and image pyramids, explaining their significance in enhancing and analyzing images. Additionally, it details interpolation methods such as nearest neighbor, linear, bilinear, and bicubic interpolation, as well as Gaussian and Laplacian pyramids for multi-resolution image representation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 57

Non-linear filtering


In linear filtering each output pixel is estimated as a weighted sum of neighborhood input pixels.

Linear filters are easier to compose and are compliant to frequency response analysis.

In many cases, however, better performance can be obtained by using a non-linear
combination of neighboring pixels.
Median Filter: A median filter is a technique that removes noise from images by replacing each pixel
with the median value of its neighboring pixels.

Median values can be computed in expected linear time using a randomized select algorithm.

Median filter is best suitable for removing the shot noise (salt and pepper) from the input image.

Since the shot noise value usually lies well outside the true values in the neighborhood, the median
filter is able to filter away such bad pixels.

150 is replaced by 124


2. Weighted Median filtering

Selecting only one input pixel value to replace each output pixel is not efficient as that of averaging.
● Another possibility is to compute a weighted median, in which each pixel is used a number of
times depending on its distance from the center.


This turns out to be equivalent to minimizing the weighted objective function.


where g(i, j) is the desired output value and p = 1 for the weighted median.

Useful in image smoothing with edge-preserving.
3. Bilateral filtering

A bilateral filter is a non-linear, edge-preserving, and noise-reducing smoothing filter for images.


It replaces the intensity of each pixel with a weighted average of intensity values from nearby pixels.

Mathematically given by,
4. Guided image filtering

Guided image filtering uses context from another image, known as a guidance image, to
influence the output of image filtering.

Like other filtering operations, guided image filtering is a neighborhood operation.

However, guided image filtering takes into account the statistics of a region in the
corresponding spatial neighborhood in the guidance image when calculating the value of the output
pixel.

The guidance image can be the image itself, a different version of the image, or a completely
different image.
Binary Image Processing
● Non-linear filters are often used to enhance grayscale and color images, and they are also used
extensively to process binary images.

Such images often occur after a thresholding operation,

● Converting a scanned grayscale document into a binary image for further processing, such as
Optical character recognition and Biometric applications.
Morphological Operations

Morphological operations are image processing techniques that change the shape
and structure of objects in an image.

They are based on mathematical morphology, which studies the properties of shapes and patterns.

To perform such an operation, we first apply convolution on the binary image with a binary
structuring element.

Then select a binary output value depending on the thresholded result of the convolution.

The structuring element can be any shape, from a simple3 × 3 box filter, to more complicated disc
structures.

Different Structuring Elements


● The convolution of a binary image f with a 3×3 structuring element s and the resulting images for
the operations is described as c = f ⊗s.

● Where c is an integer-valued count of the number of 1s inside each structuring element as it


is scanned over the image.


Let S be the size of the structuring element (number of pixels)

The standard operations used in include:
1) Dilation: dilate(f, s) = θ(c, 1);
2) Erosion: erode(f, s) = θ(c, S);
3) Majority: maj(f, s) = θ(c, S/2);
4) Opening: open(f, s) = dilate(erode(f, s), s);
5) Closing: close(f, s) = erode(dilate(f, s), s).
1) Erosion: erode(f, s) = θ(c, S);


The basic idea of erosion is just like soil erosion only, it erodes away the boundaries of foreground

object.

The kernel slides through the image (as in 2D convolution).
● A pixel in the original image (either 1 or 0) will be considered 1 only if all the pixels
under the kernel is 1, otherwise it is eroded (made to zero).
2) Dilation: dilate(f, s) = θ(c, 1);


It is just opposite of erosion, and applies padding to the foreground object.


Here, a pixel element is '1' if at least one pixel under the kernel is '1'.


So it increases the white region in the image or size of foreground object increases.


Normally, in cases like noise removal, erosion is followed by dilation.


Because, erosion removes white noises, but it also shrinks our object.


So we dilate it. Since noise is gone, they won't come back, but our object area increases. It is

also useful in joining broken parts of an object.


3) Opening: open(f, s) = dilate(erode(f, s), s);


Opening is just another name of Erosion followed by Dilation.

It is useful in removing the noise from the images.

4) Closing: close(f, s) = erode(dilate(f, s), s).


Closing is reverse of Opening, Dilation followed by Erosion.


It is useful in closing smallholes inside the foreground objects, or small black points on the object.

Before After
Distance transforms

The distance transform provides a metric or measure of the separation of points in the image.

● The distance transform is useful in quickly computing the distance between a point and a set of
points or a curve using a two-pass raster algorithm.

● It has many applications, including level sets, binary image alignment, feathering in image stitching
and blending, and nearest point alignment.


The distance transform D(i, j) of a binary image b(i, j) is defined as follows.

Let d(k, l) be some distance metric between pixel offsets.

Two commonly used metrics include the city block or Manhattan distance

and the Euclidean distance

The distance transform is then defined as:


i.e., it is the distance to the nearest background pixel whose value is 0.
City block distance

City block distance is the distance between two points when you can only move along grid lines, like in

a city.

It's also known as Manhattan distance, boxcar distance, or absolute value distance.

Euclidean Distance: Euclidean distance between two points in Euclidean space is the length
of the line segment between them.
Distance transforms

Applications:
• Object Segmentation and Feature Extraction: Helps delineate object
boundaries and extract features.
• Path Planning and Robotics: Used for finding shortest paths and obstacle
avoidance.
• Image Analysis and Pattern Recognition: Aids in analyzing and understanding
image features.
Connected components
● Connected components, which are defined as regions of adjacent pixels that have the same input value or
label.

● Pixels are said to be N4 adjacent if they are immediately horizontally or vertically adjacent, and N8 if they
can also be diagonally adjacent.

● Both variants of connected components are widely used in a variety of applications, such as finding individual letters
in a scanned document or finding objects (say, cells) in a thresholded image and computing their area
statistics.

● Once a binary or multi-valued image has been segmented into its connected components, it is often useful to
compute the area statistics for each individual region R.
Such statistics include:
1) The area (number of pixels);
2) The perimeter (number of boundary pixels);
3) The centroid (average x and y values);
4) The second moments,

From which the major and minor axis orientation and lengths can be
computed using eigenvalue analysis.
Fourier transforms

Fourier analysis could be used to analyze the frequency characteristics of various signals.

Image operations in different domains

1) Gray value (histogram) domain

Histogram stretching, equalization, specification, etc...


2) Spatial (image) domain

Average filter, median filter, gradient, Laplacian, etc…


3) Frequency (Fourier) domain

Fast Fourier Transform, Wavelets etc...

● The Fourier Transform is an important image processing tool which is used to decompose an image into its
sine and cosine components.
● The output of the transformation represents the image in the Fourier or frequency domain, while the input image is
the spatial domain equivalent.


In the Fourier domain image, each point represents a particular frequency contained in the spatial domain

image.


The Fourier Transform is used in a wide range of applications, such as image analysis, image filtering,

image reconstruction and image compression.


Basics of Fourier transforms

For a sinusoidal signal,


where f is the frequency of signal, and angular frequency ω = 2πf, and phase φi.


If signal is sampled to form a discrete signal, we get the same frequency domain, but is periodic

in the range [-π, π], or [0, 2π] or [0, N] in for N-point DFT).


You can consider an image as a signal which is sampled in two directions.


So taking Fourier transform in both X and Y directions gives you the frequency representation

of image.
Phase shift using convolution
● If we convolve the sinusoidal signal s(x) with a filter whose impulse response is h(x), we get another
sinusoid o(x) of the same frequency but different magnitude A and phase φo.


closed form equations for the Fourier transform exist both in the continuous domain
and discrete domain.

Continuous domain for 1D:

Discrete domain for 1D:


The discrete form of the Fourier transform is known as the
Discrete Fourier Transform (DFT).
Two-dimensional Fourier transforms
● The formulas and insights we have developed for one- dimensional signals and their
transforms translate directly in to two-dimensional images.
● Here, instead of just specifying a horizontal or vertical frequency ωx or ωy, we can
create an oriented sinusoid of frequency (ω x , ωy).

s(x, y) = sin(ωx x + ωy y).


Two-dimensional Inverse Fourier
transforms
Discrete cosine transform
● The discrete cosine transform (DCT) is a variant of the Fourier transform particularly well-
suited to compressing images in a block-wise fashion.


The 1D DCT is computed by taking the dot product of each N-wide block of pixels with a set of

cosines of different frequencies.

● where k is the coefficient (frequency) index and the 1/2- pixel offset is used to make
the basis coefficients symmetric.
● The two-dimensional version of the DCT is defined
similarly.
Applications:
Sharpening, blur, and noise removal
● Another common application of image processing is the
enhancement of images through the use of sharpening and
noise removal operations, which require some kind of
neighborhood processing.

Traditionally, these kinds of operations were performed
using linear filtering.
Image Pyramids
● We often used to work with an image of constant size. But on some occasions, we need to work with
the same image in different resolution.


For example, we may need to enlarge a small image to increse its resolution for better quality.
● Alternatively, we may want to reduce the size of an image to speed up the execution of an algorithm or to
save on storage space or transmission time.


The set of images with different resolutions are called Image Pyramids (because when they are

kept in a stack with the highest resolution image at the bottom and the lowest resolution image at

top, it looks like a pyramid).


Very useful for representingimages.


Pyramid is built by using multiple copies of same image at different levels.


Each level in the pyramid is 1/4th of the size of previous level.


The lower level is of the highest resolution.
Image Interpolation (Upsampling)

Image interpolation is a technique for estimating pixel values in an image using nearby pixel

values.


It's used to resize, rotate, or enhance images, or to fill in missing parts.
● In order to interpolate (or upsample) an image to a higher resolution, we need to select some
interpolation kernel with which to convolve the image,


where r is the upsampling rate.

Different types of Interpolations


● The nearest neighbor interpolation: Nearest neighbor interpolation is a simple method of estimating the
value of a function at a new point by using the value of the nearest known data point.

● The linear interpolation: Estimates pixel values between known pixels by linearly combining the values of
its four nearest neighbors.


The bilinear kernel: Bilinear interpolation is an extension of linear interpolation to a two-dimensional space.


The bicubic interpolation: Estimates the color in an image pixel by calculating the average of 16 pixels residing

around pixels that are similar to pixels in the source image.


Nearest neighbour interpolation
• The Nearest neighbors interpolation is the simplest and it is most basic interpolation
technique
• It assigns the value to the nearest known pixels to each unknown pixel without
considering the surrounding values
Linear Interpolation
Linear interpolation (Step-
by-step)
Linear interpolation (Step-
by-step)
Linear interpolation (Step-
by-step)
16 11 28 34
Bilinear interpolation
Bilinear
interpolation
●The pixels of the below 2x2 image will be as
● follows. Suppose it has been enlarged by a factor
● 5x5.

Now, find the values of t and u.


Bilinear interpolation
Bicublic
interpolation


Bicublic
interpolation
●Upon calculating the coefficients, we then multiply them with
the weights of the known pixels and interpolate the unknown
pixels.
● Let us take the same input 2x2 image we took in the two

examples above.
● Upon bicubic interpolation, we get the following result:
Gaussian Pyramid

A Gaussian pyramid in image processing is a multi-resolution representation of an image created by
repeatedly applying a Gaussian blur(smoothing) and downsampling to produce a series of images at
different scales, useful for various image processing tasks.

Higher level (Low resolution) in a Gaussian Pyramid is formed by removing consecutive rows and
columns in Lower level (higher resolution) image.

Then each pixel in higher level is formed by the contribution from

5 pixels in underlying level with gaussian weights.

By doing so, a M × N image becomes M / 2 × N / 2 image.
Gaussian Pyramid Frequency
Decomposition
● The "pyramid" is constructed by repeatedly calculating a weighted
average of the neighboring pixels of a source image and scaling the
image down.
● It can be visualized by stacking progressively smaller versions of the
image on top of one another.
● This process creates a pyramid shape with the base as the original
image and the tip a single pixel representing the average value of the
entire image.
Laplacian
Pyramid

Laplacian have decomposition based on difference-
of- lowpass filters.

The image is recursively decomposed into low-pass
and highpass bands.

● G0, G1, .... = the levels of a Gaussian Pyramid.



Predict level G l from level G l +1 by expanding Gl +1 to G’l

Denote by L l the error in prediction: L l = G l –

G’l L 0 , L1 , .... = the levels of a Laplacian
Pyramid.
Laplacian
Pyramid

Laplacian of Gaussian (LoG) can be approximated by
the difference between two different Gaussians.

We create the Laplacian pyramid from the
Gaussian pyramid using the formula below :


g0, g1,…. are the levels of a Gaussian pyramid

L0, L1,…. are the levels of a Laplacian pyramid
Laplacian
Pyramid

We create the Laplacian pyramid from the Gaussian
pyramid using the formula below :


g0, g1,…. are the levels of a Gaussian pyramid

L0, L1,…. are the levels of a Laplacian pyramid
Wavelet Transforms

Fourier Transforms are used extensively in computer vision applications, but some people
use wavelet decompositions as an alternative.


Wavelets can solve and model complex signal processing problems.

Wavelets are filters that localize a signal in both time and frequency and are defined over a
hierarchy of scales.

Wavelets provide a smooth way to decompose a signal into frequency components without
blocking and are closely related to pyramids.

Wavelet refers to small waves.

Wavelet Transforms is a process that:

1. Convert a signal into a series of wavelets.

2. Provide a way for analyzing waveforms, bounded in both frequency (horizontal) and time
(vertical).

3. Allow signals to be stored more efficiently than by Fourier transform.

4. Be able to better approximate real-world signals.


Wavelet transforms

Wavelet refers to small waves.

Wavelet Transforms is a process that:


Convert a signal into a series of wavelets.

Provide a way for analyzing waveforms, bounded in both


● frequency (horizontal) and time (vertical).

Allow signals to be stored more efficiently than by Fourier transform.



Be able to better approximate real-world signals.

Well-suited for approximating data with sharp
discontinuities.

Principles of wavelet transforms



Split up the signal into a set of small signals.


Representing the same signal in different frequency bands.

Provides different frequency bands at different time intervals.


This helps in multi-resolution signal analysis.
Discrete Wavelet Transform
• The Discrete Wavelet Transform (DWT) is a more practical version of the CWT, where the scaling and
translation parameters a and b are discretized into powers of two.
• The DWT breaks down digital signal into a series of coefficients representing different scales and
resolutions.
• DWT includes, decompose a signal into approximation (low-frequency)
• and detail (high-frequency) components at each step.
• This process is done iteratively, with each step dividing the approximation further, which yields a multi-
level decomposition.
• Mathematically, the DWT is computed using filter banks consisting of low-pass and high-pass filters.
• At each level of decomposition, the DWT produces two sets of coefficients:
• Approximation coefficients (A): Represent the low-frequency components of the signal, capturing the
general trend.
• Detail coefficients (D): Represent the high-frequency components, capturing the finer details.
Scaling a n d Translation
• Scaling and translation are the two fundamental operations used to manipulate wavelets in
the Wavelet Transform.
• Scaling: When a wavelet is scaled, its width changes. Larger scales correspond to wider
wavelets, which capture lower-frequency components (coarse features), while more minor
scales correspond to narrower wavelets, which capture higher- frequency components (fine
details). The ability to scale wavelets makes them versatile for capturing both a signal’s
broader trends and finer details.
• Translation: Translation refers to shifting the wavelet along the time axis. By translating the
wavelet, it can be aligned with different parts of the signal to analyze local features. This
localized analysis is one of the key advantages of the Wavelet Transform, allowing it to
detect transient or time-varying events that global transforms like the Fourier Transform
would miss.
1-D Wavelet Transform
Multiple-Level Decomposition of
wavelets
Geometric transformations
Geometric transformations
Parametric transformations


Parametric transformations apply a global deformation to an image, where
the behavior of the transformation is controlled by a small number of
parameters.
Hierarchy of 2D coordinate
transformations.
Geometric
transformations

In general, given a transformation specified by a formula
x` = h(x) and a source image f(x), how do we compute
the values of the pixels in the new image g(x).

This process is called forward warping or forward
mapping and is shown in Figure 3.45a.
Forward wrapping
Limitations of forward
warping

Rounding the value of x` to the nearest integer coordinate and
copy the pixel there, but the resulting image has severe aliasing
and pixels that jump around a lot when animating the
transformation.

You can also “distribute” the value among its four nearest
neighbors in a weighted (bilinear) fashion, keeping track of the
per-pixel weights and normalizing at the end.

This technique is called splatting and is sometimes used for
volume rendering in the graphics community.

The second major problem with forward warping is the
appearance of cracks and holes, especially when magnifying an
image.

Filling such holes with their nearby neighbors can lead to further
aliasing and blurring.
Inverse warpping
Mesh-based warping

Many projection environments require images that are not simple perspective projections that are the
norm for flat screen displays.


Examples include geometry correction for cylindrical displays and some new methods of
projecting into planetarium domes or upright domes intended for VR.

The standard approach is to create the image in a format that contains all the required visual
information and distort it to compensate for the non planar nature of the projection device or
surface.

Mesh-based warping, a technique used in image processing and computer graphics, involves
deforming or warping an image by manipulating a mesh (a network of points and lines) that
represents the image's geometry.

Figure 1. Image applied as a texture to a mesh, each node is defined by a position (x,y) and texture
coordinate (u,v).
Algorithm:

Identify control points for each picture

Place them in mesh matrices

Iterate through each intermediary frame
Find intermediary mesh for each frame

Get the color mappings for this new mesh from each picture Do a weighted average of the

two new pictures formed



That is the new image for this intermediary frame

Example:

Idea: use splines to specify curves on each image

Get control of warping

Input
Source & destination images
2D array of control points in source
2D array of control points in destination
Feature-based morphing
Feature-based morphing in image processing transforms one image into another by identifying and warping
corresponding features.

Algorithm:
Step 1: Select lines in source image I s and Destination Image Id .
Step 2: Generate intermediate frame I by generating new set of line segments by interpolating lines from
their positions in I s to positions in Id .

Step 3: Now intermediate frame I pixels in each line segments map to frame I s pixels by multiple line
algorithm.
Step 4: Multiple line algorithm:
For each pixel X in the destination
Find the corresponding weights in source

Step 5: Now warped image I s and warped image I d is cross dissolved with given dissolution factor

[0,1].
Feature-based morphing

Multiple-line algorithm

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy