0% found this document useful (0 votes)
17 views39 pages

Ip All Units

Image segmentation is the process of dividing an image into meaningful regions to simplify analysis and facilitate tasks such as object recognition. Techniques for segmentation include thresholding, edge-based methods, region-based methods, clustering, and graph-based approaches, each serving various applications in fields like medical imaging and robotics. Edge detection is a key component of image segmentation, utilizing methods like gradient and Laplacian operators to identify boundaries between different regions in an image.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views39 pages

Ip All Units

Image segmentation is the process of dividing an image into meaningful regions to simplify analysis and facilitate tasks such as object recognition. Techniques for segmentation include thresholding, edge-based methods, region-based methods, clustering, and graph-based approaches, each serving various applications in fields like medical imaging and robotics. Edge detection is a key component of image segmentation, utilizing methods like gradient and Laplacian operators to identify boundaries between different regions in an image.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Image Segmentation

● Image segmentation involves converting an image into a collection of regions of pixels


that are represented by a mask or a labeled image. By dividing an image into segments,
you can process only the important segments of the image instead of processing the
entire image.
● Image segmentation refers to the process of dividing an image into meaningful and
visually distinct regions or segments. The goal is to partition the image into coherent and
semantically significant regions based on certain characteristics such as color, intensity,
texture, or object boundaries. The purpose of image segmentation is to simplify image
analysis, facilitate object recognition, and enable further processing tasks by breaking
down the image into smaller, more manageable components

● Image segmentation plays a crucial role in various fields, including computer vision,
medical imaging, robotics, and autonomous driving. By segmenting an image, we can
extract specific objects or regions of interest, separate foreground from background,
identify boundaries, and extract meaningful features for subsequent analysis.

● Typical steps in image analysis, in which the image segmentation is the first step in the
workflow.

● There are several commonly used techniques for image segmentation, including:
○ Thresholding: This technique assigns pixels to different segments based on a
predefined threshold value applied to a specific image attribute, such as
grayscale intensity or color channel values.
○ Edge-based segmentation: It relies on detecting and linking edges, which are
sharp transitions in intensity or color, to segment the image. Edge detection
algorithms, such as the Sobel operator or Canny edge detector, are commonly
employed.
○ Region-based segmentation: This technique groups pixels into regions based
on their similarity in terms of color, texture, or other features. Region growing and
region splitting/merging are popular approaches within this category.
○ Clustering: Clustering algorithms, such as k-means or Gaussian mixture models,
are used to group pixels into clusters based on their similarity in feature space.
Each cluster represents a segment in the image.
○ Watershed transform: Inspired by hydrology, this technique treats the image as
a topographic surface and simulates flooding to segment regions based on
catchment basins.
○ Graph-based segmentation: This method represents the image as a graph,
where pixels are nodes, and edges represent connections. Graph algorithms like
normalized cuts or minimum spanning trees are applied to partition the graph into
segments.

● For example, a common application of image segmentation in medical imaging is to


detect and label pixels in an image that represent a tumor in a patient’s brain or other
organs.
● Image segmentation is needed for a variety of reasons, including:
○ Object detection: Image segmentation can be used to identify objects in an
image. For example, image segmentation can be used to identify cars, people,
and other objects in a traffic scene.
○ Image classification: Image segmentation can be used to classify images into
different categories. For example, image segmentation can be used to classify
images of animals, plants, and other objects.
○ Medical image analysis: Image segmentation can be used to analyze medical
images, such as X-rays, MRI scans, and CT scans. For example, image
segmentation can be used to identify tumors, blood clots, and other medical
conditions in medical images.
○ Video analysis: Image segmentation can be used to analyze video footage. For
example, image segmentation can be used to track the movement of objects in a
video or to identify faces in a video.
○ Robotics: Image segmentation can be used to help robots navigate their
environment. For example, image segmentation can be used to help robots
identify obstacles in their path or to identify objects that they need to interact with.
○ Self-driving cars: Image segmentation is a critical technology for self-driving
cars. Self-driving cars use image segmentation to identify objects in their
environment, such as other cars, pedestrians, and traffic signs. This information
is used by the self-driving car to navigate safely and avoid collisions.

Classification of Image Segmentation

Threshold Based Image Segmentation


Segmentation accuracy determines the eventual success or failure of computerized analysis
procedures. Segmentation procedures are usually done using two approaches – detecting
discontinuity in images and linking edges to form the region (known as edge-based
segmentation), and detecting similarity among pixels based on intensity levels (known as
threshold-based segmentation).

Thresholding is one of the segmentation techniques that generates a binary image (a binary
image is one whose pixels have only two values – 0 and 1 and thus requires only one bit to
store pixel intensity) from a given grayscale image by separating it into two regions based on a
threshold value. Hence pixels having intensity values greater than the said threshold will be
treated as white or 1 in the output image and the others will be black or 0.

So the output segmented image has only two classes of pixels – one having a value of 1 and
others having a value of 0.

If the threshold T is constant in processing over the entire image region, it is said to be global
thresholding. If T varies over the image region, we say it is variable thresholding.

Multiple-thresholding classifies the image into three regions. The histogram in such cases
shows three peaks and two valleys between them.

Global Thresholding
Global thresholding involves selecting a single threshold value that separates the image into
foreground and background regions. All pixels with values above the threshold are assigned to
one class (foreground), while those below the threshold are assigned to the other class
(background). Global thresholding assumes that the foreground and background have distinct
intensity distributions.

When the intensity distribution of objects and background are sufficiently distinct, it is possible to
use a single or global threshold applicable over the entire image.
The basic global thresholding algorithm iteratively finds the best threshold value so segmenting.

This algorithm works well for images that have a clear valley in their histogram. The larger the
value of δ, the smaller the number of iterations. The initial estimate of T can be made equal to
the average pixel intensity of the entire image.

Variable/Local Thresholding
Variable thresholding is useful when the image contains variations in lighting or contrast across
different regions. Instead of using a single global threshold for the entire image, variable
thresholding applies different thresholds to different regions of the image. This is achieved by
dividing the image into smaller regions and calculating a threshold value for each region based
on local image statistics, such as mean or median intensity.

There are broadly two different approaches to local thresholding. One approach is to partition
the image into non-overlapping rectangles. Then the techniques of global thresholding or Otsu’s
method are applied to each of the sub-images. The methods of global thresholding are applied
to each sub-image rectangle by assuming that each such rectangle is a separate image in itself.

The other approach is to compute a variable threshold at each point from the neighborhood
pixel properties
Otsu's Thresholding
Otsu's thresholding is an optimal thresholding method that automatically determines the
threshold value based on the image's histogram. It minimizes the intra-class variance,
effectively finding the threshold that maximizes the inter-class separability. Otsu's method is
particularly useful when the image contains multiple intensity peaks or when the foreground and
background intensities overlap.

Edge Types
Edges in a digital image can be classified into different types based on their characteristics. The
classification of edges is often done to provide a more detailed analysis of the image content.

Here are some common types of edges:


● Step Edges: Step edges are the most basic and commonly encountered type of edges.
They occur when there is an abrupt change in intensity from one region to another. Step
edges represent the boundaries between objects or regions with different intensity
levels.
● Ramp Edges: Ramp edges occur when there is a gradual or continuous change in
intensity. Unlike step edges, ramp edges represent the transitions between regions with
smoothly varying intensity levels. Ramp edges can indicate shading or gradual changes
in texture.
● Roof Edges: Roof edges, also known as roof-like edges or roof structures, are
characterized by a linear or ridge-like pattern. They are typically formed by the
intersection of two opposing step edges. Roof edges often represent prominent
structures or lines in an image, such as the edges of buildings or geometric shapes.
● Corner Edges: Corner edges occur at points where two or more edges meet, forming a
corner-like structure. Corner edges represent the junctions or intersections of multiple
objects or edges in an image. They provide information about the spatial arrangement
and geometry of objects.
● T-Junction Edges: T-junction edges occur when one edge terminates or meets another
edge orthogonally, forming a T-shape. T-junction edges often indicate the presence of
occlusions or intersecting objects in the image. They are commonly used in shape
analysis and object recognition tasks.
● Ridge Edges: Ridge edges are similar to roof edges but represent more complex
structures with curvilinear or ridge-like patterns. Ridge edges can be found in various
natural and man-made structures, such as mountain ridges, elevated curves, or
topographical features.
● Boundary Edges: Boundary edges are edges that outline the boundaries of objects or
regions in an image. They represent the contours or perimeters of objects and are
crucial for tasks like object detection, segmentation, and shape analysis.
Edge Detection
Edge detection is a fundamental technique in image processing that aims to identify and
highlight the boundaries or edges between different objects or regions in an image. The edges
represent significant changes in pixel intensity, such as transitions from dark to light or vice
versa, which often correspond to object boundaries, surface discontinuities, or important
features in the image.

Edge detection is used to detect the location and presence of edges by making changes in the
intensity of an image. Different operations are used in image processing to detect edges. It can
detect the variation of gray levels but it quickly gives response when a noise is detected. In
image processing, edge detection is a very important task. Edge detection is the main tool in
pattern recognition, image segmentation and scene analysis.

An edge can be defined as a set of connected pixels that forms a boundary between two disjoint
regions. There are three types of edges:
- Horizontal edges
- Vertical edges
- Diagonal edges

Edge Detection Operators are of two types:


● Gradient – based operator which computes first-order derivations in a digital image like,
Sobel operator, Prewitt operator, Robert operator
● Gaussian – based operator which computes second-order derivations in a digital image
like, Canny edge detector, Laplacian of Gaussian

Gradient Detection:
It works by computing the gradient of the image, which is a measure of how quickly the intensity
of the image changes at each point. Edges are found at points where the gradient is large.

The most common gradient-based operator is the Sobel operator. It consists of two separate
masks (one for horizontal gradients and one for vertical gradients) that are convolved with the
image. The Sobel operator computes the gradient magnitude by combining the horizontal and
vertical gradients and calculates the gradient direction as the arctangent of the vertical gradient
divided by the horizontal gradient.

Gaussian Detection:

Laplacian Edge Detection


The Laplacian edge detection algorithm is a simple and effective way to detect edges in images.
It works by computing the second derivative of the image, which is a measure of how quickly the
intensity of the image changes at each point. Edges are found at points where the second
derivative is large.

The Laplacian operator is a 3x3 kernel that is convolved with the image to compute the second
derivative. The Laplacian operator can be used to detect edges in both grayscale and color
images.

The Laplacian operator, also known as the Laplacian filter or the Laplacian of Gaussian (LoG)
operator, is a mathematical operator used in image processing for edge detection and image
enhancement. It calculates the second derivative of an image to identify areas of rapid intensity
changes, which often correspond to edges or boundaries between objects in the image.

The Laplacian operator can be represented as follows:


-1 -1 -1
-1 8 -1
-1 -1 -1

When the Laplacian operator is convolved with an image, the resulting image highlights the
regions with significant intensity changes as positive or negative values. Positive values indicate
areas of rapid increase in intensity (bright-to-dark transitions), while negative values indicate
areas of rapid decrease in intensity (dark-to-bright transitions).

The Laplacian edge detection algorithm is relatively simple to implement and can be used to
detect edges in real time. However, it is not as robust to noise as some other edge detection
techniques.

Canny Edge Detection


The Canny edge detection algorithm is a more sophisticated edge detection technique than the
Laplacian edge detection algorithm. It is designed to be more robust to noise and to produce a
smoother edge map.

The Canny edge detection algorithm works by first smoothing the image to reduce noise. Then,
the image is filtered with a high-pass filter to detect edges. Finally, the edges are sharpened and
non-maximum suppression is applied to remove spurious edges.

The Canny edge detection algorithm is more computationally expensive than the Laplacian
edge detection algorithm, but it produces a better edge map.

Formulate how the derivatives are obtained in edge detection


In edge detection, derivatives are used to identify areas of rapid intensity change, which
correspond to edges in an image. The derivatives provide information about the rate of change
of pixel intensities, enabling the detection of transitions from dark to light or vice versa. There
are different methods to obtain derivatives for edge detection, depending on the specific
algorithm or technique used. Here, we'll discuss two common approaches: gradient-based
methods and Laplacian-based methods.

1. Gradient-based Methods:
Gradient-based edge detection methods utilize the first derivative of the image intensity
to detect edges. The magnitude and direction of the gradient at each pixel indicate the
strength and orientation of the intensity change, respectively. The following steps outline
how derivatives are obtained in gradient-based methods:

1. Convert the image to grayscale if it is in color.


2. Apply Gaussian smoothing to reduce noise in the image. This is done by convolving the
image with a Gaussian kernel.
3. Calculate the derivatives using the Sobel operator or other similar gradient operators.
a. The Sobel operator consists of two separate masks (one for horizontal gradients
and one for vertical gradients) that are convolved with the smoothed image.
b. Convolve the smoothed image with the horizontal and vertical gradient masks to
obtain the gradients in the x and y directions.
4. Compute the gradient magnitude by combining the horizontal and vertical gradients.
a. The gradient magnitude at each pixel is calculated using the formula: magnitude
= sqrt((gradient_x)^2 + (gradient_y)^2).
5. Compute the gradient direction at each pixel.
a. The gradient direction at each pixel is determined using the formula: direction =
arctan(gradient_y / gradient_x).

The resulting gradient magnitude and direction images provide information about the
strength and orientation of the detected edges, respectively.

2. Laplacian-based Methods:
Laplacian-based edge detection methods utilize the second derivative of the image
intensity to identify edges. The Laplacian operator highlights areas with rapid changes in
intensity. Here's how derivatives are obtained in Laplacian-based methods:

1. Convert the image to grayscale if needed.


2. Apply Gaussian smoothing to reduce noise.
3. Convolve the smoothed image with the Laplacian operator or the Laplacian of Gaussian
(LoG) operator.
a. The Laplacian operator is a second-order derivative operator that detects
intensity changes in all directions.
b. The LoG operator is a combination of the Laplacian operator and a Gaussian
smoothing operation. It is applied to the image to reduce noise before computing
the Laplacian.
4. The resulting image represents the second derivative of the intensity in the image. Areas
with significant positive or negative values indicate the presence of edges.
In both gradient-based and Laplacian-based methods, the derivatives are calculated to identify
the regions in the image where the intensity changes rapidly. These regions correspond to the
edges, and further processing steps, such as edge linking or thresholding, are performed to
obtain the final edge map or contour representation.

Edge Linking
Edge linking, also known as edge contour tracing or edge connection, is a post-processing step
in edge detection algorithms that aims to connect the detected edge pixels to form continuous
contours or curves. The initial edge detection step often results in isolated edge pixels or
fragmented edge segments, and edge linking helps in reconstructing complete edges or
contours for further analysis or visualization.

The edge linking process involves examining the neighboring pixels of an edge pixel and
determining if they belong to the same edge. By iteratively traversing the edge pixels,
connectivity is established, and a continuous contour or curve is formed.

There are different algorithms and approaches for edge linking, but a commonly used method is
the 8-connectivity approach, where each edge pixel is connected to its eight neighboring pixels.
The steps involved in edge linking using the 8-connectivity approach are as follows:
● Initialization:
○ Start with an edge pixel from the detected edge map.
○ Mark the selected pixel as part of a contour or curve and add it to a contour list.
● Neighbor Examination:
○ Examine the neighboring pixels (including diagonals) of the current pixel to
determine if they belong to the same edge.
○ If a neighboring pixel is also an edge pixel and has not been visited before, mark
it as part of the contour and add it to the contour list.
○ Continue this process for all neighboring pixels.
● Contour Tracing:
○ Choose one of the neighboring pixels that belong to the contour as the new
current pixel.
○ Repeat the neighbor examination step for the new current pixel, adding
neighboring pixels to the contour if they meet the criteria.
○ Continue this process until there are no more unvisited neighboring pixels that
meet the criteria.
● Contour Completion:
○ Once all connected edge pixels have been traversed, the contour is complete.
○ Store the contour information for further processing or visualization.
○ Repeat the process for any remaining unvisited edge pixels to find additional
contours.
The result of the edge linking process is a set of connected contours or curves that represent
the continuous edges in the image. These contours can be further utilized for various
applications such as shape analysis, object recognition, or region-based segmentation.

It's worth noting that edge linking algorithms may incorporate additional techniques to handle
challenges such as gaps in edges, noise, or branching structures. Techniques like gap filling,
noise removal, or branch merging may be applied during the edge linking process to improve
the continuity and accuracy of the resulting contours.

Edge Based Segmentation


Edge-based segmentation is a technique in image processing that uses the concept of edges to
separate different regions or objects in an image. It relies on detecting and utilizing the
boundaries or sharp transitions in pixel intensity or other image attributes to segment the image
into meaningful components.

The process of edge-based segmentation involves the following steps:


1. Edge Detection:
The first step is to detect the edges in the image. Various edge detection algorithms,
such as gradient-based methods or Laplacian-based methods, can be employed to
identify areas of rapid intensity change. These algorithms analyze the gradient or second
derivative of the image to locate the pixels that correspond to edges.
2. Edge Linking:
After the initial edge detection step, the detected edge pixels may be disconnected or
fragmented. Edge linking is performed to connect these edge pixels and form continuous
boundaries or contours. This is typically achieved by examining the neighborhood of
each edge pixel and determining if neighboring pixels belong to the same edge.
3. Edge Selection or Refinement:
In some cases, the edge map obtained from the edge detection step may contain
spurious or irrelevant edges. Edge selection or refinement techniques can be employed
to remove unwanted edges or refine the edge map by considering additional criteria
such as edge strength, curvature, or contextual information.
4. Region Formation:
Once the edges are detected, linked, and refined, the next step is to use the edge
information to segment the image into regions. This can be done by grouping the pixels
based on their connectivity to the edges or by utilizing other region growing or region
merging algorithms.

Edge-based segmentation offers several advantages in image analysis:


1. Object Boundaries: Edge-based segmentation is particularly effective in separating
objects or regions based on their boundaries. The detected edges often correspond to
the boundaries of objects, making it easier to extract meaningful regions.
2. Shape Extraction: Edges contain information about the shape and structure of objects
in an image. By leveraging edge-based segmentation, it becomes possible to extract
and analyze the shapes of objects.
3. Robustness to Lighting and Contrast Variations: Edge-based segmentation methods
are relatively robust to variations in lighting conditions and contrast. Since the focus is on
detecting intensity transitions, they can handle images with uneven illumination or low
contrast.

However, edge-based segmentation also has some limitations:


● Sensitivity to Noise: Edge detection is sensitive to noise, which can result in the
detection of spurious edges or false positives. Pre-processing steps like noise reduction
or smoothing are often necessary to enhance the accuracy of edge detection.
● Ambiguity: In some cases, edges may not provide a clear indication of object
boundaries or may be ambiguous due to overlapping or intersecting objects. Additional
contextual information or higher-level analysis may be required to resolve such
ambiguities.

Hough Transform
The Hough Transform is a feature extraction approach in image processing and computer vision
used for detecting and extracting geometric shapes or patterns in images, especially lines and
curves. It was initially developed for line detection but has since been extended to detect other
shapes like circles or ellipses.

The Hough Transform operates on the principle of parameter space representation. Instead of
directly detecting shapes in the image space, it transforms the image space into a parameter
space, where the parameters represent the properties of the desired shapes. By accumulating
votes in the parameter space, the Hough Transform identifies the most likely parameter
combinations, corresponding to the shapes present in the image.

Its purpose is to find imperfect instances of objects within a certain class of shapes by a voting
procedure. This voting procedure is carried out in a parameter space, from which objects are
obtained as local maxima in an accumulator space that is explicitly constructed by the algorithm
for computing the Hough transform.

The classical Hough transform is most commonly used for the detection of regular curves such
as lines, circles, ellipses, etc. The main advantage of the Hough transform technique is that it is
tolerant of gaps in feature boundary descriptions and is relatively unaffected by image noise.

The Hough transform works by computing a global description of a feature(s) given local
measurements. For example, when detecting lines in an image, the Hough transform maps
points in the image to curves in the Hough parameter space. When viewed in Hough parameter
space, points which are collinear in image become clear.
Watershed Transform
The Watershed Transform is a powerful image processing technique primarily used for object
segmentation. It is based on the concept of treating an image as a topographic map, where the
brightness of each point represents its height. The algorithm finds lines that run along the tops
of ridges. It is commonly used for segmenting images with complex or irregular regions.

The Watershed Transform is particularly useful for segmenting images with complex structures,
such as overlapping objects, touching boundaries, or irregular shapes. It is commonly used in
various applications, including medical image analysis, object detection, and image-based
measurements.

Watershed Transform process:


● Gradient Calculation: First, the gradient of the image is calculated using gradient-based
operators like the Sobel or Scharr operators. The gradient image highlights regions of
rapid intensity changes or edges.
● Marker Generation: Markers are created to indicate the regions or objects of interest.
These markers can be manually defined by the user or automatically generated using
techniques like thresholding, region growing, or morphological operations. Each marker
is assigned a unique label.
● Watershed Line Initialization: The gradient image is processed to identify areas where
flooding can start. This is typically achieved by finding regional minima or low points in
the gradient image. These minima serve as the starting points for flooding and are
considered as markers of basins.
● Flooding and Catchment Basins: Starting from the identified regional minima, the
flooding process begins. The intensity values of the image are treated as water levels,
and the flooding propagates from the minima throughout the image. As the water level
increases, the basins start to fill up, and watershed lines are formed where the basins
meet.
● Watershed Line Refinement: During the flooding process, the basins may merge or
over-segment due to noise or weak gradients. To address this, various techniques are
applied to refine the watershed lines and improve the segmentation accuracy.
● Segmentation Result: The final output of the Watershed Transform is a segmentation
map, where each pixel is assigned a label corresponding to its catchment basin or
segment. The watershed lines act as boundaries between the segments, separating
different regions in the image.

Clustering Techniques
In this type of segmentation, we try to cluster the pixels that are together. There are two
approaches for performing the Segmentation by clustering.
- Clustering by Merging
- Clustering by Divisive
Clustering techniques in image segmentation are used to group similar pixels or regions
together based on their pixel values or other image features. These techniques aim to partition
an image into distinct regions or objects by identifying similarities and differences in the pixel
characteristics.

K-means clustering is a very popular clustering algorithm which is applied when we have a
dataset with labels unknown. The goal is to find certain groups based on some kind of similarity
in the data with the number of groups represented by K. This algorithm is generally used in
areas like market segmentation, customer segmentation, etc. But, it can also be used to
segment different objects in the images on the basis of the pixel values.

The steps involved in K-Means clustering are


● Select a particular value of K
● A feature is taken in each pixel like RGB value, etc.
● Similar pixels are grouped by using distances like Euclidean distance.
● K-Means is used with the center of the cluster.
● Generally, a threshold is defined within which if the calculated size falls it is grouped into
a single cluster.

A good way to find the optimal value of K is to brute force a smaller range of values (1-10) which
is popularly known as the Elbow method. The point where the graph is sharply bent downward
can be considered the optimal value of K.

Region approach
This process involves dividing the image into smaller segments that have a certain set of rules.
This technique employs an algorithm that divides the image into several components with
common pixel characteristics. The process looks out for chunks of segments within the image.
Small segments can include similar pixes from neighboring pixels and subsequently grow in
size. The algorithm can pick up the gray level from surrounding pixels

Advantage:
● Region growing method can correctly Separate the region.
● Region growing Method can provide original images which have clear edges with good
segmentation results.
● We only need a small number of seeds to represent the property we want.
● The process of region growing can be controlled by setting appropriate seed points,
similarity criteria, and stopping conditions. This allows users to fine-tune the
segmentation output based on their requirements and domain knowledge
● Region growing is a straightforward and easy-to-understand segmentation technique. It
is based on the concept of growing regions from seeds, which is intuitive and can be
easily implemented
● Region growing can handle irregularly shaped regions and objects
Disadvantage:
● Computational expensive.
● It is a local method with no Global view of the problem.
● The choice of seed points or regions in region growing can significantly affect the
segmentation result. Incorrect or inappropriate seed selection may lead to
under-segmentation or over-segmentation.
● Region growing algorithms often require the specification of similarity criteria and
stopping conditions. The performance and accuracy of the segmentation heavily depend
on selecting appropriate threshold values for these parameters
● Region growing can be sensitive to noise and may generate spurious regions or false
detections
● Region growing may struggle to handle overlapping regions or objects

Region Growing
Region growing is a simple region-based image segmentation method. It is also classified as a
pixel-based image segmentation method since it involves the selection of initial seed points.

Region growing is a common technique used in region-based segmentation. It starts with seed
pixels or regions and expands them by adding neighboring pixels that meet certain similarity
criteria.

The similarity criteria can be based on color, intensity, texture, or other image features. Pixels
that satisfy the similarity criteria are added to the growing region, and the process continues
until the region stops expanding or reaches a predefined stopping condition

Region growing based techniques are better than edge based techniques in noisy images.

Region splitting
Region splitting and merging is a region-based segmentation technique that involves dividing an
image into smaller regions through splitting and then merging them based on specific criteria.

The initial image is considered as a single region. The splitting process involves dividing the
region into smaller sub-regions based on predefined splitting criteria. The splitting criteria can be
based on intensity variations, texture differences, or other image properties. Common methods
for region splitting include thresholding, local variance analysis, or edge detection. The splitting
operation continues recursively until specific conditions are met, such as reaching a desired
number of regions or satisfying certain homogeneity criteria.

After the region splitting phase, the resulting image consists of multiple smaller regions. In the
region merging phase, adjacent regions are merged based on similarity criteria to form larger,
more coherent regions. The merging criteria can be based on the similarity of color, texture,
intensity, or other image features. Different merging techniques can be applied, such as
comparing the statistical properties of neighboring regions or measuring the similarity between
region boundaries. Merging continues until no further regions can be merged or until predefined
stopping conditions are met.

Region splitting and merging continue iteratively until certain stopping conditions are satisfied.
These conditions can be predefined thresholds for the number of regions, minimum region size,
or specific homogeneity measures. The stopping conditions help prevent over-segmentation or
under-segmentation and control the granularity of the final segmentation output.
Image compression is the process of reducing the size of an image file without significantly
degrading its visual quality. It is an essential technique used in various applications, such as
digital photography, image storage, transmission over networks, and multimedia systems. The
primary goal of image compression is to minimize the file size while preserving the essential
information and perceptual fidelity of the image.

The compression of images is an important step before we start the processing of larger images
or videos. The compression of images is carried out by an encoder and output a compressed
form of an image. In the processes of compression, the mathematical transforms play a vital
role.

Need
Image compression is essential in our digital lives because large image files can cause slow
website loading times, difficulties in sharing images online, and limited storage space. By
compressing images, their size is reduced, making it easier to store and transmit them.

1. Reduced Storage Requirements: Image compression reduces file sizes, allowing for
efficient storage of images on devices with limited storage capacity.
2. Bandwidth Efficiency: Compressed images require less bandwidth, resulting in faster
upload and download times during image transmission over networks.
3. Faster Processing: Smaller file sizes from compression lead to faster loading times and
improved performance in image processing tasks.
4. Cost Reduction: Compression reduces storage, network infrastructure, and data
transfer costs associated with images.
5. Improved User Experience: Smaller compressed images result in quicker website
loading, enhanced multimedia streaming, and better user satisfaction.
6. Compatibility: Compression adapts images to meet the size and format limitations of
different devices and platforms.
7. Archiving and Preservation: Image compression reduces storage requirements,
making it more feasible to archive and preserve large collections of images over time.

Classification
There are two main types of image compression: lossless compression and lossy compression.

Lossless Compression
● Lossless compression algorithms reduce the file size of an image without any loss of
information. The compressed image can be perfectly reconstructed to its original form.
● This method is commonly used in scenarios where preserving every detail is crucial,
such as medical imaging or scientific data analysis
● Lossless compression achieves compression by exploiting redundancy and eliminating
repetitive patterns in the image data.
● Some common lossless compression algorithms include:
○ Run-Length Encoding (RLE): This algorithm replaces consecutive repetitions of
the same pixel value with a count and the pixel value itself.
○ Huffman coding: It assigns variable-length codes to different pixel values based
on their frequency of occurrence in the image.
○ Lempel-Ziv-Welch (LZW): This algorithm replaces repetitive sequences of pixels
with shorter codes, creating a dictionary of commonly occurring patterns.
● Lossless compression techniques typically achieve modest compression ratios
compared to lossy compression but ensure exact data preservation.
● Types of lossless images include:
○ RAW - these files types tend to be quite large in size. Additionally, there are
different versions of RAW, and you may need certain software to edit the files.
○ PNG - Compresses images to keep their small size by looking for patterns on a
photo, and compressing them together. The compression is reversible, so once
you open a PNG file, the image recovers exactly.
○ BMP - A format found exclusively to Microsoft. It's lossless, but not frequently
used.

Lossy Compression
● Lossy compression algorithms achieve higher compression ratios by discarding some
information from the image that is less perceptually significant.
● This method is widely used in applications such as digital photography, web images, and
multimedia streaming, where a small loss in quality is acceptable to achieve significant
file size reduction.
● Lossy compression techniques exploit the limitations of human visual perception and the
characteristics of natural images to remove or reduce redundant or less noticeable
details
● The algorithms achieve this by performing transformations on the image data and
quantizing it to reduce the number of distinct values.
● The main steps involved in lossy compression are:
○ Transform Coding: The image is transformed from the spatial domain to a
frequency domain using techniques like Discrete Cosine Transform (DCT) or
Wavelet Transform. These transforms represent the image data in a more
compact manner by concentrating the energy in fewer coefficients.
○ Quantization: In this step, the transformed coefficients are quantized, which
involves reducing the precision or dividing the range of values into a finite set of
discrete levels. Higher levels of quantization lead to greater compression but also
more loss of information. The quantization process is typically designed to
allocate more bits to visually important coefficients and fewer bits to less
important ones.
○ Entropy Encoding: The quantized coefficients are further compressed using
entropy coding techniques like Huffman coding or Arithmetic coding. These
coding schemes assign shorter codes to more frequently occurring coefficients,
resulting in additional compression.
● The amount of compression achieved in lossy compression is customizable based on
the desired trade-off between file size reduction and visual quality. Different compression
algorithms and settings can be used to balance the compression ratio and the
perceptual impact on the image.

Methods of compression

Run-length Coding
Run-length coding is a simple and effective technique used in image compression, especially for
scenarios where the image contains long sequences of identical or highly similar pixels. It
exploits the redundancy present in such sequences to achieve compression.

The basic idea behind run-length coding is to represent consecutive repetitions of the same
pixel value with a count and the pixel value itself, instead of explicitly storing each pixel
individually. By doing so, run-length coding reduces the amount of data required to represent
these repetitive patterns.

Say you have a picture of red and white stripes, and there are 12 white pixels and 12 red pixels.
Normally, the data for it would be written as WWWWWWWWWWWWRRRRRRRRRRRR, with
W representing the white pixel and R the red pixel. Run length would put the data as 12W and
12R. Much smaller and simpler while still keeping the data unaltered.

Here's how run-length coding works for image compression:

Run-length Encoding (RLE):


● Scanning: The image is scanned row by row or column by column. The scanning
direction is not crucial, but it should be consistent for encoding and decoding.
● Finding Runs: During scanning, the algorithm identifies runs, which are sequences of
consecutive pixels with the same value. The length of each run is determined.
● Encoding: For each run, the algorithm stores the length of the run (count) and the pixel
value. This information is typically represented using a pair of values: <count, value>.
The count is usually represented using a fixed number of bits or a variable-length code,
depending on the specific implementation.
● Storing the Encoded Data: The encoded run-length information is stored, usually in a
compressed form, as a sequence of <count, value> pairs or a compressed bitstream.

Run-length Decoding:
● Retrieving the Encoded Data: The encoded run-length data is retrieved from storage.
● Decoding: The decoding process involves reconstructing the original image from the
run-length data. Starting from the first <count, value> pair, the algorithm repeats the
value count times to obtain the sequence of pixels. This process is repeated for each
<count, value> pair until the entire image is reconstructed.
The decoding step essentially reverses the encoding process, reconstructing the original
image by expanding the compressed run-length representation.

The original data isn't instantly accessible, we have to decode everything before you can access
anything. Also You can't tell how large the decoded data will be.
Run-length coding is particularly effective for images with areas of solid color or regions with
uniform patterns, such as line drawings, text, or simple graphics. However, it may not be as
efficient for more complex and detailed images, as they tend to have fewer long runs of identical
pixels.

Run-length coding is often used in conjunction with other compression techniques, such as
Huffman coding or arithmetic coding, to achieve higher compression ratios. By combining
run-length coding with these entropy encoding techniques, the frequency of occurrence of
different runs or pixel values can be exploited to assign shorter codes to more frequent patterns,
resulting in additional compression.

Shannon Fano Coding


Shannon-Fano coding is a technique used for entropy encoding in image compression. It
assigns variable-length codes to different symbols (in this case, pixel values) based on their
probability of occurrence. The codes are designed in a way that ensures a prefix-free property,
meaning that no code is a prefix of another code. Shannon-Fano coding is a precursor to
Huffman coding and provides a foundation for understanding its concepts.

Here's how Shannon-Fano coding works for image compression:


● Probability Calculation:
○ Frequency Counting: The first step is to determine the frequency of occurrence
for each unique pixel value in the image. This is done by counting the number of
times each pixel value appears in the image.
○ Probability Calculation: Once the frequencies are determined, probabilities can
be calculated by dividing each frequency by the total number of pixels in the
image. These probabilities represent the likelihood of encountering each pixel
value.
● Sorting:
○ The pixel values are sorted based on their probabilities in descending order. This
step is essential for subsequent recursive splitting and assigning codes.
● Recursive Splitting:
○ Starting with the sorted list, the pixel values are divided into two groups such that
the sum of probabilities in one group is as close as possible to the sum of
probabilities in the other group. This division is performed recursively until each
group contains only a single pixel value or until the splitting is no longer possible.
● Code Assignment:
○ The codes are assigned to the pixel values based on the recursive splitting. The
assignment process follows a pattern where the left group is assigned a "0" as
the prefix, and the right group is assigned a "1" as the prefix. The splitting and
assignment process continues recursively for each group until individual codes
are assigned to all pixel values.

It's important to note that if the probabilities of two or more pixel values are the same, a different
mechanism may be employed to ensure the prefix-free property. For example, in such cases,
the pixel values can be sorted based on their original order in the image or some other
tie-breaking rule.

Shannon-Fano coding is a basic technique that assigns codes based on probabilities but does
not guarantee the optimal code lengths. Huffman coding, which is an extension of
Shannon-Fano coding, provides a more efficient encoding scheme by considering the
probabilities and constructing a binary tree where shorter codes are assigned to more frequent
symbols.

In image compression, Shannon-Fano coding is often used as a stepping stone or as a


foundation for more advanced entropy encoding techniques like Huffman coding or Arithmetic
coding. By assigning variable-length codes based on pixel probabilities, Shannon-Fano coding
contributes to the overall compression of image data by efficiently representing frequent pixel
values with shorter codes.

Huffman coding
Huffman coding is a widely used entropy encoding technique for image compression. It assigns
variable-length codes to different symbols (in this case, pixel values) based on their probabilities
or frequencies of occurrence. Huffman coding achieves efficient compression by assigning
shorter codes to more frequently occurring symbols.

Here's how Huffman coding works for image compression:


● Probability Calculation:
○ Frequency Counting: The first step is to determine the frequency of occurrence
for each unique pixel value in the image. This is done by counting the number of
times each pixel value appears in the image.
○ Probability Calculation: Once the frequencies are determined, probabilities can
be calculated by dividing each frequency by the total number of pixels in the
image. These probabilities represent the likelihood of encountering each pixel
value.
● Construction of Huffman Tree:
○ Symbol Creation: Each unique pixel value is treated as a symbol.
○ Node Creation: A leaf node is created for each symbol, containing the symbol
value and its probability.
○ Combining Nodes: The nodes are sorted based on their probabilities, and the two
nodes with the lowest probabilities are combined to create a new parent node.
The probability of the parent node is the sum of the probabilities of its child
nodes.
○ Tree Formation: The process of combining nodes is repeated iteratively until all
nodes are combined into a single root node. This results in the creation of a
binary tree known as the Huffman tree.
● Code Assignment:
○ Traversing the Huffman Tree: Starting from the root node, a traversal is
performed through the Huffman tree. Moving to the left child represents a binary
digit "0," while moving to the right child represents a binary digit "1."
○ Code Assignment: The codes are assigned by concatenating the binary digits
encountered during the traversal from the root to each leaf node. The codes for
frequently occurring symbols are shorter, while the codes for less frequent
symbols are longer.
○ Building Codebook: The assigned codes for each symbol are stored in a
codebook, which is used for encoding and decoding.
● Encoding:
○ Replace each pixel value with its corresponding Huffman code from the
codebook to generate a sequence of Huffman codes representing the
compressed image.
● Decoding:
○ Traverse the sequence of Huffman codes from left to right, starting from the root
of the Huffman tree. Decode each code to its corresponding pixel value,
reconstructing the original image.

Huffman coding achieves efficient compression by assigning shorter codes to more frequently
occurring symbols, which results in a reduction of the overall number of bits required to
represent the image. This technique is widely used in image compression algorithms such as
JPEG (Joint Photographic Experts Group) and is known for its simplicity and effectiveness in
achieving good compression ratios while preserving visual quality

Scalar and vector quantization


Scalar Quantization:
Scalar quantization, also known as scalar quantization or scalar quantization coding (SQC), is a
technique used in image compression to reduce the number of bits required to represent an
image by quantizing individual pixel values. It assigns discrete levels or values to each pixel
based on a quantization table or codebook.

Here's how scalar quantization works for image compression:


● Quantization Table or Codebook Generation:
○ Range Determination: The range of pixel values in the image is determined,
typically by examining the minimum and maximum pixel values.

○ Division into Intervals: The range is divided into a set of non-overlapping intervals
or levels. The number of levels determines the number of bits used to represent
each pixel value.
● Pixel Quantization:
○ For each pixel in the image, the quantization process involves mapping its
original value to the nearest level in the quantization table or codebook. This
mapping is performed based on the proximity of the pixel value to the available
levels.
● Encoding:
○ The quantized values, which are represented by the assigned levels, are
encoded and stored using the corresponding number of bits for each quantized
pixel value.
● Decoding:
○ During decoding, the encoded quantized values are retrieved, and the inverse
process is applied. The quantized values are mapped back to their original pixel
values based on the inverse mapping of the quantization table or codebook.

Scalar quantization provides a simple and efficient means of reducing the number of bits
required to represent an image. However, it may introduce quantization errors and loss of fine
details since each pixel is quantized independently.

Vector Quantization:
Vector quantization (VQ) is a technique used in image compression that extends the concept of
scalar quantization by grouping multiple pixels together into blocks or vectors. It aims to capture
the statistical dependencies and similarities among neighboring pixels, resulting in improved
compression performance and preservation of local image features.

Here's how vector quantization works for image compression:


● Block Division:
○ The image is divided into non-overlapping blocks or vectors, each containing
multiple pixels. The size of the blocks can vary depending on the specific
implementation.
● Codebook Generation:
○ For vector quantization, a codebook is generated by applying clustering
algorithms such as k-means to the blocks in the image. The codebook represents
a set of representative vectors that will be used for quantization.
● Vector Quantization:
○ Each block in the image is quantized by finding the codebook entry that best
approximates the block's content. The quantization is performed by assigning the
index of the closest codebook entry to the block.
● Encoding:
○ The indices of the selected codebook entries for each block are encoded and
stored, typically using a variable-length code or a fixed number of bits.
● Decoding:
○ During decoding, the encoded indices are retrieved, and the corresponding
codebook entries are used to reconstruct the quantized blocks.

Vector quantization offers improved compression performance compared to scalar quantization


as it takes into account the correlation among neighboring pixels. It captures the statistical
structure of the image more effectively and can preserve image details and features better.
However, vector quantization requires more computational complexity and memory to generate
and store the codebook compared to scalar quantization.

Compression Standards - JPEG/MPEG ???


Compression Standards for JPEG (Joint Photographic Experts Group) and MPEG (Moving
Picture Experts Group) are widely used in image and video compression, respectively. These
standards define the encoding and decoding processes for efficient compression and
decompression, ensuring interoperability across different devices and platforms.

JPEG Compression Standard:


The JPEG compression standard is primarily designed for still image compression. It provides a
lossy compression method that achieves high compression ratios while maintaining acceptable
image quality.

The main components of the JPEG compression standard include:


● Color Space Conversion: The input image is typically converted from the RGB color
space to the YCbCr color space, which separates the luminance (Y) and chrominance
(Cb and Cr) components. This color space transformation exploits the fact that the
human visual system is more sensitive to changes in brightness (luminance) than in
color (chrominance).
● Discrete Cosine Transform (DCT): The image is divided into blocks, typically 8x8
pixels, and a two-dimensional DCT is applied to each block. The DCT transforms the
spatial image data into frequency components, separating the low-frequency and
high-frequency information.
● Quantization: The DCT coefficients are quantized, reducing the precision of the
frequency components. This quantization step introduces loss of information, resulting in
a lossy compression scheme. The quantization parameters can be adjusted to control
the trade-off between compression ratio and image quality.
● Entropy Encoding: The quantized DCT coefficients are further compressed using
entropy encoding techniques such as Huffman coding. Huffman coding assigns shorter
codes to more frequently occurring coefficients, reducing the overall bit rate required for
encoding.
MPEG Compression Standard:
The MPEG compression standard is designed for compressing digital video sequences. It
provides both lossy and lossless compression methods, enabling efficient video storage and
transmission.

The main components of the MPEG compression standard include:


● Intra-frame and Inter-frame Compression: MPEG uses a combination of intra-frame
and inter-frame compression techniques. Intra-frame compression compresses
individual frames using similar techniques as JPEG, treating each frame as a separate
image. Inter-frame compression exploits temporal redundancy by encoding the
difference between consecutive frames, known as motion compensation.
● Motion Compensation: In inter-frame compression, motion compensation is used to
estimate and encode the motion vectors between frames. By predicting the motion
between frames, only the differences (residuals) need to be encoded, resulting in
efficient compression.
● Discrete Cosine Transform (DCT): Similar to JPEG, MPEG applies DCT to the blocks
within frames or the residuals to transform the spatial information into frequency
components.
● Quantization and Entropy Encoding: The DCT coefficients or residuals are quantized
and entropy encoded using techniques such as Huffman coding or arithmetic coding.
● Bitrate Control: MPEG provides various profiles and levels that define different
compression capabilities and target applications. Bitrate control mechanisms help
achieve a desired compression level while maintaining a specified bitrate for video
streaming or storage.

The MPEG compression standard is used in various formats such as MPEG-1, MPEG-2,
MPEG-4, and MPEG-7, each offering different levels of compression efficiency and supporting
different video applications, from low-quality video streaming to high-definition video storage.

Video compression
Video compression is the process of reducing the size of video data while maintaining an
acceptable level of visual quality. It involves applying various techniques to exploit spatial and
temporal redundancies in video sequences, resulting in efficient storage, transmission, and
streaming of videos.

Video compression techniques aim to achieve a balance between compression efficiency and
visual quality. The choice of compression algorithm, parameters, and settings depends on the
specific requirements, such as target bitrate, resolution, desired quality, and available resources.

Common video compression standards include MPEG-2, MPEG-4, H.264/AVC, H.265/HEVC


(High-Efficiency Video Coding), and VP9. These standards have been widely adopted in various
applications, including video streaming platforms, video conferencing systems, digital television,
and video storage devices.
Object Recognition ???
Object recognition refers to the process of identifying and classifying objects within digital
images or video frames. It is a fundamental task in computer vision, a field of study that focuses
on enabling computers to understand and interpret visual information. Object recognition
algorithms analyze visual data and extract meaningful features to make sense of the objects
present in the scene.

Object recognition can be approached using different techniques, ranging from traditional
computer vision methods to more advanced deep learning-based approaches. Traditional
methods often rely on handcrafted features and classifiers, while deep learning methods
leverage the power of neural networks to automatically learn discriminative features and
classifiers from large amounts of labeled data.

Computer Vision
Computer vision is a field of study and research that focuses on enabling computers to gain a
high-level understanding of visual information from digital images or video. It involves
developing algorithms and techniques that allow computers to analyze, interpret, and make
sense of visual data, mimicking human visual perception and understanding.

The main goals of computer vision include:


● Image and Video Understanding: Computer vision aims to enable machines to
understand and interpret the content of images and videos. This includes tasks such as
object detection and recognition, scene understanding, image segmentation, tracking,
and motion analysis.
● Feature Extraction and Representation: Computer vision algorithms extract
meaningful features from images or videos that can capture relevant information for
further analysis. These features can be low-level visual cues like edges, colors, and
textures, or higher-level semantic features that represent objects, shapes, or structures.
● Object Detection and Recognition: Computer vision algorithms can detect and
recognize objects within images or videos. This involves identifying specific objects or
classes of objects and localizing their positions or regions of interest. Object recognition
can be performed using machine learning techniques, such as support vector machines
(SVMs), convolutional neural networks (CNNs), or deep learning architectures.
● Scene Understanding and Understanding Context: Computer vision algorithms aim
to comprehend the overall scene context, including the relationships between objects,
spatial layout, and semantic understanding of the scene. This involves tasks such as
scene classification, scene segmentation, and understanding the interactions between
objects within the scene.
Computer vision algorithms utilize a range of techniques, including image processing, pattern
recognition, machine learning, deep learning, and probabilistic models. These algorithms
leverage mathematical and statistical methods to analyze visual data and extract meaningful
information.
Object recognition techniques
Object recognition techniques are methods used in computer vision to identify and classify
objects within images or video frames. These techniques aim to mimic human visual perception
and enable machines to understand and interpret visual information. Here are some commonly
used object recognition techniques:

● Template Matching:
○ Template matching compares a predefined template image with sub-regions of
the input image to find matching patterns. It involves calculating the similarity
between the template and image patches using metrics like correlation or sum of
squared differences. Template matching is straightforward but can be sensitive to
variations in scale, rotation, and lighting conditions.
● Feature-Based Methods:
○ Feature-based methods extract distinctive features from images and use them to
recognize objects. Examples of feature descriptors include Scale-Invariant
Feature Transform (SIFT), Speeded-Up Robust Features (SURF), and Oriented
FAST and Rotated BRIEF (ORB). These methods detect keypoints in images and
compute descriptors that represent the local visual characteristics of the
keypoints. Object recognition is then performed by matching and comparing
these features across images.
● Deep Learning:
○ Deep learning, particularly convolutional neural networks (CNNs), has
revolutionized object recognition. CNNs are capable of automatically learning
hierarchical features from raw image data. Training involves feeding labeled
images to the network, and it learns to recognize objects by adjusting the weights
of its layers. Deep learning-based object recognition models, such as YOLO (You
Only Look Once), Faster R-CNN (Region-based Convolutional Neural Networks),
and SSD (Single Shot MultiBox Detector), have achieved impressive results in
terms of accuracy and real-time performance.
● Histogram-based Methods:
○ Histogram-based methods utilize color and texture information to recognize
objects. These methods analyze the distribution of color or texture features in
images and use statistical measures to compare and classify objects. Examples
include color histograms, local binary patterns (LBPs), and histogram of oriented
gradients (HOG). Histogram-based methods are effective for simple object
recognition tasks but may struggle with complex scenes or object variations.
● Ensemble Techniques:
○ Ensemble techniques combine multiple object recognition models or classifiers to
improve overall performance. This can involve techniques such as ensemble
averaging, boosting, or bagging. By combining the predictions of multiple models,
ensemble techniques can enhance robustness, accuracy, and generalization of
object recognition systems.
Introduction
The principal goal of restoration techniques is to improve an image in some predefined sense.

Although there are areas of overlap, image enhancement is largely a subjective process, while
restoration is for the most part an objective process

Restoration attempts to recover an image that has been degraded by using a priori knowledge
of the degradation phenomenon. Thus, restoration techniques are oriented toward modeling the
degradation and applying the inverse process in order to recover the original image.

Restoration improves image in some predefined sense. Image enhancement techniques are
subjective processes, whereas image restoration techniques are objective processes.

The restoration approach usually involves formulating a criterion of goodness that will yield an
optimal estimate of the desired result, while enhancement techniques are heuristic procedures
to manipulate an image in order to take advantage of the human visual system.

Some restoration techniques are best formulated in the spatial domain, while others are better
suited for the frequency domain.

Image degradation is the loss of image quality for a variety of reasons. When there is image
deterioration, the quality of the image is greatly diminished and becomes hazy.

Image restoration is the process of enhancing or upgrading an image’s quality with the aid of
picture editing software.

Model of the Image Degradation/Restoration


The model of image degradation/restoration refers to the process of simulating the degradation
of an image and subsequently restoring it to its original or enhanced form using various
techniques and algorithms. This model helps us understand the factors that contribute to image
degradation and enables us to develop methods for image restoration.
The model typically consists of two main components: the degradation model and the
restoration model.

1. Degradation function/model:
The degradation function/model represents the processes that cause image
degradation. It simulates various factors that can affect an image, such as noise,
blurring, compression artifacts, and other distortions. These factors can occur during
image acquisition, transmission, or storage. The degradation model aims to replicate
these effects to create degraded versions of the original image.

Some common degradation models include:


● Additive noise: Simulating noise by adding random pixel values to the image.
● Gaussian blur: Applying a blurring filter to mimic the effect of motion or lens blur.
● Compression artifacts: Simulating the lossy compression effects by reducing
the bit rate or introducing compression algorithms.
2. Restoration function/model:
The restoration function/model is designed to reverse or minimize the effects of
degradation on the degraded image. It aims to recover the lost or corrupted information
and enhance the image quality. Restoration techniques can vary depending on the
specific degradation types and desired image quality.

Some common restoration methods include:


● Denoising: Removing or reducing the noise in the image using filters or
statistical algorithms.
● Deblurring: Recovering the sharpness and details lost due to blurring by
estimating the blur kernel and applying deconvolution techniques.
● Super-resolution: Increasing the resolution and level of detail in a low-resolution
image by utilizing information from multiple images or employing sophisticated
algorithms.

Noise Models
The principle sources of noise in digital image are due to image acquisition and
transmission.
● During image acquisition, the performance of image sensors gets affected by a variety of
factors such as environmental conditions and the quality of sensing elements.
● During image transmission, the images are corrupted due to the interference introduced
in the channel used for transmission.
The Noise components are considered as random variables, characterized by a probability
density function.
Gaussian Noise
Because of its mathematical simplicity, the Gaussian noise model is often used in practice and
even in situations where they are marginally applicable at best.

Gaussian noise arises in an image due to factors such as electronic circuit noise and sensor
noise due to poor illumination or high temperature

The Gaussian distribution is one of the most widely used probability density functions (PDFs) to
model noise. It is characterized by a bell-shaped curve and is often used to represent additive
white Gaussian noise (AWGN)

The PDF of a Gaussian distribution is defined by its mean and standard deviation.
Gaussian noise is symmetric around the mean and has a flat power spectral density.

Rayleigh Noise
Rayleigh noise is usually used to characterize noise phenomena in range imaging.

The Rayleigh distribution is commonly used to model noise in radar and ultrasound imaging.
It is characterized by a non-negative skewness and a right-skewed shape

The PDF of a Rayleigh distribution is defined by its scale parameter, which determines the
spread of the distribution.

Erlang (gamma) Noise


The gamma distribution is a versatile distribution used to model various types of noise. It can
represent both additive and multiplicative noise in images.
The PDF of a gamma distribution depends on two parameters: the shape parameter and the
scale parameter.
The gamma distribution can approximate various noise types, including exponential, Erlang, and
chi-squared distributions.

Exponential Noise
The exponential distribution is commonly used to model Poisson noise, which arises from
photon counting in low-light imaging scenarios. Poisson noise occurs when the number of
events in a fixed interval follows a Poisson process. The PDF of an exponential distribution is
characterized by its rate parameter, which determines the decay rate of the distribution.
Uniform Noise
The uniform distribution represents noise that is equally likely to occur within a specified range.
In image processing, uniform noise is often used to model quantization noise, which occurs
when continuous values are discretized into a limited number of levels. The PDF of a uniform
distribution is a constant within a specified range and zero outside that range.

Uniform noise is not practically present but is often used in numerical simulations to analyze
systems.

Classification of image restoration techniques


Image restoration techniques can be broadly classified as below:
● Blind deconvolution
● Lucy-Richardson filtering
● Wiener filtering
● Reverse filter

Blind Deconvolution
Blind deconvolution is a challenging image restoration technique that aims to estimate both the
unknown blur kernel and the original image from a single degraded image. It is referred to as
"blind" because it does not assume prior knowledge about the blur kernel or the true image,
making it more challenging than non-blind deconvolution methods.

Blind deconvolution is widely used in applications such as astronomical imaging, microscopy,


and forensic image analysis.

The goal of blind deconvolution is to recover the original image that has been convolved with an
unknown blur kernel and corrupted by noise. The blur kernel represents the blurring effect
applied to the original image, which could be caused by factors such as defocus, motion blur, or
optical aberrations. The blur kernel defines how each pixel in the original image contributes to
the neighboring pixels in the degraded image. By estimating the blur kernel and applying its
inverse, the original image can be recovered.

Blind deconvolution is a highly ill-posed problem, meaning that multiple solutions can potentially
match the observed degraded image. Challenges in blind deconvolution include dealing with
noise amplification, handling complex and spatially varying blur kernels, and avoiding overfitting
or underfitting of the estimated blur kernel.
To enhance the performance of blind deconvolution, additional information can be incorporated
into the process, such as multiple degraded images with different blurs, multiple channels of the
same scene, or constraints based on the scene content or prior knowledge about the blur type.

The blind deconvolution process typically involves the following steps:


● Initialization:
○ Blind deconvolution algorithms usually start by initializing the blur kernel and the
restored image. Common initialization methods include assuming a known or
simple blur model, such as a Gaussian blur or a uniform blur, or using a default
kernel.
● Iterative Estimation:
○ The blind deconvolution algorithm iteratively estimates the blur kernel and the
original image. This is done by minimizing an objective function that represents
the difference between the observed degraded image.
● Alternating Optimization:
○ In each iteration, the algorithm alternates between estimating the blur kernel and
estimating the original image. The blur kernel estimation involves updating the
kernel based on its relationship with the current estimate of the original image.
● Regularization:
○ Blind deconvolution algorithms often incorporate regularization techniques to
improve the stability and robustness of the estimation process. Regularization
helps to constrain the solution space by introducing prior knowledge or
assumptions about the blur kernel and the original image. Common regularization
methods include total variation (TV) regularization, sparse priors, or smoothness
assumptions.
● Stopping Criteria:
○ The iterative estimation process continues until a stopping criterion is met. This
criterion can be based on the convergence of the objective function or reaching a
predefined number of iterations.

Lucy-Richardson Filtering
The Lucy-Richardson algorithm, also known as iterative deconvolution, is an iterative image
restoration technique used to recover images that have undergone blurring or convolution.

Lucy-Richardson filtering is widely used in applications such as astronomy, microscopy, and


remote sensing.

The Lucy-Richardson algorithm assumes a known point spread function (PSF) or blur kernel,
which represents the blurring effect applied to the original image. The algorithm aims to
iteratively estimate the original image by alternating between forward and backward filtering
operations.
The iterative nature of the Lucy-Richardson algorithm allows it to refine the estimate of the
original image gradually. It leverages the known PSF to deblur the image iteratively, attempting
to recover fine details and sharpness.

The performance of the Lucy-Richardson algorithm depends on factors such as the accuracy of
the known PSF, the number of iterations, and the presence of noise in the observed degraded
image. It is a relatively simple and computationally efficient method but can be sensitive to noise
and model mismatches

The steps involved in the Lucy-Richardson filtering algorithm are as follows:


1. Initialization:
The algorithm starts by initializing the restored image, often with a simple
estimate such as a uniformly gray image or a copy of the observed degraded
image.
2. Forward Filtering:
In the forward filtering step, the current estimate of the original image is
convolved with the known PSF. This operation simulates the blurring effect on the
image, producing a blurred estimate.
3. Ratio Calculation:
The ratio is calculated by dividing the observed degraded image by the blurred
estimate obtained in the forward filtering step. This ratio represents the relative
contribution of each pixel in the observed image to the estimated pixel intensity.
4. Backward Filtering:
In the backward filtering step, the ratio obtained in the previous step is convolved
with the transpose (or adjoint) of the PSF. This operation applies a reverse
blurring effect to the ratio, spreading the information from each pixel to its
neighboring pixels.
5. Update:
The current estimate of the original image is updated by multiplying it with the
result of the backward filtering operation. This step enhances the estimated
image by incorporating the information propagated from the ratio through the
backward filtering.
6. Iteration:
Steps 2 to 5 are repeated for a predefined number of iterations or until a
convergence criterion is met. The convergence criterion can be based on the
change in the estimated image between iterations or the overall improvement in
the restoration quality.

Wiener filtering
Wiener filtering, also known as the Wiener deconvolution, is a widely used image restoration
technique that aims to restore degraded images by minimizing the mean square error between
the original image and the restored image.
It is particularly effective when the degradation process and the statistical properties of the noise
are known or can be estimated accurately.

The Wiener filter operates in the frequency domain and utilizes a statistical approach to restore
the image. The filter is designed based on the power spectral densities (PSDs) of the original
image and the degradation process.

The key idea is to find a filter that minimizes the expected mean square error between the
estimated image and the true image

The Wiener filter optimally balances noise reduction and preservation of image details by
minimizing the mean square error. It exploits the statistical properties of the degradation process
and the image to achieve restoration. However, it assumes stationarity of the signal and noise
properties, which may not hold in practice.

Wiener filter depends on the accuracy of the estimated PSDs, the assumptions made about the
noise statistics, and the degradation model. If these assumptions are incorrect or inaccurate, the
Wiener filter may produce suboptimal results

The steps involved in the Wiener filtering algorithm are as follows:


● Estimation of the Power Spectral Densities (PSDs):
○ The first step is to estimate the power spectral densities of the original image and
the degradation process. The PSD represents the frequency characteristics of
the image and the degradation. If the original image is not available, an estimate
can be obtained using statistical techniques or by analyzing a set of similar
images.
● Calculation of the Wiener Filter Transfer Function:
○ The Wiener filter transfer function, H(w), is computed by dividing the cross power
spectrum of the original image and the degradation process by the power
spectrum of the degradation process. The transfer function determines the
frequency response of the filter and governs the restoration process.
● Restoration in the Frequency Domain:
○ The degraded image is transformed into the frequency domain using a Fourier
transform. The Fourier transform of the degraded image is multiplied by the
Wiener filter transfer function H(w) to obtain the frequency representation of the
restored image.
● Inverse Transformation:
○ The frequency domain representation of the restored image is transformed back
into the spatial domain using an inverse Fourier transform, resulting in the
restored image.

To enhance the performance of the Wiener filter, additional considerations can be taken into
account, such as regularization
Medical Image Processing
Medical image processing refers to the application of various computational techniques and
algorithms to analyze and interpret medical images for diagnostic, therapeutic, and research
purposes. It involves the acquisition, enhancement, segmentation, and analysis of images
obtained from various medical imaging modalities such as X-rays, computed tomography (CT),
magnetic resonance imaging (MRI), ultrasound, and positron emission tomography (PET),
among others.

Irrespective of the methods used, Medical image processing involves the following steps:
● Image Enhancement
● Image Segmentation
● Image Quantification
● Image Registration
● Visualization

Image Enhancement
● Medical images are very often corrupted by noise which occurs due to various sources
of interference. This noise affects the process of measurements of various factors that
could lead to serious change in diagnosis and treatment.
● Medical images also suffer from low contrast. Medical Image Enhancement aims at
resolving problems of low contrast and high-level noise in accurate diagnosis of
particular disease.
● In all such cases improvement in the visual quality of images helps to correctly interpret
the condition of the patient.
● Histogram equalization is often used to correct low contrast problems. Power law
transformation is used to correct non-uniform illumination issues.
● High frequency noise is reduced using carefully designed low pass filters. Filters could
be designed in spatial and frequency domain. MRI images suffer from noise and can be
improved using median filters.

● Image enhancement techniques are applied to improve the quality, clarity, and visual
appearance of medical images. The goal is to highlight important structures, reduce
noise, enhance contrast, and improve overall image interpretability. Common
enhancement techniques include filtering (such as noise reduction filters or
edge-enhancing filters), histogram equalization, contrast stretching, and image
sharpening.

Image Segmentation
● Image segmentation basically partitions and images into various regions. Medical Image
segmentation involves the extraction of regions of interest(ROI) from Medical Image.
Medical Image segmentation allows for more precise analysis of data by isolating only
those regions that are necessary for diagnosis.
● Image segmentation removes unwanted parts from a medical image allowing different
tissues such as bone and soft tissues to be isolated. segmentation also requires
classification of pixels and hence is treated as a pattern recognition problem.
● The most common approach to segmentation is Edge based segmentation and region
based segmentation.Thresholding is the easiest and most common technique used in
segmentation.
● Thresholding could either have a global threshold where a single threshold value
separates important objects within an image or one good use local thresholding by
splitting an image into sub images and calculating threshold for each sub image region.

● Image segmentation is the process of dividing an image into distinct regions or objects
based on their characteristics. It helps in isolating structures or areas of interest from the
background or other surrounding tissues. Segmentation can be performed using various
algorithms, such as thresholding, region growing, active contours (or snakes), clustering,
or machine learning-based approaches. Segmentation is crucial for tasks like organ
delineation, tumor detection, or measurement of specific structures.
● (refer segmentation from unit 3)

Image Quantification
● Image quantification involves extracting numerical or quantitative measurements from
medical images.
● It aims to derive meaningful and objective information from the image data.
Quantification techniques may involve measuring properties like size, shape, intensity,
texture, or other relevant features of structures or regions of interest.
● Medical image analysis requires fast,precise and repeatable Measurements. These
quantitative measurements help in addressing many aspect of the image data such as
tissue texture, size and density.
● These measurements can assist in diagnosing and monitoring diseases, assessing
treatment responses, or comparing different patient populations. Various algorithms and
methodologies are used for image quantification, including statistical analysis, pattern
recognition, or machine learning algorithms.

Image Registration
● Image registration is the process of aligning to or more images of the same scene. This
Is required for images obtained from CT scan and MRI since images from these
methods are stacked one over the other to give us 3D structures of the organs that are
being imaged.
● The process of registration involves designating one image as the reference and
applying geometric transformation to the other image so that they align with the
reference. Image registration is a prerequisite for all imaging applications that compare
datasets across subjects.
● Image registration is the process of aligning or matching two or more medical images
acquired from different modalities, time points, or perspectives. It is essential for
combining information from multiple images, tracking changes over time, or creating
image overlays for visualization or surgical planning. Registration algorithms aim to find
the spatial transformation that brings images into alignment by accounting for differences
in scale, rotation, translation, or deformation. Registration techniques can be rigid (for
rigid body alignment) or non-rigid (for accounting for deformations).

Visualization
● Visualization techniques are employed to present medical images and processed results
in an intuitive and informative manner.
● Visualization methods can range from simple 2D or 3D rendering of images to more
advanced techniques like volume rendering, surface rendering, or virtual reality-based
visualization.
● Visualization helps medical professionals better understand complex anatomical
structures, identify abnormalities, and assist in surgical planning or patient education. It
plays a vital role in conveying the information extracted from medical images effectively.

Satellite Image Processing


Satellite Image Processing is an Important field in research which consists of the images of
earth and other celestial objects using artificial satellites. Satellite Image Processing is a kind of
remote sensing which works on pixel resolutions to collect coherent information about the
earth's surface.

Remote Sensing Process


● Remote Sensing in general can be defined as acquiring information about an object
without being in direct contact with the object. Conventional Remote sensing can be
defined as a science of acquiring information about the Earth's surface without actually
being in contact with it.
● Remote sensing for satellite image processing refers to the use of satellite data to
extract information about the Earth's surface and atmosphere. It involves the acquisition,
preprocessing, analysis, and interpretation of satellite imagery for various applications.
Here is an overview of the remote sensing process for satellite image processing:
● This is done by sensing and recording reflected or emitted electromagnetic radiation
from the earth's surface. The sensors on board detect this emitted radiation from the
different areas of the earth's surface in different spectral regions. This energy is then
processed and analyzed to extract relevant information.
● Remote sensing is conducted from the space shuttle or, more commonly, from satellites.
Satellites are objects which revolve around the Earth.
● These satellites are placed in specific orbits relative to the earth in terms of altitude and
orientation. Because of their orbits, these satellites continue to revolve, permitting
repetitive coverage of the Earth's surface.
● Remote sensing typically uses electromagnetic radiation and involves an interaction
between incident radiation and the targets of interest. One of the common sources of
electromagnetic radiation is the Sun. The sun provides a very convenient source of
energy for remote sensing.
● In satellite image processing, passive and active remote sensing are two fundamental
techniques used to acquire information about the Earth's surface and atmosphere.
These techniques involve the use of satellites equipped with sensors to capture data
from the target area. However, they differ in how they interact with the target and
measure the reflected or emitted energy.

Passive Remote Sensing


Passive remote sensing relies on detecting natural energy emitted or reflected by the Earth's
surface and atmosphere. It measures the energy that is naturally present in the environment
without actively transmitting any signals. Passive sensors record the electromagnetic radiation
(EMR) coming from the target area. The most common source of EMR for passive remote
sensing is the Sun.

Passive sensors record the intensity and spectral characteristics of the EMR reflected or emitted
by different objects on the Earth's surface. The sensors capture the radiation across different
wavelengths, ranging from visible light to thermal infrared and even microwave regions. By
analyzing the patterns and properties of the captured EMR, scientists can gather valuable
information about land cover, vegetation, oceans, clouds, atmospheric conditions, and more.

Examples of passive remote sensing techniques include multispectral imaging, hyperspectral


imaging, and thermal imaging. Satellite systems like Landsat, MODIS, and Sentinel use passive
sensors to acquire data for various applications such as agriculture, urban planning, weather
forecasting, and environmental monitoring.

Active Remote Sensing


Active remote sensing involves the transmission of specific signals or pulses of energy from the
satellite towards the target area. The sensors in active remote sensing systems emit energy in
the form of microwaves, lasers, or radar signals and measure the reflected or backscattered
energy.

Active sensors measure the time it takes for the transmitted energy to return to the satellite,
allowing for the calculation of the distance between the satellite and the target. By analyzing the
properties of the returned energy, such as its intensity and phase, active remote sensing
provides valuable information about the shape, elevation, and surface properties of the target
area.

Some common active remote sensing techniques include radar imaging, lidar (light detection
and ranging), and synthetic aperture radar (SAR). These techniques are used for mapping
topography, monitoring ice cover, measuring vegetation height, detecting forest structure, and
studying geological features.
Unlike passive remote sensing, active remote sensing is not dependent on sunlight, making it
suitable for acquiring data in all weather and lighting conditions. However, active systems
require higher power consumption and sophisticated signal processing techniques.

Advantages:
● Wide Area Coverage: Remote sensing allows for data collection over large and
inaccessible areas.
● Temporal Coverage: Remote sensing provides information about changes and
dynamics over time.
● Multispectral and Multisensor Capability: Remote sensing systems capture data
across various wavelengths and use different sensors, enabling the analysis of multiple
spectral bands simultaneously.
● Cost-Effectiveness: Remote sensing can be a more economical option compared to
traditional ground-based surveys.
● Consistency and Standardization: Remote sensing follows standardized procedures,
ensuring consistent and repeatable measurements.
● Synoptic View: Remote sensing provides a comprehensive view of large-scale patterns
and features.
Limitations:
● Spatial and Spectral Resolution: Remote sensing systems have limitations in
capturing fine-scale details and complex spectral characteristics.
● Atmospheric Interference: The Earth's atmosphere can affect remote sensing data by
causing scattering, absorption, and reflection, leading to potential errors.
● Limited Penetration: Some remote sensing systems cannot penetrate through clouds,
vegetation, or dense canopies, limiting data collection in certain areas.
● Interpretation Complexity: Remote sensing data require advanced analysis techniques
and expertise for accurate interpretation.
● Lack of Ground Truth Validation: Ground truth data may be challenging to obtain,
leading to uncertainties in interpreting remote sensing data.
● Data Availability and Cost: High-quality remote sensing data may have limited
accessibility and be expensive to acquire and process

Photogrammetric Imaging Devices


Photogrammetric imaging devices are used to capture images of objects or scenes for the
purpose of extracting three-dimensional information. These devices can be used in a variety of
applications, including surveying, mapping, and 3D modeling.

There are two main types of photogrammetric imaging devices: cameras and scanners.
Cameras use lenses to project images of objects onto a light-sensitive surface, such as film or a
digital sensor. Scanners use a beam of light to scan objects and create a digital representation
of their surface.
Cameras are the most common type of photogrammetric imaging device. They can be used to
capture images from a variety of platforms, including ground, air, and space. Cameras can be
used to create a variety of photogrammetric products, including orthophotos, digital elevation
models, and 3D models.

Scanners are used to create more detailed representations of objects than cameras. They can
be used to capture images of objects that are too small or too large to be captured by a camera.
Scanners can also be used to capture images of objects that are in motion.

The choice of photogrammetric imaging device depends on the specific application. For
example, cameras are typically used for surveying and mapping, while scanners are typically
used for 3D modeling.

Photogrammetric imaging devices are used in satellite image processing to create


three-dimensional models of the Earth's surface. These models can be used for a variety of
purposes, including:
● Mapping: Photogrammetric models can be used to create detailed maps of the Earth's
surface. These maps can be used for a variety of purposes, such as navigation, land use
planning, and environmental monitoring.
● Change detection: Photogrammetric models can be used to track changes in the
Earth's surface over time. This information can be used to monitor natural disasters,
such as floods and earthquakes, and to track human-caused changes, such as
deforestation and urban growth.
● 3D modeling: Photogrammetric models can be used to create three-dimensional models
of objects or scenes. These models can be used for a variety of purposes, such as
virtual tourism, architectural design, and product visualization.

There are two main types of photogrammetric imaging devices used in satellite image
processing:
● Single-image photogrammetry: Single-image photogrammetry uses a single image to
create a three-dimensional model of the Earth's surface. This is done by using the
known geometry of the camera and the image to calculate the distance to each point in
the image.
● Multi-image photogrammetry: Multi-image photogrammetry uses multiple images to
create a three-dimensional model of the Earth's surface. This is done by using the
known geometry of the camera and the images to calculate the distance to each point in
the images.

Multi-image photogrammetry is more accurate than single-image photogrammetry, but it


requires more images. Single-image photogrammetry is less accurate, but it can be used to
create three-dimensional models from a single image.
Hyperspectral Sensing
Hyperspectral sensing is a technique used in satellite image processing that involves capturing
and analyzing a wide range of narrow, contiguous spectral bands to provide detailed information
about the Earth's surface and its properties.

In traditional remote sensing, sensors capture data in a few broad spectral bands, such as red,
green, and blue. In contrast, hyperspectral sensors measure the reflected or emitted energy
across hundreds of narrow and contiguous spectral bands, covering a much broader portion of
the electromagnetic spectrum. Each spectral band corresponds to a specific wavelength,
allowing for detailed analysis of the spectral signatures of different materials and features on the
Earth's surface.

The high spectral resolution of hyperspectral sensing enables the identification and
characterization of subtle variations in the reflectance or emission patterns of objects. This rich
spectral information can be used to discriminate between materials with similar visual
appearances but different spectral characteristics. It enables the detection and classification of
specific minerals, vegetation types, water bodies, pollution sources, and other features that
might be indistinguishable in lower spectral resolution images.

Hyperspectral data analysis involves several steps, including preprocessing, spectral signature
extraction, and classification. Preprocessing techniques correct for atmospheric effects, sensor
artifacts, and radiometric calibration to enhance the quality of the data. Spectral signature
extraction involves identifying unique spectral patterns associated with different materials or
land cover classes. Classification algorithms are then applied to categorize the image pixels into
different classes based on their spectral signatures, allowing for mapping and analysis of
specific features.

Applications of hyperspectral sensing in satellite image processing include agriculture,


environmental monitoring, mineral exploration, urban planning, and disaster assessment. For
example, it can help monitor crop health, detect invasive species, identify mineral deposits, map
land cover changes, and assess water quality.

However, there are some challenges associated with hyperspectral sensing, such as the large
volume of data generated, the need for advanced data processing and analysis techniques, and
limitations in spatial resolution. Additionally, atmospheric effects and sensor noise can affect the
accuracy of hyperspectral data, requiring careful calibration and correction procedures.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy