Ip All Units
Ip All Units
● Image segmentation plays a crucial role in various fields, including computer vision,
medical imaging, robotics, and autonomous driving. By segmenting an image, we can
extract specific objects or regions of interest, separate foreground from background,
identify boundaries, and extract meaningful features for subsequent analysis.
● Typical steps in image analysis, in which the image segmentation is the first step in the
workflow.
● There are several commonly used techniques for image segmentation, including:
○ Thresholding: This technique assigns pixels to different segments based on a
predefined threshold value applied to a specific image attribute, such as
grayscale intensity or color channel values.
○ Edge-based segmentation: It relies on detecting and linking edges, which are
sharp transitions in intensity or color, to segment the image. Edge detection
algorithms, such as the Sobel operator or Canny edge detector, are commonly
employed.
○ Region-based segmentation: This technique groups pixels into regions based
on their similarity in terms of color, texture, or other features. Region growing and
region splitting/merging are popular approaches within this category.
○ Clustering: Clustering algorithms, such as k-means or Gaussian mixture models,
are used to group pixels into clusters based on their similarity in feature space.
Each cluster represents a segment in the image.
○ Watershed transform: Inspired by hydrology, this technique treats the image as
a topographic surface and simulates flooding to segment regions based on
catchment basins.
○ Graph-based segmentation: This method represents the image as a graph,
where pixels are nodes, and edges represent connections. Graph algorithms like
normalized cuts or minimum spanning trees are applied to partition the graph into
segments.
Thresholding is one of the segmentation techniques that generates a binary image (a binary
image is one whose pixels have only two values – 0 and 1 and thus requires only one bit to
store pixel intensity) from a given grayscale image by separating it into two regions based on a
threshold value. Hence pixels having intensity values greater than the said threshold will be
treated as white or 1 in the output image and the others will be black or 0.
So the output segmented image has only two classes of pixels – one having a value of 1 and
others having a value of 0.
If the threshold T is constant in processing over the entire image region, it is said to be global
thresholding. If T varies over the image region, we say it is variable thresholding.
Multiple-thresholding classifies the image into three regions. The histogram in such cases
shows three peaks and two valleys between them.
Global Thresholding
Global thresholding involves selecting a single threshold value that separates the image into
foreground and background regions. All pixels with values above the threshold are assigned to
one class (foreground), while those below the threshold are assigned to the other class
(background). Global thresholding assumes that the foreground and background have distinct
intensity distributions.
When the intensity distribution of objects and background are sufficiently distinct, it is possible to
use a single or global threshold applicable over the entire image.
The basic global thresholding algorithm iteratively finds the best threshold value so segmenting.
This algorithm works well for images that have a clear valley in their histogram. The larger the
value of δ, the smaller the number of iterations. The initial estimate of T can be made equal to
the average pixel intensity of the entire image.
Variable/Local Thresholding
Variable thresholding is useful when the image contains variations in lighting or contrast across
different regions. Instead of using a single global threshold for the entire image, variable
thresholding applies different thresholds to different regions of the image. This is achieved by
dividing the image into smaller regions and calculating a threshold value for each region based
on local image statistics, such as mean or median intensity.
There are broadly two different approaches to local thresholding. One approach is to partition
the image into non-overlapping rectangles. Then the techniques of global thresholding or Otsu’s
method are applied to each of the sub-images. The methods of global thresholding are applied
to each sub-image rectangle by assuming that each such rectangle is a separate image in itself.
The other approach is to compute a variable threshold at each point from the neighborhood
pixel properties
Otsu's Thresholding
Otsu's thresholding is an optimal thresholding method that automatically determines the
threshold value based on the image's histogram. It minimizes the intra-class variance,
effectively finding the threshold that maximizes the inter-class separability. Otsu's method is
particularly useful when the image contains multiple intensity peaks or when the foreground and
background intensities overlap.
Edge Types
Edges in a digital image can be classified into different types based on their characteristics. The
classification of edges is often done to provide a more detailed analysis of the image content.
Edge detection is used to detect the location and presence of edges by making changes in the
intensity of an image. Different operations are used in image processing to detect edges. It can
detect the variation of gray levels but it quickly gives response when a noise is detected. In
image processing, edge detection is a very important task. Edge detection is the main tool in
pattern recognition, image segmentation and scene analysis.
An edge can be defined as a set of connected pixels that forms a boundary between two disjoint
regions. There are three types of edges:
- Horizontal edges
- Vertical edges
- Diagonal edges
Gradient Detection:
It works by computing the gradient of the image, which is a measure of how quickly the intensity
of the image changes at each point. Edges are found at points where the gradient is large.
The most common gradient-based operator is the Sobel operator. It consists of two separate
masks (one for horizontal gradients and one for vertical gradients) that are convolved with the
image. The Sobel operator computes the gradient magnitude by combining the horizontal and
vertical gradients and calculates the gradient direction as the arctangent of the vertical gradient
divided by the horizontal gradient.
Gaussian Detection:
The Laplacian operator is a 3x3 kernel that is convolved with the image to compute the second
derivative. The Laplacian operator can be used to detect edges in both grayscale and color
images.
The Laplacian operator, also known as the Laplacian filter or the Laplacian of Gaussian (LoG)
operator, is a mathematical operator used in image processing for edge detection and image
enhancement. It calculates the second derivative of an image to identify areas of rapid intensity
changes, which often correspond to edges or boundaries between objects in the image.
When the Laplacian operator is convolved with an image, the resulting image highlights the
regions with significant intensity changes as positive or negative values. Positive values indicate
areas of rapid increase in intensity (bright-to-dark transitions), while negative values indicate
areas of rapid decrease in intensity (dark-to-bright transitions).
The Laplacian edge detection algorithm is relatively simple to implement and can be used to
detect edges in real time. However, it is not as robust to noise as some other edge detection
techniques.
The Canny edge detection algorithm works by first smoothing the image to reduce noise. Then,
the image is filtered with a high-pass filter to detect edges. Finally, the edges are sharpened and
non-maximum suppression is applied to remove spurious edges.
The Canny edge detection algorithm is more computationally expensive than the Laplacian
edge detection algorithm, but it produces a better edge map.
1. Gradient-based Methods:
Gradient-based edge detection methods utilize the first derivative of the image intensity
to detect edges. The magnitude and direction of the gradient at each pixel indicate the
strength and orientation of the intensity change, respectively. The following steps outline
how derivatives are obtained in gradient-based methods:
The resulting gradient magnitude and direction images provide information about the
strength and orientation of the detected edges, respectively.
2. Laplacian-based Methods:
Laplacian-based edge detection methods utilize the second derivative of the image
intensity to identify edges. The Laplacian operator highlights areas with rapid changes in
intensity. Here's how derivatives are obtained in Laplacian-based methods:
Edge Linking
Edge linking, also known as edge contour tracing or edge connection, is a post-processing step
in edge detection algorithms that aims to connect the detected edge pixels to form continuous
contours or curves. The initial edge detection step often results in isolated edge pixels or
fragmented edge segments, and edge linking helps in reconstructing complete edges or
contours for further analysis or visualization.
The edge linking process involves examining the neighboring pixels of an edge pixel and
determining if they belong to the same edge. By iteratively traversing the edge pixels,
connectivity is established, and a continuous contour or curve is formed.
There are different algorithms and approaches for edge linking, but a commonly used method is
the 8-connectivity approach, where each edge pixel is connected to its eight neighboring pixels.
The steps involved in edge linking using the 8-connectivity approach are as follows:
● Initialization:
○ Start with an edge pixel from the detected edge map.
○ Mark the selected pixel as part of a contour or curve and add it to a contour list.
● Neighbor Examination:
○ Examine the neighboring pixels (including diagonals) of the current pixel to
determine if they belong to the same edge.
○ If a neighboring pixel is also an edge pixel and has not been visited before, mark
it as part of the contour and add it to the contour list.
○ Continue this process for all neighboring pixels.
● Contour Tracing:
○ Choose one of the neighboring pixels that belong to the contour as the new
current pixel.
○ Repeat the neighbor examination step for the new current pixel, adding
neighboring pixels to the contour if they meet the criteria.
○ Continue this process until there are no more unvisited neighboring pixels that
meet the criteria.
● Contour Completion:
○ Once all connected edge pixels have been traversed, the contour is complete.
○ Store the contour information for further processing or visualization.
○ Repeat the process for any remaining unvisited edge pixels to find additional
contours.
The result of the edge linking process is a set of connected contours or curves that represent
the continuous edges in the image. These contours can be further utilized for various
applications such as shape analysis, object recognition, or region-based segmentation.
It's worth noting that edge linking algorithms may incorporate additional techniques to handle
challenges such as gaps in edges, noise, or branching structures. Techniques like gap filling,
noise removal, or branch merging may be applied during the edge linking process to improve
the continuity and accuracy of the resulting contours.
Hough Transform
The Hough Transform is a feature extraction approach in image processing and computer vision
used for detecting and extracting geometric shapes or patterns in images, especially lines and
curves. It was initially developed for line detection but has since been extended to detect other
shapes like circles or ellipses.
The Hough Transform operates on the principle of parameter space representation. Instead of
directly detecting shapes in the image space, it transforms the image space into a parameter
space, where the parameters represent the properties of the desired shapes. By accumulating
votes in the parameter space, the Hough Transform identifies the most likely parameter
combinations, corresponding to the shapes present in the image.
Its purpose is to find imperfect instances of objects within a certain class of shapes by a voting
procedure. This voting procedure is carried out in a parameter space, from which objects are
obtained as local maxima in an accumulator space that is explicitly constructed by the algorithm
for computing the Hough transform.
The classical Hough transform is most commonly used for the detection of regular curves such
as lines, circles, ellipses, etc. The main advantage of the Hough transform technique is that it is
tolerant of gaps in feature boundary descriptions and is relatively unaffected by image noise.
The Hough transform works by computing a global description of a feature(s) given local
measurements. For example, when detecting lines in an image, the Hough transform maps
points in the image to curves in the Hough parameter space. When viewed in Hough parameter
space, points which are collinear in image become clear.
Watershed Transform
The Watershed Transform is a powerful image processing technique primarily used for object
segmentation. It is based on the concept of treating an image as a topographic map, where the
brightness of each point represents its height. The algorithm finds lines that run along the tops
of ridges. It is commonly used for segmenting images with complex or irregular regions.
The Watershed Transform is particularly useful for segmenting images with complex structures,
such as overlapping objects, touching boundaries, or irregular shapes. It is commonly used in
various applications, including medical image analysis, object detection, and image-based
measurements.
Clustering Techniques
In this type of segmentation, we try to cluster the pixels that are together. There are two
approaches for performing the Segmentation by clustering.
- Clustering by Merging
- Clustering by Divisive
Clustering techniques in image segmentation are used to group similar pixels or regions
together based on their pixel values or other image features. These techniques aim to partition
an image into distinct regions or objects by identifying similarities and differences in the pixel
characteristics.
K-means clustering is a very popular clustering algorithm which is applied when we have a
dataset with labels unknown. The goal is to find certain groups based on some kind of similarity
in the data with the number of groups represented by K. This algorithm is generally used in
areas like market segmentation, customer segmentation, etc. But, it can also be used to
segment different objects in the images on the basis of the pixel values.
A good way to find the optimal value of K is to brute force a smaller range of values (1-10) which
is popularly known as the Elbow method. The point where the graph is sharply bent downward
can be considered the optimal value of K.
Region approach
This process involves dividing the image into smaller segments that have a certain set of rules.
This technique employs an algorithm that divides the image into several components with
common pixel characteristics. The process looks out for chunks of segments within the image.
Small segments can include similar pixes from neighboring pixels and subsequently grow in
size. The algorithm can pick up the gray level from surrounding pixels
Advantage:
● Region growing method can correctly Separate the region.
● Region growing Method can provide original images which have clear edges with good
segmentation results.
● We only need a small number of seeds to represent the property we want.
● The process of region growing can be controlled by setting appropriate seed points,
similarity criteria, and stopping conditions. This allows users to fine-tune the
segmentation output based on their requirements and domain knowledge
● Region growing is a straightforward and easy-to-understand segmentation technique. It
is based on the concept of growing regions from seeds, which is intuitive and can be
easily implemented
● Region growing can handle irregularly shaped regions and objects
Disadvantage:
● Computational expensive.
● It is a local method with no Global view of the problem.
● The choice of seed points or regions in region growing can significantly affect the
segmentation result. Incorrect or inappropriate seed selection may lead to
under-segmentation or over-segmentation.
● Region growing algorithms often require the specification of similarity criteria and
stopping conditions. The performance and accuracy of the segmentation heavily depend
on selecting appropriate threshold values for these parameters
● Region growing can be sensitive to noise and may generate spurious regions or false
detections
● Region growing may struggle to handle overlapping regions or objects
Region Growing
Region growing is a simple region-based image segmentation method. It is also classified as a
pixel-based image segmentation method since it involves the selection of initial seed points.
Region growing is a common technique used in region-based segmentation. It starts with seed
pixels or regions and expands them by adding neighboring pixels that meet certain similarity
criteria.
The similarity criteria can be based on color, intensity, texture, or other image features. Pixels
that satisfy the similarity criteria are added to the growing region, and the process continues
until the region stops expanding or reaches a predefined stopping condition
Region growing based techniques are better than edge based techniques in noisy images.
Region splitting
Region splitting and merging is a region-based segmentation technique that involves dividing an
image into smaller regions through splitting and then merging them based on specific criteria.
The initial image is considered as a single region. The splitting process involves dividing the
region into smaller sub-regions based on predefined splitting criteria. The splitting criteria can be
based on intensity variations, texture differences, or other image properties. Common methods
for region splitting include thresholding, local variance analysis, or edge detection. The splitting
operation continues recursively until specific conditions are met, such as reaching a desired
number of regions or satisfying certain homogeneity criteria.
After the region splitting phase, the resulting image consists of multiple smaller regions. In the
region merging phase, adjacent regions are merged based on similarity criteria to form larger,
more coherent regions. The merging criteria can be based on the similarity of color, texture,
intensity, or other image features. Different merging techniques can be applied, such as
comparing the statistical properties of neighboring regions or measuring the similarity between
region boundaries. Merging continues until no further regions can be merged or until predefined
stopping conditions are met.
Region splitting and merging continue iteratively until certain stopping conditions are satisfied.
These conditions can be predefined thresholds for the number of regions, minimum region size,
or specific homogeneity measures. The stopping conditions help prevent over-segmentation or
under-segmentation and control the granularity of the final segmentation output.
Image compression is the process of reducing the size of an image file without significantly
degrading its visual quality. It is an essential technique used in various applications, such as
digital photography, image storage, transmission over networks, and multimedia systems. The
primary goal of image compression is to minimize the file size while preserving the essential
information and perceptual fidelity of the image.
The compression of images is an important step before we start the processing of larger images
or videos. The compression of images is carried out by an encoder and output a compressed
form of an image. In the processes of compression, the mathematical transforms play a vital
role.
Need
Image compression is essential in our digital lives because large image files can cause slow
website loading times, difficulties in sharing images online, and limited storage space. By
compressing images, their size is reduced, making it easier to store and transmit them.
1. Reduced Storage Requirements: Image compression reduces file sizes, allowing for
efficient storage of images on devices with limited storage capacity.
2. Bandwidth Efficiency: Compressed images require less bandwidth, resulting in faster
upload and download times during image transmission over networks.
3. Faster Processing: Smaller file sizes from compression lead to faster loading times and
improved performance in image processing tasks.
4. Cost Reduction: Compression reduces storage, network infrastructure, and data
transfer costs associated with images.
5. Improved User Experience: Smaller compressed images result in quicker website
loading, enhanced multimedia streaming, and better user satisfaction.
6. Compatibility: Compression adapts images to meet the size and format limitations of
different devices and platforms.
7. Archiving and Preservation: Image compression reduces storage requirements,
making it more feasible to archive and preserve large collections of images over time.
Classification
There are two main types of image compression: lossless compression and lossy compression.
Lossless Compression
● Lossless compression algorithms reduce the file size of an image without any loss of
information. The compressed image can be perfectly reconstructed to its original form.
● This method is commonly used in scenarios where preserving every detail is crucial,
such as medical imaging or scientific data analysis
● Lossless compression achieves compression by exploiting redundancy and eliminating
repetitive patterns in the image data.
● Some common lossless compression algorithms include:
○ Run-Length Encoding (RLE): This algorithm replaces consecutive repetitions of
the same pixel value with a count and the pixel value itself.
○ Huffman coding: It assigns variable-length codes to different pixel values based
on their frequency of occurrence in the image.
○ Lempel-Ziv-Welch (LZW): This algorithm replaces repetitive sequences of pixels
with shorter codes, creating a dictionary of commonly occurring patterns.
● Lossless compression techniques typically achieve modest compression ratios
compared to lossy compression but ensure exact data preservation.
● Types of lossless images include:
○ RAW - these files types tend to be quite large in size. Additionally, there are
different versions of RAW, and you may need certain software to edit the files.
○ PNG - Compresses images to keep their small size by looking for patterns on a
photo, and compressing them together. The compression is reversible, so once
you open a PNG file, the image recovers exactly.
○ BMP - A format found exclusively to Microsoft. It's lossless, but not frequently
used.
Lossy Compression
● Lossy compression algorithms achieve higher compression ratios by discarding some
information from the image that is less perceptually significant.
● This method is widely used in applications such as digital photography, web images, and
multimedia streaming, where a small loss in quality is acceptable to achieve significant
file size reduction.
● Lossy compression techniques exploit the limitations of human visual perception and the
characteristics of natural images to remove or reduce redundant or less noticeable
details
● The algorithms achieve this by performing transformations on the image data and
quantizing it to reduce the number of distinct values.
● The main steps involved in lossy compression are:
○ Transform Coding: The image is transformed from the spatial domain to a
frequency domain using techniques like Discrete Cosine Transform (DCT) or
Wavelet Transform. These transforms represent the image data in a more
compact manner by concentrating the energy in fewer coefficients.
○ Quantization: In this step, the transformed coefficients are quantized, which
involves reducing the precision or dividing the range of values into a finite set of
discrete levels. Higher levels of quantization lead to greater compression but also
more loss of information. The quantization process is typically designed to
allocate more bits to visually important coefficients and fewer bits to less
important ones.
○ Entropy Encoding: The quantized coefficients are further compressed using
entropy coding techniques like Huffman coding or Arithmetic coding. These
coding schemes assign shorter codes to more frequently occurring coefficients,
resulting in additional compression.
● The amount of compression achieved in lossy compression is customizable based on
the desired trade-off between file size reduction and visual quality. Different compression
algorithms and settings can be used to balance the compression ratio and the
perceptual impact on the image.
Methods of compression
Run-length Coding
Run-length coding is a simple and effective technique used in image compression, especially for
scenarios where the image contains long sequences of identical or highly similar pixels. It
exploits the redundancy present in such sequences to achieve compression.
The basic idea behind run-length coding is to represent consecutive repetitions of the same
pixel value with a count and the pixel value itself, instead of explicitly storing each pixel
individually. By doing so, run-length coding reduces the amount of data required to represent
these repetitive patterns.
Say you have a picture of red and white stripes, and there are 12 white pixels and 12 red pixels.
Normally, the data for it would be written as WWWWWWWWWWWWRRRRRRRRRRRR, with
W representing the white pixel and R the red pixel. Run length would put the data as 12W and
12R. Much smaller and simpler while still keeping the data unaltered.
Run-length Decoding:
● Retrieving the Encoded Data: The encoded run-length data is retrieved from storage.
● Decoding: The decoding process involves reconstructing the original image from the
run-length data. Starting from the first <count, value> pair, the algorithm repeats the
value count times to obtain the sequence of pixels. This process is repeated for each
<count, value> pair until the entire image is reconstructed.
The decoding step essentially reverses the encoding process, reconstructing the original
image by expanding the compressed run-length representation.
The original data isn't instantly accessible, we have to decode everything before you can access
anything. Also You can't tell how large the decoded data will be.
Run-length coding is particularly effective for images with areas of solid color or regions with
uniform patterns, such as line drawings, text, or simple graphics. However, it may not be as
efficient for more complex and detailed images, as they tend to have fewer long runs of identical
pixels.
Run-length coding is often used in conjunction with other compression techniques, such as
Huffman coding or arithmetic coding, to achieve higher compression ratios. By combining
run-length coding with these entropy encoding techniques, the frequency of occurrence of
different runs or pixel values can be exploited to assign shorter codes to more frequent patterns,
resulting in additional compression.
It's important to note that if the probabilities of two or more pixel values are the same, a different
mechanism may be employed to ensure the prefix-free property. For example, in such cases,
the pixel values can be sorted based on their original order in the image or some other
tie-breaking rule.
Shannon-Fano coding is a basic technique that assigns codes based on probabilities but does
not guarantee the optimal code lengths. Huffman coding, which is an extension of
Shannon-Fano coding, provides a more efficient encoding scheme by considering the
probabilities and constructing a binary tree where shorter codes are assigned to more frequent
symbols.
Huffman coding
Huffman coding is a widely used entropy encoding technique for image compression. It assigns
variable-length codes to different symbols (in this case, pixel values) based on their probabilities
or frequencies of occurrence. Huffman coding achieves efficient compression by assigning
shorter codes to more frequently occurring symbols.
Huffman coding achieves efficient compression by assigning shorter codes to more frequently
occurring symbols, which results in a reduction of the overall number of bits required to
represent the image. This technique is widely used in image compression algorithms such as
JPEG (Joint Photographic Experts Group) and is known for its simplicity and effectiveness in
achieving good compression ratios while preserving visual quality
Scalar quantization provides a simple and efficient means of reducing the number of bits
required to represent an image. However, it may introduce quantization errors and loss of fine
details since each pixel is quantized independently.
Vector Quantization:
Vector quantization (VQ) is a technique used in image compression that extends the concept of
scalar quantization by grouping multiple pixels together into blocks or vectors. It aims to capture
the statistical dependencies and similarities among neighboring pixels, resulting in improved
compression performance and preservation of local image features.
The MPEG compression standard is used in various formats such as MPEG-1, MPEG-2,
MPEG-4, and MPEG-7, each offering different levels of compression efficiency and supporting
different video applications, from low-quality video streaming to high-definition video storage.
Video compression
Video compression is the process of reducing the size of video data while maintaining an
acceptable level of visual quality. It involves applying various techniques to exploit spatial and
temporal redundancies in video sequences, resulting in efficient storage, transmission, and
streaming of videos.
Video compression techniques aim to achieve a balance between compression efficiency and
visual quality. The choice of compression algorithm, parameters, and settings depends on the
specific requirements, such as target bitrate, resolution, desired quality, and available resources.
Object recognition can be approached using different techniques, ranging from traditional
computer vision methods to more advanced deep learning-based approaches. Traditional
methods often rely on handcrafted features and classifiers, while deep learning methods
leverage the power of neural networks to automatically learn discriminative features and
classifiers from large amounts of labeled data.
Computer Vision
Computer vision is a field of study and research that focuses on enabling computers to gain a
high-level understanding of visual information from digital images or video. It involves
developing algorithms and techniques that allow computers to analyze, interpret, and make
sense of visual data, mimicking human visual perception and understanding.
● Template Matching:
○ Template matching compares a predefined template image with sub-regions of
the input image to find matching patterns. It involves calculating the similarity
between the template and image patches using metrics like correlation or sum of
squared differences. Template matching is straightforward but can be sensitive to
variations in scale, rotation, and lighting conditions.
● Feature-Based Methods:
○ Feature-based methods extract distinctive features from images and use them to
recognize objects. Examples of feature descriptors include Scale-Invariant
Feature Transform (SIFT), Speeded-Up Robust Features (SURF), and Oriented
FAST and Rotated BRIEF (ORB). These methods detect keypoints in images and
compute descriptors that represent the local visual characteristics of the
keypoints. Object recognition is then performed by matching and comparing
these features across images.
● Deep Learning:
○ Deep learning, particularly convolutional neural networks (CNNs), has
revolutionized object recognition. CNNs are capable of automatically learning
hierarchical features from raw image data. Training involves feeding labeled
images to the network, and it learns to recognize objects by adjusting the weights
of its layers. Deep learning-based object recognition models, such as YOLO (You
Only Look Once), Faster R-CNN (Region-based Convolutional Neural Networks),
and SSD (Single Shot MultiBox Detector), have achieved impressive results in
terms of accuracy and real-time performance.
● Histogram-based Methods:
○ Histogram-based methods utilize color and texture information to recognize
objects. These methods analyze the distribution of color or texture features in
images and use statistical measures to compare and classify objects. Examples
include color histograms, local binary patterns (LBPs), and histogram of oriented
gradients (HOG). Histogram-based methods are effective for simple object
recognition tasks but may struggle with complex scenes or object variations.
● Ensemble Techniques:
○ Ensemble techniques combine multiple object recognition models or classifiers to
improve overall performance. This can involve techniques such as ensemble
averaging, boosting, or bagging. By combining the predictions of multiple models,
ensemble techniques can enhance robustness, accuracy, and generalization of
object recognition systems.
Introduction
The principal goal of restoration techniques is to improve an image in some predefined sense.
Although there are areas of overlap, image enhancement is largely a subjective process, while
restoration is for the most part an objective process
Restoration attempts to recover an image that has been degraded by using a priori knowledge
of the degradation phenomenon. Thus, restoration techniques are oriented toward modeling the
degradation and applying the inverse process in order to recover the original image.
Restoration improves image in some predefined sense. Image enhancement techniques are
subjective processes, whereas image restoration techniques are objective processes.
The restoration approach usually involves formulating a criterion of goodness that will yield an
optimal estimate of the desired result, while enhancement techniques are heuristic procedures
to manipulate an image in order to take advantage of the human visual system.
Some restoration techniques are best formulated in the spatial domain, while others are better
suited for the frequency domain.
Image degradation is the loss of image quality for a variety of reasons. When there is image
deterioration, the quality of the image is greatly diminished and becomes hazy.
Image restoration is the process of enhancing or upgrading an image’s quality with the aid of
picture editing software.
1. Degradation function/model:
The degradation function/model represents the processes that cause image
degradation. It simulates various factors that can affect an image, such as noise,
blurring, compression artifacts, and other distortions. These factors can occur during
image acquisition, transmission, or storage. The degradation model aims to replicate
these effects to create degraded versions of the original image.
Noise Models
The principle sources of noise in digital image are due to image acquisition and
transmission.
● During image acquisition, the performance of image sensors gets affected by a variety of
factors such as environmental conditions and the quality of sensing elements.
● During image transmission, the images are corrupted due to the interference introduced
in the channel used for transmission.
The Noise components are considered as random variables, characterized by a probability
density function.
Gaussian Noise
Because of its mathematical simplicity, the Gaussian noise model is often used in practice and
even in situations where they are marginally applicable at best.
Gaussian noise arises in an image due to factors such as electronic circuit noise and sensor
noise due to poor illumination or high temperature
The Gaussian distribution is one of the most widely used probability density functions (PDFs) to
model noise. It is characterized by a bell-shaped curve and is often used to represent additive
white Gaussian noise (AWGN)
The PDF of a Gaussian distribution is defined by its mean and standard deviation.
Gaussian noise is symmetric around the mean and has a flat power spectral density.
Rayleigh Noise
Rayleigh noise is usually used to characterize noise phenomena in range imaging.
The Rayleigh distribution is commonly used to model noise in radar and ultrasound imaging.
It is characterized by a non-negative skewness and a right-skewed shape
The PDF of a Rayleigh distribution is defined by its scale parameter, which determines the
spread of the distribution.
Exponential Noise
The exponential distribution is commonly used to model Poisson noise, which arises from
photon counting in low-light imaging scenarios. Poisson noise occurs when the number of
events in a fixed interval follows a Poisson process. The PDF of an exponential distribution is
characterized by its rate parameter, which determines the decay rate of the distribution.
Uniform Noise
The uniform distribution represents noise that is equally likely to occur within a specified range.
In image processing, uniform noise is often used to model quantization noise, which occurs
when continuous values are discretized into a limited number of levels. The PDF of a uniform
distribution is a constant within a specified range and zero outside that range.
Uniform noise is not practically present but is often used in numerical simulations to analyze
systems.
Blind Deconvolution
Blind deconvolution is a challenging image restoration technique that aims to estimate both the
unknown blur kernel and the original image from a single degraded image. It is referred to as
"blind" because it does not assume prior knowledge about the blur kernel or the true image,
making it more challenging than non-blind deconvolution methods.
The goal of blind deconvolution is to recover the original image that has been convolved with an
unknown blur kernel and corrupted by noise. The blur kernel represents the blurring effect
applied to the original image, which could be caused by factors such as defocus, motion blur, or
optical aberrations. The blur kernel defines how each pixel in the original image contributes to
the neighboring pixels in the degraded image. By estimating the blur kernel and applying its
inverse, the original image can be recovered.
Blind deconvolution is a highly ill-posed problem, meaning that multiple solutions can potentially
match the observed degraded image. Challenges in blind deconvolution include dealing with
noise amplification, handling complex and spatially varying blur kernels, and avoiding overfitting
or underfitting of the estimated blur kernel.
To enhance the performance of blind deconvolution, additional information can be incorporated
into the process, such as multiple degraded images with different blurs, multiple channels of the
same scene, or constraints based on the scene content or prior knowledge about the blur type.
Lucy-Richardson Filtering
The Lucy-Richardson algorithm, also known as iterative deconvolution, is an iterative image
restoration technique used to recover images that have undergone blurring or convolution.
The Lucy-Richardson algorithm assumes a known point spread function (PSF) or blur kernel,
which represents the blurring effect applied to the original image. The algorithm aims to
iteratively estimate the original image by alternating between forward and backward filtering
operations.
The iterative nature of the Lucy-Richardson algorithm allows it to refine the estimate of the
original image gradually. It leverages the known PSF to deblur the image iteratively, attempting
to recover fine details and sharpness.
The performance of the Lucy-Richardson algorithm depends on factors such as the accuracy of
the known PSF, the number of iterations, and the presence of noise in the observed degraded
image. It is a relatively simple and computationally efficient method but can be sensitive to noise
and model mismatches
Wiener filtering
Wiener filtering, also known as the Wiener deconvolution, is a widely used image restoration
technique that aims to restore degraded images by minimizing the mean square error between
the original image and the restored image.
It is particularly effective when the degradation process and the statistical properties of the noise
are known or can be estimated accurately.
The Wiener filter operates in the frequency domain and utilizes a statistical approach to restore
the image. The filter is designed based on the power spectral densities (PSDs) of the original
image and the degradation process.
The key idea is to find a filter that minimizes the expected mean square error between the
estimated image and the true image
The Wiener filter optimally balances noise reduction and preservation of image details by
minimizing the mean square error. It exploits the statistical properties of the degradation process
and the image to achieve restoration. However, it assumes stationarity of the signal and noise
properties, which may not hold in practice.
Wiener filter depends on the accuracy of the estimated PSDs, the assumptions made about the
noise statistics, and the degradation model. If these assumptions are incorrect or inaccurate, the
Wiener filter may produce suboptimal results
To enhance the performance of the Wiener filter, additional considerations can be taken into
account, such as regularization
Medical Image Processing
Medical image processing refers to the application of various computational techniques and
algorithms to analyze and interpret medical images for diagnostic, therapeutic, and research
purposes. It involves the acquisition, enhancement, segmentation, and analysis of images
obtained from various medical imaging modalities such as X-rays, computed tomography (CT),
magnetic resonance imaging (MRI), ultrasound, and positron emission tomography (PET),
among others.
Irrespective of the methods used, Medical image processing involves the following steps:
● Image Enhancement
● Image Segmentation
● Image Quantification
● Image Registration
● Visualization
Image Enhancement
● Medical images are very often corrupted by noise which occurs due to various sources
of interference. This noise affects the process of measurements of various factors that
could lead to serious change in diagnosis and treatment.
● Medical images also suffer from low contrast. Medical Image Enhancement aims at
resolving problems of low contrast and high-level noise in accurate diagnosis of
particular disease.
● In all such cases improvement in the visual quality of images helps to correctly interpret
the condition of the patient.
● Histogram equalization is often used to correct low contrast problems. Power law
transformation is used to correct non-uniform illumination issues.
● High frequency noise is reduced using carefully designed low pass filters. Filters could
be designed in spatial and frequency domain. MRI images suffer from noise and can be
improved using median filters.
● Image enhancement techniques are applied to improve the quality, clarity, and visual
appearance of medical images. The goal is to highlight important structures, reduce
noise, enhance contrast, and improve overall image interpretability. Common
enhancement techniques include filtering (such as noise reduction filters or
edge-enhancing filters), histogram equalization, contrast stretching, and image
sharpening.
Image Segmentation
● Image segmentation basically partitions and images into various regions. Medical Image
segmentation involves the extraction of regions of interest(ROI) from Medical Image.
Medical Image segmentation allows for more precise analysis of data by isolating only
those regions that are necessary for diagnosis.
● Image segmentation removes unwanted parts from a medical image allowing different
tissues such as bone and soft tissues to be isolated. segmentation also requires
classification of pixels and hence is treated as a pattern recognition problem.
● The most common approach to segmentation is Edge based segmentation and region
based segmentation.Thresholding is the easiest and most common technique used in
segmentation.
● Thresholding could either have a global threshold where a single threshold value
separates important objects within an image or one good use local thresholding by
splitting an image into sub images and calculating threshold for each sub image region.
● Image segmentation is the process of dividing an image into distinct regions or objects
based on their characteristics. It helps in isolating structures or areas of interest from the
background or other surrounding tissues. Segmentation can be performed using various
algorithms, such as thresholding, region growing, active contours (or snakes), clustering,
or machine learning-based approaches. Segmentation is crucial for tasks like organ
delineation, tumor detection, or measurement of specific structures.
● (refer segmentation from unit 3)
Image Quantification
● Image quantification involves extracting numerical or quantitative measurements from
medical images.
● It aims to derive meaningful and objective information from the image data.
Quantification techniques may involve measuring properties like size, shape, intensity,
texture, or other relevant features of structures or regions of interest.
● Medical image analysis requires fast,precise and repeatable Measurements. These
quantitative measurements help in addressing many aspect of the image data such as
tissue texture, size and density.
● These measurements can assist in diagnosing and monitoring diseases, assessing
treatment responses, or comparing different patient populations. Various algorithms and
methodologies are used for image quantification, including statistical analysis, pattern
recognition, or machine learning algorithms.
Image Registration
● Image registration is the process of aligning to or more images of the same scene. This
Is required for images obtained from CT scan and MRI since images from these
methods are stacked one over the other to give us 3D structures of the organs that are
being imaged.
● The process of registration involves designating one image as the reference and
applying geometric transformation to the other image so that they align with the
reference. Image registration is a prerequisite for all imaging applications that compare
datasets across subjects.
● Image registration is the process of aligning or matching two or more medical images
acquired from different modalities, time points, or perspectives. It is essential for
combining information from multiple images, tracking changes over time, or creating
image overlays for visualization or surgical planning. Registration algorithms aim to find
the spatial transformation that brings images into alignment by accounting for differences
in scale, rotation, translation, or deformation. Registration techniques can be rigid (for
rigid body alignment) or non-rigid (for accounting for deformations).
Visualization
● Visualization techniques are employed to present medical images and processed results
in an intuitive and informative manner.
● Visualization methods can range from simple 2D or 3D rendering of images to more
advanced techniques like volume rendering, surface rendering, or virtual reality-based
visualization.
● Visualization helps medical professionals better understand complex anatomical
structures, identify abnormalities, and assist in surgical planning or patient education. It
plays a vital role in conveying the information extracted from medical images effectively.
Passive sensors record the intensity and spectral characteristics of the EMR reflected or emitted
by different objects on the Earth's surface. The sensors capture the radiation across different
wavelengths, ranging from visible light to thermal infrared and even microwave regions. By
analyzing the patterns and properties of the captured EMR, scientists can gather valuable
information about land cover, vegetation, oceans, clouds, atmospheric conditions, and more.
Active sensors measure the time it takes for the transmitted energy to return to the satellite,
allowing for the calculation of the distance between the satellite and the target. By analyzing the
properties of the returned energy, such as its intensity and phase, active remote sensing
provides valuable information about the shape, elevation, and surface properties of the target
area.
Some common active remote sensing techniques include radar imaging, lidar (light detection
and ranging), and synthetic aperture radar (SAR). These techniques are used for mapping
topography, monitoring ice cover, measuring vegetation height, detecting forest structure, and
studying geological features.
Unlike passive remote sensing, active remote sensing is not dependent on sunlight, making it
suitable for acquiring data in all weather and lighting conditions. However, active systems
require higher power consumption and sophisticated signal processing techniques.
Advantages:
● Wide Area Coverage: Remote sensing allows for data collection over large and
inaccessible areas.
● Temporal Coverage: Remote sensing provides information about changes and
dynamics over time.
● Multispectral and Multisensor Capability: Remote sensing systems capture data
across various wavelengths and use different sensors, enabling the analysis of multiple
spectral bands simultaneously.
● Cost-Effectiveness: Remote sensing can be a more economical option compared to
traditional ground-based surveys.
● Consistency and Standardization: Remote sensing follows standardized procedures,
ensuring consistent and repeatable measurements.
● Synoptic View: Remote sensing provides a comprehensive view of large-scale patterns
and features.
Limitations:
● Spatial and Spectral Resolution: Remote sensing systems have limitations in
capturing fine-scale details and complex spectral characteristics.
● Atmospheric Interference: The Earth's atmosphere can affect remote sensing data by
causing scattering, absorption, and reflection, leading to potential errors.
● Limited Penetration: Some remote sensing systems cannot penetrate through clouds,
vegetation, or dense canopies, limiting data collection in certain areas.
● Interpretation Complexity: Remote sensing data require advanced analysis techniques
and expertise for accurate interpretation.
● Lack of Ground Truth Validation: Ground truth data may be challenging to obtain,
leading to uncertainties in interpreting remote sensing data.
● Data Availability and Cost: High-quality remote sensing data may have limited
accessibility and be expensive to acquire and process
There are two main types of photogrammetric imaging devices: cameras and scanners.
Cameras use lenses to project images of objects onto a light-sensitive surface, such as film or a
digital sensor. Scanners use a beam of light to scan objects and create a digital representation
of their surface.
Cameras are the most common type of photogrammetric imaging device. They can be used to
capture images from a variety of platforms, including ground, air, and space. Cameras can be
used to create a variety of photogrammetric products, including orthophotos, digital elevation
models, and 3D models.
Scanners are used to create more detailed representations of objects than cameras. They can
be used to capture images of objects that are too small or too large to be captured by a camera.
Scanners can also be used to capture images of objects that are in motion.
The choice of photogrammetric imaging device depends on the specific application. For
example, cameras are typically used for surveying and mapping, while scanners are typically
used for 3D modeling.
There are two main types of photogrammetric imaging devices used in satellite image
processing:
● Single-image photogrammetry: Single-image photogrammetry uses a single image to
create a three-dimensional model of the Earth's surface. This is done by using the
known geometry of the camera and the image to calculate the distance to each point in
the image.
● Multi-image photogrammetry: Multi-image photogrammetry uses multiple images to
create a three-dimensional model of the Earth's surface. This is done by using the
known geometry of the camera and the images to calculate the distance to each point in
the images.
In traditional remote sensing, sensors capture data in a few broad spectral bands, such as red,
green, and blue. In contrast, hyperspectral sensors measure the reflected or emitted energy
across hundreds of narrow and contiguous spectral bands, covering a much broader portion of
the electromagnetic spectrum. Each spectral band corresponds to a specific wavelength,
allowing for detailed analysis of the spectral signatures of different materials and features on the
Earth's surface.
The high spectral resolution of hyperspectral sensing enables the identification and
characterization of subtle variations in the reflectance or emission patterns of objects. This rich
spectral information can be used to discriminate between materials with similar visual
appearances but different spectral characteristics. It enables the detection and classification of
specific minerals, vegetation types, water bodies, pollution sources, and other features that
might be indistinguishable in lower spectral resolution images.
Hyperspectral data analysis involves several steps, including preprocessing, spectral signature
extraction, and classification. Preprocessing techniques correct for atmospheric effects, sensor
artifacts, and radiometric calibration to enhance the quality of the data. Spectral signature
extraction involves identifying unique spectral patterns associated with different materials or
land cover classes. Classification algorithms are then applied to categorize the image pixels into
different classes based on their spectral signatures, allowing for mapping and analysis of
specific features.
However, there are some challenges associated with hyperspectral sensing, such as the large
volume of data generated, the need for advanced data processing and analysis techniques, and
limitations in spatial resolution. Additionally, atmospheric effects and sensor noise can affect the
accuracy of hyperspectral data, requiring careful calibration and correction procedures.