image_processing
image_processing
Nikin Baidar
Year 2021
1. Fundamentals of Digital Image Pro-
cessing
Monochrome image (or simply image) refers to a 2D light intensity function f(x,y) where
x and y denote spatial co-ordinates. The value of f(x, y) at (x, y) is proportional to the
brightness (or gray level) of the image at that point.
A digital image is an image f(x,y) that has been discretized both in spatial coordi-
nates and in brightness.
A digital image is essentially a matrix MxN where, the value of each element of
the matrix (pixel) (x, y) is given by the light intensity function I(x, y). The analog sig-
nals used to generate a digital image are discretized, both in spatial coordinates and
intensity levels.
Image Morphological
restoration processing
Image
Segmentation
enhancement
Image Object
acquisition recognition
Problem Representation
Domain & Description
Figure 1.1: Key Stages in Digital Image Processing. Image processing begins out with
the problem domain.
• Problem Domain
• Image Acquisition
• Image Enhancement
• Image restoration
• Morphological processing
• Segmentation
• Object Recognition
• Image Compression∗ .
2. Entropy
3. Image Quality
4. Noise
Image Histogram
An image histogram is a graphical representation of the distribution of gray levels (pixel
values) in an image. The histogram of a digital image f with intensities [0, L−1] is given
by a discrete function h(rk ) = nk . Where, rk is the kth intensity value and nk is the
total number of pixels with intensity rk .
Essentially, an image histogram represents the frequency of every pixel value in an
image. A histogram provides a global description of the appearance of the image that
can be used in the following:
3. Image Segmentation
4. Compression
The histogram plot of a darker image, the graph tends towards left, in a bright im-
age, it tends towards right, in an image with a low contrast the plot is more or less at the
center, whilst in a clear image with high contrast, the plots are distributed throughout
the graph and each bar of the histogram is evenly spaced.
Histogram Normalization
Histogram normalization is a common practice. To normalize an image histogram,
divide each component of the histogram, by the total number of pixels in the image.
For an image of resolution MxN with pixels between [0..L−1], the normalizing function
nk
p(rk ) is given by .
MN
X
n
H(X) = − P(xi ) log(P(xi ))
i=1
Here’s an example of how the size of an image increases as the number of bits
used to represent an image increases. Consider a grayscale image of size 1024x1024.
A grayscale image is composed of discrete pixels with values (gray levels/brightness
intensities) between the interval [0..255]. The number of bits per pixel (bpp) required
to encode such grayscale image will be log(255 − 0 + 1) = log(256) = 8. The size
of the image will be resolution ∗ bpp which yields 8388608 bits. To express this
into bytes we divide by 8 (as 8 bits make a byte). So the size of the image will be
1048576 bytes. Now, suppose an RGB image data with resolution 1024x1024. An
RGB image will have 3 channels: red, green and blue and each of these channels
will have pixel values between the interval [0..255]. So the number of bits required to
encode an RGB image will be 3 times 8 which yields 24-bits. So size of such image
will be 1024 × 1204 × 24 bits; to express in bytes we divide by 8 which gives 3145728
bytes. Clearly, the size of image (with same resolution) has increased by three folds
when it is converted from a grayscale image to an RGB image.
Image Quality
Image quality is rather subjective and influences the viewer’s visual perception about
the features defined in an image. Several attributes are responsible for the ultimate
quality of an image as follows:
CPI is another measure of spatial resolution (not exactly used in image processing)
that defines the sensitivity of a mouse. CPI stands for Counts per inch, and defines
the number of pixels by which the cursor moves on your screen when you move your
mouse by an inch. So a CPI setting of 800 means that moving your mouse by one inch
will move the cursor by an equivalent of 800 pixels.
5. Brightness
6. Sharpness
Image Dtypes
Before performing any operations on images, it is important to get a general idea of
image dtypes.
Typecasting Images in Py
Module to use: ‘scikit-image‘ i.e., ‘skimage’. The ‘skimage.util‘ sub-module provides
some functions to typecast images:
Function Description
img_as_float Convert to floating points
img_as_ubyte Convert to 8-bit unsigned integer type.
img_as_uint Convert to 16-bit unsigned integer type.
img_as_int Convert to 16-bit signed integer type.
Note that floating point images must be restricted to the interval [-1..1] even though
the datatype itself can exceed this range. To respect this property, amongst others,
the ‘astype‘ py command should never be used to typecast images because is violates
the assumptions about the dtype range of images.
import numpy as np
from skimage.util import img_as_float as toFloat
image = np.arange(0, 50, 10, dtype=np.uint8)
# These float values are out of dtype range for an image.
print(image.astype(float))
[ 0. 10. 20. 30. 40. ]
Point Operations
In point operations, same conversion operation is applied to each and every pixel in
an image. This means that the transformation of any given pixel is independent of
its location or its neighboring pixels. This is in contrast to neighbourhood operations
where the transformation of a pixel depends on where it is located and the pixels that
surround it.
The transformation function of point operations is given by:
s = c T (r)
where, s is the processed pixel, c is the scaling factor, T is the transformation function
and r is the input pixel.
• Logarithmic Transformation
• Power Law Transformation (Gamma Transformation)
• Contrast Stretching
• Histogram Equalization
• Image Negatives
Logarithmic Transformation
Logarithmic transformation is used for contrast enhancement. The input pixels are
replaced with their (natural) logarithmic values. Logarithmic transformation increases
details in the darker regions of an image whilst at the same time decreases the details
in the brighter regions of the image with respect to the contrast.
The logarithmic transformation function is as follows:
s = c log(r + 1)
The input pixel is incremented by 1 in order to handle the case when the logarithm
is undefined log(0). For logarithmic transformation, the scaling factor c is calculated
as:
255
c=
log(max input pixel value + 1)
255 is the maximum possible pixel value. Here, the scaling factor c is chosen such
that we get the maximum output value corresponding to the bit size used.
Contrast stretching
Contrast stretching is used to stretch intensity values to cover a desired range of pixels
in a linear fashion. It enhances images for which the histogram is narrower.
The contrast stretching function in as follows:
(b − a)
s = (r − c) +a
(d − c)
where, s is the processed/output pixel, r is the input pixel, a and b is the range of
possible pixel values, c and d are the minimum and maximum intensity values present
in the image.
Algorithm:
a = 0 # Lower bound
b = 255 # Upper bound (for 8-bit images;)
Step 3. Calculate the image histogram, i.e., the frequency of all grey levels in the image.
Step 4. Initialize the least and maximum grey level values that are present in the image.
scaling_factor = (b-a)/(d-c)
s = c (r)γ
where, c is the scaling factor (usually an identity function), s is the processed or the
output pixel, r is the input pixel and γ is a controllable parameter such that
• when γ > 1, the contrast of the light gray area is enhanced
• when γ < 1, the contrast of the dark gray ares is enhanced
• when γ = 1, the contrast of the original image remains unchanged.
Figure 3.1: Curves for different values of γ.
Image Negatives
Image negatives, in the good old days, were used to produce images in Film Photog-
raphy.
Image negative is produced by subtracting each pixel from the maximum intensity
value. e.g. for an 8-bit image, the max intensity value is 255, thus each pixel is
subtracted from 255 to produce the image negative. So the transformation function
used to obtain image negative is
s = c ∗ (L − 1) − r
Where, the scaling factor c is 1, (L - 1) is the max possible intensity value and s, and r
are output and input pixel values respectively.
Image negatives have their applications in images where the background is black
and the foreground gray levels are not clearly visible. So converting the background to
white will render the image clear.
Neighbourhood Operations: Spatial Filtering
With neighbourhood operations, individual pixel values are modified with respect to its
location. To put in another way, the final transformation of an individual pixel value
will depend also on the pixel values of its surrounding. Some simple neighbourhood
operations include:
1. Min filter : Set the pixel value to the min in the neighbourhood.
2. Max filter: Set the pixel value to the max in the neighbourhood.
3. Median filter: Set the pixel value to the median of all pixel values in the neigh-
bourhood.
These min, max and median filters are termed as order statistics filters.
Pixel Neighbourhood
• N4 (p) 4-neighbours: 2 horizontal and 2 vertical pixels of the central pixel.
• ND (p) diagonal neighbours: Elements of the major and the minor diagonals of
the central pixel p. Two diagonals meet at the central pixel.
• N8 (p) 8-neighbours: The 2 horizontal and 2 vertical plus the elements of the
major and the minor diagonals of the central pixel p. i.e., N8 (p) = N4 (p) ∪ ND (p).
3. Compute a new set of pixels by performing some sort of operations, between the
original and the filter pixels, such as addition or multiplication.
4. Apply the filter function, it can be things such as minimum, maximum, median,
addition, multiplication. Result of the whichever operation performed is the value
of the new pixel.
5. Replace the original pixel with this new pixel value.
6. Shift the window 1 step/stride to the right (or by the defined number of steps and
in the defined direction)
7. Repeat till EOF.
Filter Properties
A filter can have different parameters such as its shape, size, weights and function.
Filter parameters:
• Filter size (Size of the neighbourhood), usual filter size ranges anywhere between
3x3 to 21x21.
• Filter shape, filters do not necessarily need to be squares, they can also be
rectangular, circular and so on.
Correlation
For this illustration, a 2D filter will be represented as a 1D array such that each row of
the filter will be appended at the end of the array in a top-down approach.
Consider a 3x3 filter F = [a, b, c, d, e, f, g, h, i] is placed over the first set of orig-
inal pixels [j, k, l, m, n, o, p, q, r] at the origin. Now, we multiply the correspond-
ing pixels in the original set and the filter to get [aj, bk, cl, dm, en, fo, gp, hq, ir].
Now, add up all the items in this new set of pixels and replace the central pixel in
the original set i.e. ‘n’ with its sum. So our processed set of pixels will become
[j, k, l, m, sum, o, p, q, r]. Now we shift the window by the defined number of step-
s/strides, say 1 and repeat this process for every pixel in the original image to generate
a new filtered image.
Figure 3.2: ./images/image-origin.jpg
Convolution
Start with pretty much the same (or similar) filter as correlation, F=[a, b, c, d, e, f, g,
h, i]. The only difference is that the filter is reversed (reversed in 1D, flipped in 2D) as
G=[i, h, g, f, e, d, c, b, a]. Apply this filter to the original set of pixels in the image.
Multiply the corresponding pixel values to obtain a new set of pixels [ji, kh, lg, mf, en,
do, cp, qb, ra]. Add up all pixels in this new set and use the sum to replace the central
pixel.
For symmetric filters, there is no difference between correlation and convolution as
its reverse will be same as it. For example, since F=[a, b, c, d, e, d, c, b, a] is symmetric
its reverse willPbePsame as F.
g(x, y) = s t w(s, t)f(x + s, y + t)
g(x, y) yields the new value of the central pixel.
Ix′
q
−1
∇I = (Ix′ )2 + (Iy′ )2 θ = tan
Iy′
Step edge: [0. 0. 0. 0. 1. 1. 1. 1.] Ramp edge: [0. 0. .1 .2 .4 .8 1. 1.] Roof edge:
[0. .1 .2 1. 1. .2 .1 0.]
(i − (n + 1))2 + (j − (n + 1))2
1
Hij = exp − ; 1 ≤ i, j ≤ (2n + 1)
2πσ2 2σ2
Looking at the range, we can see that filters size should be 3 at minimum. Fur-
thermore, the output of the first iteration must be placed at the center of the filter. To
achieve this, think of the origin as the center of a mesh grid. So if you want a filter
of size 3, the origin (0,0) is the center and its N4 (p) are (-1,0), (1,0), (0,-1), (0,1). To
implement in this code, start iteration from -n to n+1 and place the values of the pixels
in (i+n, j+n). Here’s a py function to create Gaussian filters:
#! /usr/bin/python
Laplacian Operator
1. Isotropic, or rotation invariant. This means that the results of first applying the
Laplacian operator and later rotating the image will be same as the results of first
rotating the image and then applying the Laplacian operator.
3. Digital implementation
∂2 f ∂2 f
∇2 f = +
∂2 x ∂2 y
∂2 f
= f(x + 1, y) + f(x − 1, y) − 2f(x, y)
∂x2
∂2 f
= f(x, y + 1) + f(x, y − 1) − 2f(x, y)
∂y2
Adding these together, we get
With this equation, we can compute the standard Laplacian operator kernel:
0 1 0
1 −4 1
0 1 0
Some commonly used variants of the standard Laplacian kernels are:
1 1 1 −1 2 −1
1 −8 1 and 2 −4 2
1 1 1 −1 2 −1
Laplacian filters are very sensitive to noise. So, to counter this, the image is Gaus-
sian smoothed before applying the Laplacian filter. This step reduces high frequency
noise components prior to the spatial differentiation.
Prewitt Operator
edges_prewitt = applyPrewitt(src, 3)
edges_x = edges_prewitt.get(’edges_x’)
edges_y = edges_prewitt.get(’edges_y’)
# Compute magnitude
magnitude = numpy.sqrt(edges_x**2 + edges_y**2)
# Orientation
phase = cv2.phase(gradient_x, gradient_y, angleInDegrees=True)
Sobel Operator
• Like the Laplacian and Prewitt operators, Sobel operator is another discrete dif-
ferentiation operator. Discrete differentiation is only an approximation of the con-
tinuous differentiation, therefore the Sobel operator computes only an approxi-
mation of the gradient (derivative) of an image (intensity function).
Image Segmenta-
tion Approaches
• Works well when • Better for noisy • One of the best and
the edges are images where easiest techniques.
prominent. edges are hard to
identify. Disadvantages:
Disadvantages:
Disadvantages: • Cannot not be
• Cannot not be applied to images
applied on images • Seed point must be with complex
having many or specified. intensity distributions.
smooth edges. • Different seeds • Cannot process
• Not suitable for may give different images with
noisy images. outputs. unimodal histograms.
Figure 4.1: Various approaches to image segmentation. (i) Edge based approaches
depends on the local changes in the image intensity but these cannot be applied on
images that have smooth or way too many edges. (ii) Region based segmentations
rely on a seed point, based on which regions grow by checking neighbouring pixel
intensities to add or not add, thus separating the regions. (iii) The thresholding based
segmentation (common pixel based method) involves calculating optimum thresholds
which separates different regions. Thresholding exhibits good performance in images
with bimodal intensity distribution, but cannot process images that have unimodal his-
tograms.
Thresholding Techniques
Thresholding is one of the simplest operations for segmentation. Thresholding is used
to remove an intended object or target object from its background by allocating a
threshold, an intensity value, for every pixel such that each pixel is categorized as
either a background pixel or a pixel that belongs to the object of interest.
For example, if we take a grayscale image and threshold it with a value of T we
make all pixel values >= T into 1, and all pixels < T into 0. The results would be as
follows:
X
t−1
ωb (t) = p(i)
i=0
Where, p(i) is the probability of the intensity value i. and H(i) is the his-
togram value i.e. frequency of the pixel i in the image. The probabilities
must add up i.e. ωb + ωf = 1.
2. the mean (µ) of background and foreground pixels. The mean of the back-
ground pixels is given by:
X
t−1
p(i) ∗ H(i)
µb (t) =
i=0
ωb (t)
Step 4. Compute intraclass (within class) variance for the current threshold. This is sim-
ply the sum of the two variances (background and foreground) multiplied by their
associated weights.
νW (t) = ωb (t) νb (t) + ωf (t) νf (t)
Step 5. Update the intraclass (within class) variance for a given threshold if it lower than
the current value for it. The desired threshold corresponds to the one with the
minimum νW (W for within).
Step 6. Repeat 3 and 5 for all possible values of threshold.
Step 7. Once the optimum threshold value is determined, change all background pixels
to 0 and all foreground pixels to 1 to get the segmented image.
A slight modification that opts for a much faster approach (because you skip through
step 3.3 mentioned above) is determination of the maximum value for the interclass
variance between foregrounds and backgrounds.
νB (t) = ν − νW
= σ2 − σ2W
νB (t) = ωb ωf (µb − µf )2
Here, we look for the maximum value of ν( B) (B for between) to select the optimum
value of threshold. The threshold that yields maximum νB also yields minimum νW .
The purpose of this step is to check if the pixels on the same direction are more
or less intense than the ones being processed. Helps to get rid of false edge
detections by thinning the edges.
Step 4. Apply double threshold to determine potential edges
Post non-maximum suppression, remaining edge pixels provide a more accurate
representation of the real edges of objects in the image. However, some false
edge pixels may still exist due to noise and color variations. To account for these
spurious edges, perform double thresholding to filter out edge pixels with a weak
gradient value and preserve edges with high gradient values. This is different
from non-maximum suppression as comparisons are made with respect to pre-
set lower and upper bounds. So if the edge pixel’s gradient value is higher than
the upper bound, it is marked as a strong edge pixel and if on the other hand it
is higher than the lower bound but smaller than the upper bound, it is marked as
a weak edge pixel. Pixels whose gradient values are lower than the lower bound
are suppressed. There is no predefined values for the lower and upper bound;
these are determined experimentally depending on the content of the image.
Step 5. Track edge by hysteresis
As a final step, the edges are detected by suppressing those weak pixels that
are not connected to any strong edges. Usually a weak edge pixel caused from
true edges will be connected to strong edge pixels. So this step eliminates any
spurious weak pixels that may have resulted from noise/color variations. To track
the edge connection, blob analysis is applied by looking at a weak edge pixel
and its 8-connected neighborhood pixels. As long as there is one strong edge
pixel that is involved in the blob, that weak edge point can be identified as one
that should be preserved otherwise it should be suppressed.
Hough Transform
“Method and Means for Recognizing Complex Patterns”.
• Extraneous data: Which points to fit to?
• Incomplete Data: Only parts of the model is visible.
• Noise
The simplest case of Hough transform (edge detection) is detecting straight lines.
In general, the straight line y = mx + c can be represented as a point (b, m) in the
parameter space (the parameters are the slope m and the intercept c).
Line detection algorithm:
However, this would give rise to unbound values of the slope parameter, meaning
that −∞ ≤ m ≥ ∞. This would mean the accumulator needs to be massive to store
all possible values of m. Thus, for computational efficiency, we resort to the Hesse
normal form:
r = x cos θ + y sin θ
The intuition is that every vector on the line must be perpendicular to the straight
line of length r that comes from the origin. The intersection point of the function line
and the perpendicular line that comes from the origin is at Po = (r cos θ, r sin θ). So for
any point P on the line, the vector P − Po must be orthogonal to vector Po .
2. Region Splitting and Merging. This is a top-down approach where the algo-
rithm completes in two step; starts with a bigger chunk and removes pixels as
necessary to get the segment. Similar segments are added depending on the
set criteria.
Region growing based techniques are better than edge-based techniques in noisy
images where edges are difficult to detect.
Here’s another example to show how erosion works. We start with the input image
I on which a 2x2 structuring element F traverses.
1 1 1 1
1 1 0 1
Input I =
0 1 1 1
0 1 0 0
1 1
Structuring element F =
1 0
Starting from the top left we traverse the structuring element across all the pixels
in a left-to-right and top-to-bottom basis and a new pixel value is computed after each
iteration such that the final output becomes:
0 1 0 0
0 0 0 0
Eroded E = I ⊖ F =
0 1 0 0
0 1 0 0
The following illustrates how the erosion process [I ⊖ F] for the above input and
filter.
1 [0] 1 [0] 1 1 1 [0] 1 [1] 1 [1] 1 1 [0] 1 [1] 1 [0] 1 [0]
1 [0] 1 [0] 0 1 1 [0] 1 [1] 0 [1] 1 1 [0] 1 [1] 0 [0] 1 [0]
0 1 1 1 0 1 1 1 0 1 1 1
0 1 0 0 0 1 0 0 0 1 0 0
1 [0] 1 [1] 1 [0] 1 [0] 1 [0] 1 [1] 1 [0] 1 [0] 1 [0] 1 [1] 1 [0] 1 [0]
1 [0] 1 [0] 0 [0] 1 [0] 1 [0] 1 [0] 0 [0] 1 [0] 1 [0] 1 [0] 0 [0] 1 [0]
0 [0] 1 [0] 1 1 0 [0] 1 [0] 1 [0] 1 0 [0] 1 [0] 1 [0] 1 [0]
0 1 0 0 0 1 0 0 0 1 0 0
1 [0] 1 [1] 1 [0] 1 [0] 1 [0] 1 [1] 1 [0] 1 [0] 1 [0] 1 [1] 1 [0] 1 [0]
1 [0] 1 [0] 0 [0] 1 [0] 1 [0] 1 [0] 0 [0] 1 [0] 1 [0] 1 [0] 0 [0] 1 [0]
0 [0] 1 [0] 1 [0] 1 [0] 0 [0] 1 [1] 1 [1] 1 [0] 0 [0] 1 [1] 1 [0] 1 [0]
0 [0] 1 [0] 0 0 0 [0] 1 [1] 0 [1] 0 0 [0] 1 [1] 0 [0] 0 [0]
The following figure shows how the output of an erosion operation actually looks
like:
Figure 4.6: Example of erosion on the original image (a) with a 3x3 (b) and a 5x5 (c)
structuring elements.
Figure 4.8: Dilation operation on an input image using a 2x2 structuring element
Here’s another example to show how dilation works. We start with the input image
I on which a 2x2 structuring element F traverses. The dot (.) represents the central
pixel viz. (0,0) for a 2x2 filter.
0 1 0
1. 1
Input I = 1 0 0 Structuring element F =
0 0
0 0 0
Starting from the top left we traverse the structuring element across all the pixels
in a left-to-right and top-to-bottom basis and a new pixel value is computed after each
iteration such that the final output becomes:
0 1 1
Dilated D = I ⊕ F = 1 1 0
0 0 0
The following illustrates how the dilation process [I ⊕ F] using substitution method
works for the above input and filter.
So PositionD (1) = {(0,1), (0,2), (1,0), (1,1)} can be used to compute the dilated
output (notice that the output of both vector addition and substitution method is same).
Figure 4.9: Dilation Illustration on the original image (a) with a 3x3 (b) and a 5x5 (c)
structuring element.
Dilation Properties:
Opening
• Erosion first then dilation with the same structuring element. Mathematically,
represented as A ◦ B = D(E(A, B)) = (A ⊖ B) ⊕ B
• Removes any narrow connections and lines between two regions.
• Renders the edges more sharper and smoother.
Closing
• Dilation first then erosion with the same structuring element. Mathematically,
represented as A • B = E(D(A, B)) = (A ⊕ B) ⊖ B
• Removes noise in the form of small holes i.e. fills in any small black areas or
holes in the image while maintaining the shape and size of the object in the
image.
Morphological Gradient
Dilation and erosion have inverse effect of each other. Dilation adds a pixel layer to the
boundaries of regions, while erosion strips them away. The difference between image
dilation and erosion is termed as the morphological gradient.
5. Image Compression
Compression, at its core, is the reduction in the original (file) size of the image. Com-
pression encodes an original image with fewer number of bits. The primary objec-
tive of image compression is to eliminate or minimize the redundancy (occurrence of
similar bits) in the image which will allow for optimum resource utilization. For exam-
ple, smaller sized images require lower bandwidth to be transmitted across a channel
which makes the transmission more efficient, on a similar note, storing a compressed
image minimizes unnecessary memory hogging.
Compression Ratio
Compression ratio is the ratio of the original or uncompressed image file to the reduced
or compressed file. The size of the compressed file depends on the compression ratio
as related by the following equation:
Size of uncompressed file
Compression ratio (c) =
Size of compressed file
Compression ratio is often written as SIZEU : SIZEC .
Example#1 The original image is 256x256 pixels, single-band (grayscale), 8-bits per
pixel. This file is 65,536 bytes (64k). After compression the image file is 6,554 bytes.
The compression ration is SIZEU : SIZEC i.e. 65546 : 6554 = 9.99 ≈ 10 : 1. This is read
as “10 to 1 compression” or “x10” compression.
Another way of stating this compression is to use the terminology of bits per pixel.
The bpp is given by
Number of bits
bpp =
Number of pixels
So in the above case, bpp is given by
(8)(6554)
bpp = which yields 0.8.
(256)(256)
CR is clearly a relative measure, while bpp is an absolute measure and represents
the average number of bits needed to encode each image pixel information. CR is
often times represented as a normalized ratio such as 2 : 1, meaning that the com-
pressed file is twice as small as the original size.
For compressed images, as they are usually transformed into different representa-
tions, the bpp is evaluated indirectly by taking the following average:
SIZEC
bpp =
Npixels
So, as the number of pixels (Npixels ) remains unchanged, CR can be related with
bpp as follows:
bppU
CR =
bppC
For the above example, CR is given by 8/0.8 = 10. Since it’s a ratio, we write this
as 10 : 1.
Redundancy in Image Data
1. Coding redundancy
• Occurs when the data used to represent the image is not utilized in an opti-
mal fashion.
• If the gray levels of an image are coded in a way that uses more code sym-
bols than absolutely necessary to represent each gray level, then the result-
ing image is said to contain coding redundancy.
2. Interpixel redundancy
• Occurs because adjacent pixels tend to be highly correlated, in most im-
ages, the brightness levels do not change rapidly but chage gradually.
3. Interband redundancy
• Occurs in color images due to the correlation between bands within an im-
age— if we extract the red, green and blue bands they look similar.
4. Psychovisual redundancy
• Some information is more important to the human visual system than other
types of information.
Preprocessing
Input Data
Mapping
Image I(r,c) Reduction
Compressed
Quantization Coding
file
Encoding
Compressed Inverse
Decoding
file Mapping
Post Decoded
Processing Image (I,c)
Lossless Compression
• Image size is reduced without any quality loss
• Lossless compression algorithms usually do not remove any pixels, instead they
just group similar pixels together. This renders lossless compression reversible.
• The biggest benefit to lossless compression is that the quality of the image is
retained and we can still achieve a smaller file size.
• The downside to lossless compression is that the file size can, even though
smaller than the original, can still be quite big. These days Google’s WebP for-
mat is combined with lossless compression to achieve a significant reduction in
file size while keep the image quality same.
Here is a list of a few lossless algorithms:
• Run-length encoding
• Huffman-coding
• Arithmetic Coding
Lossy Compression
• Some of the data from the original image is lost.
• Images that have undergone lossy compression cannot be reversed, once mean-
ing that lossy compression algorithms are irreversible.
• If the compression is repeated a number times, more degradation occurs each
time which isn’t the case with non-lossy compression.
• Transform encoding
MRI
sMRI
sMRI, short for structural MRI, is the most commonly used MRI modality which is
used to record the anatomical structure of tissues i.e. to display the contrast between
different tissues in the brain.
dMRI
Diffusion MRI creates contrast based on the diffusion of direction of water molecules
in the brain. This technique is used to map brain structures such as the white matter
tracts in the brain. WM tracts are insulated and transmit electrical signals over long
distances. This allows communications between brain regions with high bandwidth
and speed. Brain regions are made up of gray matter. GM is regarded as the brain’s
substrate for computation and memory. The magnitude and direction of diffusion of
water molecules measured with dMRI are used to provide us with an idea of the struc-
ture and directionality of WM tracts. Mapping WM tracts in the brain is called WM
tractography.
fMRI
Functional MRI, records the activity in the brain i.e. the functional connectivity between
the brain regions. Unlike sMRI, fMRI does not use properties of hydrogen to create a
contrast but uses the blood-oxygen-level-dependent (BOLD) signal. The BOLD signal
measures the amount of blood flow and blood oxygenation in different regions of the
brain over time. An increase in energy consumption in a certain brain region increases
the blood flow and the transportation of oxygen to that brain region. fMRI thus indirectly
records the neural activity, either while a person is performing some task (inside the
MRI scanner, like watching a movie) or while in the resting state, performing no task at
all to understand the neural activity. fMRI signals are spatio-temporal (4D).
EEG
EEG records brain signals using electrodes.