0% found this document useful (0 votes)

9 views31 pages

Chunk 2

Uploaded by

richkdp85

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views31 pages

Chunk 2

Uploaded by

richkdp85

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

increasing interaction between computer vision and computer graphics, there has been a

significant change in methods like morphing, image-based modeling and rendering,

image stitching, light-field-rendering, and interpolation of views (Szeliski, 2022).

Current developments go into the direction of optimization frameworks and machine

learning approaches for those feature-based techniques. Further stimulation of the field
of computer visions comes from the recent development in deep learning. These new
methods outperform the classical methods on benchmark computer image datasets in
many tasks such as segmentation and classification or optical flow (O’Mahony et al.,
2020).

Typical Tasks

There are four major categories in computer vision: recognition tasks, motion analysis,
image restoration and geometry reconstruction. The following figure illustrates those
tasks.

Figure 17: Categories of Computer Vision Tasks

Source: Created on behalf of IU (2022).

Recognition tasks

There are different types of recognition tasks in computer vision. Typical tasks involve the
detection of objects, persons, poses, or images. Object recognition deals with the estima-
tion of different classes of objects that are contained in an image (Zou et al., 2019). For
instance, a very basic classifier could be used to detect whether there is a hazardous mate-
rial label on an image or not. Making the classifier more specific could additionally recog-
nize information about the label type such as “flammable” or “poison.” Object recognition
is also important in the area of autonomous driving to detect other vehicles or pedes-
trians.

In object identification tasks, objects or persons that are in an image are identified using
unique features (Barik & Mondal, 2010). For person identification, for example, a computer
vision system can use characteristics, such as fingerprint, face or handwriting. Facial rec-

65
ognition, for instance, uses biometric features from an image and compares them to the
biometric features of other images from a given database. Person identification is com-
monly used to verify the identity of a person for access control.

Pose estimation tasks play an important role in autonomous driving. The goal is to esti-
mate the orientation and/or position of a given object relative to the camera (Chen et al.,
2020). This can, for instance, be the distance to another vehicle ahead or an obstacle on
the road.

In optical character recognition (OCR), handwritten or printed text is recognized from an

image and converted into a string, which can be processed by a machine (Islam et al.,
2017). In online banking, for instance, OCR can be used to extract the relevant information
for bank transfers such as the amount or the bank account information, from an invoice.

Motion analysis tasks

In classical odometry, motion sensors are used to estimate the change of the position of
an object over time. Visual odometry, conversely, hand analyzes a sequence of images to
gather information about the position and orientation of the camera (Aqel et al., 2016).
Autonomous cleaning bots can, for instance, use this information to estimate the location
in a specific room.

In tracking tasks, an object is located and followed in successive frames. A frame can be
defined as a single image in a longer sequence of images, such as videos or animations
(Yilmaz et al., 2006). This can, for instance, be the tracking of people, vehicles, or animals.

Image restoration tasks

Image restoration deals with the process of recovering a blurry or noisy image to an image
of better and clearer quality. This can, for instance, be old photographs, but also movies
that were damaged over time. To recover the image quality, filters like median or low-pass
Noise filters can remove the noise (Dhruv et al., 2017). Nowadays, methods from image restora-
In computer vision, Noise tion can also be used to restore missing or damaged parts of an artwork.
refers to a quality loss of
an image which is caused
by a disturbed signal. Geometry reconstruction tasks

In geometry reconstruction tasks, virtual 3D models of scenes from videos or images or

even real world objects are estimated (Han et al., 2021). This is typically done based on
multiple images that are taken from different perspectives.

Challenges in Computer Vision

In computer vision, there are five major challenges that must be tackled (Szeliski, 2022):

• The illumination of an object is very important. If lighting conditions change, this can
yield different results in the recognition process. For instance, red can easily be
detected as orange if the environment is bright.

66
• Differentiating similar objects can also be difficult in recognition tasks. If a system is
trained to recognize a ball it might also try to identify an egg as a ball.
• The size and aspect ratios of objects in images or videos pose another challenge in com-
puter vision. In an image, objects that are further away will appear to be smaller than
closer objects even if they are the same size.
• Algorithms must be able to deal with rotation of an object. If we look for instance at a
pencil on a table, it can either look like a line when we look from the top or as a circle
when we change to a different perspective.
• The location of objects can vary. In computer vision, this effect is called translation.
Going back to our example of the pencil, it should not make a difference to the algo-
rithm if the pencil is located on the center of a paper or next to it.

Because of these challenges, there is muchre research towards algorithms that are scale-,
rotation-, and/or translation invariant (Szeliski, 2022).

5.2 Image Representation and Geometry

Computer vision is about processing digital images. To be able to process images with a
computer, this section starts with an explanation of how to represent images in the form
of numerical data. For this purpose, we introduce the concept of pixels. Subsequently, we
will address the topic of filters and how images can be modified using filters.

Pixels

Images are constructed as a two-dimensional pixel array (Lyra et al., 2011). A pixel is the
smallest unit of a picture. The word originates from the two terms “pictures” (pix) and
“element”(el) (Lyon, 2006). A pixel is normally represented as a single square with one
color. It becomes visible when zooming deep into a digital image. You can see an example
of the pixels of an image in the figure below.

Figure 18: Pixels of a Digital Image

Source: Created on behalf of IU (2022).

67
In the resolution of an image, the number of pixels is specified. If the resolution is high, the
more details will be in the image. Conversely, if the resolution is low, the picture might
look fuzzy or blurry.

Color representations

There are various ways to represent the color of a pixel as a numerical value. The easiest
way is to use monochrome pictures. In this case, the color of a pixel will be represented by
a single bit, being 0 or 1. In a true color image, a pixel will be represented by 24 bits.

The following table shows the most important color representations with the according
number of available colors (color depth).

Table 2: Color Representations in Images

Name Color representation Color depth

Monochrome 1 bit 2 colors

8 bit 28 = 256 gray scale intensity

levels or colors

Real color 15 bit 215 = 32.768 colors

High color 16 bit 216 = 65.536 colors

True color 24 bit 224 = 16.777.216colors

Deep color 30 – 48 bit 230 − 248colors

Source: Created on behalf of IU (2022).

One way to represent colors is the RGB color representation. We illustrate this using the
24-bit color representation. Using RGB, the 24 bits of a pixel are separated in three parts,
each 8 bits in length. Each of those parts represents the intensity of a color between 0 and
255. The first is the color red (R), the second green (G), and last blue (B). Out of these three
components all the other colors can be mixed additively. For instance the color code
RGB(0, 255, 0) will yield 100 percent green. If all values are set to 0, the resulting color will
be black. If all values are set to 255 it will be white. The figure below illustrates how the
colors are mixed in an additive way.

68
Figure 19: Additive Mixing of Colors

Source: Created on behalf of IU (2022).

Another way to represent colors is the CMYK model. In contrast to the RGB representation
it is a subtractive color model comprised of cyan, magenta, yellow and key (black). The
color values in CMYK range from 0 to 1. Therefore, to convert colors from RGB to CMYK, the
RGB values first have to be divided by 255. Therefore, the values of cyan, magenta, yellow
and key can be computed as follows:

R G B
K=1 − max , ,
255 255 255
R
1− −K
255
C= 1−K
G
1− −K
255
M= 1−K
B
1− −K
255
Y= 1−K

While the RGB is better suited for digital representation of images, CMYK is commonly
used for printed material.

69
Images as functions

We will now discuss how an image can be built from single pixels. To do that, we need a
function that can map a two-dimensional coordinate (x,y) to a specific color value. On the
x-axis we begin on the left with a value of 0 and continue to the right until the maximum
width of an image is reached. On the y-axis, we begin with 0 at the top and reach the
height of an image at the bottom.

Let us look at the function f x, y for an 8-bit gray scale image. The function values of
f 42, 100 = 0 would mean that we will have a black pixel 42 pixels to the right and 100
pixels below the starting point. In a 24-bit image the result of the function would be a tri-
ple value indicating the RGB intensity of the specified pixel.

Filters

Filters play an important role in computer vision when it comes to applying affects to an
image, implementing techniques like smoothing, or inpainting, or extracting useful infor-
mation from an image, like the detection of corners or edges. It can be defined as a func-
tion that gets an image as an input, applies modifications to that image, and returns the
filtered image as an output (Szeliski, 2022).

2D convolution

A frequently used technique to filter images is 2D convolution. If 2D convolution is applied

to an image, a small matrix (also called a convolution matrix or kernel) is moved over the
matrix of an image pixel by pixel and multiplied with the values of the matrix. The convo-
lution matrix usually consists of 3x3 or 5x5 elements (Smith, 1997).

The convolution of an image I with a kernel k with a size of n and a center coordinate a can
be calculated as follows:

n n
I ⋅ x, y = ∑ ∑ I x + i − a, y + j − a k i, j
i = 1i = j

where I · x, y is the value of the resulting image I · at position x, y while I is the origi-
nal image. The center coordinate for a 3x3 convolution matrix is 2, for a 5x5 convolution
matrix 3 and so forth. To understand the process, we will use the following example of a
3x3 convolution. The kernel matrix used for the convolution is shown in the middle col-
umn of the figure.

70
Figure 20: 2D Image Convolution

Source: Created on behalf of IU (2022).

The kernel matrix is moved over each position of the input image. In our input image the
current position is marked orange. In our example we start with the center position of the
image and multiply the image on this position with the values of the kernel matrix. The
resulting value for the center position of our filtered image is computed as follows:

0 · 41 + 0 · 26 + 0 · 86 + 0 · 27 + 0 · 42 + 1 · 47 + 0 · 44 + 0 · 88 + 0
· 41 = 47

In the next step, we shift the kernel matrix to the next position and compute the new value
of the filtered image:

0 · 26 + 0 · 86 + 0 · 41 + 0 · 42 + 0 · 47 + 1 · 93 + 0 · 88 + 0 · 41 + 0
· 24 = 93

The bottom row in our figure shows the result after all positions of the image have been
multiplied with the kernel matrix.

71
Padding techniques

If convolution techniques are applied to images, we face the problem that in the first and
last rows and columns of an image there will not be enough values to apply the matrix
multiplication with the convolution matrix. To solve this, we can add additional values at
the border of our input images. This process is referred to as padding (Szeliski, 2022).

There are three padding techniques that are commonly used: constant, replication, and
reflection padding.

In constant padding, a constant number (e.g., zero) is used to fill the empty cells. Replica-
tion padding uses a replication of the values from the nearest neighboring cells. In reflec-
tion padding, the value from the opposite side of a pixel is used to fill the cell. For instance,
the cell on the top left will be filled with the value on the bottom right (Szeliski, 2022).

Figure 21: Different Padding Techniques

Source: Created on behalf of IU (2022).

The figure above illustrates how the three padding techniques are applied to an image.

Distortion

Image processing in computer vision is normally done with the assumption that an image
we receive from a camera is a linear projection of a scene. That means that if we have a
straight line in the real world we can expect it to be a straight line in the digital representa-
tion of the image (Szeliski, 2022). However, in practical scenarios camera lenses often
cause distortion. There exist two kinds of distortions – radial and tangential – that will be
explained in the following.

72
Radial distortion

Radial distortion appears when lines that are normally straight bend towards the edge of
the camera lens (Wang et al., 2009). The intensity of the distortion depends on the size of
the lens. With smaller lenses we will find higher distortion. Moreover, radial distortion is
also more dominant when wide-angle lenses are used. In general, there are four types of
radial distortion (Szeliski, 2022):

1. Barrel distortion/positive radial distortion: Lines in the center of an image are bent to
the outside.
2. Pincushion distortion/negative radial distortion: Lines in the center of an image are
bent to the inside.
3. Complex distortion/mustache radial distortion: Lines with a combination of positive
and negative distortion.
4. Fisheye radial distortion: Occurs with ultra wide-angle lenses, e.g., a peephole.

Figure 22: Radial Distortion Types

Source: Created on behalf of IU (2022).

Tangential distortion

Besides radial distortion, tangential distortion is another effect that can often be observed
in digital imaging. Tangential distortion is caused if the image sensor unit and the camera
lens are not properly aligned. If the camera lens and the image plane are not parallel, the
distortions will look as shown in the graphic below.

73
Figure 23: Tangential Distortion

Source: Created on behalf of IU (2022).

To address distortion in digital image processing, mathematical models like the Brown-
Conrady model (Brown, 1966) can be used to describe and correct the effects of the distor-
tion. To be able to apply those models, it is important that the extrinsic and intrinsic
parameters of the camera are known. These parameters can be determined by calibration.

Calibration

Camera calibration estimates the extrinsic and intrinsic parameters of a camera (Szeliski,
2022). The calibration makes it possible to extract distortion from the images.

Extrinsic characteristics of a camera are, for instance, the orientation in real world coordi-
nates and the position of the camera. The intrinsic characteristics include parameters
such as the optical center, the focal length, and the lens distortion parameters.

If the camera is calibrated properly, images can reliably be recovered from distortion
which allows us, for instance, to measure distances and sizes on those images in units as
meters and, therefore, reconstruct a 3D model of the underlying scenario from the real
world.

Techniques

To be able to determine the calibration parameters of a camera, it is important to know

the coordinates from the original real world 3D representation as well as the correspond-
ing coordinates of the 2D image (Szeliski, 2022). A good example to illustrate the process
Pinhole camera of transferring a 3D image into a 3D image in a simplified model is the pinhole camera. In
The pinhole camera was this process we use three coordinate systems:
the very first camera. It
uses a box with a small
pinhole to generate an 1. the 3D coordinate system of the camera
image on the opposite 2. the 3D real world coordinate system
side of the box.
3. the 2D coordinate system of the projected image

74
Figure 24: Principle of the Pinhole Camera

Source: Created on behalf of IU (2022).

The projection process is done in two steps:

1. Transform the coordinates from the 3D world to the 3D camera coordinates. For this
step, extrinsic parameters, such as rotation and translation of the information are
used.
2. Transform the 3D camera coordinates to the 2D image coordinates. In this step, intrin-
sic parameters, such as focal length, distortion parameters, and optical center are
applied.

To map the 3D coordinates from the real world to a two-dimensional image, a 3x4 projec-
tion matrix (often referred to as a camera matrix) is used. When we multiply the 3D coordi-
nates with this matrix, we will receive the 2D coordinates of the projected point on the
image pane.

The figure below illustrates the steps of the projection process when 3D real world coordi-
nates are transformed to the 2D image coordinates.

Figure 25: Projection of 3D World Coordinates onto 2D Image Coordinates

Source: Created on behalf of IU (2022).

75
To apply the projection steps illustrated above, we need to know the intrinsic and extrinsic
camera parameters. These can be estimated using camera calibration. To understand the
practical implementation of the calibration process we will look at flexible techniques for
camera calibration (Zhang, 2000).

This technique uses two or more images as an input as well as the size of the object. A
good object for camera calibration is, for instance, a checkerboard. After the calibration
process, we will receive the extrinsic parameters rotation and translation and the intrinsic
camera parameters optical center, focal length, and distortion.

Figure 26: Camera Calibration Using a Checkerboard Pattern

Source: Created on behalf of IU (2022).

The calibration process works as follows:

1. Select at least two sample images, which should be well-structured patterns, such as
a checkerboard pattern.
2. Identify distinctive points in each image. If we use a checkerboard pattern, this can,
for instance, be the corners of the individual squares. Because of the clear structure of
the checkerboard pattern with the black and white squares, the corners are easy to
detect. They have a high gradient at the corners in both directions.
3. Localization of the corners of the squares. For the checkerboard pattern this can be
done in a very robust manner. To be able to identify the 3D coordinates of the corners
in the 3D real world, we need to know the size of the checkerboard and need two or
more sample images. Moreover, we know the 2D coordinates of the corners in the
image from the picture that was taken by the camera. Using this information, we can
calculate the camera matrix and the distortion coefficients. The distortion coefficients
can be used by applying the Brown-Conrady model (Brown, 1966).

76
5.3 Feature Detection
In the context of computer vision, features can be defined as points of interest of an
image, which contain the required information to solve a respective problem (Hassaballah
et al., 2016). To find those features in a picture, there exists a large variety of feature detec-
tion algorithms. Once the features are detected, the semantic information about them can
be extracted. The coordinates of a feature, i.e., on which position it is located in an image,
is the feature keypoint. The semantic information extracted about a feature is stored in a
vector, which is also called a feature descriptor or feature vector. The detection and extrac-
tion of features is often an important part of the preprocessing in machine learning appli-
cations. The extracted feature vectors can subsequently be used as an input for image
classification. In motion tracking or recognition of individuals or similar objects in multi-
ple images, feature matching can be used.

The most common types of features are blobs, edges, and corners. Blobs are formed by a
group of pixels that have some properties in common. Regions that differ in properties
belong to different blobs. This can, for instance, be different color or brightness compared
to the areas surrounding a region. Edges are indicated by a significant change of the
brightness of pixels. They can be identified by a discontinuity of the image intensity, i.e., a
sudden change in the brightness of an image (Jain et al., 1995). Corners are the connec-
tion between two edges. The image below illustrates the difference between blobs (blue),
edges (red), and corners (yellow).

Figure 27: Detecting Blobs, Edges, and Corners

Source: Created on behalf of IU (2022).

If we want to detect all tomatoes in the picture, we can use an algorithm to detect all
blobs. However, there will still be the challenge of distinguishing tomatoes from other
round objects, like olives or cucumbers. This challenge can be tackled if we use a feature

77
description algorithm to extract the information that is characteristic of a tomato and con-
struct a feature descriptor from this information. The feature descriptor could, for
instance, include information about the surrounding n pixel values or the color of the pix-
els.

Once we have the feature descriptor for our cucumber candidate, it is possible to compare
it with other feature descriptors from cucumber images using a feature matching algo-
rithm. This feature matching algorithm allows us to detect all the cucumber slices in the
image. As we have seen in our example, feature engineering is usually performed in three
steps:

1. Feature detection
2. Feature description/extraction
3. Feature matching

Feature detection

To detect features such as edges or corners, there exist several methods. To detect edges
in images, 2D convolution can be used. Edges are characterized by a significant difference
of the pixel values to the surrounding pixels. If we look at an edge, there will be a clear
difference in brightness and/or color compared to the surrounding pixels.

Figure 28: Edge Detection in an Image

Source: Created on behalf of IU (2022).

The figure above shows an example of edge detection. The edge between the road and the
surrounding grass is clearly visibly in this example. On the upper left part of the zoomed in
image we can see some variations of dark green colors, the lower right part is filled with
variations of light gray. The edge separates both parts of the image.

Two techniques that are commonly used for edge detection are the Canny edge detector
and the Sobel filter. The Canny edge detection (Canny, 1986) analyzes the change between
pixel values. For this purpose, it uses the derivatives of the x and y coordinates. The algo-
rithm works with two–dimensional values, i.e., it works only on single color images such
as gray scaled images. The figure below shows the result of the Canny edge detection in
our example picture.

78
Figure 29: Example for Canny Edge Detection

Source: Created on behalf of IU (2022).

When using Sobel filters for edge detection, two special kernel matrices are used, one for
each of the axes. These Sobel operators use convolution to transfer the original image into
a gradient image. High frequencies in the gradient image indicate areas with the highest
changes in pixel intensity which are likely to be edges. Therefore, in a second step, the
algorithm is often combined with a threshold function to detect the edges. The figure
below shows the Sobel edge detection for the x and y direction.

Figure 30: Example for Sobel Edge Detection

Source: Created on behalf of IU (2022).

For corner detection in images, one of the most prominent algorithms is the Harris corner
detection (Harris & Stephens, 1988). This algorithm analyzes the change of the pixel values
in a sliding window that is moved in different directions. The sliding window can be as
small as, for instance, 7x7 pixels. The figure illustrates how flat areas, edges, and corners
can be detected using the sliding window technique.

79
Figure 31: Harris Corner Detection

Source: Created on behalf of IU (2022).

The left image shows the window in a flat area with no edges or corners. In the underlying
window, there is no significant change in the values of the pixels if the window is moved
into any direction. In the middle image, the window is moved on an edge but does not
touch the other edge. This means we only have a change in a pixel value when we move
the image in the horizontal direction. If we move the image in a vertical direction, there
will be no changes in the pixel value. In the image on the right, the sliding window is
moved over a corner. In this image, we will have a significant change in the pixel value no
matter in which direction we move the image.

Therefore, if we want to detect corners, we have to find the window where the change of
the underlying pixels is maximized in all directions. To formalize this idea mathematically,
Harris corner detection uses the Sobel operators which were explained previously.

Feature description

For further processing of the features detected in the feature detection step, it is important
to be able to describe those features in a way that a computer can use them and distin-
guish one from another. For this purpose, we use feature vectors/feature descriptors,
which contain semantic information about the features. One possibility to describe fea-
tures is the Binary Robust Independent Elementary Features (BRIEF) algorithm (Calonder
et al., 2010). To describe a feature, a binary vector is used.

The vector is constructed using an image patch, i.e., a square with a set pixel width and
height, which is constructed by comparing the intensity of a pair of pixels. In a first step, a
patch p at position x is first smoothed. Afterwards the pixel intensity p x is computed. In
a test τ the result of the comparison is coded into a binary value according to the following
equation:

1ifp x < p y
τ p; x, y : =
0otherwise

80
The major advantage of the BRIEF algorithm is that it is fast to compute and easy to imple-
ment. However, feature extraction for features that are rotated more than 35 degrees is no
longer accurate (Hassaballah et al., 2016). Algorithms like Oriented FAST and Rotated
BRIEF (ORB) try to overcome this limitation (Rublee et al., 2011).

Another algorithm for feature description is the SIFT algorithm (Scale-Invariant Feature
Transform) (Lowe, 1999). The SIFT algorithm has been enhanced by the SURF algorithm
(Speeded-Up Robust Features) (Bay et al., 2008), which provides a performance improved
variation of the SIFT algorithm. However, as both algorithms have been patented, they
cannot be used as freely as for instance ORB. Additionally compared to ORB their accuracy
is lower and the computational cost higher (Rublee et al., 2011).

Feature matching

The goal of feature matching is to identify similar features in different images. This could,
for instance, be when detecting the same person in different scenarios. Feature matching
is an important component in tasks like camera calibration, motion tracking, object recog-
nition, and tracking.

One very simple technique for feature matching is brute force matching, which compares
the feature descriptors of source and target image and computes the distance between
those images (Jakubovic & Velagic, 2018). For numeric values of the feature vectors, we
can use the Euclidean distance (Wang et al., 2005). For binary vectors, because they are
generated when using the BRIEF algorithm, the Hamming distance is an appropriate
approach to calculate the distance (Torralba et al., 2008).

Especially when dealing with large datasets and high dimensional feature vectors, Fast
Library for Approximate Nearest Neighbors (FLANN) provides a more sophisticated
method for feature matching. It contains a set of algorithms using a nearest neighbors
search and has lower computational costs than brute force matching. The most appropri-
ate algorithm is automatically selected depending on the dataset. However, it is less accu-
rate than brute force matching (Muja & Lowe, 2009).

Important Characteristics for Feature Detection and Extraction

According to Hassaballah et al. (2016), there are several characteristics a good algorithm
for feature detection and extraction from images should have: robustness, repeatability,
accuracy, generality, efficiency and quantity. The characteristics are explained in the table
below.

Table 3: Important Characteristics of a Feature Detection and Extraction Algorithm

Robustness Reliable feature detection even under difficult con-

ditions, such as different lighting conditions, noise
(disturbed image signals), or changes in position,
scale, or rotation of the feature

Repeatability Replicability of feature detection independent from

perspective and angle

81
Accuracy Accurate localization of a feature in an image based
on its pixel position

Generality Application of feature detection and extraction in a

different use case without additional adaption

Efficiency Low computational costs

Quantity Ability of the algorithm to detect (almost) all fea-

tures present in an image to be able to generate a
meaningful representation of that image

Source: Created on behalf of IU (2022).

Challenges in Feature Detection

When performing feature detection and extraction on an image, there are several chal-
lenges. While humans can easily identify objects no matter how they are located or lit,
those differences can pose a great challenge for a computer. Therefore, there is still much
ongoing research to develop algorithms that are less prone to factors, such as noise, vary-
ing lighting conditions, changes of camera perspectives, rotation or translation of objects,
and changes of scale.

5.4 Semantic segmentation

In semantic segmentation, also known as image segmentation, parts of an image that
belong to the same object class are put into the same cluster (Li et al., 2018; Thoma, 2016).
The prediction is performed on a pixel-level, i.e., each pixel of an image will be classified
according to its category.

To perform the semantic segmentation, the algorithm receives an image with one or more
objects as an input, and outputs an image where each pixel is labeled according to its cat-
egory. The figure below illustrates how semantic segmentation can be applied to an
image. In the image, every pixel is either categorized as background, chair, or coffee table.

82
Figure 32: Example for Semantic Segmentation

Source: Created on behalf of IU (2022).

Semantic Segmentation Techniques

Algorithms for semantic segmentation can be seen as a structured labeling problem on a

pixel by pixel level and are often based on convolutional neural networks (CNNs) (Chen et
al., 2016; Long et al., 2015).

If a CNN-based strategy is used, a network on top of the convolutional layers can be

learned. This network consists of deconvolution and convolution layers. After training the
network it can be applied to the proposals of the individual objects. The final semantic
segmentation map is constructed as a combination of the results from the instance-wise
segmentations (Noh et al., 2015). The architecture of the entire neural network is illus-
trated in the figure below.

Figure 33: Network Architecture for Semantic Segmentation

Source: Created on behalf of IU (2022).

The convolutional part of the network is used for feature extraction. It transforms the
image from the input into a multidimensional representation of its features. The deconvo-
lution network uses the features that have been extracted from the convolution network

83
to generate the shapes of the object segmentation. Its unpooling and deconvolution lay-
ers are used to identify class labels based on the pixels and predict the segmentation
masks. It generates a probability map as an output, which has the same size as the input
image. For each pixel this probability map indicates the probability of it belonging to one
of the given classes (Noh et al., 2015). Additionally, to refine the label map, it is possible to
Conditional random apply fully connected conditional random fields to the output of the network (Krähen-
fields bühl & Koltun, 2012).
An undirected probabilis-
tic model that also con-
siders neighboring sam- Use Cases
ples for classification is
known as a conditional
random field. Semantic image segmentation can be helpful many use cases:

• Autonomous driving: detecting other vehicles, road lanes, pedestrians, or sidewalks

(Kaymak & Uçar, 2019).
• GeoSensing: analyze information about land usage such as agricultural areas, forests, or
areas of water from satellite images (Pollatos et al., 2020).
• Pose estimation/motion capture: identifying and tracking of body parts like legs, arms,
head or eyes (Liu et al., 2013).
• Medicine: detecting of affected brain areas by tumors (Işın et al., 2016).

SUMMARY
Computer vision is an interdisciplinary field that combines methods
from computer science, engineering, and artificial intelligence. It dates
back to the 1960s when researchers first tried to mimic the visual system
of humans. Typical tasks in computer vision deal with topics such as rec-
ognition tasks, image restoration, motion analysis, and geometry recon-
struction.

In computer vision, images are represented using pixels. Models like the
Brown-Conrady model can be used to address the distortion of digital
images. Besides that, it is also important to know the calibration param-
eters of a camera to address radial and tangential distortion.

Feature detection algorithms in computer vision can be used to detect

the points of interest. After the feature detection, the features are trans-
formed into feature vectors, which can then be used for feature match-
ing.

Using methods from semantic segmentation, the pixels of an image can

be put into different categories to classify the content of an image.

84
BACKMATTER
LIST OF REFERENCES
Aqel, M. O. A., Marhaban, M. H., Saripan, M. I., & Ismail, N. B. (2016). Review of visual odom-
etry: Types, approaches, challenges, and applications. SpringerPlus, 5(1), 1897. https:/
/doi.org/10.1186/s40064-016-3573-7

Automatic Language Processing Advisory Committee. (1966). Language and machines.

Computers in translation and linguistics. The National Academies Press. https://doi.org
/10.17226/9547

Barik, D., & Mondal, M. (2010). Object identification for computer vision using image seg-
mentation. In V. Mahadevan & G. S. Tomar (Eds.), ICETC 2010.2010 2nd international
conference on education technology and computer. IEEE. https://doi.org/10.1109/ICET
C.2010.5529412

Bay, H., Ess, A., Tuytelaars, T., & van Gool, L. (2008). Speeded-up robust features (SURF).
Computer Vision and Image Understanding, 110(3), 346–359.

Beel, J., Gipp, B., Langer, S., & Breitinger, C. (2016). Research-paper recommender systems:
A literature survey. International Journal on Digital Libraries, 17(4), 305–338. https://do
i.org/10.1007/s00799-015-0156-0

Bracken, H. M. (Ed.). (1984). Mind and language. De Gruyter Mouton. https://doi.org/10.151

5/9783110850413-003

Brown, D. C. (1966). Decentering distortion of lenses. Photogrammetric Engineering, 444–

462.

Buchanan, B. G. (2005). A (very) brief history of artificial intelligence. AI Mag, 26, 53–60.

Calonder, M., Lepetit, V., Strecha, C., & Fua, P. (2010). BRIEF: Binary Robust Independent
Elementary Features. In K. Daniilidis, P. Maragos, & N. Paragios (Eds.), Lecture Notes in
Computer Science. Computer Vision – ECCV 2010 (Vol. 6314, pp. 778–792). Springer. htt
ps://doi.org/10.1007/978-3-642-15561-1_56

Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on Pat-

tern Analysis and Machine Intelligence, 8(6), 679–698. https://doi.org/10.1109/TPAMI.19
86.4767851

Cer, D., Yang, Y., Kong, S., Hua, N., Limtiaco, N. L. U., John, R. S., Constant, N., Guajardo-
Céspedes, M., Yuan, S., Tar, C., Sung, Y., Strope, B., & Kurzweil, R. (2018). Universal Sen-
tence Encoder. EMNLP Demonstration. https://arxiv.org/abs/1803.11175

Chen, L.-C., Papandreou, G., Kokkinos, I., Murphy, K., & Yuille, A. L. (2016, June 7). Semantic
Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. https://d
oi.org/10.48550/arXiv.1412.7062

86
Chen, Y., Tian, Y., & He, M. (2020). Monocular human pose estimation: A survey of deep
learning-based methods. Computer Vision and Image Understanding, 192, 102897.

Chidambaram, M., Yang, Y., Cer, D., Yuan, S., Sung, Y.–H., Strope, B., & Kurzweil, R. (2019).
Learning cross-lingual sentence representations via a multi-task dual-encoder model. ht
tp://arxiv.org/pdf/1810.12836v4

Chomsky, N. (1957). Syntactic structures. Mouton de Gruyter. https://doi.org/10.1515/9783

110218329

Crevier, D. (1993). Ai - The tumultuous history of the search for artificial intelligence. Basic
Books, Inc.

D’Acunto, F., Prabhala, N., & Rossi, A. G. (2019). The promises and pitfalls of robo-advising.
The Review of Financial Studies, 32(5), 1983–2020. https://doi.org/10.1093/rfs/hhz014

Devlin, J., Chang, M.–W., Lee, K., & Toutanova, K. (2018, October 11). BERT: Pre-training of
Deep Bidirectional Transformers for Language Understanding. http://arxiv.org/pdf/181
0.04805v2

Dhruv, B., Mittal, N., & Modi, M. (2017). Analysis of different filters for noise reduction in
images. [RN1][SC2]2017 Recent Developments in Control, Automation & Power Engi-
neering (RDCAPE) (pp. 410–415). IEEE. https://doi.org/10.1109/RDCAPE.2017.8358306

Feigenbaum, E. (2012). McCarthy as scientist and engineer, with personal recollections. AI

Magazine, 33(4), 17—18. https://doi.org/10.1609/aimag.v33i4.2442

Flasiński, M. (2016). History of Artificial Intelligence. In M. Flasiński (Ed.), Introduction to

Artificial Intelligence (pp. 3–13). Springer International Publishing. https://doi.org/10.1
007/978-3-319-40022-8_1

Blosch, M., & Fenn, J. (2018, August 20). Understanding Gartner’s hype cycles. Gartner. http
s://www.gartner.com/en/documents/388776

Gartner. (2021, September 7). Gartner identifies four trends driving near-term artificial intel-
ligence innovation [Press release]. https://www.gartner.com/en/newsroom/press-rele
ases/2021-09-07-gartner-identifies-four-trends-driving-near-term-artificial-intelligenc
e-innovation

Ghosh, A., & Veale, D. T. (2016). Fracking sarcasm using neural network. In A. Balahur, E.
van der Goot, P. Vossen, & A. Montoyo (Eds.), Proceedings of the 7th workshop on com-
putational approaches to subjectivity, sentiment and social media snalysis (pp. 161–
169). Association for Computational Linguistics. https://doi.org/10.18653/v1/W16-042
5

Giles, M. (2018, December 19). The man turning China into a quantum superpower. MIT
Technology Review. https://www.technologyreview.com/2018/12/19/1571/the-man-t
urning-china-into-a-quantum-superpower/

87
Giles, T. D. (2016). Aristotle writing science. An application of his theory. Journal of Techni-
cal Writing and Communication, 46(1), 83–104. https://doi.org/10.1177/0047281615600
633

Grace, K., Salvatier, J., Dafoe, A., Zhang, B., & Evans, O. (2017, May 24). When will AI exceed
human performance? Evidence from AI Experts. https://arxiv.org/abs/1705.08807?xtor=
AL-32280680#:~:text=Researchers%20believe%20there%20is%20a,much%20sooner%
20than%20North%20Americans.

Han, X.–F., Laga, H., & Bennamoun, M. (2021). Image-based 3D object reconstruction:
State-of-the-art and trends in the deep learning era. IEEE Transactions on Pattern Anal-
ysis and Machine Intelligence, 43(5), 1578–1604. https://doi.org/10.1109/TPAMI.2019.2
954885

Harris, C., & Stephens, M. (1988, September). A combined corner and edge detector. In C. J.
Taylor (Ed.), Procedings of the Alvey Vision Conference 1988 (23.1-23.6). Alvey Vision
Club. https://doi.org/10.5244/C.2.23

Hassaballah, M., Abdelmgeid, A. A., & Alshazly, H. A. (2016). Image Features Detection,
Description and Matching. In A. I. Awad & M. Hassaballah (Eds.), Image Feature Detec-
tors and Descriptors: Foundations and Applications (Vol. 630, pp. 11–45). Springer
International Publishing. https://doi.org/10.1007/978-3-319-28854-3_2

Holler, M. J. (2012). Von Neumann, Morgenstern, and the creation of game theory: From
Chess to Social Science, 1900–1960 [Review of the book Von Neumann, Morgenstern,
and the creation of game theory: From Chess to Social Science, 1900–1960, by R. Leo-
nard]. The European Journal of the History of Economic Thought, 19(1), 131--135.

Horgan, J. (1993). The mastermind of artificial intelligence. Scientific American, 269(5), 35–
38. https://doi.org/10.1038/scientificamerican1193-35

Hutchins, J. (1995). "The wisky was invisible", or persistent myths of MT. MT News Interna-
tional, 11, 17–18. https://aclanthology.org/www.mt-archive.info/90/MTNI-1995-Hutchi
ns.pdf

Hutchins, J. (1997). From first conception to first demonstration: The nascent years of
machine translation, 1947–1954. A chronology. Machine Translation, 12(3), 195–252. ht
tps://doi.org/10.1023/A:1007969630568

Işın, A., Direkoğlu, C., & Şah, M. (2016). Review of MRI-based brain tumor image segmenta-
tion using deep learning methods. Procedia Computer Science, 102, 317–324. https://d
oi.org/10.1016/j.procs.2016.09.407

Islam, N., Islam, Z., & Noor, N. (2017, October 3). A survey on optical character recognition
system. Arxiv. http://arxiv.org/pdf/1710.05703v1

88
Iyyer, M., Manjunatha, V., Boyd-Graber, J., & Daumé III, H. (2015). Deep unordered compo-
sition rivals syntactic methods for text classification. In C. Zong & M. Strube (Eds.), Pro-
ceedings of the 53rd annual meeting of the association for computational linguistics and
the 7th international joint conference on natural language processing: Vol. 1. Long
papers (pp. 1681–1691). Association for Computational Linguistics. https://doi.org/10.
3115/v1/P15-1162

Jain, R., Kasturi, R., & Schunck, B. G. (1995). Machine vision. McGraw-Hill Professional.

Jakubovic, A., & Velagic, J. (2018). Image feature matching and object detection using
brute-force matchers. In M. Muštra, M. Grgić, B. Zovko-Cihlar & D. Vitas (Eds.), Proceed-
ings of ELMAR-2018. 60th international symposium ELMAR-2018 (pp. 83–86). IEEE. https:/
/doi.org/10.23919/ELMAR.2018.8534641

Kaddari, Z., Mellah, Y., Berrich, J., Belkasmi, M. G., & Bouchentouf, T. (2021). Natural lan-
guage processing: Challenges and future directions. In T. Masrour, I. El Hassani & A.
Cherrafi (Eds.), Lecture Notes in Networks and Systems. Artificial intelligence and
industrial applications (Vol. 144, pp. 236–246). Springer International Publishing. https
://doi.org/10.1007/978-3-030-53970-2_22

Kaymak, Ç., & Uçar, A. (2019). A brief survey and an application of semantic image seg-
mentation for autonomous driving. In V. E. Balas, S. S. Roy, D. Sharma, & P. Samui
(Eds.), Handbook of Deep Learning Applications (Vol. 136, pp. 161–200). Springer Inter-
national Publishing. https://doi.org/10.1007/978-3-030-11479-4_9

Kim, Y., Petrov, P., Petrushkov, P., Khadivi, S., & Ney, H. (2019, September 20). Pivot-based
transfer learning for neural machine translation between non-English languages. http:/
/arxiv.org/pdf/1909.09524v1

Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R. S., Torralba, A., Urtasun, R., & Fidler, S. (2015,
June 22). Skip-thought vectors. http://arxiv.org/pdf/1506.06726v1

Koehn, P., & Knowles, R. (2017, June 13). Six challenges for neural machine translation. http
://arxiv.org/pdf/1706.03872v1

Krähenbühl, P., & Koltun, V. (2012, 20 October). Efficient inference in fully connected CRFs
with Gaussian edge potentials. Advances in Neural Information Processing Systems, 24,
109—117. https://doi.org/10.48550/arXiv.1210.5644

Kuipers, M., & Prasad, R. (2021). Journey of Artificial Intelligence. Wireless Personal Com-
munications, 123, 3275—3290. https://doi.org/10.1007/s11277-021-09288-0

Kurzweil, R. (2014). The singularity is near. In R. L. Sandler (Ed.), Ethics and emerging tech-
nologies (pp. 393–406). Palgrave Macmillan. https://doi.org/10.1057/9781137349088_2
6

89
Laguarta, J., Hueto, F., & Subirana, B. (2020). COVID-19 artificial intelligence diagnosis
using only cough recordings. IEEE Open Journal of Engineering in Medicine and Biology,
1, 275–281. https://doi.org/10.1109/OJEMB.2020.3026928

Leonard, R. (2010). Von Neumann, Morgenstern, and the creation of game theory: From
chess to social science, 1900--1960. Historical perspectives on modern economics.
Cambridge University Press. https://search.ebscohost.com/login.aspx?direct=true&sc
ope=site&db=nlebk&db=nlabk&AN=783042

Li, B., Shi, Y., Qi, Z., & Chen, Z. (2018). A survey on semantic segmentation. In H. Tong, Z. Li,
F. Zhu & J. Yu (Eds.), 18th IEEE international conference on data mining workshops.
ICDMW 2018 (pp. 1233–1240). IEEE. https://doi.org/10.1109/ICDMW.2018.00176

Liu, Y., Gall, J., Stoll, C., Dai, Q., Seidel, H.-P., & Theobalt, C. (2013). Markerless motion cap-
ture of multiple characters using multiview image segmentation. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 35(11), 2720–2735. https://doi.org/10.1109/
TPAMI.2013.47

Long, J., Shelhamer, E., & Darrell, T. (2015, March 8). Fully convolutional networks for
semantic segmentation. https://doi.org/10.48550/arXiv.1411.4038

Lowe, D. G. (1999). Object recognition from local scale-invariant features. Proceedings of

the seventh IEEE international conference on computer vision (Vol. 2, 1150—1157). IEEE.
https://doi.org/10.1109/ICCV.1999.790410

Lyon, R. F. (2006). A brief history of “pixel.” Digital Photography II, 6069.

Lyra, M., Ploussi, A., & Georgantzoglou, A. (2011). MATLAB as a tool in nuclear medicine
image processing. In C. Ionescu (Ed.), MATLAB - A ubiquitous tool for the practical engi-
neer. IntechOpen. https://doi.org/10.5772/19999

Masnick, M. (2014, June 9). No, a 'Supercomputer' did not pass the Turing test for the first
time and everyone should know better. Techdirt. https://www.techdirt.com/articles/20
140609/07284327524/no-computer-did-not-pass-turing-test-first-time-everyone-shoul
d-know-better.shtml

May, C., Ferraro, F., McCree, A., Wintrode, J., Garcia-Romero, D., & van Durme, B. (2015).
Topic identification and discovery on text and speech. In L. Màrquez, C. Callison-
Burch, & J. Su (Eds.), Proceedings of the 2015 conference on empirical methods in natu-
ral language processing (pp. 2377–2387). Association for Computational Linguistics. ht
tps://doi.org/10.18653/v1/D15-1285

McCarthy, J. (1960). Recursive functions of symbolic expressions and their computation by

machine, Part I. Communications of the ACM, 3(4), 184–195. https://doi.org/10.1145/36
7177.367199

90
McCarthy, J., Minsky, M. L., Rochester, N., & Shannon, C. E. (1955). A proposal for the Dart-
mouth Summer Research Project on Artificial Intelligence. AI Magazine, 27(4). https://d
oi.org/10.1609/aimag.v27i4.1904

McKinsey & Company (2021). Global survey: The state of AI in 2021. https://www.mckinsey.c
om/~/media/McKinsey/Business%20Functions/McKinsey%20Analytics/Our%20Insigh
ts/Global%20survey%20The%20state%20of%20AI%20in%202021/Global-survey-The-
state-of-AI-in-2021.pdf

Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing Order into Text. Proceedings of the
2004 Conference on empirical methods in natural language processing (pp. 404–411).
https://aclanthology.org/W04-3252

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013, September 7). Efficient estimation of
word representations in vector space. http://arxiv.org/pdf/1301.3781v3

Muja, M., & Lowe, D. G. (2009, February 5–8). Fast approximate nearest neighbors with
automatic algorithm configuration. In A. K. Ranchordas & H. Araújo (Eds.), Proceedings
of the fourth international conference on computer vision theory and applications (pp.
331–340). SciTePress. https://doi.org/10.5220/0001787803310340

Nasukawa, T., & Yi, J. (2003). Sentiment analysis: Capturing favorability using natural lan-
guage processing. In J. Gennari, B. Porter, & Y. Gil (Eds.), Proceedings of the 2nd Inter-
national Conference on Knowledge Capture, 70–77. https://doi.org/10.1145/945645.94
5658

Negnevitsky, M. (2011). Artificial Intelligence: A Guide to Intelligent Systems (3rd ed.). Addi-
son Wesley.

Newquist, H. P. (1994). The brain makers: The history of artificial intelligence – Genius, ego,
And greed in the quest for machines that think. Sams Publishing.

Nilsson, N. J. (2009). The quest for artificial intelligence. Cambridge University Press. https:
//doi.org/10.1017/CBO9780511819346

Noh, H., Hong, S., & Han, B. (2015, May 17). Learning deconvolution network for semantic
segmentation. http://arxiv.org/pdf/1505.04366v1

O’Mahony, N., Campbell, S., Carvalho, A., Harapanahalli, S., Hernandez, G. V., Krpalkova,
L., Riordan, D., & Walsh, J. (2020). Deep learning vs. traditional computer vision. In K.
Arai & S. Kapoor (Eds.), Advances in intelligent systems and computing. Advances in
computer vision (Vol. 943, pp. 128–144). Springer International Publishing. https://doi.
org/10.1007/978-3-030-17795-9_10

91
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representa-
tion. In Q. C. R. I. Alessandro Moschitti, G. Bo Pang, & U. o. A. Walter Daelemans (Eds.),
Proceedings of the 2014 conference on empirical methods in natural language process-
ing (EMNLP) (pp. 1532–1543). Association for Computational Linguistics. https://doi.or
g/10.3115/v1/D14-1162

Pollatos, V., Kouvaras, L., & Charou, E. (2020, October 13). Land cover semantic segmenta-
tion using ResUNet. http://arxiv.org/pdf/2010.06285v1

PricewaterhouseCoopers. (2018). Sizing the prize. What’s the real value of AI for your busi-
ness and how can you capitalize? https://www.pwc.com/gx/en/issues/analytics/assets/
pwc-ai-analysis-sizing-the-prize-report.pdf

Rublee, E., Rabaud, V., Konolige, K., & Bradski, G. (2011, November 6–13). ORB: An efficient
alternative to SIFT or SURF. 011 International Conference on Computer Vision (ICCv
2011) (pp. 2564–2571). IEEE. https://doi.org/10.1109/ICCV.2011.6126544

Russell, S. J., & Norvig, P. (2022). Artificial intelligence: A modern approach (4th ed.). Pear-
son.

Schwartz, O. (2019). In the 17th century, Leibniz dreamed of a machine that could calculate
ideas. The machine would use an “alphabet of human thoughts” and rules to combine
them. IEEE Spectrum. https://spectrum.ieee.org/in-the-17th-century-leibniz-dreamed
-of-a-machine-that-could-calculate-ideas

Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417–
424. https://doi.org/10.1017/S0140525X00005756

Sharif, W., Samsudin, N. A., Deris, M. M., & Naseem, R. (2016, August 24–26). Effect of nega-
tion in sentiment analysis. In E. Ariwa (Ed.), 2016 sixth international conference on
innovative computing technology (INTECH) (pp. 718–723). IEEE. https://doi.org/10.1109
/INTECH.2016.7845119

Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., van den Driessche, G., Schritt-
wieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D.,
Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Grae-
pel, T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks and
tree search. Nature, 529(7587), 484–489. https://doi.org/10.1038/nature16961

Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L.,
Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., & Hassabis, D. (2018). A general
reinforcement learning algorithm that masters chess, shogi, and Go through self-play.
Science, 362(6419), 1140–1144. https://doi.org/10.1126/science.aar6404

Smith, S. W. (1997). The scientist and engineer's guide to digital signal processing. Califor-
nia Technical Publ.

92
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine
Learning, 3(1), 9–44. https://doi.org/10.1023/A:1022633531479

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.).
Adaptive computation and machine learning. MIT Press.

Szeliski, R. (2022). Computer vision: Algorithms and applications (2nd ed.). Springer Inter-
national Publishing. https://doi.org/10.1007/978-3-030-34372-9

Thoma, M. (2016, May 11). A survey of semantic segmentation. http://arxiv.org/pdf/1602.06

541v2

Torralba, A., Fergus, R., & Weiss, Y. (2008, June 23–28). Small codes and large image data-
bases for recognition. In 2008 IEEE conference on computer vision and pattern recogni-
tion (pp. 1–8). IEEE. https://doi.org/10.1109/CVPR.2008.4587633

Turing, A. M. (1950). Computing machinery and intelligence. Mind, LIX(236), 433–460. https
://doi.org/10.1093/mind/LIX.236.433

Turovsky, B. (2016, April 28). "Ten years of Google Translate". https://blog.google/products

/translate/ten-years-of-google-translate/

van Otterlo, M., & Wiering, M. (2012). Reinforcement Learning and Markov Decision Proc-
esses. In M. Wiering & M. van Otterlo (Eds.), Reinforcement Learning: State-of-the-Art
(Vol. 12, pp. 3–42). Springer. https://doi.org/10.1007/978-3-642-27645-3_1

Wang, A., Qiu, T., & Shao, L. (2009). A simple method of radial distortion correction with
centre of distortion estimation. Journal of Mathematical Imaging and Vision, 35(3),
165–172. https://doi.org/10.1007/s10851-009-0162-1

Wang, L., Zhang, Y., & Feng, J. (2005). On the Euclidean distance of images. IEEE Transac-
tions on Pattern Analysis and Machine Intelligence, 27(8), 1334–1339. https://doi.org/10
.1109/TPAMI.2005.165

Weizenbaum, J. (1966). ELIZA—A computer program for the study of natural language
communication between man and machine. Communications of the ACM, 9(1), 36–45.
https://doi.org/10.1145/365153.365168

Wiley, V., & Lucas, T. (2018). Computer Vision and Image Processing: A paper review. Inter-
national Journal of Artificial Intelligence Research, 2(1), 29–36. https://doi.org/10.2909
9/ijair.v2i1.42

Woods, W. A. (1973). Progress in natural language understanding. AFIPS ’73: Proceedings

of the June 4—8, 1973, national computer conference and exposition (pp. 441—450).
ACM Press. https://doi.org/10.1145/1499586.1499695

Wright, J. P. (2009). Hume's “A Treatise of Human Nature”: An introduction. Cambridge Uni-

versity Press. https://doi.org/10.1017/CBO9780511808456

93
Yilmaz, A., Javed, O., & Shah, M. (2006). Object tracking: A survey. ACM Computing Surveys,
38(4), 13. https://doi.org/10.1145/1177352.1177355

Zhang, D., Mishra, S., Brynjolfsson, E., Etchemendy, J., Ganguli, D., Grosz, B., Lyons, T.,
Manyika, J., Niebles, J. C., Sellitto, M., Shoham, Y., Clark, J., & Perrault, R. (2021, March
9). The AI index 2021 annual report. https://aiindex.stanford.edu/wp-content/uploads/
2021/11/2021-AI-Index-Report_Master.pdf

Zhang, Z. (2000). A flexible new technique for camera calibration. IEEE transactions on pat-
tern analysis and machine intelligence, 22(11), 1330–1334. https://doi.org/10.1109/34.8
88718

Zimmermann, T., Kotschenreuther, L., & Schmidt, K. (2016, June 21). Data-driven HR -
Résumé analysis based on natural language processing and machine learning. https://
doi.org/10.48550/arXiv.1606.05611

94
LIST OF TABLES AND
FIGURES
Figure 1: Historical Development of Al . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Figure 2: Aristotle, Greek Philosopher (384-322 BCE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

Figure 3: Important Aspects of Al . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Figure 4: Components of an Expert System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Figure 5: The Gartner Hype Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Figure 6: Application Areas of Al . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Figure 7: The Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Figure 8: Initial Situation in the Labyrinth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

Table 1: Basic Terms of Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

Figure 9: The Process of Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Figure 10: Transitions in the Labyrinth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

Figure 11: Example for Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Figure 12: Text-to-text Translation as a Part of Speech-to-Speech Translation . . . . . . . . . . 49

Figure 13: Part-of-Speech Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Figure 14: The Importance of Semantics in NLP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Figure 15: NLP Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Figure 16: Comparison of CBOW and Skip-Gram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Figure 17: Categories of Computer Vision Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

Figure 18: Pixels of a Digital Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Table 2: Color Representations in Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Lecture 4
No ratings yet
Lecture 4
22 pages
Unit 1 - PPT
No ratings yet
Unit 1 - PPT
67 pages
Chapter 2 Digital Image Fundamentalsn Final
No ratings yet
Chapter 2 Digital Image Fundamentalsn Final
85 pages
Lecture 1 Introduction No Ink
No ratings yet
Lecture 1 Introduction No Ink
37 pages
Wa0000.
No ratings yet
Wa0000.
18 pages
Computer Vision and Image Processing (Updated)
No ratings yet
Computer Vision and Image Processing (Updated)
165 pages
Classical Computer Vision - Session 1
No ratings yet
Classical Computer Vision - Session 1
130 pages
Computer Vision MCQs
No ratings yet
Computer Vision MCQs
3 pages
Computer Vision
No ratings yet
Computer Vision
17 pages
Computer Vision
No ratings yet
Computer Vision
12 pages
Unit 1
No ratings yet
Unit 1
15 pages
Machine - Learning (Computer Vision)
No ratings yet
Machine - Learning (Computer Vision)
56 pages
COMPUTER VISION Notes
No ratings yet
COMPUTER VISION Notes
3 pages
Computer Vision
No ratings yet
Computer Vision
14 pages
Computer Vision
No ratings yet
Computer Vision
13 pages
Digital Image Acquisition Sampling and Quantization
No ratings yet
Digital Image Acquisition Sampling and Quantization
74 pages
CV - Unit 1
No ratings yet
CV - Unit 1
14 pages
CH 3
No ratings yet
CH 3
22 pages
Digital Image Processing
No ratings yet
Digital Image Processing
57 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
AI 10th Grade Pdfs
No ratings yet
AI 10th Grade Pdfs
30 pages
Class 10:artificial Intelligence: Computer Vision
No ratings yet
Class 10:artificial Intelligence: Computer Vision
36 pages
Class X Artificial Intelligence: Computer Vision
No ratings yet
Class X Artificial Intelligence: Computer Vision
54 pages
C10 - Ai - Computer Vision
No ratings yet
C10 - Ai - Computer Vision
40 pages
6960795-Class10 Ai Partb Unit5 Computervision
No ratings yet
6960795-Class10 Ai Partb Unit5 Computervision
17 pages
Computer Vision
No ratings yet
Computer Vision
36 pages
Computer Vision Class X
No ratings yet
Computer Vision Class X
39 pages
DIgital Image Processing
No ratings yet
DIgital Image Processing
74 pages
AI-Computer Vision
No ratings yet
AI-Computer Vision
16 pages
Question Bank 9
No ratings yet
Question Bank 9
6 pages
PDF Computer Vision
No ratings yet
PDF Computer Vision
3 pages
CS312 Module 4
No ratings yet
CS312 Module 4
21 pages
Computer Vision
No ratings yet
Computer Vision
15 pages
CV Gtu Answers
No ratings yet
CV Gtu Answers
56 pages
Chapter-4 Computer Vision Study Material
No ratings yet
Chapter-4 Computer Vision Study Material
4 pages
PartA Unit5 Ass01
No ratings yet
PartA Unit5 Ass01
3 pages
ASSIGNMENT 5 - X - AI Handout Computer Vision1
No ratings yet
ASSIGNMENT 5 - X - AI Handout Computer Vision1
3 pages
Class 10 Revision
No ratings yet
Class 10 Revision
10 pages
Computer Vision
No ratings yet
Computer Vision
29 pages
CV (Unit1&2ans)
No ratings yet
CV (Unit1&2ans)
32 pages
HW 675075 1compu
No ratings yet
HW 675075 1compu
3 pages
Computer Vision
No ratings yet
Computer Vision
13 pages
Ch-Computer Vision
No ratings yet
Ch-Computer Vision
6 pages
2023 - 12 - 06 7 - 57 PM Office Lens
No ratings yet
2023 - 12 - 06 7 - 57 PM Office Lens
11 pages
Computer Vision Notes
No ratings yet
Computer Vision Notes
4 pages
Basic Image Opeartion
No ratings yet
Basic Image Opeartion
13 pages
52 BDB
No ratings yet
52 BDB
3 pages
Screenshot 2023-10-23 at 5.51.17 AM
No ratings yet
Screenshot 2023-10-23 at 5.51.17 AM
14 pages
Computer Vision
No ratings yet
Computer Vision
19 pages
Computer Vision Notes
No ratings yet
Computer Vision Notes
4 pages
Image Processing: Introduction & Fundamentals
No ratings yet
Image Processing: Introduction & Fundamentals
58 pages
Computer Vision
No ratings yet
Computer Vision
4 pages
Unit 4
No ratings yet
Unit 4
13 pages
Computer Vision: Facial Recognition
No ratings yet
Computer Vision: Facial Recognition
9 pages
Unit-5 Computer Vision
No ratings yet
Unit-5 Computer Vision
3 pages
"Introduction To Computer Vision": Submitted by
No ratings yet
"Introduction To Computer Vision": Submitted by
45 pages
RD 30
No ratings yet
RD 30
67 pages
DSP Unit 1 Notes
No ratings yet
DSP Unit 1 Notes
67 pages
Efficient Computation of The DFT: Fast Fourier Transform Algorithms
No ratings yet
Efficient Computation of The DFT: Fast Fourier Transform Algorithms
40 pages
Computer Vision Class 10 Notes
100% (5)
Computer Vision Class 10 Notes
7 pages
Poster Oscilloscope Fundamentals V 03001635132339506
No ratings yet
Poster Oscilloscope Fundamentals V 03001635132339506
1 page
Question Bank-Computer Graphics & IP
No ratings yet
Question Bank-Computer Graphics & IP
4 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
2 pages
Digital Image Processing Full Report
No ratings yet
Digital Image Processing Full Report
9 pages
RM Abdomen Basico Guglielmo2016
No ratings yet
RM Abdomen Basico Guglielmo2016
22 pages
CV Record
No ratings yet
CV Record
48 pages
Peculiarities of 3D Compression of Noisy Multichannel Images
No ratings yet
Peculiarities of 3D Compression of Noisy Multichannel Images
4 pages
DIP Lecture8 PDF
No ratings yet
DIP Lecture8 PDF
33 pages
Image Is NOT Perfect Sometimes: EE465: Introduction To Digital Image Processing 1
No ratings yet
Image Is NOT Perfect Sometimes: EE465: Introduction To Digital Image Processing 1
34 pages
S&DSP Module5 sk25
No ratings yet
S&DSP Module5 sk25
25 pages
Yamaha DSP Ax630 Se HTR 5560 Rds RX v630 Rds RX v730 Rds
No ratings yet
Yamaha DSP Ax630 Se HTR 5560 Rds RX v630 Rds RX v730 Rds
114 pages
File Digital Signal Processing
No ratings yet
File Digital Signal Processing
31 pages
EE3403
No ratings yet
EE3403
1 page
Discrete Wavelet Transform
No ratings yet
Discrete Wavelet Transform
10 pages
ECE 4316: Digital Signal Processing: Dr. Hany M. Zamel
No ratings yet
ECE 4316: Digital Signal Processing: Dr. Hany M. Zamel
23 pages
An Automated Multi Scale RETINEX With Color Restoration For Image Enhancement
No ratings yet
An Automated Multi Scale RETINEX With Color Restoration For Image Enhancement
5 pages
"Objectives" of Lecture #DSP: - The Need For DSP - Aliasing & Windowing - Introduction To FFT
No ratings yet
"Objectives" of Lecture #DSP: - The Need For DSP - Aliasing & Windowing - Introduction To FFT
21 pages
Boundary Detection in Medical Images Using Edge Following Algorithm Based On Intensity Gradient and Texture Gradient Features
No ratings yet
Boundary Detection in Medical Images Using Edge Following Algorithm Based On Intensity Gradient and Texture Gradient Features
7 pages
Edge Detection
No ratings yet
Edge Detection
36 pages
Adc and Dac Lec 2 (1st)
No ratings yet
Adc and Dac Lec 2 (1st)
10 pages
The Hit or Miss Transform
No ratings yet
The Hit or Miss Transform
10 pages
Siril Procedure For Astro Photography
No ratings yet
Siril Procedure For Astro Photography
3 pages
DIP Micro
No ratings yet
DIP Micro
2 pages
Design of Fir Filter Using Verilog HDL: Work Done by Under The Supervision of
No ratings yet
Design of Fir Filter Using Verilog HDL: Work Done by Under The Supervision of
19 pages
Digital Image Processing Seminar
80% (5)
Digital Image Processing Seminar
23 pages
Gaurab Banerjee: E9 213 (JAN) 3:0 Time-Frequency Analysis
No ratings yet
Gaurab Banerjee: E9 213 (JAN) 3:0 Time-Frequency Analysis
1 page
Adaptive Signal Processing: Synopsis
No ratings yet
Adaptive Signal Processing: Synopsis
2 pages
Unit I Signals and Systems
No ratings yet
Unit I Signals and Systems
3 pages
DSP - Vlsi
No ratings yet
DSP - Vlsi
1 page
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
From Everand
Computer Stereo Vision: Exploring Depth Perception in Computer Vision
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.