Cao 2020
Cao 2020
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT Road pavement cracks detection has been a hot research topic for quite a long time due to
the practical importance of crack detection for road maintenance and traffic safety. Many methods have
been proposed to solve this problem. This paper reviews the three major types of methods used in road
cracks detection: image processing, machine learning and 3D imaging based methods. Image processing
algorithms mainly include threshold segmentation, edge detection and region growing methods, which are
used to process images and identify crack features. Crack detection based traditional machine learning
methods such as neural network and support vector machine still relies on hand-crafted features using image
processing techniques. Deep learning methods have fundamentally changed the way of crack detection
and greatly improved the detection performance. In this work, we review and compare the deep learning
neural networks proposed in crack detection in three ways, classification based , object detection based
and segmentation based. We also cover the performance evaluation metrics and the performance of these
methods on commonly-used benchmark datasets. With the maturity of 3D technology, crack detection using
3D data is a new line of research and application. We compare the three types of 3D data representations
and study the corresponding performance of the deep neural networks for 3D object detection. Traditional
and deep learning based crack detection methods using 3D data are also reviewed in detail.
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
Researchers have conducted in-depth researches on road pavement detection technologies. These reviews address
crack detection and proposed many methods to crack the different emphasis or aspect on road surface detection. In this
problem, from image processing to machine learning meth- review, we provide an comprehensive review of pavement
ods, including deep learning methods which has been widely crack detection methods, especially the in-depth analysis of
used nowadays. Image processing methods mainly include deep learning and 3D image based methods.
three categories [6], threshold segmentation, edge detection
and region growing methods. The threshold segmentation The rest of the paper is organized as follows. Section II
method divides the image pixels into several categories by briefly reviews the crack detection methods mainly based
setting a proper pixel intensity threshold, so as to separate the on image processing techniques. Crack detection based on
target crack from the background. The edge detection method machine learning methods, including unsupervised learning,
detects the edges of the road crack through edge detection traditional supervised learning and deep learning, are re-
operators such as Sobel operator [7], Prewitt operator [8], and viewed in Section III. Section IV talks about the 3D imaging
Canny operator [9]. The region growing method depicts the technologies and corresponding methods for pavement defect
specific information inside the crack by assembling the pixels detection. Discussions about the existing problems and the
with similar characteristics to form a region. prospect of crack detection is presented in Section V. Section
The emergence of machine learning makes road crack de- VI concludes this work.
tection rise to a new level. Image processing techniques can
only be able to analyze some superficial defect features, II. CRACK DETECTION BASED ON IMAGE PROCESSING
while machine learning can learn some deep features. Ma- Pavement is exposed to the natural environment for long
chine learning takes advantage of the similarity between data time, often affected by rain, shadow, stains and other factors.
through the design of algorithms, so that the computer can Therefore, the images captured by imaging sensors usually
master the learning rules and predict from the unknown data contains a lot of noises, textures and interferences. Cracks on
by itself. Especially, deep learning methods have greatly images appear as thin, irregular, dark curves, surrounded by
advanced the accuracy of pavement crack detection. strong textured noise. Researchers have proposed various im-
Unlike other types of surface defects, pavement cracks are age processing methods to reduce the influence of the noise
usually deep and have large size, such as block cracks and on detection. These methods mainly include three categories:
alligator cracks [10]. It is practically meaningful to measure threshold segmentation, edge detection and region growing.
and detect the depth of the cracks. The depth of crack
is related to the strength of crack [11], and the detection A. THRESHOLD SEGMENTATION METHODS
of crack depth can predict the future trend of the crack, Threshold segmentation [17] is a classical method in image
which is helpful to repair the pavement in time and reduce segmentation. For each pixel in the image, we can judge
potential safety risks. In recent years, 3D imaging technology whether its characteristic attributes meet a threshold require-
has achieved great progress, making cracks detection in 3D ments to determine the pixel belongs to the target area or the
images has become a new research direction for scholars. background. This way, we can convert a gray image into a
Owing to the extra depth dimension, the 3D structure of road binary image. Let f (x, y) be the original image and T be the
cracks can be constructed from the 3D images. Besides this, threshold value, image segmentation can be written as
3D images can reduce the effect of shadow and other noise
1, f (x, y) ≥ T
[12]. g(x, y) =
0, f (x, y) < T
In recent years, there have been several reviews available Obtaining reasonable threshold value is the key of this
from the literature. Sylvie et al. summarized the application method. Dynamic threshold method and local threshold
of image processing technologies in road detection, and pro- method have achieved good results in pavement defect de-
posed a new automatic road cracks detection and evaluation tection. Oliveira et al. [18] recognized the potential cracks
comparison protocol [13]. In the work of [14], Kasthuri- by identifying dark pixels in images with dynamic threshold.
rangan et al. compared some deep learning frameworks, In their work, thresholded images are divided into non-
networks and hyper-parameters used in pavement crack overlapping blocks by entropy computation, and secondary
detection, and classified the previous papers, which provided dynamic threshold of the generated Entropy Block Matrix
a good reference for developing pavement crack detection is used as the basis for identifying image blocks containing
models. Tom et al. listed different kinds of pavement defects, crack pixels. Li et al. proposed a twice-threshold segmen-
discussed different defect detection methods and assessed tation [19]. Firstly, the improved Otsu threshold segmen-
different defect data acquisition devices [15]. In [16] Senthan tation algorithm was used to remove the road markers in
et al. discussed the detection of road surface lesions from the the runway image. Then, the improved adaptive iterative
perspective of 3D image defect detection, summarized the threshold segmentation algorithm was used to segment im-
application of 3D imaging technologies in road surface mon- ages which removed the markers. Finally, the outline of the
itoring, analyzed the imaging principle of different devices crack can be obtained through morphological denoising. In
and compared the advantages and disadvantages of different [20], a new multi-scale local optimal threshold segmentation
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
algorithm was proposed to segment pavement cracks through rules embedded in the data. Supervised learning and unsuper-
crack density distribution. Compared with the global thresh- vised learning are commonly used for cracks detection and
old method and the optimal threshold method, this method analysis.
achieved a better segmentation effect.
A. UNSUPERVISED LEARNING METHODS
B. EDGE DETECTION METHODS The biggest difference between unsupervised learning and
Edge detection methods can also be used in crack detection. supervised learning is absence of data labels during training.
Common edge detection operators include Sobel operator, Training samples for unsupervised learning have no labels
Roberts operator, Prewitt operator and Canny operator. D- and no definite results for output, the computer needs to
ifferent operators have different detection effects on edges learn the similarity between samples by itself and classify
of the same type. Fig. 2 shows an example. Simply using a the samples. The advantage of unsupervised learning is that
single operator can hardly reach the expected effect. Many there is no need to label, reducing the influence of human
scholars have improved the edge detection operators. Zhao et subjective factors on the results.
al. proposed an improved Canny edge detection method for Akagic et al. proposed a new unsupervised road crack de-
road edge detection [21]. Mallat wavelet transform was used tection method based on gray histogram and Otsu method,
to enhance the blurred edge, and a better adaptive threshold and a better results were obtained under the condition of
Canny algorithm is obtained by using genetic algorithm [22]. low signal-to-noise ratio [27]. In [28], Amhaz et al. intro-
Ayenu et al. [23] studied the road crack detection method duced an improved unsupervised learning algorithm based
which combines bi-dimensional empirical mode decompo- on minimum path selection, which reduced the loop and peak
sition (BEMD) and Sobel edge detection. BEMD is an ex- artifacts in crack detection by estimating the crack width. In
tension of EMD [24], which removes noise from the signal [29], Li et al. used a method based on the minimum intensity
without the need for complex convolution processes. path of the window to extract candidate cracks at each scale in
the image, compared the corresponding relations of different
scale cracks, established a crack evaluation model based on
multivariate statistical hypothesis.
Original image Roberts Sobel Prewitt Canny
B. SUPERVISED LEARNING METHODS
FIGURE 2: Detection effect of different edge operator. Supervised learning needs the labels of the training data.
Common supervised learning algorithms include logistic re-
gression [30], Naive Bayesian [31], Support Vector Machine
C. REGION GROWING METHODS [32], artificial neural network [33] and random forest [34].
The edge detection algorithm can get the edge distribution of Xu et al. used the self-learning characteristic of neural net-
crack defects and outline the crack contour, but it can not de- work to transform cracks recognition into crack probability
scribe the information of internal pixels of cracks concretely. judgment of each sub-block image in the work of [35]. They
The recognition method based on region growing provides first divide the binary image of cracks into sub-images and
another idea for pavement crack detection. The basic idea of extract the parameters representing the features of crack from
region growing is to gather similar pixels to form a region. each sub-image, then select representative images to train
The selection of seeds is very important, which greatly affects back propagation neural network. In [36], Crack Forest, a
the accuracy of image segmentation. In the work of [25], road crack detection framework based on random structure
after the road surface image was preprocessed, the lane was forest, was proposed to effectively solve the problems of
marked and the uneven background part was also processed. uneven edge cracks and cracks with complex topological
Then, the crack seeds were selected by grid cell analysis and structures. The authors extracted crack features from multiple
connected by Euclidean minimum spanning tree structure. In levels and directions to train the random forest model. In [37],
this way, cracks can be detected quickly and effectively. Li an automatic pavement crack detection scheme is proposed.
et al. proposed an automatic cracks detection method based Firstly, the crack image is preprocessed to smooth its texture
on FoSA-F* seed growth for better detection of blurred and and enhance any existing cracks. Then the image is divided
discontinuous cracks [26]. It exploited seed-growing strategy into several non overlapping blocks, each block produces a
to eliminate the requirement that start and end points should feature vector, and the supervised learning algorithm support
be surrounded in advance. The global search space is reduced vector machine is used to detect the cracks. These methods
to the interested local space to improve the search efficiency. heavily rely on the high-quality features extracted from the
images, which needs careful design of the algorithms.
III. CRACK DETECTION BASED ON MACHINE
LEARNING 1) Deep learning methods
Machine learning has become a hot research topic and widely In recent years, deep learning technologies have achieved
used in various areas. It can give predictions by learning the tremendous success in various computer vision tasks such as
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
image classification, object detection and image segmenta- with different sizes of receptive field, and find that different
tion [38]–[42]. Many deep learning based methods, especial- size of receptive field have a slight effect on the classification
ly deep convolution neural networks, have been proposed for accuracy. The method proposed by Xianglong et al. in [52]
road crack detection. According to the way of handling the is quite different from above methods. In this work, the input
crack detection problem, these methods can roughly divided pavement images are segmented into non-overlapping grids
into three categories, pure image classification methods, of size 32 × 32 or 64 × 64, then a simple CNN is used to
object detection based methods and pixel-level segmentation classify the grid image to decide if it contains crack. After
methods. this, crack skeleton can be represented by the grid cells
containing cracks. PCA (principal component analysis) is
Crack detection based on classification used to process the coordinate vector of the crack grid cells
Basically, this category of methods divide the input image to decide the crack type to be longitudinal, transverse or
into overlapping blocks, and then classify the block image alligator crack.
into classes. If the block contains a certain number of defect
pixels or more, the block is labeled as defective block. Crack detection based on pixel segmentation
Pixel segmentation is to assign a label or a score to each
Crack detection based on binary classification pixel in the image. In [50] Zhun et al. proposed a network
This kind of methods divide the input images into overlap- structure with 4 convolutional layers with 2 max-pooling
ping blocks and then use a deep convolution network to layers and 3 Fully Connected layers to directly segment the
decide if the block contains crack or not. For example, Lei original images. The output can have different resolution,
et al. divided the road image of 3264 × 2248 into small from 1 × 1 to 5 × 5. In [53] Mark et al. proposed a semantic
patches of size 99 × 99 × 3, and used their convolution neural segmentation algorithm for road cracks based on U-Net,
network to classify these small patches [43]. The output is where the U-Net is basically encoder-decoder structure [54].
the probability that the small patch is crack or not. In the This network can be divided into encoder layer and decoder
work of [44], Li et al. modified GoogLeNet [45] to classify layer. The encoder layer mainly realizes feature mapping of
image blocks and realized crack detection on real pavement images, and the decoder layer is mainly used to promote
using smartphone. In [46], cha et al. used MatConvNet [47] feature vectors during segmentation and generate probability
to classify the input pavement 256 × 256 images. Similarly, distribution of each pixel. Similarly, Qin et al. [55] proposed
in [43], the authors generated image patches of 99 × 99 from DeepCrack which uses encoder-decoder architecture to seg-
original pavement images, where the patch is defective if ment pavement image pixels into crack and backgound. And
its center pixel is within 5 pixels of the crack center. The in [56], the propose network structure used 4 convolution
CNN model was compared to the performance of SVM and layers and max poolings as the encoder to extract features
Boosting methods. Leo et al. studied the relationship between and 4 subsequent modules as the decoder. The work of
network depth and network accuracy using a self-designed [57] employed residue connections inside each encoder and
CNN model [48]. Unlike the work mentioned above, Chen decoder block and attention gating block before the decoder
et al. processed pavement videos in [49]. In this work, a to retain only spatially relevant features of the feature map in
CNN model was designed to classify the image patches the shortcut connection. Fully convolutional network is also
of size 120 × 120 sampled from video frame and then often used for segmentation purpose, such as [58], [59].
adopted a naive Bayes data fusion scheme to aggregate the
information obtained from each video frame to enhance the Crack detection based on object detection
overall performance and robustness of the system. Object detection is an important task in computer vision.
Its goal is to locate the object with a bounding box in the
Crack detection based on multi-class classification image and decide the object type. Many deep CNN models
Crack detection based binary classification is not suitable for have been proposed to improve the accuracy and efficiency,
the case when it is required to decide the defect types. In [50], such as faster R-CNN [60], SSD [61], YOLO [62] etc. Object
Zhun et al. used one CNN model to learn the structure of detection methods are also popular in road crack detection.
the pavement cracks as a multi-label classification problem. Faster R-CNN is widely used in object detection, which
Small crack image patches of 27 × 27 were used as the has three major steps, 1) extract image features using CNN
input and the output layer had s × s nodes, representing the structure like VGG, 2) propose candidate regions for objects
intensity states of square block centered at the crack pixel. (RPN), 3) classification of object types and bounding box
For example, if s = 5, the model predicts 25 pixel state of the coordinates regression. The CNN structure in step 1 is shared
block image of 5 × 5. During training, the input 27 × 27 was by step 2 and 3. In [63], Gahyun et al. used faster R-CNN to
resize to 5 × 5 as the ground truth. In [51], Li et al. proposed detect the damages in civil infrastructure. Young-Jin et al.
a deep CNNs for pavement crack classification based on 3D modified the faster R-CNN by using a ZF-net to speedup
pavement images, and classify pavement patches cut from 3D the feature extraction in step 1 [64]. ZF-net [65] is slightly
images into five categories including the normal category. modified from AlexNet [66] which is relatively simple and
They trained four supervised CNNs classification models fast. In [67] Li et al. used the faster R-CNN to detect six kinds
4 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
of road defects. The model can automatically identify and ROC, AUC, and IOU
locate defects under different lighting conditions with high ROC (Receiver Operating Characteristic) [74] curve and
accuracy and stability. AUC (Area Under Curve) [75] can also be used to measure
SSD [61] combines predictions from multiple feature map- the detection performance. ROC curve describes the relation-
s with different resolutions to naturally handle objects of ship between TP rate and FP rate. Fig. 3 shows two ROC
various sizes and completely eliminates proposal generation curves. If the ROC curve is closer to the upper left corner,
and encapsulates the region classification and coordinates that’s mean, FP is low, TP is high, and the better the model
regression in a single network. This makes SSD much faster works. Therefore, the Area under the ROC curve, namely
than faster R-CNN. And MobileNet [68] is a well known AUC is used to compare two ROC curves.
light weight deep neural networks for mobile applications. In object detection using models such as SSD, IOU (Inter-
To test the crack detection on devices with limited resources, section over Union) is often used to decide if the object is
Hiroya et al. compared SSD using MobileNet, SSD using correctly detected. The IOU means the overlap rate between
Inception v2 [69] for object detection on smart phones and the bounding box given by the model and the ground truth
found that SSD using Inception v2 is two times slower than bounding box. If the IOU is larger than a predefined thresh-
SSD-MobileNet [70]. This conclusion is not surprising as old, which is usually 0.5, the object detection is considered
MobileNet is designed for acceleration purpose. successful.
Unlike above methods, Crack-pot method in [71] combined
traditional image processing techniques and deep learning
methods to detect the potholes and cracks in the road. In these
method, edge detection, dilation, contour detection were
applied to generate candidate bounding boxes for suspected
potholes and cracks. Then these regions were feed into a
classification model which is modified from SqueezeNet [72]
by replacing the last pooling layer with a learned dictionary
[73].
Methods based on object detection like SSD and faster R-
CNN propose multiple candidate regions and perform the
location regression using the image features extracted from
CNN structure is a systematic way for object detection. For
defects with compact shapes, these methods may work well.
However, for defects like long curves or scratches on the
surface, the methods may fail to detect due to the overly large
FIGURE 3: Two ROC curves and AUC.
bounding box proposed by the Region Proposal Network
(RPN).
Detection Result ∩ Ground Truth
2) Metrics to evaluate model performance IOU =
Detection Result ∪ Ground Truth
Precision, Recall and F1 AIU, ODS and OIS
The three most commonly used parameters for evaluating In [76], the authors proposed three new evaluation metrics,
crack detection performance are precision, recall, and F1. AIU, ODS and OIS. AIU is the average intersection over
Precision is the ratio of the correct detected results to all union between the predicted area and ground truth area. ODS
the actual detected results, recall is the ratio of the correct represents the best F1 score on the dataset with fixed scale,
and OIS represents the aggregated F1 score on the dataset
detected results to all the results that should be detected. with the best proportion of each image. ODS and OIS are
The F1 is the harmonic mean of the precision and the defined as follows:
recall. P recision = T PT+F P TP
P , Recall = T P +F N and
P t × Rt
2∗ T P ODS = max 2 : t = 0.01, 0.02, . . . , 0.99
F1 = 2∗ T P +F P +F N . The detection accuracy is defined as P t + Rt
Acc = T P +TTNP +T N
+F P +F N . Table 1 shows the definition of FN Nimg i
(False Negative), FP (False Positive), TN (True Negative) and 1 X P × Rt
OIS = max 2 t : t = 0.01, 0.02, . . . , 0.99
TP (True Positive). Nimg i
Pt + Rt
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
3) Public DataSets for Road Crack Detection TABLE 3: Test results on AigleRN dataset.
Road crack detection has been research topic for years. There Method
tolerance
Precision Recall F1
are many public datasets to help us do better research. margin (pixel)
König et al. [57] 2 0.8690 0.9304 0.8986
Fan et al. [50] 2 0.9178 0.8812 0.8954
CrackForest Dataset (CFD) CrackForest [36] 5 0.9028 0.8658 0.8839
The CrackForest dataset consists of 118 images of cracks on U-Net [77] 5 0.9202 0.9321 0.9261
urban road surface in Beijing taken by iphone5. Each image
is resized to 480×320 pixels and has been labeled. It is avail- TABLE 4: Results comparison on CRACK500 dataset.
able at https://github.com/cuilimeng/CrackForest-dataset. Method AIU ODS OIS
HED [78] 0.481 0.575 0.625
RCF [79] 0.403 0.490 0.586
AigleRN dataset
FCN [58] 0.379 0.513 0.577
AigleRN dataset contains 38 pre-processed gray-scale im- CrackForest [36] N/A 0.199 0.199
ages on French pavement. Half of them are 991 × 462 and FPHBN [76] 0.489 0.604 0.635
half of them are 311 × 462. The dataset is available at
http://telerobot.cs.tamu.edu/bridge/Datasets.html.
ASINVOS-mod [80] is a further version of ASINVOS net
CRACK500 by replacing large convolutional filters by multiple smaller
500 pictures of pavement cracks with the size of 2000 × 1500 filters.
were taken by smartphone. Each crack image has a binary
TABLE 5: Test results on Gaps dataset.
mask image for annotation. The dataset is divided into
three parts, 250 images for training, 50 for validation, and Method Acc F1
200 for test. It is available at https://github.com/fyangneil/ Crack-pot [71] 0.9893 0.7314
ASINVOS net [80] 0.9772 0.7246
pavement-crack-detection. ASINVOS-mod [80] 0.9723 0.6707
RCD net [43] 0.9732 0.6642
GAPs dataset
German asphalt pavement disease (Gaps) dataset, including
1969 gray-scale pavement images, is partitioned into 1418 4) Data Augmentation
training images, 51 validation images, and 500 test images.
The training of deep neural network model requires a large
The image resolution is 1920 × 1080 pixels. It is available at
amount of data. However, it is costly to acquire and label
http://www.tu-ilmenau.de/neurob/data-sets-code/gaps/.
this amount of data. Data augmentation is an effective tech-
nique to relieve the problem. Common data augmentation
Results on benchmark datasets
methods include image rotation, flipping, mirroring, adding
The following tables list the results comparison on different
noise, changing the illumination etc. These techniques are
benchmark datasets. In Table 2 and Table 3, the tolerance
usually combined to get more data. Table 6 shows the data
margin is the number of pixels the predicted pixel away from
augmentation techniques used in road crack detection.
the ground truth pixel when we count the true negatives.
For example, if the tolerance margin is 2, a ground truth
IV. CRACK DETECTION BASED ON 3D DATA
pixel is hit if there is a predicted pixel within its 2-pixel
Most of existing crack detection methods are based on 2D
neighborhood. AIU, ODS, OIS are used to compare the
images. With the development of stereo camera and range-
performance of different methods on CRACK500 datasets in
based sensors, stereovision is becoming a promising ap-
Table 4.
proach in crack detection as it can provide accurate and
TABLE 2: Test results on CFD dataset. robust data for the depth information.
tolerance
Method Precision Recall F1 A. REPRESENTATION OF 3D DATA
margin (pixel)
FPCNet [56] 2 0.9748 0.9639 0.9693 Basically, there are three kinds of 3D data representations,
FCN [58] 2 0.9729 0.9456 0.9590
Fan et al. [50] 2 0.9119 0.9481 0.9244 namely, multi-view, point cloud and voxel data.
U-Net-A [59] 5 0.9693 0.9345 0.95 Earlier representations of 3D images were made through
U-Net-B [59] 5 0.9731 0.9428 0.9575 multi-view. Multi-view represents a collection of 2D images
of a rendered polygon grid captured from different view-
[80] presented GAPs dataset to test pavement defect type points to convey 3D geometry in a simple manner, as shown
classification. On this dataset, the authors compared four in Fig.4(a). This method is easy to understand, but difficult to
methods, shown in Table 5, where the RCD net [43] is just express the spatial structure of 3D data. On the other hand,
a simple and small CNN with four blocks of alternating since multi-view projections can only represent 2D contours
convolutional and max-pooling layers, and the ASINVOS net of 3D objects, some detailed geometrical information is
[80] is modified from RCD net by adding more blocks, the inevitably lost during the projection process [81].
6 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
Point cloud is a set of points in the 3D space, where each B. COMPARISON OF DIFFERENT 3D
point is specified by the 3D coordinates (x, y, z) and other REPRESENTATIONS
information such as RGB value of color. These huge amount Different 3D data representation will affect the effectiveness
of points are used to interpolate the geometric shape of of the methods. We compared different methods in terms
object surface, the more dense point clouds are, the more of object classification performance on benchmark Model-
accurate models can be created, this process is called 3D net40 [86]. Modelnet40 contains 40 categories of CAD 3D
reconstruction, as shown in Fig.4(b). 3D scanners and LiDAR models and is a standard dataset for evaluating semantic
devices can be used to generate point cloud data [82]. segmentation and classification of 3D deep learning models
Point cloud data can convert to structured 3D regular grids [87]. For 3D object classification, we studied the 60 meth-
[83], namely, voxel. Voxel is the smallest unit of digital data ods submitted to the website, Fig. 5 shows the distribution
in 3D space segmentation, each unit can be viewed as a grid of these methods on different data types. We can see that
with fixed coordinates. Similar to 2D image, it also has a 21.33% of these methods were based on multi-view, 17.27%
resolution, the finer the 3D space is divided, the smaller each were based on point cloud data, 18.29% were based on
grid is, and the greater the resolution is. Fig.4(c) shows 3D voxel, and 7.11% were based on other methods. The high-
occupancy grids in different resolution. For easy reference, est classification accuracy (97.37%) was achieved by Rota-
we compared these three kinds of representation in Table 7. tionNet [88], which jointly estimates the object categories
and viewpoints for each single-view image and aggregates
object class predictions from partial multi-view image sets.
As just mentioned, different data representation may affect
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
the classification performance. We analyzed three different features. Rahul Dev also proposed CNN models to classify
3D data representation methods in terms of classification 3D object based on volumetric data [89]. Lightnet [81] is
performance. The average accuracy based on multi-view is a faster version of VoxNet to address heavy computation
92.31%, based on point cloud data is 90.43%, and based on problem for real time 3D object recognition.
voxel is 86.73%, as shown in Fig. 6. It can be found that in Point cloud is a unordered set of points scanned from the 3D
the classification task, the method based on multiple views object. The critical problem to solve is to make the model
and point cloud are more accurate than that based on voxel. invariant to the permutation of the data points. PointNet
[90] is the first CNN model to directly work on the raw
point cloud. The method operates on each point separately
and accumulate features from all the points by a symmetric
function, which is a max pooling layer. Pointnet++ intro-
duce a hierarchical neural network that applies PointNet
recursively on a nested partitioning of the input point set.
By exploiting metric space distances, the method is able to
learn local features with increasing contextual scales [91]. To
further address the problem, DGCNN was proposed in [92].
Instead of working on individual points like PointNet, this
method constructs a neighborhood graph to capture the local
geometric information and proposes EdgeConv operation to
apply convolution-like operations on the edges.
FIGURE 6: Average accuracy of different classification
methods. These methods were all tested on modelnet40 dataset. We
compared them in terms of the number of model parameters,
input type, forward time, accuracy and the deep learning
C. DEEP NETWORKS FOR 3D OBJECT framework in Table 8. We can see that, the multi-view
CLASSIFICATION model is much larger than the other two methods in terms
In the work of [84], the authors presented a CNN architecture of model parameters. In terms of classification accuracy,
that combines information from multiple views of a 3D shape data representation based on multi-view and point cloud is
into a single and compact shape descriptor offering even slightly higher than based on voxel. This is caused by the
better recognition performance. In this method, images from resolution of voxel, the higher the resolution of voxel, the
each view were passed through a separate CNN to extract larger calculation amount and the more complex the model
view-based features. Then, an additional CNN is used to is. Generally, only 32 × 32 × 32 or 64 × 64 × 64 resolutions
combine these features for final classification. are selected for training.
Following the first volumetric CNN is 3D ShapeNets [86], For multi-view, the performance of the model will get better
Maturana et al. proposed VoxNet in [85] to process vol- as the number of images from different perspectives increas-
umetric data with grid resolution of 32 × 32, where the es. The same is true to point cloud data. The more points used
model consists of 4D convolution filters to hold 3D spatial to describe an object, the more comprehensive the 3D infor-
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
mation of the object will be, and the classification accuracy which learns a local volumetric patch descriptor to establish
will be improved. Similarly, the higher the resolution of voxel corresponding relationships between local 3D data and can
data, the better the performance of the model. match local geometric features well in real depth images.
Deng et al. proposed PPFNet [98], a 3D local feature de-
D. FEATURE EXTRACTION USING 3D DATA scriptor for in-depth learning of global information, which
Feature extraction is a very important step in crack detection. can be matched to corresponding parts in disordered point
3D data can provide richer features than 2D images. Several cloud data. PPFNet uses a new n-tuple loss and architecture
methods explicitly extract features from 3D data to feed to to naturally inject global information into local descriptors
traditional machine learning models. For example, in the and enhance the representation of local features.
work of [93], the authors combined the extracted features
from 2D and 3D to train classifiers, and in [94], spatiotempo- E. 3D PAVEMENT DEFECT DETECTION
ral features were extracted from videos using 3D ConvNets. With 3D data acquisition is becoming easier, the application
These features followed by a linear classifier achieved state- of 3D technology to pavement defect detection is more
of-the-art results at the publication time. and more common. 3D data can well represent the spatial
information (length, width and depth) of road defects, and
1) Spatiotemporal features conduct multi-directional analysis on the area, volume and
In [94] Du et al. proposed a simple and efficient method other aspects of defects.
to learn spatial feature of 3D data by using 3D convolu- Xu et al. [99] used 3D mobile LiDAR to collect road point
tional neural network to learning spatiotemporal features for cloud data and studied the automatic extraction of road
videos. they found that 3 × 3 × 3 convolution kernels in curbs, in order to improve the robustness and accuracy of the
all layers is among the best performing architectures for 3D model, they designed a new energy function to extract the
ConvNets. In [95] Owoyemi et al. proposed an end-to-end constrained candidate points and refined the candidate points
spatiotemporal gesture learning method for 3D point cloud with the least cost path model. They sampled the point cloud
data, mapping the point cloud data into a dense occupancy data at a rate of 100%, 50%, 10% and 1% respectively. Even
grid and learning the spatiotemporal characteristics of the if the point cloud drops to 1%, the method proposed in this
data. In this work, 3D ROI jittering method is used in training paper can still extract the road curbs.
to expand 3D data.
1) Traditional methods for 3D crack detection
2) Geometric features Zhang et al. utilized the Microsoft Kinect to reconstruct
In [96] Takahiko et al. proposed a deep local feature aggrega- pavement surfaces and capture geometric features of pave-
tion network (DLAN) for 3D model retrieval. It combines the ment cracking, including crack width, length, and depth to
extraction of rotation invariant 3D local features with their identify the distress severities of three major types of pave-
aggregation in a single depth architecture. DLAN describes ment cracks, namely, alligator cracking, traverse cracking,
the local 3D region of a 3D model by using a set of 3D longitudinal cracking [100]. In the work of [101], Li et al.
geometric features that are not affected by local rotation. employed laser-imaging techniques to model the pavement
Andy et al. proposed a data-driven model, 3DMatch [97], surface with dense 3D points and used an algorithm based
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
on frequency analysis (Fourier transformation) separate po- improves the efficiency of feature extraction. In addition, they
tential cracks from the control profile and material texture of designed a new activation function to improve the detection
the pavement assuming that the road pavement in the absence accuracy of shallow cracks.
of pavement distresses commonly holds a relatively uniform In order to improve the recall rate, they put forward
control profile. Yi et al. proposed a dynamic-optimization- CrackNet-R [109] based on recurrent neural network. As
based crack segmentation method to test 1 to 5 mm wide a recursive unit, gated recurrent multilayer perceptron
cracks collected by 3D laser at different depths and lighting (GRMLP) is designed to update the internal memory of
conditions [102]. To detect similar cracks in masonry, the CrackNet-R recursively. GRMLP aims to abstract the fea-
work [103] presented mathematics to determine the mini- tures of input and hidden state more deeply by multi-layer
mum crack width detectable with a terrestrial laser scanner, nonlinear transformation at gate unit. The resultant model
in which the main features used include orthogonal offset, achieved about four times faster and introduces tangible im-
interval scan angle, crack orientation, and crack depth. In provements in detection accuracy, when compared to Crack-
[93], the whole image is divided into subimages of 128 × 128 Net. The performance comparison of the networks shown in
pixels and filtered by a set of Gabor filters. The maximum Table 9.
value of the magnitude of every filtered image is the feature
used to train weak classifiers. To detect crack in pavement 3) Factors affecting 3D pavement defect detection
images, binary segmentation is a straightforward way. Unlike
most 2D thresholding techniques based on the assumptions There are many factors that can influence the detection
that the distress pixels are darker than their surroundings, of pavement defects. Yi et al. [102] proposed a dynamic-
[104] proposed a probabilistic relaxation labeling technique optimization-based crack segmentation method to test 1 to 5
to enhance the accuracy of the distress detection, which take mm wide cracks collected by 3D laser at different depths and
account of the non-uniform illumination and complicated lighting conditions. Experiments show that cracks with width
contents on the pavement surface areas. The work of [105] equal to or greater than 2 mm can be effectively separated
proposed an unique method which uses Dempster-Shafer (D- from the pavement background, while cracks with width of 1
S) theory to combine the 2D gray-scale image and 3D laser mm can only be partially separated. In addition, it was found
scanning data as a mass function, and the corresponding that the light intensity had little effect on the test results.
detection results are fused at the decision-making level. Li et al. [101] used laser imaging technology to model 3D
dense point road surface and proposed a 3D point cloud crack
2) Deep network for 3D crack detection detection method based on sparse point grouping, which can
Applying deep learning neural network in 3D crack detection reduce the influence of light variation and shadow on crack
is currently a new and hot research direction. In 2017, Zhang detection. They tested the effect of the data acquisition vehi-
et al. proposed CrackNet network to implement pixel-level cle on the performance of the proposed method at different
detection of pavement cracks and defects [106]. The model speeds(10km/h to 80km/h). The experimental results show
consists of five layers with two fully connected layers, two that at different speeds, the crack test effect is roughly the
convolution layers and one output layer. The feature extractor same, but the slower the speed, the more detailed the crack
utilizes line filters oriented at various directions and with contour description.
varied lengths as well as widths to enhance the contrast Debra et al. [103] found through the experiment that crack
between cracks and the background. The model was trained depth depends on three factors: scanning distance, scanning
with 1,800 3D pavement images collected from DHDV [2]. angle and crack width.The scanning distance is the distance
Later on, in the work of [107], the authors proposed an between the crack and the laser scanner, and the scanning
improved architecture of CrackNet called CrackNet II for en- angle is the offset angle between the crack and the laser
hanced learning capability and faster performance. CrackNet scanner. Cracks with a width of 1 to 7 mm were scanned at
II has a deeper architecture with more hidden layers but fewer distances of 5m and 7.5m and angles of 0◦ , 15◦ and 30◦ . The
parameters. Such an architecture yields five times faster results show that the crack depth cannot be detected when the
performance compared with the original CrackNet. Similar to crack width is less than 1 mm, because the smaller the crack
the original CrackNet, CrackNet II still uses invariant image width is, the more difficult to obtain the depth information of
width and height through all layers to place explicit require- crack. As the crack width increases, the detection of the crack
ments on pixel-perfect accuracy. In addition, they deepened depth becomes more accurate. With the increase of scanning
the network and the combination of repeated convolution angle, the error of crack depth detection will also increase.
and 1 × 1 convolution is used to learn the local features The closer the scanning distance is, the higher the detection
with different local receptive fields. Recently, Zhang’s team accuracy will be.
put forward the CrackNet V [108], which includes a pre- Khurram et al. [110] used Kinect to predict and analyze the
processing layer, eight convolutional layers and an output depth and volume of pothole, the mean percentage error are
layer. They used a 3 × 3 filter for the first six convolutions, 2.58% and 5.47%, respectively. In addition, the test perfor-
and stack multiple 3 × 3 convolutions together for depth mance of pothole with water, dust and oil is also discussed.
extraction, which reduced the number of parameters and Experimental results show that the error of test results will
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
CrackNet 512 × 256 GPU:GeForce GTX 1080Ti 2568 500 1.21 90.86 80.96 85.62
CrackNet-V 512 × 256 GPU:GeForce GTX 1080Ti 2568 500 0.33 84.31 90.12 87.12
GPU:Two NVidia
CrackNet 1024 × 512 3000 500 2.894 83.89 89.41 86.57
GeForce GTX 1080 Ti
GPU:Two NVidia
CrackNet-R 1024 × 512 3000 500 0.713 88.89 95.00 91.84
GeForce GTX 1080 Ti
increase with the increase of water, dust and oil content, and years. In this work, we review these methods, and we focus
the error is also related to the types of these media. on the detailed comparison and analysis on deep learning
methods and 3D image based methods. Particularly, deep
V. EXISTING PROBLEMS AND RESEARCH PROSPECTS learning methods are grouped and reviewed in three cate-
After years of development, many achievements have been gories, image classification, object detection and pixel-level
made in pavement defects detection, which has made great segmentation. For 3D crack detection methods, we compare
contributions to the maintenance of pavement and the safety the different data representations and study the corresponding
of vehicles. However, there are still some problems in the performance of the deep neural networks for 3D object classi-
practical application: fication. Traditional and deep learning based crack detection
1) Due to the complex and dynamic environmental fac- methods using 3D data are also reviewed.
tors, there may be some errors in the detection of road
cracks under the condition of poor light in rainy days REFERENCES
or when there is water on the road. [1] G. Caroff, P. Joubert, F. Prudhomme, and G. Soussain, “Classification of
pavement distresses by image processing (macadam system).” ASCE,
2) Different algorithms are needed to test on different
1989, pp. 46–51.
road surface conditions, and the algorithm transplan- [2] K. C. Wang, Z. Hou, and W. Gong, “Automation techniques for digital
tation performance is poor. highway data vehicle (dhdv).” Citeseer, 2008.
3) The process of defects detection is always offline, so [3] L. Sjogren and P. Offrell, “Automatic crack measurement in sweden,”
2000.
the performance of real-time is not good in reality. [4] L. Jin_hui, L. Wei, and J. Shou_shan, “A study on road surface defects
Therefore, we need to further enhance the detection accuracy detecting technology with ccd camera [j],” Journal of Xi’an Institute of
Technology, vol. 2, 2002.
and real-time performance of the algorithm to ensure the op- [5] K. K. Singh and A. Singh, “A study of image segmentation algorithms
timal detection results in real applications. The generalization for different types of images,” International Journal of Computer Science
and robustness of the methods is also very important as the Issues (IJCSI), vol. 7, no. 5, p. 414, 2010.
[6] S. Kamdi and R. Krishna, “Image segmentation and region growing al-
factors such as road and weather conditions greatly affect the gorithm,” International Journal of Computer Technology and Electronics
detection. As for 3D cracks detection, the depth information Engineering (IJCTEE), vol. 2, no. 1, 2012.
of cracks is added to make the cracks have spatial structure. [7] N. Kanopoulos, N. Vasanthavada, and R. L. Baker, “Design of an image
edge detection filter using the sobel operator,” IEEE Journal of solid-state
Although the overall information of cracks is more complete, circuits, vol. 23, no. 2, pp. 358–367, 1988.
it undoubtedly increases the complexity of the algorithm [8] W. Dong and Z. Shisheng, “Color image recognition method based on the
and greatly increases the computational cost. The algorithm prewitt operator,” vol. 6. IEEE, 2008, pp. 170–173.
[9] L. Er-Sen, Z. Shu-Long, Z. Bao-shan, Z. Yong, X. Chao-gui, and S. Li-
can be improved and the computing cost can be reduced hua, “An adaptive edge-detection method based on the canny operator,”
by referring to some progress in deep convolutional neural vol. 1. IEEE, 2009, pp. 465–469.
networks for 2D images such as network architecture and [10] B. J. Lee and H. D. Lee, “Position-invariant neural network for digital
pavement crack analysis,” Computer-Aided Civil and Infrastructure En-
model compression techniques. On the other hand, there are gineering, vol. 19, no. 2, pp. 105–118, 2004.
few public 3D cracks datasets, researchers collects pavement [11] J.-Y. Jung, H.-J. Yoon, and H.-W. Cho, “A study on crack depth mea-
crack data for training and testing by himself, and it is surement in steel structures using image-based intensity differences,”
Advances in Civil Engineering, vol. 2018, 2018.
impossible to conduct performance analysis on the same
[12] F. Blais, M. Rioux, and J.-A. Beraldin, “Practical considerations for a de-
dataset. Collecting 3D crack benchmark datasets will greatly sign of a high precision 3-d laser scanner system,” vol. 959. International
benefit future study of the 3D crack detection. Society for Optics and Photonics, 1988, pp. 225–246.
[13] S. Chambon and J.-M. Moliard, “Automatic road pavement assessment
with image processing: review and comparison,” International Journal of
VI. CONCLUSION Geophysics, vol. 2011, 2011.
The automatic detection of pavement crack has been studied [14] K. Gopalakrishnan, “Deep learning in data-driven pavement image anal-
extensively due to its practical significance. From traditional ysis and automated distress detection: A review,” Data, vol. 3, no. 3, p. 28,
2018.
image processing methods to machine learning methods to [15] T. B. Coenen and A. Golroo, “A review on automated pavement distress
deep learning algorithms that have become popular in recent detection methods,” Cogent Engineering, vol. 4, no. 1, p. 1374822, 2017.
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
[16] S. Mathavan, K. Kamal, and M. Rahman, “A review of three- [42] D. Meng, G. Cao, Y. Duan, M. Zhu, L. Tu, D. Xu, and J. Xu, “Tongue
dimensional imaging technologies for pavement distress detection and images classification based on constrained high dispersal network,”
measurements,” IEEE Transactions on Intelligent Transportation System- Evidence-Based Complementray and Alternative Medicine,2017,(2017-
s, vol. 16, no. 5, pp. 2353–2362, 2015. 3-30), vol. 2017, no. 4, pp. 1–12, 2017.
[17] S. Zhu, X. Xia, Q. Zhang, and K. Belloulata, “An image segmentation [43] L. Zhang, F. Yang, Y. D. Zhang, and Y. J. Zhu, “Road crack detection
algorithm in image processing based on threshold segmentation.” IEEE, using deep convolutional neural network.” IEEE, 2016, pp. 3708–3712.
2007, pp. 673–678. [44] S. Li and X. Zhao, “Convolutional neural networks-based crack detection
[18] H. Oliveira and P. L. Correia, “Automatic road crack segmentation using for real concrete surface,” vol. 10598. International Society for Optics
entropy and image dynamic thresholding.” IEEE, 2009, pp. 622–626. and Photonics, 2018, p. 105983V.
[19] L. Peng, W. Chao, L. Shuangmiao, and F. Baocai, “Research on crack [45] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan,
detection method of airport runway based on twice-threshold segmenta- V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,”
tion.” IEEE, 2015, pp. 1716–1720. 2015, pp. 1–9.
[20] S. Wang and W. Tang, “Pavement crack segmentation algorithm based on [46] Y.-J. Cha, W. Choi, and O. Büyüköztürk, “Deep learning-based crack
local optimal threshold of cracks density distribution.” Springer, 2011, damage detection using convolutional neural networks,” Computer-Aided
pp. 298–302. Civil and Infrastructure Engineering, vol. 32, no. 5, pp. 361–378, 2017.
[21] H. Zhao, G. Qin, and X. Wang, “Improvement of canny algorithm based [47] A. Vedaldi and K. Lenc, “Matconvnet:convolutional neural networks for
on pavement edge detection,” vol. 2. IEEE, 2010, pp. 964–967. matlab,” 2015.
[22] C.-C. Zhou, G.-F. Yin, and X.-B. Hu, “Multi-objective optimization of [48] L. Pauly, D. Hogg, R. Fuentes, and H. Peel, “Deeper networks for
material selection for sustainable products: artificial neural networks and pavement crack detection.” IAARC, 2017, pp. 479–485.
genetic algorithm approach,” Materials & Design, vol. 30, no. 4, pp. [49] F.-C. Chen and M. R. Jahanshahi, “Nb-cnn: deep learning-based crack
1209–1215, 2009. detection using convolutional neural network and naïve bayes data fu-
[23] A. Ayenu-Prah and N. Attoh-Okine, “Evaluating pavement cracks with sion,” IEEE Transactions on Industrial Electronics, vol. 65, no. 5, pp.
bidimensional empirical mode decomposition,” EURASIP Journal on 4392–4400, 2017.
Advances in Signal Processing, vol. 2008, no. 1, p. 861701, 2008. [50] Z. Fan, Y. Wu, J. Lu, and W. Li, “Automatic pavement crack detection
[24] Z. Wu and N. E. Huang, “A study of the characteristics of white noise based on structured prediction with the convolutional neural network,”
using the empirical mode decomposition method,” Proceedings of the arXiv preprint arXiv:1802.02208, 2018.
Royal Society of London. Series A: Mathematical, Physical and Engi- [51] B. Li, K. C. Wang, A. Zhang, E. Yang, and G. Wang, “Automatic clas-
neering Sciences, vol. 460, no. 2046, pp. 1597–1611, 2004. sification of pavement crack using deep convolutional neural network,”
[25] Y. Zhou, F. Wang, N. Meghanathan, and Y. Huang, “Seed-based approach International Journal of Pavement Engineering, pp. 1–7, 2018.
for automated crack detection from pavement images,” Transportation [52] X. Wang and Z. Hu, “Grid-based pavement crack analysis using deep
Research Record, vol. 2589, no. 1, pp. 162–171, 2016. learning.” IEEE, 2017, pp. 917–924.
[26] Q. Li, Q. Zou, D. Zhang, and Q. Mao, “Fosa: F* seed-growing approach
[53] M. D. Jenkins, T. A. Carr, M. I. Iglesias, T. Buggy, and G. Morison, “A
for crack-line detection from pavement images,” Image and Vision Com-
deep convolutional neural network for semantic pixel-wise segmentation
puting, vol. 29, no. 12, pp. 861–872, 2011.
of road and pavement surface cracks.” IEEE, 2018, pp. 2120–2124.
[27] A. Akagic, E. Buza, S. Omanovic, and A. Karabegovic, “Pavement crack
[54] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
detection using otsu thresholding for image segmentation.” IEEE, 2018,
for biomedical image segmentation,” 2015.
pp. 1092–1097.
[55] Q. Zou, Z. Zhang, Q. Li, X. Qi, Q. Wang, and S. Wang, “Deepcrack:
[28] R. Amhaz, S. Chambon, J. Idier, and V. Baltazart, “Automatic crack
Learning hierarchical convolutional features for crack detection,” IEEE
detection on two-dimensional pavement images: An algorithm based on
Transactions on Image Processing, vol. 28, no. 3, pp. 1498–1512, 2018.
minimal path selection,” IEEE Transactions on Intelligent Transportation
[56] W. Liu, Y. Huang, Y. Li, and Q. Chen, “Fpcnet: Fast pavement crack
Systems, vol. 17, no. 10, pp. 2718–2729, 2016.
detection network based on encoder-decoder architecture,” arXiv preprint
[29] H. Li, D. Song, Y. Liu, and B. Li, “Automatic pavement crack detection
arXiv:1907.02248, 2019.
by multi-scale image fusion,” IEEE Transactions on Intelligent Trans-
portation Systems, no. 99, pp. 1–12, 2018. [57] J. Konig, M. D. Jenkins, P. Barrie, M. Mannion, and G. Morison, “A
[30] R. E. Wright, “Logistic regression.” 1995. convolutional neural network for pavement surface crack segmentation
using residual connections and attention gating.” IEEE, 2019, pp. 1460–
[31] K. M. Leung, “Naive bayesian classifier,” Polytechnic University Depart-
1464.
ment of Computer Science/Finance and Risk Engineering, 2007.
[32] C. J. Burges, “A tutorial on support vector machines for pattern recogni- [58] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for
tion,” Data mining and knowledge discovery, vol. 2, no. 2, pp. 121–167, semantic segmentation,” 2015, pp. 3431–3440.
1998. [59] U. Escalona, F. Arce, E. Zamora, and J. H. Sossa Azuela, “Fully
[33] A. K. Jain, J. Mao, and K. Mohiuddin, “Artificial neural networks: A convolutional networks for automatic pavement crack segmentation,”
tutorial,” Computer, no. 3, pp. 31–44, 1996. Computación y Sistemas, vol. 23, no. 2, pp. 451–460, 2019.
[34] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5– [60] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time
32, 2001. object detection with region proposal networks,” 2015, pp. 91–99.
[35] G. Xu, J. Ma, F. Liu, and X. Niu, “Automatic recognition of pavement [61] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C.
surface crack based on bp neural network.” IEEE, 2008, pp. 19–22. Berg, “Ssd: Single shot multibox detector.” Springer, 2016, pp. 21–37.
[36] Y. Shi, L. Cui, Z. Qi, F. Meng, and Z. Chen, “Automatic road crack detec- [62] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once:
tion using random structured forests,” IEEE Transactions on Intelligent Unified, real-time object detection,” 2016, pp. 779–788.
Transportation Systems, vol. 17, no. 12, pp. 3434–3445, 2016. [63] G. Suh and Y.-J. Cha, “Deep faster r-cnn-based automated detection and
[37] A. Marques and P. L. Correia, “Automatic road pavement crack detection localization of multiple types of damage,” vol. 10598. International
using svm,” Lisbon, Portugal: Dissertation for the Master of Science Society for Optics and Photonics, 2018, p. 105980T.
Degree in Electrical and Computer Engineering at Instituto Superior [64] Y.-J. Cha, W. Choi, G. Suh, S. Mahmoudkhani, and O. Büyüköztürk,
Técnico, 2012. “Autonomous structural visual inspection using region-based deep learn-
[38] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, ing for detecting multiple damage types,” Computer-Aided Civil and
no. 7553, p. 436, 2015. Infrastructure Engineering, vol. 33, no. 9, pp. 731–747, 2018.
[39] W. Cao, Q. Lin, and Z. He, “Hybrid representation learning for cross- [65] M. D. Zeiler and R. Fergus, “Visualizing and understanding convolution-
modal retrieval,” Neurocomputing, vol. 345, pp. 45–47, 2019. al networks,” 2013.
[40] W. Cao, J. Yuan, Z. He, Z. Zhang, and Z. He, “Fast deep neural networks [66] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification
with knowledge guided training and predicted regions of interests for with deep convolutional neural networks,” 2012, pp. 1097–1105.
real-time video object detection,” IEEE Access, vol. 6, pp. 8990–8999, [67] J. Li, X. Zhao, and H. Li, “Method for detecting road pavement damage
2018. based on deep learning,” vol. 10972. International Society for Optics
[41] M. Dan, L. Zhang, G. Cao, W. Cao, G. Zhang, and H. Bing, “Liver and Photonics, 2019, p. 109722D.
fibrosis classification based on transfer learning and fcnet for ultrasound [68] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand,
images,” IEEE Access, vol. 5, no. 99, pp. 5804–5810, 2017. M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional
12 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.2966881, IEEE Access
neural networks for mobile vision applications,” arXiv preprint arX- [97] A. Zeng, S. Song, M. Niebner, M. Fisher, J. Xiao, and T. Funkhouser,
iv:1704.04861, 2017. “3dmatch: Learning local geometric descriptors from rgb-d reconstruc-
[69] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking tions,” 2016.
the inception architecture for computer vision,” 2016, pp. 2818–2826. [98] H. Deng, T. Birdal, and S. Ilic, “Ppfnet: Global context aware local
[70] H. Maeda, Y. Sekimoto, T. Seto, T. Kashiyama, and H. Omata, “Road features for robust 3d point matching,” 2018, pp. 195–205.
damage detection using deep neural networks with images captured [99] S. Xu, R. Wang, and H. Zheng, “Road curb extraction from mobile lidar
through a smartphone,” arXiv preprint arXiv:1801.09454, 2018. point clouds,” IEEE Transactions on Geoscience and Remote Sensing,
[71] S. Anand, S. Gupta, V. Darbari, and S. Kohli, “Crack-pot: Autonomous vol. 55, no. 2, pp. 996–1009, 2016.
road crack and pothole detection,” 2018. [100] Y. Zhang, C. Chen, Q. Wu, Q. Lu, S. Zhang, G. Zhang, and Y. Yang,
[72] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, “A kinect-based approach for 3d pavement surface reconstruction and
and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer cracking recognition,” IEEE Transactions on Intelligent Transportation
parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360, Systems, vol. 19, no. 12, pp. 3935–3946, 2018.
2016. [101] Q. Li, D. Zhang, Q. Zou, and H. Lin, “3d laser imaging and sparse points
[73] J. Mairal, J. Ponce, G. Sapiro, A. Zisserman, and F. R. Bach, “Super- grouping for pavement crack detection.” IEEE, 2017, pp. 2036–2040.
vised dictionary learning,” in Advances in neural information processing [102] Y.-C. J. Tsai and F. Li, “Critical assessment of detecting asphalt pavement
systems, 2009, pp. 1033–1040. cracks under different lighting and low intensity contrast conditions using
[74] D. K. McClish, “Analyzing a portion of the roc curve,” Medical Decision emerging 3d laser technology,” Journal of Transportation Engineering,
Making, vol. 9, no. 3, pp. 190–195, 1989. vol. 138, no. 5, pp. 649–656, 2012.
[75] A. P. Bradley, “The use of the area under the roc curve in the evaluation [103] D. F. Laefer, L. Truong-Hong, H. Carr, and M. Singh, “Crack detection
of machine learning algorithms,” Pattern recognition, vol. 30, no. 7, pp. limits in unit based masonry with terrestrial laser scanning,” Ndt & E
1145–1159, 1997. International, vol. 62, pp. 66–76, 2014.
[76] F. Yang, L. Zhang, S. Yu, D. Prokhorov, X. Mei, and H. Ling, “Feature [104] E. Salari and G. Bao, “Automated pavement distress inspection based on
pyramid and hierarchical boosting network for pavement crack detec- 2d and 3d information.” IEEE, 2011, pp. 1–4.
tion,” arXiv preprint arXiv:1901.06340, 2019. [105] J. Huang, W. Liu, and X. Sun, “A pavement crack detection method
[77] J. Cheng, W. Xiong, W. Chen, Y. Gu, and Y. Li, “Pixel-level crack combining 2d with 3d information based on dempster-shafer theory,”
detection using u-net.” IEEE, 2018, pp. 0462–0466. Computer-Aided Civil and Infrastructure Engineering, vol. 29, no. 4, pp.
[78] H. Oliveira and P. L. Correia, “Automatic road crack detection and char- 299–313, 2014.
acterization,” IEEE Transactions on Intelligent Transportation Systems, [106] A. Zhang, K. C. Wang, B. Li, E. Yang, X. Dai, Y. Peng, Y. Fei, Y. Liu,
vol. 14, no. 1, pp. 155–168, 2012. J. Q. Li, and C. Chen, “Automated pixel-level pavement crack detection
[79] Y. Liu, M.-M. Cheng, X. Hu, K. Wang, and X. Bai, “Richer convolutional on 3d asphalt surfaces using a deep-learning network,” Computer-Aided
features for edge detection,” 2017, pp. 3000–3009. Civil and Infrastructure Engineering, vol. 32, no. 10, pp. 805–819, 2017.
[80] M. Eisenbach, R. Stricker, D. Seichter, K. Amende, K. Debes, M. Sessel- [107] A. Zhang, K. C. Wang, Y. Fei, Y. Liu, S. Tao, C. Chen, J. Q. Li, and
mann, D. Ebersbach, U. Stoeckert, and H.-M. Gross, “How to get pave- B. Li, “Deep learning–based fully automated pavement crack detection
ment distress detection ready for deep learning? a systematic approach.” on 3d asphalt surfaces with an improved cracknet,” Journal of Computing
IEEE, 2017, pp. 2039–2047. in Civil Engineering, vol. 32, no. 5, p. 04018041, 2018.
[81] S. Zhi, Y. Liu, X. Li, and Y. Guo, “Toward real-time 3d object recogni- [108] Y. Fei, K. C. Wang, A. Zhang, C. Chen, J. Q. Li, Y. Liu, G. Yang, and
tion: A lightweight volumetric cnn framework using multitask learning,” B. Li, “Pixel-level cracking detection on 3d asphalt pavement images
Computers & Graphics, vol. 71, pp. 199–207, 2018. through deep-learning-based cracknet-v,” IEEE Transactions on Intelli-
[82] F. Chazal, L. J. Guibas, S. Y. Oudot, and P. Skraba, “Analysis of scalar gent Transportation Systems, 2019.
fields over point cloud data.” SIAM, 2009, pp. 1021–1030. [109] A. Zhang, K. C. Wang, Y. Fei, Y. Liu, C. Chen, G. Yang, J. Q. Li,
[83] M. J. Lee, “Method and apparatus for transforming point cloud data to E. Yang, and S. Qiu, “Automated pixel-level pavement crack detection
volumetric data,” Jan. 8 2008, uS Patent 7,317,456. on 3d asphalt surfaces with a recurrent neural network,” Computer-Aided
[84] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view Civil and Infrastructure Engineering, vol. 34, no. 3, pp. 213–229, 2019.
convolutional neural networks for 3d shape recognition,” 2015, pp. 945– [110] K. Kamal, S. Mathavan, T. Zafar, I. Moazzam, A. Ali, S. U. Ahmad, and
953. M. Rahman, “Performance assessment of kinect as a sensor for pothole
[85] D. Maturana and S. Scherer, “Voxnet: A 3d convolutional neural network imaging and metrology,” International Journal of Pavement Engineering,
for real-time object recognition.” IEEE, 2015, pp. 922–928. vol. 19, no. 7, pp. 565–576, 2018.
[86] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3d
shapenets: A deep representation for volumetric shapes,” 2015, pp. 1912–
1920.
[87] T. P. ModelNet, https://modelnet.cs.princeton.edu/, online; accessed: Oc-
tober 2019.
[88] A. Kanezaki, Y. Matsushita, and Y. Nishida, “Rotationnet: Joint object
categorization and pose estimation using multiviews from unsupervised
viewpoints,” 2018, pp. 5010–5019.
[89] R. D. Singh, A. Mittal, and R. K. Bhatia, “3d convolutional neural
network for object recognition.”
[90] C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on
point sets for 3d classification and segmentation,” 2017, pp. 652–660.
[91] C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical
feature learning on point sets in a metric space,” 2017, pp. 5099–5108.
[92] Y. Wang, Y. Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M.
Solomon, “Dynamic graph cnn for learning on point clouds,” ACM
Transactions on Graphics (TOG), vol. 38, no. 5, p. 146, 2019.
[93] R. Medina, J. Llamas, E. Zalama, and J. Gómez-García-Bermejo, “En-
hanced automatic detection of road surface cracks by combining 2d/3d
image processing techniques.” IEEE, 2014, pp. 778–782.
[94] D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning s-
patiotemporal features with 3d convolutional networks,” 2015, pp. 4489–
4497.
[95] J. Owoyemi and K. Hashimoto, “Spatiotemporal learning of dynamic
gestures from 3d point cloud data,” 2018.
[96] T. Furuya and R. Ohbuchi, “Deep aggregation of local 3d geometric
features for 3d model retrieval.” 2016, pp. 121–1.
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/.