0% found this document useful (0 votes)
29 views15 pages

Atm 08 11 713

This article reviews the application of deep learning in medical image classification and segmentation. It introduces common deep learning architectures and models used in medical imaging, and provides examples of how deep learning has been applied to classify and segment different medical image types like fundus images, CT/MRI scans, ultrasounds, and digital pathology.

Uploaded by

Alex Shen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views15 pages

Atm 08 11 713

This article reviews the application of deep learning in medical image classification and segmentation. It introduces common deep learning architectures and models used in medical imaging, and provides examples of how deep learning has been applied to classify and segment different medical image types like fundus images, CT/MRI scans, ultrasounds, and digital pathology.

Uploaded by

Alex Shen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Review Article on Medical Artificial Intelligent Research

Page 1 of 15

A review of the application of deep learning in medical image


classification and segmentation
Lei Cai1, Jingyang Gao1, Di Zhao2
1
College of Information Engineering and Technology, Beijing University of Chemical Technology, Beijing, China; 2Institute of Computing
Technology, Chinese Academy of Sciences, Beijing, China
Contributions: (I) Conception and design: J Gao, D Zhao; (II) Administrative support: J Gao; (III) Provision of study materials or patients: None;
(IV) Collection and assembly of data: All authors; (V) Data analysis and interpretation: All authors; (VI) Manuscript writing: All authors; (VII) Final
approval of manuscript: All authors.
Correspondence to: Jingyang Gao. College of Information Engineering and Technology, Beijing University of Chemical Technology, No. 15 East
North Third Ring Road, Beijing, China. Email: gaojy@mail.buct.edu.cn; Di Zhao. Institute of Computing Technology, Chinese Academy of
Sciences, No. 6 South Road, Zhongguancun Academy of Sciences, Beijing, China. Email: zhaodi@escience.cn.

Abstract: Big medical data mainly include electronic health record data, medical image data, gene
information data, etc. Among them, medical image data account for the vast majority of medical data at
this stage. How to apply big medical data to clinical practice? This is an issue of great concern to medical
and computer researchers, and intelligent imaging and deep learning provide a good answer. This review
introduces the application of intelligent imaging and deep learning in the field of big data analysis and
early diagnosis of diseases, combining the latest research progress of big data analysis of medical images
and the work of our team in the field of big data analysis of medical imagec, especially the classification and
segmentation of medical images.

Keywords: Big medical data; deep learning; classification; segmentation; object detection

Submitted Sep 16, 2019. Accepted for publication Feb 06, 2020.
doi: 10.21037/atm.2020.02.44
View this article at: http://dx.doi.org/10.21037/atm.2020.02.44

Introduction detection, classification or segmentation. The advantage of


deep learning is to replace the manual acquisition feature
Since 2006, deep learning has emerged as a branch of
with unsupervised or semi-supervised feature learning and
the machine learning field in people’s field of vision. It
hierarchical feature extraction efficient algorithms (3).
is a method of data processing using multiple layers of
Medical care is about the health of people. At present,
complex structures or multiple processing layers composed the amount of medical data is huge, but it is crucial to make
of multiple nonlinear transformations (1). In recent years, good use of this huge medical data to contribute to the
deep learning has made breakthroughs in the fields of medical industry. Although the amount of medical data is
computer vision, speech recognition, natural language huge, there are still many problems: medical data is diverse,
processing, audio recognition and bioinformatics (2). Deep including maps, texts, videos, magnets, etc.; due to different
learning has been praised as one of the top ten technological equipment used, the quality of data varies greatly; data
breakthroughs since 2013 due to its considerable application presents fluctuating characteristics, over time and specific
prospects in data analysis. The deep learning method events change; due to differences in individuals, the law
simulates the human neural network. By combining of the disease has no universal applicability (4). There are
multiple nonlinear processing layers, the original data is many factors that cannot be dealt with in the existence of
abstracted layer by layer, and different levels of abstract these problems. Medical imaging is a very important part of
features are obtained from the data and used for target medical data.

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Page 2 of 15 Cai et al. Deep learning and medical imaging analysis

This paper first introduces the application of deep pooling performed to reduce the amount of data. AlexNet
learning algorithms in medical image analysis, expounds the accepts 227×227 pixels’ input data. After five rounds of
techniques of deep learning classification and segmentation, convolution and pooling operations, the 6×6×256 feature
and introduces the more classic and current mainstream matrix finally sent to the fully connected layer. The sixth
network models. Then we detailed the application of deep layer of the fully connected layer sets up 4,096 convolution
learning in the classification and segmentation of medical kernels, and the linear feature value of 4,096 size obtained
images, including fundus, CT/MRI tomography, ultrasound by the dropout operation. After the last two layers, we get
and digital pathology based on different imaging techniques. 1,000 float types output data, which is the final prediction
Finally, it discusses the possible problems and predicts the result. AlexNet’s error rate in ImageNet was 15.3%, which
development prospects of deep learning medical imaging was much higher than the 26.2% in second place. At
analysis. the same time, its activation function is not sigmoid but
adopted ReLU, and proved that the ReLU function is more
effective.
Deep learning architectures
VGG16 first proposed by VGG Group of Oxford
Deep learning algorithms University. Compared with AlexNet, it uses several
Deep learning has developed into a hot research field, and consecutive 3×3 kernels instead of the larger convolution
there are dozens of algorithms, each with its own advantages kernel in AlexNet like 11×11 and 5×5. For a given receptive
and disadvantages. These algorithms cover almost all field range, the effect of using several small convolution
aspects of our image processing, which mainly focus on kernels is better than using a larger convolution kernel,
classification, segmentation. Figure 1 is an overview of some because the multi-layer nonlinear layer can increase the
typical network structures in these areas. network depth to ensure more complex patterns are learned,
and the computational cost is also more small.
Classification GoogLeNet, which launched in the same year as
Using deep learning for image classification is earliest VGGNet, also achieved good results. Compared to
rise and it also a subject of prosperity. Among them, VGGNet, GoogLeNet designed a module called inception.
convolutional neural network (CNN) is the most widely It’s a dense structure with a small number of convolution
structure. Since Krizhevsky et al. proposed AlexNet based kernels of each size, and use 1×1 convolutional layer to
on deep learning model CNN in 2012 (5), which won reduce the amount of computation.
the championship in the ImageNet image classification
of that year, deep learning began to explode. In 2013, Lin Segmentation
et al. proposed the network in network (NIN) structure, Semantic segmentation is an important research field
which uses global average pooling to reduce the risk of of deep learning. With the rapid development of deep
overfitting (6). In 2014, GoogLeNet and VGGNet learning technology, excellent semantic segmentation
both improved the accuracy on the ImageNet dataset neural networks emerge in large numbers and continuously
(7,8). GoogLeNet has further developed the v2, v3 and become state-of-the-art in various segmentation
v4 versions to improve performance (9-11). For the competitions. Since CNN’s success in the classification
shortcomings of CNN on the input size fixed requirements, field, people started to try CNN for image segmentation.
He et al. proposed spatial pyramid pooling (SPP) model Although CNN can accept images of any size as input,
to enhance the robustness of the input data (12). With the CNN will lose some details while pooling for extracting
deepening of the deep learning model, He et al. proposed features, and it will lose the space information of input
the residual network ResNet for the problem of model image due to the fully connected layers at the end of the
degradation that may occur, and continue to advance the network. So it’s difficult for CNN to pinpoint which
deep learning technology (13). category certain pixels belong to. With the development
Take AlexNet as an example. In 2012, the AlexNet of deep learning technology, some segmentation networks
adopted an 8-layer network structure consisting of five based on convolution structure are derived.
convolutional layers and three fully connected layers. After The fully convolutional network (FCN) (14) proposed by
each convolution in five convolutional layers, a maximum Long et al. is the originator of the semantic segmentation

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Annals of Translational Medicine, Vol 8, No 11 June 2020 Page 3 of 15

Input Input Output Input

Max pooling Conv 3×3 + Relu Softmax


RPN

Conv 7×7 Conv 3×3 + Relu Fully connected layer Shared Conv lavers Feature map

Max pooling Max pooling Fully connected layer Sliding window


Unique Conv layers
Pool Conv 3×3 + Relu Fully connected layer
Fully connected laver

Conv 3×3 Conv 3×3 + Relu Max pooling + flatten Feature map
Fully connected laver
Conv 3×3 Max pooling Conv 3×3 + Relu
ROI pooling Proposals Score
Conv 3×3 Conv 3×3 + Relu Conv 3×3 + Relu

Conv 3×3 Conv 3×3 + Relu Conv 3×3 + Relu Fully connected laver

Max pooling + flatten Conv 3×3 + Relu Max pooling + flatten


Cls score Bbox regress
Fully connected layer Max pooling + flatten Conv 3×3 + Relu
c. Faster R-CNN
Fully connected layer Conv 3×3 + Relu Conv 3×3 + Relu
b. VGG16
Output

a. AlexNet

Input Input Output Input Output


(300×300×3)

Conv 3×3 + Relu DeConv Conv 1×1 + softmax


VGG-16
(38×38×512)
Conv 3×3 + Relu Feature map Conv 3×3 + Relu Conv 3×3 + Relu
Conv 3×3
(19×19×1,024) Max pooling Conv 3×3 + Relu Conv 3×3 + Relu Conv 3×3 + Relu

Conv 1×1 Conv 3×3 + Relu DeConv Pool DeConv


(19×19×1,024)
Conv 3×3 + Relu Feature map Conv 3×3 + Relu Conv 3×3 + Relu
Conv 1×1 + Conv 3×3 Classifier:
(10×10×512) 4 boxes Max pooling Conv 3×3 + Relu Conv 3×3 + Relu Conv 3×3 + Relu

Conv 1×1 + Conv 3×3 Classifier: Conv 3×3 + Relu DeConv Pool DeConv
(5×5×256) 6 boxes
Conv 3×3 + Relu Conv 3×3 + Relu Conv 3×3 + Relu Conv 3×3 + Relu
Conv 1×1 + Conv 3×3
(3×3×256)
Classifier: Conv 3×3 + Relu Conv 3×3 + Relu Conv 3×3 + Relu Conv 3×3 + Relu
4 boxes
Conv 1×1 + Conv 3×3
(1×1×256) Max pooling + flatten Conv 3×3 + Relu Pool DeConv

Conv 3×3 + Relu Max pooling + flatten Conv 3×3 + Relu Conv 3×3 + Relu
Detections: 8,732 per class
Conv 3×3 + Relu Conv 3×3 + Relu Conv 3×3 + Relu Conv 3×3 + Relu

Non-maximum suppression Conv 3×3 + Relu Conv 3×3 + Relu Pool DeConv
d. SSD
Max pooling + flatten Conv 3×3 + Relu Max pooling + flatten Conv 3×3 + Relu

e. FCN f. U-Net

Figure 1 The typical network structures of the deep learning.

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Page 4 of 15 Cai et al. Deep learning and medical imaging analysis

networks. It replaces the fully connected layers of the encoder is downsampled, and restores the image according
classification network VGG16 with convolutional layers to the position information when sampling on the decoder.
and retains the spatial information of the feature map SegNet with this strategy does not require learning when
and achieves pixel-level classification. Finally, FCN uses upsampling, and SegNet training is more accurate and
the deconvolution and fusing feature maps to restore faster than FCN.
the image, and provides the segmentation result of each In order to fuse the context information under multi-
pixel by softmax. Since the fully connected layer with scale at the same level, PSPNet (18) proposes a pooled
dense connections is replaced by the convolutional layer pyramid structure, which realizes image segmentation
which is locally connecting and weights sharing, the FCN in which the target environment can be understood, and
greatly reduces the parameters that need to be trained. solves the problem that FCN cannot effectively deal with,
The performance of the FCN on the Pascal VOC 2012 the relationship problem between global information and
datasets (15) has increased by 20% compared to the scenes. Its pooled pyramid structure can aggregate context
previous method, reaching 62.2% of the mIOU. information of different regions, thereby improving the
U-Net (16) was proposed by Olaf based on FCN, and ability to obtain global information.
has been widely used in medical imaging. Based on the idea
of FCN deconvolution to restore image size and feature,
Deep learning development framework
U-Net constructs the encoder-decoder structure in the field
of semantic segmentation. The encoder gradually reduces While the deep learning technology is developing in
the spatial dimension by continuously merging the layers theory, the software development framework based on deep
to extract feature information, and the decoder portion learning theory is also booming.
gradually restores the target detail and the spatial dimension
according to the feature information. Among them, the Convolutional architecture for fast feature embedding
step of the encoder gradually reducing the image size is (Caffe)
called downsampling, and the step of the decoder gradually Caffe was born in Berkeley, California and now hosted
reducing the image details and size is called upsampling. by BVLC. Caffe features high-performance, seamless
Different from the fusion operation of the direct addition switching between CPU and GPU modes, and cross-
feature when the FCN is upsampled, the U-Net upsampling platform support for Windows, Linux and Mac. Caffe has
process first uses the concatenate operation to splicing the three basic atomic structures of Blobs, Layers and Nets, and
feature maps before the up-sampling of the encoder and its programming framework is implemented under these
the downsampling of the decoder. After concatenation the three atoms. It highly abstracts the structure of the deep
feature map is deconvolved. Different from the conventional neural network in terms of the “Layer”, and significantly
convolution, pooling, and other operations, this strategy of optimizes the execution efficiency through some elaborate
directly utilizing shallow features is called skip connection. design, and it has flexibility based on maintaining efficient
U-Net adopts the skip connection strategy of splicing to implementation.
make full use of the features of the downsampling part of
the encoder to be used for upsampling. To achieve a more Tensorflow
refined reduction, this strategy is applied to shallow feature TensorFlow is an open source software library that uses
information of all scales to achieve a better reduction effect. data flow diagrams for numerical calculations. Google
SegNet (17) is a depth semantic segmentation network officially opened the computing framework TensorFlow
designed by Cambridge to solve autonomous driving or on November 9, 2015, and officially released Google
intelligent robots, which is also based on the encoder- TensorFlow version 1.0 in 2017, marking its official
decoder structure. SegNet’s encoder and decoder each use in the production environment. The TensorFlow
have 13 convolution layers. The convolutional layer of calculation framework can well support various algorithms
the encoder corresponds to the first 13 convolutional for deep learning such as CNN, RNN and LSTM, but its
layers of VGG16. The upsampling part of the decoder application is not limited to deep learning, but also supports
uses UnPooling. SegNet records the element position the construction of general machine learning. TensorFlow’s
information of the maximum pooling operation when the components are excellent, and it provides powerful

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Annals of Translational Medicine, Vol 8, No 11 June 2020 Page 5 of 15

visualization capabilities through TensorBoard, which Deep learning for medical imaging analysis
can generate very powerful visual representations of real-
With the development of deep learning, computer vision
world network topologies and performance. At the same
uses a lot of deep learning to deal with various image
time, it supports heterogeneous distributed computing,
problems. Medical image as a special visual image has
which can run on multiple GPUs at the same time, and can
attracted the attention of many researchers. In recent years,
automatically run the model on different platforms. Because
various types of medical image processing and recognition
TensorFlow developed in C++, it has high-performance.
have adopted deep learning methods, including fundus
images, endoscopic images, CT/MRI images, ultrasound
PyTorch
images, pathological images, etc. At present, deep learning
Pytorch is the python version of torch, a neural network
technology is mainly used in classification and segmentation
framework that is open sourced by Facebook and specifically
in medical images. Figure 2 shows the main medical
targeted at GPU-accelerated deep neural network
application scenarios of deep learning.
programming. Unlike Tensorflow’s static calculation graph,
Pytorch’s calculation graph is dynamic, and the calculation
graph can be changed in real-time according to the The classification of medical image
calculation needs. In January 2017, the Facebook Artificial
Intelligence Institute (FAIR) team opened up PyTorch on Diabetic retinopathy detection
GitHub and quickly occupied the top of the GitHub hotlist. In the field of deep learning, image classification and its
PyTorch immediately attracted widespread attention as soon application have made great progress this year. On the one
as it was launched, and quickly became popular in research. hand, the academic circles have made great efforts to design
a variety of efficient CNN models, which have achieved
high accuracy and even exceeded the human recognition
High-performance computing based on GPU ability. On the other hand, the application of CNN model
The key factors of image processing in medical imaging in medical image analysis has become one of the most
field are imaging speed, image size and resolution. Due to attractive directions of deep learning. In particular, the
the limitation of the hardware, the processing of medical retinal fundus image obtained from fundus camera has
images calculated according to sequence. It is also due to become one of the key research objects of deep learning in
the lack of computing resources that the processing of the field of image classification.
these images wastes a lot of valuable time of doctors and The main method for studying related fundus diseases
patients. In recent years, GPU has made great progress using deep learning techniques is to classify and detect
and moved towards the direction of general computing. Its fundus images, such as diabetic retinopathy detection and
data processing capacity far exceeds that of CPU, which glaucoma detection. The following Table 1 lists the deep
makes it possible to realize high-performance computing on learning methods applied and fundus image analysis in
ordinary computers. the past 3 years. These methods mainly use the large scale
The full name of the GPU is the Graphics Processing dataset to train deep CNN model and perform disease
Unit, a microprocessor that performs image computing on classification detection on fundus images. The deep CNN
PCs, workstations, game consoles and some mobile devices. used to update iterations with the development of deep
In August 1999, NVIDIA released a GeForce 256 graphics learning techniques, from the earliest shallow CNN model
chip codenamed NV10. Its architecture is very different to the deep CNN model or some combination models, and
from that of the CPU. At the beginning of its birth, it was the use of migration learning, data augmentation and other
mainly oriented to the rendering of graphic images. Like new methods and techniques.
the CPU, the GPU is a processor in the graphics card that We mainly work of detecting fundus diseases in transfer
designed to perform complex mathematical and geometric learning. It’s very difficult to obtain large-scale medical
calculations that are required for graphics rendering. annotation set, and transfer learning is an effective method
With the GPU, CPU does not need to perform graphics to solve the problem of small data. In order to find the
processing work, and can perform other system tasks, potential factors between the accuracy and the type of
which can greatly improve the overall performance of the primary models, we process transfer learning using the pre-
computer. trained model, for example CaffeNet, GoogleNet, VGG19.

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Page 6 of 15 Cai et al. Deep learning and medical imaging analysis

A B C

F G

D E

H I

Figure 2 Deep learning application in medical image analysis. (A) Fundus detection; (B,C) hippocampus segmentation; (D) left ventricular
segmentation; (E) pulmonary nodule classification; (F,G,H,I) gastric cancer pathology segmentation. The staining method is H&E, and the
magnification is ×40.

The experimental results show that the transfer learning combined the detection methods of texture features
based on the pre-trained CNN model is introduced to solve and morphology (25,26), using AlexNet model to find
the problems in medical image analysis, and some effective the nodules in the image and predicting the benign and
results are produced. The Figure 3 shows our classification malignant, the AUC value reached 0.9325 (27). Shalev-
model of fundus. Shwartz et al. applied deep learning to the segmentation
of the left ventricular ultrasound image of the heart, and
Ultrasound detection of breast nodules achieved a better segmentation effect. In the study of
The current deep learning technology has achieved automatic classification of fetal facial ultrasound images,
research results in the field of ultrasound imaging such Yu integrated CNN and random two-coordinate descent
as breast cancer, cardiovascular and carotid arteries. optimization algorithm (28), achieving 96.98% accuracy. At
Compared with traditional machine learning, deep learning the same time, Yu uses the ResNet model in the automatic
can automatically filter features to improve recognition recognition of melanoma in dermoscopic images based on
performance based on multi-layer models. Deep learning deep aggregation features (29,30). The AUC has reached
has become an important tool for ultrasonic image more than 80%.
recognition with its high efficiency and accuracy, which In the classification of breast nodules, we mainly propose
can effectively improve the diagnostic accuracy. In the an enhancement method of data preprocessing, called
field of ultrasound imaging of breast nodules, Chen et al. adaptive contrast enhancement (ACE) method. Directly

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Table 1 The application of deep network in the fundus detection
Literature Target task Network structure Method introduction Result

Liskowski Vessel Deep CNN + Proposing a supervised segmentation technique that uses a deep neural ROC >0.99; accuracy of
et al. 2016 (19) segmentation complex data network. Using a relatively recent machine learning formalism of structured classification >0.97
preparation prediction to produce segmentation results

Fu et al. Vessel Fully CNNs Formulating the vessel segmentation to a boundary detection problem. State-of-the-art vessel
2016 (20) segmentation + conditional Using FCN and CRF to generate a vessel probability map and give a binary segmentation performance on
random fields classification result the DRIVE and STARE datasets

Dasgupta et al. Vessel FCN Formulating the segmentation task as a multi-label inference task and utilize the On DRIVE dataset; accuracy
2017 (21) segmentation implicit advantages of the combination of CNNs and structured prediction 95.33%; AUC: 0.974

Zhu et al. Vessel Extreme learning Extracting feature vectors from pixels. Constructing matrix for pixels based on Average accuracy 0.9607;
2017 (22) segmentation machine feature vectors and the manual labels. give ELM the matrix and get the binary sensitivity 0.7140; specificity
retinal vascular segmentation 0.9868

Hu et al. Vessel Multiscale CNN + Combining with CNN and fully CRFs. Developing a multiscale CNN with an Competitive in the sensitivity

© Annals of Translational Medicine. All rights reserved.


2018 (23) segmentation CRF + improved improved cross-entropy loss function while ensuring accuracy
cross-entropy loss
Annals of Translational Medicine, Vol 8, No 11 June 2020

Fu et al. Optic disc and M-Net Constructing image pyramid to achieve multiple level receptive field sizes. The State-of-the-art OD and OC
2018 (24) cup segmentation U-shape CNN learn the rich hierarchical representation. The side-output layer segmentation result on ORIGA
for glaucoma provides early classification result. Using multi-label loss function to generate data set
detection final segmentation result
CNN, convolutional neural network; FCN, fully convolutional network.

Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44


Page 7 of 15
Page 8 of 15 Cai et al. Deep learning and medical imaging analysis

Normal

Mild NPDR

Full
Conv 1 Full connected
connected
Conv 2 Moderate NPDR
Conv 3 Conv 4 Conv 5 max Full
connected
Pooling

Severe NPDR

Drop
Drop Soft
out
out max
PDR

Max
pooling
Max
pooling

Figure 3 Fundus retina detection using deep learning network.

sending the original data to the neural network training traditional machine learning algorithm. On the other hand,
often has a poor effect. On this basis, we combine the ACE the deep learning algorithm also has its bottleneck in the
algorithm to enhance the ultrasound image. The ACE detection of pulmonary nodules. First, the high detection
algorithm can calculate the relative pixel value of the target rate of lung nodules depends on a large number of data
point and surrounding pixels by difference to correct the sets, and at the same time depends on the accuracy of the
final pixel value, which has a good enhancement effect on annotation data. Secondly, as the number of network layers
the image. Figure 4 shows the enhanced image of breast. increases, the accuracy rate is not very large, which indicates
that the deep learning algorithm itself has its limitations.
Pulmonary nodule screening
Pulmonary nodule disease is a common lung disease.
The segmentation of medical image analysis
Figure 5 shows the whole picture of pulmonary nodule in
CT image. The accuracy rate of common X-ray chest film Early detection of Alzheimer’s disease (AD)
in the diagnosis of pulmonary nodules is less than 50%, and Brain MRI analysis is mainly for the segmentation of
even people with normal chest film can be detected to infer different brain regions and the diagnosis of brain diseases,
sarcoidosis. With CT becoming the main detection method such as brain tumor segmentation (31), schizophrenia
of pulmonary nodule diagnosis, more and more physical diagnosis, early diagnosis of Parkinson’s syndrome (32)
examinations begin to include lung cancer screening and early diagnosis of AD. Among them, the broadest
detection. According to relevant data statistics, the detection field of deep learning applications is the early diagnosis
rate of pulmonary nodule has increased 5 times in recent of AD. AD diagnosis based on deep learning is mainly
years. With the development of deep learning technology, based on segmentation of hippocampus, cortical thickness
a series of deep learning methods are emerging to detect and brain volume in brain MRI images. Sarraf et al. (33)
pulmonary nodules. trained AD samples for sMRI and fMRI using the well-
We compared a variety of existing lung nodule detection known LeNet-5 framework in CNNs, respectively, yielding
methods and found that the deep learning algorithm greatly 98.84% and 96.85% accuracy, respectively. This is the
improved the detection rate of nodules compared with the first time based on deep learning to analyzing fMRI data.

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Annals of Translational Medicine, Vol 8, No 11 June 2020 Page 9 of 15

A B C D

Figure 4 Image processed by the ACE algorithm. (A) represents the picture before processing, and (B) represents the processed picture; (C,D)
is also similar. ACE, adaptive contrast enhancement.

A B C D

Figure 5 The whole picture of pulmonary nodule. (A) shows the lung image; (B) represents the position of pulmonary nodule in the lung
image. (C,D) is also similar.

Payan et al. (34) proposed a 3D CNN for AD diagnosis the segmentation of hippocampus with mainstream deep
based on SAE pre-training. The 3D MRI scan is randomly learning networks, including FCN, Unet, SegNet, Unet-
selected to pre-train the SAE, and the trained SAE 3Dand Mask-RCNN. Figure 6 shows the segmented result
weights are used for 3D-CNN convolution filtering pre- of these networks.
training. Finally, the 3D-CNN fully connected layer is Furthermore, we use the dice coefficient as a measure.
fine-tuned, but fine-tuning needs to be at the expense of From the Figure 7, we can see that Mask-RCNN achieves
a large amount of computational complexity during the high precision in hippocampus segmentation. At the same
training phase. Hosseini-Asl et al. (35) and other analysis time, we noticed that the effect of Unet is far less than the
based on sMRI feature extraction technology limits the effect of Unet-3D. The reason may be that MRI imaging
accuracy of AD classification, they proposed a new depth- itself is a three-dimensional form, so the use of three-
supervised adaptive 3D-CNN network, in which 3D-CAE dimensional convolution can better interpret the segmented
learning and automatic extraction Identify AD features object.
and capture changes caused by AD. The 3D-CAE pre-
trained convolution filter is further applied to another set Left ventricular segmentation
of data fields, such as the CAD Dementia pre-trained AD Cardiac MRI analysis diagnoses heart disease by dividing
neuroimaging (ADNI) data set. In the early diagnosis of the left ventricle to measure left ventricular volume, ejection
AD, our research group also carried out related experiments fraction, and wall thickness. Among them, deep learning is
and proposed a deep learning method based on enhanced widely used in left ventricular segmentation. In recent years,
AlexNet. According to the characteristics of AD, we focus deep learning algorithms for left ventricular segmentation
on the how to precise the hippocampus. We compared on MRI image have emerged in an endless stream. Poudel

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Page 10 of 15 Cai et al. Deep learning and medical imaging analysis

A B C D E

Figure 6 The hippocampus segmentation result. (A) FCN segmentation; (B) Unet segmentation; (C) SegNet segmentation; (D) Unet-3d
segmentation; (E) Mask-RCNN segmentation. FCN, fully convolutional network.

A FCN B FCN
Unet 1.0 Unet
1.0 SegNet SegNet
Unet-3D Unet-3D
Mask-RCNN Mask-RCNN
0.8 0.8
Accuracy

0.6 0.6
Loss

0.4
0.4

0.2
0.2
0.0
0.0
0 50  100   150 200 0 50   100   150 200
Epoch Epoch

Figure 7 It is the training accuracy (A) and loss (B) diagram of five kinds of segmentation networks. FCN, fully convolutional network.

et al. (36) proposed a recurrent full convolutional network using a deep full CNN called hypercolumns. The 2D
(RFCN) that learns image representation from the entire segmentation results are integrated into different images to
2D slice stack and utilizes inter-slice spatial dependence estimate the volume. This model uses end-to-end training
through internal memory cells, which can detect and and uses the real volume directly as a label. Lieman-Sifry
segment anatomical Combined into a single architecture for et al. (39) developed the FastVentricle architecture based on
end-to-end training, it significantly reduces computation the ENet architecture. The FastVentricle architecture is a
time, simplifies the segmentation process, and enables real- FCN architecture for ventricular partitioning that runs four
time applications. Isensee et al. (37) proposed a method for times faster than the best ventricular partitioning structure
classifying into a fully automated processing pipeline by and six times less memory. Maintain good clinical accuracy.
integrating segmentation and disease, which uses a set of In this field, our group proposed a precise left cardiac
U-Net structures to segment the cardiac structure at each contour segmentation model based on group normalization
point in the cardiac cycle. Liao et al. (38) designed a detector and nearest neighbor interpolation, which is called GNNI
combined with a neural network classifier to detect the U-Net. We constructed a convolution module based on
ROI containing LV. The LV in the ROI is then segmented group normalization method for fast and accurately feature

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Annals of Translational Medicine, Vol 8, No 11 June 2020 Page 11 of 15

A B

Figure 8 Segmentation results of GNNI U-net on Sunnybrook dataset (A) and LVSC dataset (B). The figure shows the segmentation results
of the left ventricular contour image of GNNI U-net in six different scales of the two data sets. The green area is the correctly divided
contour area, the blue area is the mis-segmented area, and the red area is the leak-divided area.

extraction, and an up-sampling module for feature restoring several major types of lung cancer can be detected (42). In
based on nearest neighbor interpolation method. Our model the field of breast cancer pathology, Qaiser et al. established
got Dice coefficient of 0.937 on the Sunnybrook dataset a CNN-based intelligent image diagnosis system with a
and 0.957 and on the LVSC dataset, which achieves high judgment result of pathological sections (cancer and non-
precision on left cardiac contour segmentation. Figure 8 cancer tissue) as high as 0.833, four cancer classifications
shows our left ventricular segmentation effect. (normal tissue, benign tumor, original The accuracy of
the results of both cancer and invasive cancer is as high as
Gastric cancer pathological 0.778, which can reach the level of pathologists (43). Hanna
Pathological diagnosis is the “gold standard” of various and others based on the deep learning artificial intelligence
cancer diagnosis methods, which plays an important role in automatic human epidermal growth factor receptor scoring
the medical field (40). The contradiction between a large system to score human epidermal growth factor receptor,
number of market demands and a shortage of talents in the results are better than pathologists (44). An intelligent
the field of pathology gives a great opportunity for deep image diagnosis system constructed by Ehteshami Bejnordi
learning in this field, and digital pathology technology also et al. The diagnosis of gastric cancer pathological images
makes computer-assisted diagnosis possible (41). can reach 69.9% accuracy compared with pathologists (45),
Nowadays, deep learning technology has been applied Yoshida et al. reached 0.556 in three classifications of
to the pathological diagnosis of lung cancer, breast cancer gastric cancer pathology (normal tissues, adenomas, cancer
and gastric cancer. Its application mainly includes early cells) (46). In the field of gastric cancer pathology our
tumor screening and benign and malignant diagnosis of research team has established a benign and malignant
tumor. In the field of lung cancer pathology, Zhang et al. diagnostic system based on gastric cancer pathology based
established the “early computer diagnosis system for lung on deep learning, with a sensitivity of over 97% (47).
cancer” to detect lung cancer pathological sections, so that However, the above methods include that the initial work of

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Page 12 of 15 Cai et al. Deep learning and medical imaging analysis

A B C

D E F

Figure 9 Overview of gastric cancer pathology. (A) Pathological image with a cancerous tissue region; (B) label region corresponding to the
cancerous tissue in (A); (C) pathological image of cancer-free tissue; (D) the effect of magnifying observation of the cancerous tissue in (A);
(E) the effect of extracting the original pathological image and the label and extracting it into the diseased tissue; (F) the effect of magnifying
observation for a cancer-free image. The staining method is H&E, and the magnification is ×40.

our group rarely involves the field of image segmentation. image analysis. It has been successfully applied in target
In fact, the segmentation of pathological images is indeed detection, segmentation, classification and registration. The
a problem. We further designed a multi-input model development of deep learning in the medical field depends
called MIFNet to segment the lesions in the pathological on the accumulation of medical big data, while the medical
image, and increase the dice coefficient to 81.87% in the data itself has multi-modal characteristics, which provides
segmentation of gastric cancer case images, much higher a large amount of rich data for deep learning. In terms of
than some existing segmentation models. Such as U-Net disease treatment, deep learning can not only find the lesion
(67.73%), SegNet (63.89%) and PSPNet (60.51%). area, but also discriminate and classify specific lesions.
Figure 9 shows some sample experimental data. What we When the lesion is positive, many detection networks can
need to do is get the label of the style shown in Figure 9B, also segment the lesion area. While deep learning reflects
which marks the tissue area of the lesion on the image, as its own advantages, it still has certain shortcomings. The
shown in Figure 9E. deep learning model relies heavily on data sets. Each deep
learning network requires massive data for training, which
makes the data set acquisition more demanding. The root
Conclusions and future research
cause is that the pixel features from the original input image
Deep learning is one of the powerful tools for medical are too complex, so it is a future development trend to focus

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Annals of Translational Medicine, Vol 8, No 11 June 2020 Page 13 of 15

on designing a network with a smaller data size. by the Guest Editors (Haotian Lin and Limin Yu) for the
Deep learning widely applies all aspects of medical series “Medical Artificial Intelligent Research” published
image analysis, including ophthalmology, neuroimaging, in Annals of Translational Medicine. The article was sent for
ultrasound, etc. With the development of deep learning, external peer review organized by the Guest Editors and the
more and more medical fields will apply deep learning editorial office.
technology, and future deep learning will not only focus on
the single aspect of neuroimaging but also other aspects of Conflicts of Interest: All authors have completed the ICMJE
genomics and bioinformatics. uniform disclosure form (available at http://dx.doi.
The reason why deep learning can develop so rapidly org/10.21037/atm.2020.02.44). The series “Medical
in the medical field is inseparable from a large number of Artificial Intelligent Research” was commissioned by the
clinical practices. How to better apply deep learning to all editorial office without any funding or sponsorship. The
stages of medical treatment becomes a more challenging authors have no other conflicts of interest to declare.
task. It depends on two aspects: one is the constantly
updated iteration of technology, and the other is the Ethical Statement: The authors are accountable for all
continuous accumulation of medical experience. aspects of the work in ensuring that questions related
At present, a number of excellent algorithms have to the accuracy or integrity of any part of the work are
emerged in the fields of driverlessness, natural language appropriately investigated and resolved.
processing, computer vision, etc. These algorithms have
attracted great attention in their respective fields, and Open Access Statement: This is an Open Access article
how to use these advanced deep learning algorithms is an distributed in accordance with the Creative Commons
aspect worthy of our researchers’ constant thinking and Attribution-NonCommercial-NoDerivs 4.0 International
innovation. License (CC BY-NC-ND 4.0), which permits the non-
Deep learning on the medical imaging applications is not commercial replication and distribution of the article with
limited to the detection of big data routine diseases, but also the strict proviso that no changes or edits are made and the
effective solutions for rare diseases. At present, we mainly use original work is properly cited (including links to both the
the method of migration learning for small sample data to formal publication through the relevant DOI and the license).
converge and achieve prediction results. However, there are See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
certain limitations in migration learning. Not all rare diseases
can be predicted in this way, which brings new challenges and
References
opportunities for the diagnosis of intractable diseases.
With the advent of the 5G era, it provides a new and 1. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature
broad space for medical deep learning. Our traditional 2015;521:436-44.
machine learning algorithms are concentrated at the 2. Deng L, Yu D. Deep learning: methods and applications.
software level, and there is very little technology involved Foundations and Trends® in Signal Processing
in deep learning at the hardware level. The combination of 2014;7:197-387.
5G technology and deep learning technology enables the 3. Song HA, Lee SY. Hierarchical representation using
machine to achieve true intelligence. At the same time, the NMF. In: International conference on neural information
continuous development of intelligent medical devices and processing. Heidelberg: Springer, 2013:466-73.
medical robots promotes the realization of deep learning at 4. Zhang QL, Zhao D, Chi XB. Review for deep learning
the hardware level, and greatly facilitates patient treatment. based on medical imaging diagnosis. Computer Science
2017;44:1-7.
5. Krizhevsky A, Sutskever I, Hinton GE. Imagenet
Acknowledgments
classification with deep convolutional neural networks.
Funding: None. In: Advances in neural information processing systems.
2012:1097-105.
6. Lin M, Chen Q, Yan S. Network in network. arXiv
Footnote
preprint arXiv:1312.4400, 2013.
Provenance and Peer Review: This article was commissioned 7. Szegedy C, Liu W, Jia Y, et al. Going deeper with

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Page 14 of 15 Cai et al. Deep learning and medical imaging analysis

convolutions. In: Proceedings of the IEEE conference on connected conditional random fields. In: 2016 IEEE 13th
computer vision and pattern recognition. 2015:1-9. international symposium on biomedical imaging (ISBI).
8. Simonyan K, Zisserman A. Very deep convolutional New York: IEEE, 2016:698-701.
networks for large-scale image recognition. arXiv preprint 21. Dasgupta A, Singh S. A fully convolutional neural network
arXiv:1409.1556, 2014. based structured prediction approach towards the retinal
9. Ioffe S, Szegedy C. Batch normalization: accelerating deep vessel segmentation. In: 2017 IEEE 14th International
network training by reducing internal covariate shift. arXiv Symposium on Biomedical Imaging (ISBI 2017). New
preprint arXiv:1502.03167, 2015. York: IEEE, 2017:248-51.
10. Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the 22. Zhu C, Zou B, Zhao R, et al. Retinal vessel segmentation
inception architecture for computer vision. In: Proceedings in colour fundus images using extreme learning machine.
of the IEEE conference on computer vision and pattern Comput Med Imaging Graph 2017;55:68-77.
recognition. 2016:2818-26. 23. Hu K, Zhang Z, Niu X, et al. Retinal vessel segmentation
11. Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, of color fundus images using multiscale convolutional
inception-resnet and the impact of residual connections neural network with an improved cross-entropy loss
on learning. In: Thirty-first AAAI conference on artificial function. Neurocomputing 2018;309:179-91.
intelligence. 2017. 24. Fu H, Cheng J, Xu Y, et al. Joint optic disc and cup
12. He K, Zhang X, Ren S, et al. Spatial pyramid pooling in segmentation based on multi-label deep network and
deep convolutional networks for visual recognition. IEEE polar transformation. IEEE Trans Med Imaging
Trans Pattern Anal Mach Intell 2015;37:1904-16. 2018;37:1597-605.
13. He K, Zhang X, Ren S, et al. Deep residual learning 25. Su Y, Wang Y, Jiao J, et al. Automatic detection and
for image recognition. In: Proceedings of the IEEE classification of breast tumors in ultrasonic images using
conference on computer vision and pattern recognition. texture and morphological features. Open Med Inform J
2016:770-8. 2011;5:26-37.
14. Long J, Shelhamer E, Darrell T. Fully convolutional 26. Wang TC, Huang YH, Huang CS, et al. Computer-aided
networks for semantic segmentation. In: Proceedings of diagnosis of breast DCE-MRI using pharmacokinetic
the IEEE conference on computer vision and pattern model and 3-D morphology analysis. Magn Reson Imaging
recognition. 2015:3431-40. 2014;32:197-205.
15. Everingham M, Van Gool L, Williams CKI, et al. The 27. Chen SW, Liu YJ, Liu D, et al. AlexNet model and
pascal visual object classes challenge 2012 (voc2012) adaptive contrast enhancement based ultrasound imaging
results. Available online: http://www.pascal-network.org/ classification. Computer Science 2019;46:146-52.
challenges/VOC/voc2011/workshop/index 28. Shalev-Shwartz S, Zhang T. Stochastic dual coordinate
16. Ronneberger O, Fischer P, Brox T. U-net: Convolutional ascent methods for regularized loss minimization. Journal
networks for biomedical image segmentation. In: of Machine Learning Research 2013;14:567-99.
International Conference on Medical image computing 29. Baumgartner CF, Kamnitsas K, Matthew J, et al. Real-
and computer-assisted intervention. Cham: Springer, time standard scan plane detection and localisation in fetal
2015:234-41. ultrasound using fully convolutional neural networks. In:
17. Badrinarayanan V, Kendall A, Cipolla R. SegNet: A deep International conference on medical image computing
convolutional encoder-decoder architecture for image and computer-assisted intervention. Cham: Springer,
segmentation. IEEE Trans Pattern Anal Mach Intell 2016:203-11.
2017;39:2481-95. 30. Codella NCF, Nguyen QB, Pankanti S, et al. Deep
18. Zhao H, Shi J, Qi X, et al. Pyramid scene parsing network. learning ensembles for melanoma recognition in
In: Proceedings of the IEEE conference on computer dermoscopy images. IBM Journal of Research and
vision and pattern recognition. 2017:2881-90. Development 2017;61:5:1-5:15.
19. Liskowski P, Krawiec K. Segmenting retinal blood vessels 31. Yang Y, Yan LF, Zhang X, et al. Glioma grading on
with deep neural networks. IEEE Trans Med Imaging conventional MR images: a deep learning study with
2016;35:2369-80. transfer learning. Front Neurosci 2018;12:804.
20. Fu H, Xu Y, Wong DWK, et al. Retinal vessel 32. Zhang QL, Chi XB, Zhao D. Early diagnosis of
segmentation via deep learning network and fully- Parkinson’s disease based on deep learning. Computer

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44
Annals of Translational Medicine, Vol 8, No 11 June 2020 Page 15 of 15

Systems & Applications 2018;27:1-9. ultrasound and pathology images for special types of breast
33. Sarraf S, Tofighi G. DeepAD: Alzheimer’s disease malignant tumors. Chinese Journal of Medical Imaging
classification via deep convolutional neural networks 2015;(3):188-91.
using MRI and fMRI. BioRxiv 2016. doi: https://doi. 41. Gurcan MN, Boucheron LE, Can A, et al.
org/10.1101/070441. Histopathological image analysis: a review. IEEE Rev
34. Payan A, Montana G. Predicting Alzheimer's disease: Biomed Eng 2009;2:147-71.
a neuroimaging study with 3D convolutional neural 42. Zhang Y, Yukun YE, Wang D. Clinical application of
networks. arXiv preprint arXiv:1502.02506, 2015. image processing and neural network in cytopathological
35. Hosseini-Asl E, Ghazal M, Mahmoud A, et al. Alzheimer’s diagnosis of lung cancer. Chinese Journal of Thoracic and
disease diagnostics by a 3D deeply supervised adaptable Cardiovascular Surgery 2005;04.
convolutional network. Front Biosci (Landmark Ed) 43. Qaiser T, Mukherjee A, Reddy Pb C, et al. HER2
2018;23:584-96. challenge contest: a detailed assessment of automated
36. Poudel RPK, Lamata P, Montana G. Recurrent fully HER2 scoring algorithms in whole slide images of breast
convolutional neural networks for multi-slice MRI cardiac cancer tissues. Histopathology 2018;72:227-38.
segmentation. In: Zuluaga MA, Bhatia K, Kainz B, et al. 44. Gown AM, Goldstein LC, Barry TS, et al. High
Reconstruction, segmentation, and analysis of medical concordance between immunohistochemistry and
images. Cham: Springer, 2016:83-94. fluorescence in situ hybridization testing for HER2 status
37. Isensee F, Jaeger PF, Full PM, et al. Automatic in breast cancer requires a normalized IHC scoring system.
cardiac disease assessment on cine-MRI via time- Mod Pathol 2008;21:1271-7.
series segmentation and domain specific features. 45. Ehteshami Bejnordi B, Balkenhol M, Litjens G, et al.
In: International workshop on statistical atlases and Automated detection of DCIS in whole-slide H&E stained
computational models of the heart. Cham: Springer, breast histopathology images. IEEE Trans Med Imaging
2017:120-9. 2016;35:2141-50.
38. Liao F, Chen X, Hu X, et al. Estimation of the volume 46. Yoshida H, Yamashita Y, Shimazu T, et al. Automated
of the left ventricle from MRI images using deep neural histological classification of whole slide images of
networks. IEEE Trans Cybern 2019;49:495-504. colorectal biopsy specimens. Oncotarget 2017;8:90719-29.
39. Lieman-Sifry J, Le M, Lau F, et al. FastVentricle: cardiac 47. Zhang ZZ, Gao JY, Lv G, et al. Pathological image
segmentation with ENet. In: International Conference on classification of gastric cancer based on depth learning.
Functional Imaging and Modeling of the Heart. Cham: Computer Science 2018. doi: 10.11896/j.issn.1002-
Springer, 2017:127-38. 137X.2018.Z11.053.
40. Meng Y, Zhang D, Yandong LI, et al. Analysis of

Cite this article as: Cai L, Gao J, Zhao D. A review of the


application of deep learning in medical image classification and
segmentation. Ann Transl Med 2020;8(11):713. doi: 10.21037/
atm.2020.02.44

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(11):713 | http://dx.doi.org/10.21037/atm.2020.02.44

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy