Symmetry-Enhanced Attention Network For Acute Ischemic Infarct Segmentation With Non-Contrast CT Images
Symmetry-Enhanced Attention Network For Acute Ischemic Infarct Segmentation With Non-Contrast CT Images
Kongming Liang1 , Kai Han2 , Xiuli Li2 , Xiaoqing Cheng3 , Yiming Li2 ,
Yizhou Wang4 , and Yizhou Yu2,5 ( )
arXiv:2110.05039v1 [eess.IV] 11 Oct 2021
1
Pattern Recognition and Intelligent System Laboratory, School of Artificial
Intelligence, Beijing University of Posts and Telecommunications, Beijing, China
2
Deepwise AI Lab, Beijing, China
3
Department of Medical Imaging, Jinling Hospital, Nanjing University School of
Medicine, Nanjing, Jiangsu, China
4
Department of Computer Science and Technology, Peking University, Beijing, China
5
The University of Hong Kong, Pokfulam, Hong Kong
yizhouy@acm.org
1 Introduction
Acute ischemic stroke is one of the leading causes of death and disability world-
wide and imposes an enormous burden for the health care system [8]. The use of
pretreatment neuroimaging is critical to improve neurological outcomes of pa-
tients with stroke symptoms. Compared to MRI, non-contrast head CT scan is
commonly used as the initial imaging because of its wide availability and low
acquisition time. To interpret early infarct signs in CT, the Alberta Stroke Pro-
gram Early CT Score (ASPECTS) evaluation was proposed at the beginning in
2 K. Liang et al.
Fig. 1. The anatomical symmetry of the brain CT images. (a) and (b) show the two
symmetrical patches of the image in axial view and across axial view. The dotted red
line in (b) denotes the slice from axial view. Due to the rotation of patient’s head, the
symmetrical landmarks may appear on different images in the axial view.
the 2000s [2] and has found increasing acceptance in clinical practice. However,
ASPECTS evaluation is only an approximation of the assessment of early is-
chemic changes. Since the density of lesions is subtle and can be confounded by
normal physiologic changes, quantitative estimation of acute ischemic infarct is
challenging. In clinical practice [9], bilaterally symmetric (illustrated in Fig. 1)
provides useful information for the identification of acute ischemic infarct.
Anatomical asymmetry has been utilized in previous works to localize and
segment the abnormal regions for neuroimaging analysis. [12, 15] leverage the
symmetry by adding extra information beyond the input image. [12] calculates
the differences of each voxel by subtracting the original brain from the mirrored
brain. The difference map is further used as the input to train a random forest
classifier to yield lesion segmentation. [15] extracts both the original patch and
its symmetric patch, and feeds them into the network simultaneously. Except for
calculating the asymmetry on image-level, [3, 11, 16] propose to explore feature-
level fusion of the two symmetry regions. For instance, two-branch networks (e.g.
siamese network) can learn the features of left and right hemispheres and measure
the difference between the features of two hemispheres to analyze abnormalities
such as Alzheimer’s disease [11], ischemic stroke [3, 10] and brain tumors [16].
Even though the pixel-wise difference is widely used in previous methods, it can
not efficiently exploit the bilaterally symmetric information due to the limitation
of context modeling. In addition, all the above methods need the input images
to be already calibrated which cannot be guaranteed in practice.
In this paper, a symmetry enhanced attention network (SEAN) is proposed
for acute ischemic infarct segmentation. The proposed SEAN can automatically
transform an input image into the standard space without any human super-
vision. The transformed image is further processed by a U-shape network that
contains encoding and decoding stages. Different from the original design [13],
the encoder performs 3d convolution to leverage context information of adjacent
images in axial. Then, a symmetry enhanced attention module is integrated
Symmetry Enhanced Attention Network 3
Alignment
Network SEA
L1 Loss Restore
between the encoding and decoding stages to efficiently model the anatomical
symmetry. In summary, the main contributions of our paper are as follows.
2 Method
Since the poses of patients are arbitrary when they perform CT scans, the brain
images are usually not in standard space. In order to effectively use the sym-
metry of the brain, we attempt to align the image to keep the region of brain
in the center of the image and horizontally symmetrical. However, traditional
registration based method can not be applied in clinical practice due to the high
time complexity. Therefore, we proposed a Symmetry based Alignment Network
4 K. Liang et al.
as show in Fig. 2 which can automatically align the brain images with only the
information of images itself. Inspired by Spatial Transformer Networks [7], we
design the symmetry based alignment network as: c2d[32,7,7]-relu-max2d[2,2]-
c2d[32,5,5]-relu-max2d[2,2]-fc[3] where c2d[n,cw ,ch ] denotes a 2d convolutional
layer with n filters of size cw × ch , max2d[sh , sw ] is a 2d max-pooling layer with
the kernel size and the stride as sh × sw , fc[n] is a fully connected layer with n
units. The output of the network is interpreted as the parameters α (rotation,
horizontal shift and vertical shift) of rigid transformation matrix.
Given an input volume, we define Ai as the i-th slice in the axial view. During
training, the output parameters α is applied to the input slice Ati = fα (Ai ) using
parameterised sampling grid. Then we generate the flipped version of Ati as Ãti .
The total loss is designed as the following:
The proposed segmentation network adopts the structure of UNet [13] which is
mainly composed of two parts: the encoder stage and the decoder stage. Inspired
by [5], we use 3D convolutions as the basic encoding block to keep the context
information from adjacent images in axial view. For the decoding stage, the
middle plane of input volume is retained as the target image and upsampled
to the original resolution for pixel-wise labelling. We name the above network
as HybridUnet. Finally, we cascade the last encoding block with the symmetry
attention module. In this way, the feature representation can be enhanced by its
symmetry information to efficiently assess the presence and extent of ischemic
infarct.
To exploit the context information of the i-th axial image, HybridUnet takes
its adjacent images {Ai+t |t = −T, · · · , T } as the input. The input images are
firstly processed by the 3d encoder. We design the encoder block as: c3d-bn-
relu-c3d-bn-relu-max3d where c3d denotes 3d convolutional layer, bn denotes
3d batchnorm layer and max denotes a 3d max-pooling layer. The encoder stage
contains five encoder blocks. The output feature from the last encoder block is
denoted as Xi ∈ RC×H×W for the input image Ai where H and W represent
the height and width of the output feature respectively and C is the number of
Symmetry Enhanced Attention Network 5
0 0 0 0 0
where Sti,j,k , S̃ti,j,k ∈ RN ×N (N = H × W ) are the similarity matrix of the self
attention and symmetry attention respectively. θ(·) and φ(·) perform convolution
operations to reduce the number of input channels to d (e.g. d = C2 ) and reshape
0 0 √
the output to Rd×H W . We use d as a scaling factor for the inner product to
solve the small gradient problem of softmax function. Xi+t,j,k and X̃i+t,j,k are
also fed into g(·) and h(·) to compute the new representation by convolution
C 0 0
operations and reshape the output feature map to R 2 ×H W . The symmetry
6 K. Liang et al.
0 0 0 0
enhance feature Yi,j,k ∈ RC×H W is reshaped to C × H × W and further
considered as the residual mapping of Xi,j,k to acquire the final output of the
symmetry enhanced attention.
3 Experiments
In this section, we first introduce the data acquisition and evaluation indicators
of our model in Sec. 3.1. Then we show the detail information of implementation
in Sec. 3.2. Finally, we compare our approach with the state-of-the-art methods
and conduct extensive ablation studies in Sec. 3.3.
and its flipped version as an extra information. We also implement the above
methods using HybridUnet [5] as the backbone for further comparison.
across-planar symmetry information, it can differentiate the infarct and the nor-
mal physiologic change more efficiently. In general, symmetry based methods
observed significant improvements according to both Dice and F1. This is also
consistent with doctors’ habit in clinical practice. For the effectiveness of the
backbone network, HybridUnet achieves better performance than the original
Unet, which demonstrates the importance of context information from adja-
cent images. Feature-level method outperforms image-level method, since the
feature-level method is robust to the misalignment of the input image. We con-
duct ablation studies of SEAN and show the results in Tab. 2. The influence
of the proposed alignment network and the two type of attention mechanism:
self-attention and symmetry-attention is investigated. According to the results,
it can be seen that the proposed alignment network improves over the baseline
for a large margin. We can also see that symmetry enhanced attention yields a
higher increase in both Dice and F1 comparing to only using self-attention. We
also show some qualitative examples in Fig. 3.
4 Conclusion
References
1. Avants, B.B., Tustison, N., Song, G.: Advanced normalization tools (ants). Insight
j 2(365), 1–35 (2009)
2. Barber, P.A., Demchuk, A.M., Zhang, J., Buchan, A.M., Group, A.S., et al.: Va-
lidity and reliability of a quantitative computed tomography score in predicting
outcome of hyperacute stroke before thrombolytic therapy. The Lancet 355(9216),
1670–1674 (2000)
3. Barman, A., Inam, M.E., Lee, S., Savitz, S.I., Sheth, S.A., Giancardo, L.: De-
termining ischemic stroke from ct-angiography imaging using symmetry-sensitive
convolutional networks. In: International Symposium on Biomedical Imaging. pp.
1873–1877 (2019)
4. Evans, A.C., Collins, D.L., Mills, S., Brown, E.D., Kelly, R.L., Peters, T.M.: 3d
statistical neuroanatomical models from 305 mri volumes. In: 1993 IEEE conference
record nuclear science symposium and medical imaging conference. pp. 1813–1817.
IEEE (1993)
5. Fang, C., Li, G., Pan, C., Li, Y., Yu, Y.: Globally guided progressive fusion net-
work for 3d pancreas segmentation. In: Shen, D., Liu, T., Peters, T.M., Staib,
L.H., Essert, C., Zhou, S., Yap, P.T., Khan, A. (eds.) Medical Image Comput-
ing and Computer Assisted Intervention – MICCAI 2019. pp. 210–218. Springer
International Publishing, Cham (2019)
6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.
In: Computer Vision and Pattern Recognition (2015)
10 K. Liang et al.
7. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks.
In: Advances in neural information processing systems. pp. 2017–2025 (2015)
8. Katan, M., Luft, A.: Global burden of stroke. In: Seminars in neurology. vol. 38,
pp. 208–211. Georg Thieme Verlag (2018)
9. Khan Academy, A.A.o.C.o.N.: Diagnosing strokes with imaging ct, mri, and an-
giography. https://www.khanacademy.org
10. Kuang, H., Menon, B.K., Qiu, W.: Automated infarct segmentation from follow-
up non-contrast ct scans in patients with acute ischemic stroke using dense multi-
path contextual generative adversarial network. In: Shen, D., Liu, T., Peters, T.M.,
Staib, L.H., Essert, C., Zhou, S., Yap, P.T., Khan, A. (eds.) Medical Image Com-
puting and Computer Assisted Intervention – MICCAI 2019. pp. 856–863. Springer
International Publishing, Cham (2019)
11. Liu, C.F., Padhy, S., Ramachandran, S., Wang, V.X., Efimov, A., Bernal, A.,
Shi, L., Vaillant, M., Ratnanather, J.T., Faria, A.V., Caffo, B., Albert, M., Miller,
M.I.: Using deep siamese neural networks for detection of brain asymmetries associ-
ated with alzheimer’s disease and mild cognitive impairment. Magnetic Resonance
Imaging 64, 190 – 199 (2019)
12. Qiu, W., Kuang, H., Teleg, E., Ospel, J.M., Sohn, I., Almekhlafi, M., Goyal, M.,
Hill, M.D., Demchuk, A.M., Menon, B.K.: Machine learning for detecting early
infarction in acute stroke with non–contrast-enhanced ct. Radiology 294(3), 638–
644 (2020)
13. Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomed-
ical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F.
(eds.) Medical Image Computing and Computer-Assisted Intervention – MICCAI
2015. pp. 234–241. Springer International Publishing, Cham (2015)
14. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Pro-
ceedings of the IEEE conference on computer vision and pattern recognition. pp.
7794–7803 (2018)
15. Wang, Y., Katsaggelos, A.K., Xue, W., Parrish, T.B.: A deep symmetry convnet for
stroke lesion segmentation. In: IEEE International Conference on Image Processing
(ICIP) (2016)
16. Zhang, H., Zhu, X., Willke, T.L.: Segmenting brain tumors with symmetry. In:
Proceedings of NIPS Workshop (2017)
1
arXiv:2110.05039v1 [eess.IV] 11 Oct 2021
Fig. 2. ASPECTS regions are consist of 10 different parts which are symmetrical with
respect to the cerebral falx.