Synthbraingrow: Synthetic Diffusion Brain Aging For Longitudinal Mri Data Generation in Young People
Synthbraingrow: Synthetic Diffusion Brain Aging For Longitudinal Mri Data Generation in Young People
Netherlands
4 Boston Children's Hospital, Boston, MA, United States
azapaishchykova@bwh.harvard.edu,
benjamin_kann@dfci.harvard.edu
Abstract. Synthetic longitudinal brain MRI simulates brain aging and would
enable more efficient research on neurodevelopmental and neurodegenerative
conditions. Synthetically generated, age-adjusted brain images could serve as
valuable alternatives to costly longitudinal imaging acquisitions, serve as internal
controls for studies looking at the effects of environmental or therapeutic
modifiers on brain development, and allow data augmentation for diverse
populations. In this paper, we present a diffusion-based approach called
SynthBrainGrow for synthetic brain aging with a two-year step. To validate the
feasibility of using synthetically-generated data on downstream tasks, we
compared structural volumetrics of two-year-aged brains against synthetically-
aged brain MRI. Results show that SynthBrainGrow can accurately capture
substructure volumetrics and simulate structural changes such as ventricle
enlargement and cortical thinning. Our approach provides a novel way to
generate longitudinal brain datasets from cross-sectional data to enable
augmented training and benchmarking of computational tools for analyzing
lifespan trajectories. This work signifies an important advance in generative
modeling to synthesize realistic longitudinal data with limited lifelong MRI
scans. The code is available at XXX.
1 Introduction
Brain aging research relies heavily on magnetic resonance imaging (MRI) to track
longitudinal changes in brain structure and function [1], [2]. Modeling long-term
trajectories of different volumetric structures is critical for understanding healthy
approaches. In contrast to their work, we focus on the younger subjects ranging from
8-16 years old from the ABCD study with a two-year scan interval and DDPM as a
backbone model, which allows us to synthetically age brain MRI using only one
baseline image.
Contribution. We propose the first diffusion model for the synthetic aging of subject-
specific brain MRIs and the first model of any kind for synthetic aging in young people.
Our model simulates two years of anatomically-plausible brain maturation based on
paired scans showing real aging effects. We demonstrate the utility of our synthesized
pseudo-longitudinal data by analyzing age-related substructural volumetrics and
volumetric changes.
2 Method
An overview of the workflow for an image of the ABCD dataset is shown in Fig. 1.
Fig. 1. Top panel: Method overview. Step 1: MRI preprocessing, pairwise co-
registration, intensity normalization, and rescaling to 64x64x64. Step 2: Diffusion
probabilistic model SynthBrainGrow for synthetic brain 2-year aging. Step 3: Image
upscaling using SynthSR. Step 4: Brain tissue segmentation using SynthSeg. For more
details on each step, please refer to the “3.1 Experimental Setup & Dataset” section.
Bottom panel: The training and sampling procedure of our method. In every step t, the
anatomical information is induced by concatenating the baseline brain MR images b to
the noisy aged brain xc,t
Our model was trained on paired 3D T1w MRI scans of the same subjects scanned
two years apart. The first scan provides the input for the baseline of healthy brain, while
the second scan provides the ground truth for the image after two years of aging. By
training the diffusion model on these input-output pairs, the model learns to take a
healthy brain as input and output a version that has simulated two years of aging. We
follow the idea and implementation proposed by Wolleb et al. [12] and Dorjsembe et
4 A. Zapaishchykova, B.H. Kann et al.
al. [18]. Like DDPMs, our aging synthesis approach relies on a forward diffusion
process that adds Gaussian noise to brain MRI scans from young healthy individuals,
followed by a reverse generative process that denoises the images. However, we
incorporated the anatomical guidance from the baseline scan during diffusion.
Specifically, at each time step t, our model takes as input a noisy aged brain image xt
along with a corresponding input baseline brain scan c. We concatenate these along the
channel dimension to produce an augmented input:
X:= xc ⊕ c (1)
This concatenated volume provides essential anatomical cues to guide the denoising
diffusion process. The forward diffusion process that corrupts the baseline scan x0 over
T steps is defined the same as DDPM:
The reverse generative modeling process relies on our conditional diffusion model
pθ(xt-1|~xt). At each timestep, the model takes as input ~xt and outputs the denoised xt-1
used for generation after T steps:
We evaluated our method on the ABCD dataset (Data Release 5.1). The ABCD Study®
operates as a consortium, comprising 21 data collection sites across the continental US
to sample in an epidemiologically-informed and inclusive way [6]. We performed a
pairwise registration using the Elastix [19] package for each patient 3D T1w MRI scans
pair, followed by a skull stripping step using HD-BET [20]. The image intensity was
then normalized with brain mask as guidance. Addtionally, the image was
downsampled to 3×3×2.5 mm3 in voxel size and the resulting volume was cropped to
the size of 64×64×64 mm3. To overcome the memory size constraints and save
computational time during model training, we pre-computed all the preprocessing steps
prior to the deep learning training.
The total number of 3D T1w MRI pairs is 9324, originating from 7843 patients aged
8-16 years (53% Male). We performed the random 70/15/15 train/validation/test split,
which results in 6526/1399/1399 MRI scan pairs. We chose a linear noise schedule for
T=1000 steps. The U-Net was trained with the loss objectives given in the study by
SynthBrainGrow 5
Nichol et al. [11] using the MONAI framework v1.4 with a learning rate of 10-4 using
Adam optimizer and a batch size of 1. We trained the model for 4,000 epochs on 1x
Nvidia A6000 with a validation evaluation step for every 100 epochs, which took
around one day per 100 epochs.
For the MRI postprocessing, we upsampled the image ×2 using spline interpolation,
resample voxel size back to 1×1×1 mm3, and increased image resolution using
FreeSurfer v.7.4.1 SynthSR v2.0 [21]. To segment brain structures, we used FreeSurfer
v.7.4.1 SynthSeg v1.0 [22] (see Fig. 2. for an example of synthetically aged brain MRI).
We discarded the testing cases with anatomically-implausible ground truth
segmentation, which is lower than 30 WMV and lower than 1 mm3/10,000 sGMV units.
All implementation details can be found in the study git repository XX.
Fig. 2. A. An example of synthetically-aged brain MRI in axial, sagittal, and coronal view cuts
(z=80, x=85, y=93) with an overlaid heatmap (blue) of the normalized delta, which was
calculated as the difference between a ground truth scan and a synthetically-aged scan. A lighter
color indicates more difference. B. SynthSeg bilateral segmentation mask of synthetically-aged
scan with an overlay heatmap (blue) of the normalized delta.
The evaluation of synthetic medical image quality requires robust metrics to ensure
accuracy and reliability. The use of structural similarity indices such as the structural
similarity index measure (SSIM) for evaluating synthetic medical images has come
under recent scrutiny, as it may not effectively capture perceptual quality or clinical
usefulness in synthesized radiology scans [23]. This limitation seems especially
relevant for synthetic brain MRIs modeling neurodevelopment, where clinical value is
derived from quantitative biomarkers like volumetrics [24]. Similarly, SSIM does not
reflect image quality well, suggesting its inadequacy in evaluating image quality in
6 A. Zapaishchykova, B.H. Kann et al.
Table 1. Volumetric structural comparison between ground truth intra-subject two-year aged
brains and synthetically-generated ones (N=1399). GMV: Gray matter GMV; WMV: white
matter; sGMV: subcortical GM; VV: ventricular volume; MAE: mean absolute error; Delta,%:
the difference between synthetically generated one and ground truth intra-subject two-year-aged
brain, normalized by the ground truth.
Fig. 3. A. Scatterplots with regression model fit lines comparison of ground truth (GT) versus
synthetically-aged (prediction) scan for bilateral WMV, GMV, sGMV, and VV volume
(N=1399). Axes are scaled in units of 10,000 mm3. B. Bland-Altman plots for substructure
volumetrics agreement between GT vs. prediction for bilateral WMV, GMV, sGMV, and VV.
GMV: Gray matter GMV; WMV: white matter; sGMV: subcortical GM; VV: ventricular
volume.
Fig. 4. Combined uncertainty mean maps with variance heatmap overlay for ten sampled MRI
T1w brains for a single subject, five axial slice views (z- axial slice number). A lighter color
means higher variance.
Our model was trained on a relatively narrow age range and sample from one study
representative of the population within United States. Testing performance is needed
when extrapolating beyond the training data to younger or older ages. Real longitudinal
within-person trajectories may show more variability and nonlinearity than model
approximation. Incorporating diverse scans from multi-site datasets spanning different
demographics, health statuses, and neurodegenerative conditions might reveal where
synthesis quality drops and additional training is required.
8 A. Zapaishchykova, B.H. Kann et al.
Mapping synthetic scans back to brain age versus chronological age biomarkers may
offer a universal framework for validation. Ideal outputs would mirror consistent but
variable patterns of within-person maturation and decline in large-scale studies. This
could indicate utility for personalized prediction of neurocognitive trajectories.
Extending to longitudinal training and evaluating scan trajectories against real
neuropsychological, molecular, and clinical aging biomarkers is an exciting future
direction.
Additionally, we will consider sampling with the DDIM approach to speed up the
sampling process in future work.
4 Conclusion
Acknowledgments.
Disclosure of Interests.
SynthBrainGrow 9
References
[13] Y. Xie and Q. Li, “Measurement-conditioned Denoising Diffusion Probabilistic Model for
Under-sampled Medical Image Reconstruction.” arXiv, Mar. 05, 2022. doi:
10.48550/arXiv.2203.03623.
[14] A. Zapaishchykova et al., “Diffusion Deep Learning for Brain Age Prediction and
Longitudinal Tracking in Children Through Adulthood.” medRxiv, p.
2023.10.17.23297166, Oct. 20, 2023. doi: 10.1101/2023.10.17.23297166.
[15] A. Durrer et al., “Diffusion Models for Contrast Harmonization of Magnetic Resonance
Images.” arXiv, Mar. 14, 2023. doi: 10.48550/arXiv.2303.08189.
[16] S. Bao et al., “Prediction of brain age using quantitative parameters of synthetic magnetic
resonance imaging,” Front. Aging Neurosci., vol. 14, 2022, Accessed: Jan. 12, 2024.
[Online]. Available: https://www.frontiersin.org/articles/10.3389/fnagi.2022.963668
[17] J. Fu et al., “Fast three-dimensional image generation for healthy brain aging using
diffeomorphic registration,” Hum. Brain Mapp., vol. 44, no. 4, pp. 1289–1308, 2023, doi:
10.1002/hbm.26165.
[18] Z. Dorjsembe, H.-K. Pao, S. Odonchimed, and F. Xiao, “Conditional Diffusion Models for
Semantic 3D Medical Image Synthesis.” arXiv, Jul. 31, 2023. doi:
10.48550/arXiv.2305.18453.
[19] S. Klein, M. Staring, K. Murphy, M. A. Viergever, and J. P. W. Pluim, “elastix: A Toolbox
for Intensity-Based Medical Image Registration,” IEEE Trans. Med. Imaging, vol. 29, no.
1, pp. 196–205, Jan. 2010, doi: 10.1109/TMI.2009.2035616.
[20] F. Isensee et al., “Automated brain extraction of multisequence MRI using artificial neural
networks,” Hum. Brain Mapp., vol. 40, no. 17, pp. 4952–4964, 2019, doi:
10.1002/hbm.24750.
[21] “SynthSR: A public AI tool to turn heterogeneous clinical brain scans into high-resolution
T1-weighted images for 3D morphometry | Science Advances.” Accessed: Jan. 12, 2024.
[Online]. Available: https://www.science.org/doi/10.1126/sciadv.add3607
[22] B. Billot et al., “SynthSeg: Segmentation of brain MRI scans of any contrast and resolution
without retraining,” Med. Image Anal., vol. 86, p. 102789, May 2023, doi:
10.1016/j.media.2023.102789.
[23] “Properties of the SSIM metric in medical image assessment: Correspondence between
measurements and the spatial frequency spectrum.” Accessed: Jan. 11, 2024. [Online].
Available: https://www.researchsquare.com
[24] “On the proper use of structural similarity for the robust evaluation of medical image
synthesis models - Gourdeau - 2022 - Medical Physics - Wiley Online Library.” Accessed:
Jan. 10, 2024. [Online]. Available:
https://aapm.onlinelibrary.wiley.com/doi/10.1002/mp.15514
[25] C.-Y. Yang, C. Ma, and M.-H. Yang, “Single-Image Super-Resolution: A Benchmark,” in
Computer Vision – ECCV 2014, D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, Eds., in
Lecture Notes in Computer Science. Cham: Springer International Publishing, 2014, pp.
372–386. doi: 10.1007/978-3-319-10593-2_25.
[26] G. P. Renieblas, A. T. Nogués, A. M. González, N. Gómez-Leon, and E. G. Del Castillo,
“Structural similarity index family for image quality assessment in radiological images,” J.
Med. Imaging Bellingham Wash, vol. 4, no. 3, p. 035501, Jul. 2017, doi:
10.1117/1.JMI.4.3.035501.