Keywords

1 Introduction

Motion management is a key element in external beam radiotherapy of thoracic or abdominal tumours prone to respiratory movement. Pioneered in photon therapy, 4D treatment planning and motion monitoring techniques have gained in importance also in the field of particle treatments [13]. Due to higher dose conformity and the absence of radiation dose distal to the Bragg peak, proton therapy enables precise target treatment while spearing healthy tissue and organs at risk. However, in presence of organ motion, actively scanned proton beam therapies are hampered by interplay effects and inhomogeneous dose distributions [1] emphasising the need for sophisticated motion mitigation strategies, such as rescanning, gating or tracking [1, 13]. In tracking, for example, the treatment beam is adapted to follow the tumour motion with the goal to ensure optimal target coverage. To do so, however, predictive methods and motion models are crucial in order to cope with respiratory motion variabilities and system latency.

In the field of radiotherapy, motion variabilities are classified into two categories: intra-fractional and inter-fractional motion variations [6]. Intra-fractional variations refer to motion variations between different respiratory cycles observed within a single treatment session; inter-fractional variations include anatomical and physiological differences between treatment sessions. Such motion variabilities should be considered for both treatment planning and dose delivery [5]. In this context, 4D imaging and motion modelling are widely discussed techniques. Motion models are necessary when direct imaging of the internal motion is not feasible. The idea is to estimate the motion of interest based on more readily available surrogate data. 4D imaging provides dense internal motion information and therefore constitutes an important element for respiratory motion modelling. While 4D imaging is traditionally performed with computed tomography (4D CT), respiratory-correlated magnetic resonance imaging (4D MRI) methods have increasingly been developed in the last decade due to their superior soft-tissue contrast and the lack of radiation dose [12].

Fig. 1.
figure 1

Illustration of the pretreatment phase. See Sect. 2 and Sect. 3.1 for details.

In this work, we present an inter-fractional respiratory motion management pipeline for the lungs based on abdominal ultrasound (US) imaging as illustrated in Fig. 1. It involves hybrid US/MR imaging, principal component regression, and a novel 4D MRI technique [4]. The proposed approach follows a typical motion management scheme: In a pretreatment phase, simultaneous US and MR imaging acquisitions are performed and a motion model is computed. During treatment delivery, online US imaging is used to predict the respiratory motion for tumour tracking. We demonstrate the feasibility of our approach on five healthy volunteer datasets for two of which the US probe has been repositioned between motion modelling and prediction. Although not truly inter-fractional in the sense that there are days or weeks between two acquisitions, the presented data serve as preliminary data in this feasibility study.

US imaging has been proposed for image-guided interventions and radiotherapy before due to its advantages over other imaging modalities and surrogate signals [8]: it provides internal organ motion information at high temporal resolution, and therefore potentially detects phase shifts and organ drift [11], it is non-invasive and available during treatment delivery. However, as the lungs cannot be imaged directly, US guidance has mainly been applied for liver, heart or prostate. In [9], for example, an US-driven respiratory motion model for the liver has been presented. It requires precise co-registration of US and MR images in order to establish correspondence between tracked liver points. Indirect lung tumour tracking strategies based on 2D abdominal US have only been proposed recently [2, 7]. Mostafaei et al. [7] combine US imaging and cone-beam CT (CBCT) in order to reduce the CBCT imaging frequency and therefore the imaging dose to the patient. However, the tumour motion is estimated in superior-inferior (SI) direction only. In [2] dense motion information was predicted based on an adversarial neural network. Although promising, it is not clear how this approach performs if the US imaging plane is shifted.

With this work we address the clinically relevant question of how the respiratory motion model performs in case of US probe repositioning between two imaging sessions. The novelty of our work does not primarily lie in the methodological components themselves but rather in their combination into a complete respiratory motion management pipeline. We combine US imaging with a recently presented 4D MRI technique and present first results in a feasibility study.

2 Background

Dense motion estimation is generally represented as a 3D deformation field which can be derived from any 4D imaging technique in combination with deformable image registration (DIR) methods. The 4D MRI sequence applied here uses 3D readouts and, unlike most other approaches, is a time-resolved imaging method [4]. As opposed to respiratory-correlated 4D MRI methods [12], it does not assume periodic respiration but provides continuous motion information. It is based on the assumption that the respiratory motion information is mapped mainly to the low-frequency k-space center. Following this rationale, circular patches at the k-space center \(C_t \subset \mathbb {C}^3\) capture low-frequency image components with motion information while peripheral patches \(H_t \subset \mathbb {C}^3\) account for image sharpness and structural details. Since these patches consist of a small portion of the k-space only, they can be acquired at a much higher temporal resolution as compared to the entire k-space. In Fig. 1, the 3D k-space is represented as a cube and the patches are illustrated as cylinders with the height pointing into the phase encoding direction.

Center and peripheral patches are acquired alternately and combined into patch pairs \(P_t = \{C_t, H_t\}\). The center patches \(C_t\) are transformed to the spatial domain by applying the inverse Fourier transform \(I_t = \mathcal {F}^{-1}(C_t)\). Then, a diffeomorphic registration method is applied to obtain the 3D deformation field between a reference image and \(I_t\) [10]. For further details, the reader is referred to [4]. In the following, we refer to the vectorised deformation field at time t as \(\varvec{y}_t \in \mathbb {R}^{d}\) with dimension d. Note that the peripheral patches \(H_t\) are not required for motion modelling but might be necessary for the treatment planning.

3 Method

3.1 Pretreatment Phase

Data Acquisition. Simultaneous US/MR acquisitions are performed in order to ensure temporal correspondence between the center patches \(C_t\) and the US images \(U_t\) as shown in Fig. 1. The US imaging plane is chosen such that parts of the liver and the diaphragm motion are clearly visible.

Image Processing and Reconstruction. Following the data acquisition, the 4D MRI is reconstructed and the motion vectors \(\varvec{y}_t\) are computed. Given 2D abdominal US images \(U_t\), a low-dimensional respiratory motion surrogate is extracted using principal component analysis (PCA). By selecting only a small subset of principal components the model complexity is reduced. In order to cope with system latencies during dose delivery, it is important for the model to forecast the motion vectors into the future. Let \(\varvec{s}_t \in \mathbb {R}^k\) denote the standardised scores of the k most dominant principal components for image \(U_t\). We apply an element-wise autoregressive (AR) model of order p for the time series \(\{\varvec{s}_t\}_{t=1}^T\):

$$\begin{aligned} s_t^j = \theta _0^j + \sum _{i=1}^p \theta _i^j s_{t-i}^j + \epsilon _t \quad \forall j \in \{1, \ldots , k\}, \end{aligned}$$
(1)

where \(s_t^j\) is the jth element of \(\varvec{s}_t\), \(\varvec{\theta }^j = \begin{bmatrix} \theta _0^j&\theta _1^j \ldots&\theta _p^j \end{bmatrix}^T\) denotes the model parameters, and \(\epsilon _t\) is white noise. The parameters \(\varvec{\theta }^j\) are estimated using ordinary least squares. To predict the surrogate n steps ahead of time, the AR model in (1) is repeatedly applied.

Motion Modelling. In order for the motion model to capture non-linear relationships between the surrogates and the motion estimates, we formulate a cubic regression model. Let \(\varvec{x}_t \in \mathbb {R}^{3k+1}\) denote the input vector for the regression model which includes \(\varvec{s}_t\), its element-wise square and cube numbers, and a constant bias, i.e. \( \varvec{x}_t = \begin{bmatrix} 1&s_t^1&\ldots s_t^k&(s_t^1)^2&\ldots&(s_t^k)^2&(s_t^1)^3&\ldots&(s_t^k)^3 \end{bmatrix}^T. \) The motion model can thus be written as

$$\begin{aligned} \varvec{y}_t&= \varvec{\beta } \varvec{x}_t + \varvec{\epsilon }_t, \end{aligned}$$
(2)

with regression coefficients \(\varvec{\beta } \in \mathbb {R}^{d \times (3k+1)}\) and white noise \(\varvec{\epsilon }_t \in {\mathbb {R}^d}\). Given the pretreatment data \(\{\varvec{s}_t, \varvec{y}_t\}_{t=1}^T\), the model parameters \(\varvec{\beta }\) are again approximated in the least-squares sense.

3.2 Online Motion Prediction

Having computed both the AR parameters in (1), and the regression coefficients in (2), the inference during dose delivery is straightforward and computationally efficient. However, since the motion modelling and treatment planning is performed several days or weeks prior to the dose delivery, the US probe has to be reattached to the patients’ abdominal wall when they return for the treatment delivery. Although the location of the probe with respect to the patients chest can be marked by skin tattoos or similar approaches, it is hardly possible to recover the exact same imaging plane due to inter-fractional motions, anatomy changes, or different body positions with respect to the treatment couch [13]. The online US images can therefore not be projected onto the PCA basis directly, but a new principal component transformation has to be computed. We use the first minutes of US imaging after the patient has been setup for treatment as training data for recomputing a PCA basis. Since the first principal components capture the most dominant motion information and the scores \(\varvec{s}_t\) are standardised, we expect the signals to be comparable. Furthermore, the motion vectors \(\varvec{y}_t\) have to be warped in order to correspond to the present anatomy. This requires a 3D reference scan of the patients prior to treatment either using CT or MRI.

The surrogate signal \(\varvec{s}_t\) at time t is obtained by projecting the US image \(U_t\) onto the new PCA basis. Given the p latest surrogates \(\{\varvec{s}_{t-i}\}_{i=0}^{p-1}\), the signal \(\varvec{s}_{t+n}\) at time \(t+n\) is approximated by applying the AR model n times. Finally, the motion estimate \(\varvec{y}_{t+n}\) is computed given Eq. (2) and warped in order to match the actual patient position.

4 Experiments and Results

Data Acquisition. The proposed motion management pipeline was tested on 5 healthy volunteers. The 4D MRI sequence [4] was acquired on a 1.5 T MR-scanner (MAGNETOM Aera, Siemens Healthineers, Erlangen, Germany) under free respiration and with the following parameters: \(\text {TE}\) = 1.0 ms, \(\text {TR}\) = 2.5 ms, flip angle \(\alpha \) = 5\(^\circ \), bandwidth 1560 Hz px\(^{-1}\), isotropic pixel spacing 3.125 mm, image matrix \(128 \times 128 \times 88\) and field of view 400 \(\times 400\) \(\times \) 275 mm\(^{3}\) (in \(\text {LR} \times \text {SI} \times \text {AP}\)). The radius of \(C_t\) and \(H_t\) were set to 6 px and 5 px, respectively, resulting in 109 k-space points or 272.5 ms per center patch \(C_t\), and 69 k-space points or 172.5 ms per peripheral patch \(H_t\). The total acquisition time per subject was set to 11.1 min or \(T=1500\) center-peripheral patch pairs, \(P_t\). For the reconstruction of the 4D MRI, a sliding organ mask was created semi-automatically [14].

US imaging was performed simultaneously at \(f_{\mathrm {US}}\) = 15 Hz on an Acuson clinical scanner (Antares, Siemens Healthineers, Mountain View, CA). A specifically developed MR-compatible US probe was attached to the patient’s abdominal wall by means of a strap. The MRI and US systems were synchronised via optical triggers emitted by the MR scanner after every 6.675 s or 15 patch pairs \(P_t\). The optical signal triggered the US device to record a video for a duration of 5 s. The time gap of 1.675 s was chosen to compensate for the US system latency while storing the video file. As a consequence, however, 4 patch pairs \(P_t\) per trigger interval are not usable due to missing US images. Despite this time gap, it sporadically happened that the trigger signal occurred before the preceding video file was stored resulting in an omission of the video just triggered. The time delay between the MR trigger and the start of the US video was negligible.

For subjects 4 and 5, the US probe was removed and reattached after they had been standing for several minutes. The US imaging plane was visually matched with the preceding imaging plane as good as possible. The MR images were aligned based on diffeomorphic image registration of two end-exhalation master volumes and inverse displacement field warping [3, 10].

Table 1. Overview of the model settings and the respiratory motion characteristics for each subject s separately. The datasets with US probe repositioning are marked in grey.

Model Details. The first 8 US videos, corresponding to 200 images, were used to determine the AR parameters \(\varvec{\theta }\). The remaining data was split into a training and test set according to Table 1 in order to estimate \(\varvec{\beta }\) and validate the motion model performance, respectively. For each subject, the last 200 US/MR image pairs, or 133.5 s of data acquisition, were used for validation. For subject 2, however, a drastic change in respiratory motion characteristics was observed in the test set; the baseline motion more than doubled as compared to the training set. To take this observation into account, two test sets were created by dividing the last 267 s into equal parts. Below, the test set which includes deep respiratory motion, referred to as 2.2, is discussed separately. Table 1 shows the maximum and the baseline respiratory motion for each subject. The baseline motion \(\mu _{95}\) is defined as the 95th percentile of the deformation field magnitude averaged over all time points.

For subjects 4 and 5, the parameters \(\varvec{\theta }\) and \(\varvec{\beta }\) were estimated based on the primary dataset. After the probe repositioning, the first 270 US images were used for recomputing the PCA basis. For all the experiments, the number of principal components was set to \(k=3\), and an AR model of order \(p=5\) was built. The surrogate \(\varvec{s}_t\) was predicted \(n=2\) steps, or \(t_n = n/f_{\mathrm {US}}\) = 133 ms, into the future.

Fig. 2.
figure 2

Coronal cuts through sample end-inhalation volumes of volunteer 4 and 5 for both with and without repositioning. From left to right: master volume with the masked region marked in yellow, reference deformation field magnitude, predicted deformation field magnitude, and prediction error. (Color figure online)

Fig. 3.
figure 3

Respiratory motion and prediction error over time for volunteer 4 and 5. For illustration purposes, only the first 200 test samples after repositioning are shown.

Validation. The predicted deformation field \(\varvec{\hat{y}}_t\) was compared to the reference \(\varvec{y}_t\). We define the prediction error as the magnitude of the deformation field difference for the masked region including the lungs as well as parts of the liver and the stomach. Figure 2 exemplarily illustrates the organ mask for volunteer 4 and 5 on a coronal slice of the master volume. In addition, the reference and the predicted deformation field magnitude, and the prediction error are shown. Highest motion magnitudes are observed in the region of the diaphragm. As expected, the prediction errors are higher for both volunteers after repositioning. It can be further observed that the motion model has a tendency to underestimate the respiratory motion. For volunteer 5, this becomes more evident when comparing the respiratory motion characteristics in Table 1 or Fig. 3: the respiratory motion has substantially increased after repositioning and therefore cannot be predicted precisely. Additionally, an organ drift of about 2 mm is observed in volunteer 5 after repositioning if all 830 test samples are considered which further decreases the prediction accuracy. The highest prediction errors are found at the lung boundaries.

Figure 3 shows the mean prediction error and the respiratory motion for the first 200 test samples. The shaded area marks the 5th and 95th percentile of the prediction error. The respiratory motion is defined as the 95th percentile of the reference deformation field magnitude. Since an end-exhalation master volume was used for registration, in general higher prediction errors are observed at end-inhalation. However, despite the decreased performance of the motion model after repositioning, the mean prediction error is substantially lower than the respiratory motion for most time points.

Fig. 4.
figure 4

Prediction error distribution for all volunteers; without (white background) and with (grey background) US probe repositioning. The whiskers of the box plots extend to the most extreme values within 1.5 times the interquartile range.

The box plots in Fig. 4 show the distributions of both the mean prediction error and the 95th percentile computed for each time point. Without US probe repositioning, the mean error is less or equal to 3 mm for all subjects except for volunteer 4 where it shows an outlier at 3.5 mm. The 95th percentile reaches a maximum value of 7.0 mm for volunteer 4 while 95% of the prediction errors for subjects 1, 2.1, and 3 are smaller than 6.0 mm, 5.4 mm, and 5.2 mm, respectively. The last column in Fig. 4 shows the results for the second test set of subject 2 where the respiratory motion was more pronounced as compared to the training data. The maximum values for the mean prediction error and 95th percentile are 12.7 mm and 27.4 mm, respectively.

After US probe repositioning, the mean prediction error is below 8.0 mm and 6.0 mm for volunteers 4 and 5, respectively. There are, however, outliers of up to 14.5 mm for the 95th percentile of volunteer 4. By visual inspection of the prediction errors, it could be observed that these major discrepancies are located in the region of the stomach at the organ mask boundaries. In summary, the overall mean prediction error is 2.9 mm and 3.4 mm for volunteers 4 and 5, respectively.

5 Discussion and Conclusion

In this feasibility study we examined the performance of abdominal US surrogate signals in combination with a novel 4D MRI technique for lung motion estimation. The model predicts dense motion information 133 ms into the future which allows for system latency compensation. The obtained results are similar in terms of accuracy to those presented in previous studies [2, 9]. However, we additionally present preliminary findings for inter-fractional motion modelling which involves a repositioning of the US probe. Although the accuracy decreased when compared to intra-fractional modelling, overall mean prediction errors of 2.9 mm and 3.4 mm demonstrate that the proposed US surrogate signal is suitable even if the imaging plane is not identical for two fractions.

The presented results should, however, be treated with caution as the repositioning of the US probe has only been tested on two healthy volunteers and the time interval between the two measurements was in the range of minutes rather than days or weeks. Also, there exists no real ground-truth data for the respiratory motion. The reference deformation field might itself be corrupted due to registration errors. An additional error source is introduced with the alignment of the MR volumes between the two imaging sessions. Since this transformation was computed based on two exhalation master volumes, it might not be accurate for other respiratory states. Moreover, it was observed that the motion model does not generalise well if the respiration characteristics vary substantially as it was the case for subject 2. Although this limitation is inherent to the problem formulation and occurs in most motion models, it demands further investigations and characterisation. Also, further work is necessary to investigate the effect of dense motion predictions on treatment plan adaptations and dose distribution in proton therapy.