Keywords

1 Introduction

Thoracic motion estimation is central for the analysis of respiratory dynamics or the physiology of abdominal organs as for example the lung. It is usually performed by non-rigid registration of images captured at different time points e.g. at an inhalation and an exhalation state. A main challenge which arises in this scenario are organs which slide along each other causing discontinuous changes in correspondence. At sliding organ boundaries, therefore, a high degree of freedom is required to express discontinuities in the spatial mapping. However, this is opposed to within organ regions where smooth deformations are presumed, which are usually achieved by reducing the degrees of freedom of the admissible transformations.

In this paper, we integrate a low-dimensional statistical motion model (SMM) as transformation model into the registration which already accounts for the discontinuous correspondence changes. The idea is that the SMM is built out of empirical motion fields, from exhalation to inhalation state, which are derived in a controlled semi-automatic setup where for example landmarks and image masks are applied in order to deal with discontinuities. The SMM is brought into correspondence with the subject of interest where no landmarks or masks are available. Thus, the learned motion patterns containing the characteristic discontinuities at sliding organ boundaries can be transferred to the subject of interest to finally perform the registration.

Discontinuity preserving registration approaches have gained increasing attention in literature starting from semi-automatic approaches [13] where moving organs are segmented and separately registered, to approaches with image-dependent inhomogeneous smoothness priors [5, 9] or approaches with sparse regularizers [14, 15], and motion segmentation approaches [12]. None of the approaches considers statistical knowledge about the respiratory motion.

In [3, 7, 11], PCA-based motion models are proposed for mean-motion based diagnosis and model-based shape prediction. In such models, each transformation lies within the linear span of the empirical motion patterns. In [8], localized and bias reduced statistical models were introduced with the focus on inter-subject registration. However, these richer models need to be approximated by an orthogonal basis in order to be fitted to the images. As the eigenvalues slowly decrease when modeling local deformations such an approximation becomes infeasible and the number of basis functions to store exceeds standard memory capacities.

The contribution of this paper is the integration of an SMM as reproducing kernel into image registration. In the registration, only correlations between image points are considered which allows to localize the SMM and to reduce an over-restrictive model bias without the need of a model basis approximation.

2 Background

In this section, we recap the kernel-framework for image registration which was elaborated in [5, 6] and borrow the notation used therein. Given a reference and target image which map the d-dimensional input domain to intensity values, and given a spatial mapping which transforms the reference coordinate system, image registration is performed by optimizing

$$\begin{aligned} \mathop {\hbox {arg min}}\limits _u \int _\mathcal {X} \mathcal {L}\left( I_R\left( x+u\left( x\right) \right) ,I_T\left( x\right) \right) dx + \eta \mathcal {R}[u], \end{aligned}$$
(1)

where \(\mathcal {L}\) is a loss-function which quantifies the matching between the transformed reference and the target image, \(\mathcal {R}\) is a regularization term which enforces additional criteria on u and \(\eta \) is a trade-off parameter. As transformation model a reproducing kernel Hilbert space (RKHS) is defined

(2)

where is a reproducing kernel and \(\Vert \cdot \Vert _\mathcal {H}\) is the RKHS norm. For more details about kernel methods we refer to [4]. In [5], the existence of a finite dimensional solution to Eq. 1 was shown applying a regularization term operating solely on the finite many parameters \(c:=\{c_i\}_{i=1}^N\)

$$\begin{aligned} \mathop {\hbox {arg min}}\limits _{u\in \mathcal {H}}\sum _{i=1}^N\mathcal {L}\Bigg (I_R\Bigg (x_i+\sum _{j=1}^N k(x_i,x_j)c_j\Bigg ),I_T(x_i)\Bigg )+\eta \cdot g\left( p\left( c\right) \right) , \end{aligned}$$
(3)

for N pair-wise distinct sampled domain points \(x_i\) and a regularizer comprising a strictly increasing function and a function which is weakly semi-continuous and bounded from below. Examples are the non-informative regularizer \(\mathcal {R}_2\) or the homogeneity favoring radial differences regularizer \(\mathcal {R}_{rd}\)

$$\begin{aligned} \mathcal {R}_2 = \sum _i \Vert c_i \Vert _2,\quad \mathcal {R}_{rd} = \sum _{i,j} \Vert c_i-c_j\Vert ^2 k(x_i,x_j). \end{aligned}$$
(4)

3 Method

In the following, we distinguish between correspondence fields which match images of different subjects and motion fields which match exhalations and inhalation images of the same subject. We first formulate a model of motion fields and afterwards we need the correspondence fields for building the SMM (see Sect. 3.2).

3.1 Statistical Motion Model

Suppose we are given some sample transformations \(F:=\{f_i\}_{i=1}^n\) which are in correspondence and known to be useful for the registration of exhalation and inhalation images. Based on the central limit theorem, we model F by assuming a Gaussian process over the transformations \(f_i\). We estimate the mean function and the matrix-valued covariance function

$$\begin{aligned} \mu _F(x) = \frac{1}{n}\sum _{i=1}^n f_i(x),\quad k_F(x,y) = \frac{1}{n-1}\sum _{i=1}^n (f_i - \mu _F)(x)(f_i-\mu _F)(y)^T. \end{aligned}$$
(5)

We adjust the transformation model as follows

$$\begin{aligned} f(x) = \mu _F(x) + \sum _{i=1}^N k_F(x,x_i)c_i. \end{aligned}$$
(6)

Thus, the transformation model for the motion estimation yields transformations f which are linear combinations of the sample transformations at a point x.

Note that the complexity of Eq. 3 is \(\mathcal {O}(N^2)\) kernel evaluations which makes the optimization problem computationally intensive for 3d medical images. In addition, the evaluation of \(k_F\) requires a sum over all samples \(f_i\).

Dimensionality Reduction. To reduce the sum in \(k_F\) we rewrite the kernel in its Mercer’s expansion

$$\begin{aligned} k_F(x,y) = \sum _{i=1}^{\infty }\lambda _i\phi _i(x)\phi _i(y)^T, \end{aligned}$$
(7)

where \(\lambda _i\ge \lambda _{i+1}\ge 0\) and \(i>n\Leftrightarrow \lambda _i=0\). The basis functions \(\phi _i\) are orthonormal. We approximate the kernel in Eq. 7 by truncating the sum

$$\begin{aligned} k_\mathcal {M}(x,y) = \sum _{i=1}^{p} \psi _i(x)\psi _i(y)^T, \end{aligned}$$
(8)

where \(\psi _i=\sqrt{\lambda _i}\phi _i\) and \(p = \max \{i\vert \lambda _i>\theta \}\). In Eq. 7, \(\lambda _i\) and \(\phi _i\) are the eigenvalue/eigenfunction pairs of the Hilbert-Schmidt integral operator of \(k_F\). Thus, the basis functions \(\psi _i\) are the principal modes of variation of the sample F. The amount of variation kept by considering p basis functions is therefore maximal when using the first p orthogonal functions \(\psi _i\).

Locality. The SMM kernel \(k_\mathcal {M}\) has infinite support. That means, for each xy pair, \(k_\mathcal {M}\) yields a possibly non-zero value. In the following, we damp the correlation between two points with respect to the Euclidean distance between them in order to reduce the support range. Using the Wendland kernel [6]

$$\begin{aligned} k_W(x,y) = \omega _{3,2}\left( \frac{\Vert x-y\Vert }{\sigma }\right) ,\quad \omega _{3,2}(r) = (1-r)^6_+ \frac{3+18r+35r^2}{1680} \end{aligned}$$
(9)

with \(a_+=\max (0,a)\) and \(\sigma >0\) which is a compactly supported kernel we derive

$$\begin{aligned} k(x,y) = \sigma _\mathcal {M}k_\mathcal {M}(x,y) \cdot \sigma _\omega k_W(x,y)+\sigma _s\mathbf {I}_{d\times d} k_W(x,y) \end{aligned}$$
(10)

with the d-dimensional identity matrix \(\mathbf {I}\) and scaling parameters \(\sigma _\mathcal {M}>0,\sigma _\omega >0,\sigma _s\ge 0\). The effect of this manipulation (Eq. 10) to the SMM \(k_\mathcal {M}\) is two-fold. First, the quadratic complexity can be overcome since k is now compact with a support \(\sigma \), and second the model is enhanced in a way that f is no longer in the strict linear span of the samples. Nonetheless, it is locally a linear combination of the samples (when setting \(\sigma _s=0\)).

With a small sample size n, even a localized model tend to be over-restrictive. In order to reduce this restrictive model bias, we add a Wendland kernel in Eq. 10 where the scale can be controlled with \(\sigma _s\).

Scaling. If we zero-out correlation values \(k_\mathcal {M}(x,y)\) the remaining scale of the transformation f is damped as well. Therefore, the scaling factors \(\sigma _\mathcal {M},\sigma _\omega \) have to be chosen appropriately

$$\begin{aligned} \sigma _\mathcal {M} := \sum _{i=1}^N \Vert k_\mathcal {M}(x_i,x_i)\Vert _F,\quad \sigma _\omega :=\Bigg \{\frac{34650}{4\pi \sigma ^3}~\text {if}~d=3,~\frac{10080}{2\pi \sigma ^2}~\text {if}~d=2\Bigg \}, \end{aligned}$$
(11)

where \(\Vert \cdot \Vert _F\) is the Frobenius norm. The scale \(\sigma _\mathcal {M}\) is a heuristic estimate of the expected scale of the transformation. The scale of the Wendland kernel \(\sigma _\omega \) is chosen such that it integrates to one within its support. The Wendland kernel thus acts as a weighted average of \(k_\mathcal {M}\).

3.2 Model Construction

The goal in this paper is to finally guide the motion estimation for a subject of interest \(S_j\) with an SMM built from motion fields of other subjects \(S_{i}\) with \(i\ne j\). The motion fields \(f_i\) have to be in correspondence with \(S_j\) in order to be comparable and thus for actually building the SMM. In Fig. 1, the relation between the different subjects is illustrated.

Fig. 1.
figure 1

Relation between four subject in order to construct a statistical motion model for subject \(S_3\).

Fig. 2.
figure 2

Spatial transformation of the motion field f by a correspondence field u where \((f\circ u)(x_1):=f(x_1)+(u(x_2)-u(x_1))\). In the discrete case, a backward-warp need to be performed which requires the inverse transformation \(u^{-1}\).

Let an exhalation and inhalation image \(I^E,I^I\) be given for each subject. Furthermore, let the sample motion fields \(f_i\) be derived in a controlled setup. That means, they can be semi-automatically derived by registration of \(I^E_i\) and \(I^I_i\) including manual ground truth landmarks and image masks etc. The correspondence to the subject \(S_j\) is now derived by registration of the exhalation images \(I^E_i\) to the exhalation image \(I^E_j\) yielding the correspondence fields \(u_i\). Having given the correspondence fields \(u_i\), the motion fields \(f_i\) can be warped to the coordinate system of \(S_j\). Note that for a motion field warp the inverse of the correspondence field is needed (see Fig. 2). In our case, we approximate the inverse correspondence field with the fixed-point iteration proposed in [2].

4 Experiments

We tested our method on the DirlabFootnote 1 data set [1] comprising 10 subjects with an inhalation/exhalation 3d CT image of the thorax each. For evaluation, 300 ground truth landmarks are provided. We use the leave-one-out setup shown in Fig. 1. The exhalation images \(I^E_i\) are first brought into correspondence with \(I^E_j\) in three steps. First, the rib cages are threshold segmented at 1150 HU of smoothed versions of \(I^E\) and rigidly registered using the dice coefficient as image metric. Second, the rib cage segmentations are dilated and non-rigidly pre-registered using Eq. 3 applying again the dice metric, no regularization and a Wendland kernel \(k_W\). Finally, the images are non-rigidly registered using Eq. 3 applying the normalized cross-correlation (NCC) metric and the regularizer \(\mathcal {R}_{rd}\), again with \(k_W\). In this step, we cropped the images to a region of interest and used threshold segmented body masks to exclude the background.

Fig. 3.
figure 3

Coronal slice through subject 3. From left to right: magnitude of SMM mean and SMM transformation after optimizing level 1. Final registration result and the warped image as background.

The sample motion fields \(f_i\) are derived on three scale levels again using Eq. 3 with the NCC metric, the \(\mathcal {R}_{rd}\) as regularizer and \(k_W\). Additionally, a landmark cost-term was added in order to guide the registration with the 300 landmarks. Semi-automatically derived lung masks are used to consider only lung regions in the image metric.

The semi-automatically derived \(f_i\) are warped by the fully automatically derived \(u_i\) in order to build the SMM. Finally, the exhalation/inhalation images \(I^E_j,I^I_j\) are non-rigidly registered using Eq. 3, applying the localized and bias reduced kernel k of Eq. 10 and the non-informative regularizer \(\mathcal {R}_2\). Again, three scale levels where used, where k is applied only on the first level. On the remaining levels \(k_W\) is used. We empirically set \(\eta =\{{1}\mathrm {e}{-7},{1}\mathrm {e}{-6},{1}\mathrm {e}{-6}\}\), \(\sigma =\{100,80,40\}\) and \(\sigma _S={2}\mathrm {e}{-3}\) and used the same values for all cases. The orthogonal basis \(\psi _i(x)\) is numerically derived using the Singular Value Decomposition of the sample data matrix A where \(a_{ij}=f_j(x_i)\). For optimizing Eq. 3, we perform averaged stochastic gradient descent [10] on the analytically derived derivative.

In Fig. 3, an example of a mean transformation, an SMM registration (only first level) and a final registration result are shown. A clear discontinuous change in the motion field can be identified between the thoracic cavity and the lung. In Table 1, quantitative measures are provided. This experiment shows that our method achieves reasonable registration results which are on average 0.5 mm close to the intra-observer error (IOE). Since the Maxwell-Boltzmann (MB) distribution is more appropriate to model TREs, we additionally provide the expected TRE and variance of a fitted MB distribution. A complete comparison with the Dirlab benchmark considering the full landmark sets remains.

Table 1. Expected TRE [mm] of 300 landmarks. IOE: intra-observer error (on all landmarks) taken from [1]. Dirlab: best performing results in snap-to-voxel (sv) TRE, where no masking was used and the TRE was evaluated on 300 landmarks (13.2.2017). The results of our method are listed in the right three columns.

5 Conclusion

We presented a method for modeling statistical knowledge about motion patterns which can be integrated into image registration in order to estimate thoracic motion. In contrast to standard linear motion models our model is formulated as a reproducing kernel and integrated in the kernel framework for image registration. This allows to apply localized and bias reduced SMMs without the need of a basis approximation. With the leave-one-out models which we applied to the Dirlab data set, we presented an example of how such SMMs can be built and that they achieve reasonable registration performance. We think that our method opens the possibility for other types of SMMs which are built e.g. in a group-wise manner.