2024 12 17 628864v1 Full
2024 12 17 628864v1 Full
Author contributions: M. Lindmark conducted the simulation testing and the Breeding Bird case
study. Sean C. Anderson contributed to software writing, and wrote simulation tests. J. T. Thorson
derived statistical methods, wrote software, conducted the Bering Sea case study, and curated the
Breeding Bird case study data. All authors discussed and interpreted results, and wrote the paper
together.
Data Availability Statement: Code and data to reproduce the results are available on GitHub
upon publication.
1
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
1 Abstract
2 1. Species distribution models (SDMs) are widely used to standardize spatially unbalanced data,
3 project climate impacts, and identify habitat for conservation. SDMs typically estimate the
6 ecological responses integrate across local habitat conditions, such that the species density
8 2. To address this, we extend methods from the Stochastic Partial Differential Equation (SPDE)
9 method that is widely used in INLA, which approximates spatial correlations based on local
10 diffusion over a finite-element mesh (FEM). We specifically introduce the sparse inverse-
11 diffusion operator on a FEM, and apply this operator to covariates to efficiently calculate
12 a spatially weighted average of local habitat that is then passed through pointwise basis-
13 expansion to predict species densities. We show that this operator has several useful prop-
14 erties, i.e., conservation of mass, linear computational time with spatial resolution, and a
15 uniform stationary distribution, where the latter ensures that estimated responses are in-
17 3. We test this covariate-diffusion method using a simulation experiment, and show that it can
20 the eastern Bering Sea and 20 bird species in the western United States. This application con-
21 firms that non-local responses in the eastern Bering Sea case study are parsimonious for 25
22 species-maturity combinations, while 20 collapse to the null method. Estimates suggest that
23 some species-maturity combinations avoid proximity to the continental slope, beyond what
25 is the diffused human population density covariate more parsimonious than the original
26 covariate.
2
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
27 4. The covariate-diffusion method introduced here constitutes a fast and efficient approach to
28 modelling non-local covariate effects. This flexible method may be useful in cases when co-
29 variates influence nearby population densities, for instance due to movement of the sampled
31 Keywords: Species distribution models, geostatistical models, Gaussian Markov random fields,
32 diffusion, TMB, breeding bird survey, northeastern Bering Sea, spatial scale
3
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
33 Introduction
35 tors, and how that affects dynamics of ecological communities, is central to spatial ecology. At
36 local scales, the abundance of species and demographic processes are shaped by both local habi-
37 tat conditions, such as physical structure, competition and predation, and larger-scale processes
38 (Menge and Olson 1990). The latter could refer to e.g., temperature and climate indices such as
39 the NAO (Millon et al. 2014), and variables related to the dispersal pathways (Gómez-Pompa et al.
40 1972, Jonsson et al. 2016). Understanding how processes across spatial and temporal scales interact
41 to shape species distribution and community structure is an important area of research in these
42 times of rapid shifts in species distributions (Pinsky et al. 2013, Roberts et al. 2019, McCabe and
43 Cobb 2021).
44 Species distribution models (SDMs) fitted to local occurrence, count, or biomass data are key
45 tools in spatial ecology (Elith and Leathwick 2009). They can be used to quantify species’ distri-
46 bution, abundance and realized environmental niche and thereby be used to forecast range shifts
47 (Liu et al. 2023, Pinsky et al. 2018). Over time, there has been a trend towards larger data sets over
48 broader spatial and temporal scales (Rollinson et al. 2021). This has led to more power to detect ef-
49 fects and estimate functional relationships between covariates and responses, but also challenges
50 related to non-stationarity. Non-stationarity here refers to the situation where the relationship
51 between covariates and responses varies across space and/or time (Banerjee et al. 2014, Rollinson
52 et al. 2021). In regression-based SDMs, which is the focus of this study, this form of non-stationarity
53 can be accounted for by specifying effects of covariates that are allowed to evolve through time or
54 vary in space (Hastie and Tibshirani 1993, Bartolino et al. 2011, Thorson et al. 2023, Anderson et al.
55 2024). Some examples include allowing the association of bottom-dwelling fishes with depth to
56 change over time as they shift their distribution due to warming (English et al. 2022), and allowing
57 regional ocean condition indices to cause a density response that varies spatially (Lehodey et al.
4
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
59 Another challenge related to spatial non-stationarity that has received less attention is the
61 infer the relationship between habitat covariates and the response variable. However, the true
62 habitat an individual uses corresponds to the area it integrates via individual movement. Hence,
63 for sessile species, local covariates may be warranted, but as species mobility increases, local-
64 scale covariates would increasingly underestimate the habitat use in a typical scenario with a
65 limited sample size. One could average covariates (and/or the response) prior to fitting the model
66 to address that the relevant spatial scale that links covariates to the response is larger than the
67 observation scale (e.g., Lindmark et al. 2023, McKeon et al. 2024), or evaluate multiple scales and
68 find which best fits data (Bartolino et al. 2012). However, a limitation of this approach is that it
69 is impossible to know the optimal scale of aggregation beforehand, and the scale resulting in the
70 strongest effect does not necessarily mean it is the most relevant scale.
71 In this study, we introduce an approach that involves applying a diffusion operator to a covari-
72 ate within the SPDE framework. This allows us to estimate the optimal spatial scale for computing
73 a weighted average of a covariate, and can be thought of as a way to measure the effective “habitat
74 area” that individuals are integrating via movement. Using simulation testing, we show how this
75 covariate-diffusion model can correctly recover diffused covariate effects, or collapse to the raw
76 covariate, when no covariate diffusion is present. We then apply the covariate-diffusion model to
77 two real-world datasets on bottom-associated fishes and birds, and find that it is a parsimonious
78 model for more than half the species-maturity combinations in the fish case study, and approxi-
80 Methods
81 Covariate-diffusion
82 The Stochastic Partial Differential Equation (SPDE) method (Lindgren et al. 2011) is widely used to
83 define spatially correlated variables in statistical models. We briefly summarize the method here,
84 before discussing how our covariate-diffusion model arises as a novel reuse of the underlying math.
5
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
85 At the highest level, the SPDE method seeks to specify a Gaussian random field (GRF) Z, where
86 the value of this random field 𝑧𝑠 at a set of locations 𝑠 ∈ 𝐷 within a spatial domain 𝐷 follows a
89 function, 𝜅 is the decorrelation rate, 𝜈 is the smoothness parameter, and 𝜏 −2 is the pointwise vari-
90 ance. This covariance function then allows the GRF to be evaluated at a fixed set of locations as a
92 where V is the matrix of covariance among those locations. We could then calculate the value of
93 the GRF 𝑧 ∗ at a new location using bilinear interpolation, represented by a matrix A, 𝑧 ∗ = Az.
95 method then approximates this GRF as a Gaussian Markov random field (GMRF), i.e., by speci-
97 where evaluating the multivariate normal density function involves the precision matrix, and
98 hence can be directly calculated from Q without matrix inversion. Importantly, the sparse pre-
99 cision matrix can also be constructed directly using the SPDE method
Q = 𝜏 2 (𝜅 4 M0 + 2𝜅 2 M1 + M2 ), (4)
100 where M0 is a diagonal matrix, M1 has first-order adjacency within a triangulated mesh, and M2
101 has second-order adjacency. These three matrices are typically constructed by lower-level software
102 (e.g., the R package fmesher Lindgren 2023), and fitting this model does not require advanced
6
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
103 understanding of the model derivation. However, we here summarize the underlying theory to
105 In particular, the SPDE approximation to a GMRF is derived by specifying a diffusive process
106 (the partial differential equation from the method’s name) and a stochastic “shock” 𝜖 as a simulta-
108 where C̃ is a diagonal matrix and diag( C̃) is the volume of the linear basis functions centered at
109 each location and G is a sparse matrix representing the spatial overlap between basis functions
110 (i.e., is zero for nonadjacent locations). Subtracting the right-hand-side from the left yields
𝜅 2 C̃ + G z = 𝜖, (7)
111 and then dividing the left-hand-side across and expressing as a GMRF yields
112 Multiplying out the quadratic form for the precision matrix then results in the original expression
114 Having re-iterated the diffusion process that underlies the SPDE precision matrix, we now
−1
D−1 = 1 + 𝜅 2 C̃−1 C̃ + 𝜅 2 C̃ + G , (10)
116 where the inverse-diffusion D −1 has the same sparsity as G, which follows first-order adjacency.
7
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
118 1. Conservation of mass: Given a field approximated as vector z at the vertices of the finite-
119 element mesh, 𝑁 evenly spaced locations s that cover the domain, and bilinear interpolation
120 matrix A that projects to those 𝑁 locations, we can approximate the average value of the
121 field by predicting and then averaging across those locations 𝑥¯ = 𝑁 −1 1𝑇 Ax. Pre-multiplying
122 by the diffusion operator has (almost) no effect on this average mass, 1𝑇 Ax = 1𝑇 ADx, i.e.,
124 2. Invariance to centering or scaling: Given that we approximate diffusion using a linear op-
125 erator, we can apply a linear transformation to any vector z∗ = 𝑎 + 𝑏z, and this will result
126 in the same linear transformation of the diffused version Dz∗ = 𝑎 + 𝑏Dz. For example, if
127 we measure temperature in Celcius at a set of sites, convert to Farhenheit, and then apply
128 the diffusion operator, this will be equivalent to applying the diffusion operator and then
130 3. Efficient computation: The diffusion matrix D is “dense” (i.e., values become small but remain
131 nonzero even as distances become large), and hence the time to compute Dv scales as 𝑆 2
132 where 𝑆 is the number of sites (Fig. S1). However, we can instead calculate Dv efficiently
133 by first calculating a sparse LU decomposition of D−1 and then applying this to v. Doing
134 so works directly with the sparse matrix D−1 , and avoids computing or storing the dense
135 matrix D.
136 In the following, we therefore define a vector of covariate values x at the vertices of the finite-
137 element mesh, where the covariate 𝑥 ∗ is interpolated at a new location using the same interpolation
138 matrix 𝑥 ∗ = Ax. We then replace the covariate value 𝑥 ∗ for sample 𝑖 at location 𝑠𝑖 with its diffused
139 value ADx, and use ADx to predict local densities in a species distribution model. The effect of
140 applying the diffusion operator and its effect on the total mass of the covariate is visualized in
141 Fig. 1. We then estimate parameter 𝜅 (used to construct diffusion matrix D) simultaneously with
142 other regression coefficients representing habitat associations. As 𝜅 → ∞ in Eq. 10 then diffusion
143 D −1 → I and the diffused covariate collapses on its local value Dx = x. Alternatively, as 𝜅 → 0 then
144 Dx = 𝑐1 and the diffused covariate collapses on an constant value 𝑐. We are therefore interested in
8
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
145 intermediate values of 𝜅 where Dx represents the impact of covariates within the neighborhood
148 We used simulation testing to evaluate 1) how well the estimate for a diffused covariate could
149 be recovered under varying observation error, 2) how often marginal AIC favoured the correct
150 estimation model (diffusion or null model) in a self-and-cross experiment, 3) how well the diffusion
151 parameters could be recovered under varying strengths of diffusion, and 4) how well the diffusion
152 model can collapse to the null model (i.e, how well the diffusion model can match the null model
153 estimates when data are simulated without diffusion). We simulated 200 datasets from a Poisson
154 model with an observation-level random intercept in link space to allow for additional dispersion
155 beyond the 1:1 mean-variance of the Poisson — a lognormal Poisson. Parameters were largely taken
156 from a model fitted to counts of juvenile Pacific cod (Gadus macrocephalus) in a subsequent case
157 study, with a scaled depth covariate (subtracting the mean and dividing by the standard deviation).
158 Each data set contained 15592 spatially correlated observations, and for every dataset, a new GMRF
159 was simulated. For a more detailed description of the models we refer to the northeastern Bearing
161 In the first exercise (questions 1 and 2), we generated data by simulating from models without
162 and with covariate-diffusion. In the former, we set the intercept 𝛽 0 to -0.4, the linear effect of
163 the raw or diffused covariate 𝛽 𝑗 to -2.4, the scalar of the precision matrix log(𝜏) to -1.4, and the
164 decorrelation rate log(𝜅) to -0.8. For the diffusion model, we in addition set the strength of the
165 diffusion log(𝜅) to 2.5 (which corresponds to moderate diffusion). In both operating models, we
166 varied the observation level standard deviation, setting 𝜎𝜂 to 0.1, 1, and 2, and tested how well both
167 models could return the true depth coefficient, and how often marginal AIC favoured the correct
169 In the second exercise (questions 3 and 4), we simulated data from a diffusion model to evaluate
170 how well the true value of the diffused covariate could be retrieved, given varying strengths of the
171 diffusion, and how often marginal AIC favoured the correct operating model. We used the same
9
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
172 parameters for the diffusion model as above (𝜎𝜂 = 1), and set log(𝜅) to 0 (strong diffusion), 2.5
175 To illustrate a diffused covariate in practice, we also present two real-world case studies. In the
176 first case study, we use count data for 20 species from the US Breeding Bird Survey (Sauer et al.
177 1997) in the western United States (westward of Wyoming, Colorado, Montana, and New Mexico)
178 in 2019. We test if there is support for non-local effects of log human population density. This
179 could, for instance, indicate a response to urbanization affecting the habitat quality. For example,
180 outside densely populated areas there may still be large impacts of habitat due to infrastructure.
181 We use marginal AIC to determine if the more complex covariate-diffusion model is supported.
182 The second case study is based on bottom trawl survey data from the northeastern Bering Sea in
183 2019, collected by the NOAA Alaska Fisheries Science Center (AFSC) using a fixed station design.
184 Each trawled site contains information on catch in numbers of 45 species-maturity combinations.
185 Here we test if there is support for non-local effects of sea floor depth. A diffused depth effect
186 could for instance indicate a response to being near (but not actually on) the continental slope.
187 In both case studies, we modelled the counts at each site using a lognormal Poisson observation
𝜔 s ∼ MVN(0, 𝚺𝜔 ), (13)
189 where 𝜇 s,𝑡 represents the mean count, Xs,𝑡 is the design matrix, 𝛼𝑔 is an observation-level random
190 intercept, and 𝜔 s represents spatial random effects drawn from a Gaussian Markov random field
191 with inverse precision (i.e., covariance) matrix 𝚺𝜔 constrained by a Matérn covariance function.
192 We constructed meshes using the function fm mesh 2d() in the R package fmesher (Lindgren
193 2023), using a cutoff distance (minimum triangle edge length) of 0.1 degrees in the northeastern
10
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
194 Bering Sea case study (1295 knots) and 1 degree in the Breeding Bird Survey (170 knots) (Figs. S3–
195 S2).
197 We fit the SPDE-based spatial models in the simulation experiment and the case studies using the
198 R (R Core Team 2024) package TMB (Kristensen et al. 2016), with matrices in Eq. 5 constructed with
199 the R package fmesher (Lindgren 2023). Parameter estimation is done via maximum marginal
200 likelihood using the non-linear minimizer nlminb (R Core Team 2024).
201 Results
202 The covariate-diffusion estimation model is able to retrieve the true parameters accurately both
203 when the underlying model (“operating model”) generating the data had covariate-diffusion and
204 when it did not, since the diffusion model reverts to the sub model without covariate-diffusion as 𝜅
205 becomes large (Fig. 2). However when the operating model is a covariate-diffusion model, the null
206 estimation model leads to biased parameter estimates (Fig. 2a). We also find that marginal AIC
207 favours the covariate-diffusion model in >98% of cases when the operating model is covariate-
208 diffusion (Fig. 2a), and favours the null model in >94% when the operating model is null (Fig. 2b).
209 Neither the ability to retrieve the true parameter estimate nor the assignment based on marginal
210 AIC are affected by the observation error standard deviation given the ranges tested here (Fig. 2).
211 We also find that marginal AIC identifies the true operating model more frequently when the
212 strength of the diffusion is larger—86–98% of cases (Fig. 3a–b)—but for low diffusion, marginal
213 AIC favours the the null model (Fig. 3c). The covariate-diffusion model is able to retrieve the true
214 parameter value on average regardless of the strength of the diffusion, but the spread of individual
216 Our case studies show that covariate diffusion is supported to varying degrees in both bird
217 species and fish groups. In two of 20 bird species, covariate-diffusion is supported for the human
218 population density covariate as they have ΔAIC > 2 (Fig. 4a). In contrast, we find support for
11
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
220 the eastern Bering Sea case study on fishes (Fig. 4b). In both case studies, a ΔAIC = −2 indicates
221 that the covariate-diffusion and the null model have the same marginal log likelihood, and the
222 correlation between the raw and diffused covariate approaches 1 (Fig. 4).
223 A lower correlation between the diffused and raw covariate is typically found in species where
224 the covariate-diffusion model is supported (Fig. 4). For example, in the Breeding Bird case, the
225 covariate-diffusion model is not supported for the common starling (Sturnus vulgaris; top row) and
226 the diffused covariate (middle column) is nearly identical to the original covariate (left column),
227 while for black-headed grosbeak (Pheucticus melanocephalus; bottom row) the human population
228 density covariate is smoothed with a relatively strong diffusion (log 𝜅 = −8.07) (Fig. 5b). Similarly
229 for the example fishes from the northeastern Bearing Sea case, in adult Alaska pollock (Gadus
230 chalcogrammus), the depth covariate from the covariate-diffusion model is nearly identical to the
231 raw covariate, while for capelin (Mallotus villosus), the diffusion is strong (log 𝜅 = −7.44), and the
233 For several species-maturity combinations in the fish case study, there is a notable difference in
234 the partial effect of depth on density (Fig. S4). However, the covariate-diffusion model and the null
235 model often generate similar predictions, even in cases of strong diffusion, presumably because
236 the spatial random effects can change between the models (Figs. S5, S6).
237 Discussion
238 We have introduced a sparse inverse-diffusion operator based on the SPDE method, which can be
239 used to efficiently model non-local covariate effects such as to approximate the effective habitat
240 area that individuals integrate via movement. Specifically, when applied to a covariate, this oper-
241 ator calculates a spatially weighted average covariate given the estimated range of the diffusion
242 processes. With simulation testing, we have demonstrated that the diffusion model can correctly
243 identify the underlying processes model and estimate the density response to the diffused covari-
244 ate.
12
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
245 We then tested the approach on spatial models fitted to datasets for bird and fishes. Covariate-
246 diffusion was more parsimonious than the null model for only two of 20 bird species, but for a
248 species-maturity combinations in the fish case study, the partial effect of depth was smaller for the
249 covariate diffusion model than the null model near the continental shelf slope suggesting that these
250 groups avoid these habitats despite being of similar depths to other more inshore areas. Hence,
251 our approach could aid generating hypothesis as to what drives non-stationary in across space,
252 which is important for improving large-scale species distribution modeling (Rollinson et al. 2021).
253 Covariate-diffusion could result from any ecological teleconnections such that local ecological
254 properties are influenced by patterns happening at a broader scale. For example, fish move over
255 time and therefore their body condition (how plump they are given their length) may be affected
256 by the combination of habitat and spatially varying prey they encounter over their lifetime (Lind-
257 mark et al. 2023). Covariate diffusion could represent how this broader scale of conditions the
258 fish moved through might affect their body condition. Alternatively, a species may be stationary
259 with environmental process changing around them. For example, the number of eggs produced by
260 sessile clams may be influenced by environmental conditions as ocean currents move water past
261 the clams. Covariate diffusion could represent how this broader scale of experienced environment
263 We observe that predictions from the diffusion model and the null model tend to be similar.
264 While both the covariate and the estimate of its coefficient change when the diffusion model is
265 applied and supported, the predicted counts do not substantially differ between the two models,
266 partly because the spatial random effects also change. Which model to use then depends on the
267 objectives of the analysis—whether it is to learn about ecologically relevant scales of covariates and
268 non-local effects or if a model that generates similar predictions by placing additional variation
269 in the spatial random effects will suffice. Since we have also shown that the diffusion model can
270 revert to a non-diffused model in the absence of diffusion, the diffusion model can be applied even
271 though it is not known whether diffusion is supported a priori at little cost.
13
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
272 We recommend two topics for future research. The first is to augment our covariate-diffusion
273 model by incorporating advection, i.e., where local densities respond to environmental conditions
274 that are centered on a location that is geographically distant. This “covariate-advection” is fea-
275 sible using the SPDE method (Clarotto et al. 2023) and would presumably represent advective
276 movement, e.g., where densities during summer sampling respond to habitat conditions in a win-
277 ter habitat. Second, we note that covariate-diffusion collapses to an index of regionally averaged
278 conditions as diffusion becomes large. In this case, fitting a spatially varying coefficient (SVC)
279 (Hastie and Tibshirani 1993, Gelfand et al. 2003) response to the diffused covariate across multiple
280 years would allow a wide range of model behaviors, from a stationary and local response to a
282 We note several drawbacks to the covariate-diffusion approach. First, the approach replaces
283 the high-resolution covariate measured at each unique location with an interpolated value that is
284 defined at each vertex of the finite-element mesh. This mesh can be defined at a high resolution,
285 but still requires some loss of fine-scale variation. Second, although computationally efficient due
286 to working with the sparse inverse diffusion matrix, the approach is still more computationally
287 intensive than fitting a model without covariate diffusion. Third, the model requires users to define
288 covariate values not just for the location of samples, but at all locations across a given domain. This
289 results in a more-complex user interface than the regression models typically used for SDMs and
290 will therefore require some consideration before integrating into GMRF- and TMB-based SDM
291 software such as sdmTMB (Anderson et al. 2024) or tinyVAST (Thorson et al. 2024).
292 Despite these drawbacks, we conclude that covariate-diffusion using the SPDE method is com-
293 putationally efficient, statistically performant, and ecologically important for a wide range of
294 species. We therefore recommend that ecologists estimate non-local habitat responses across the
14
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
296 Funding
297 M.L. was supported by a research grant from the Swedish Research Council Formas (grant no.
299 References
300 Anderson, S.C., Ward, E.J., English, P.A., Barnett, L.A.K. and Thorson, J.T. (2024) sdmTMB: An R
301 package for fast, flexible, and user-friendly generalized linear mixed effects models with spa-
302 tial and spatiotemporal random fields. bioRxiv 2022.03.24.485545. URL https://doi.org/10.
303 1101/2022.03.24.485545.
304 Banerjee, S., Carlin, B.P. and Gelfand, A.E. (2014) Hierarchical Modeling and Analysis for Spatial
306 Bartolino, V., Ciannelli, L., Bacheler, N.M. and Chan, K.S. (2011) Ontogenetic and sex-specific
307 differences in density-dependent habitat selection of a marine fish population. Ecology 92,
309 https://onlinelibrary.wiley.com/doi/pdf/10.1890/09-1129.1.
310 Bartolino, V., Ciannelli, L., Spencer, P., Wilderbuer, T.K. and Chan, K.S. (2012) Scale-dependent
311 detection of the effects of harvesting a marine fish population. Marine Ecology Progress Series
313 Clarotto, L., Allard, D., Romary, T. and Desassis, N. (2023) The SPDE approach for spatio-temporal
315 Elith, J. and Leathwick, J.R. (2009) Species Distribution Models: Ecological Explanation and
316 Prediction Across Space and Time. Annual Review of Ecology, Evolution, and Systematics
15
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
319 English, P.A., Ward, E.J., Rooper, C.N., Forrest, R.E. et al. (2022) Contrasting cli-
320 mate velocity impacts in warm and cool locations show that effects of marine warm-
321 ing are worse in already warmer temperate waters. Fish and Fisheries 23, 239–
323 https://onlinelibrary.wiley.com/doi/pdf/10.1111/faf.12613.
324 Gelfand, A.E., Kim, H.J., Sirmans, C.F. and Banerjee, S. (2003) Spatial modeling with spatially
325 varying coefficient processes. Journal of the American Statistical Association 98, 387–396.
326 Gómez-Pompa, A., Vázquez-Yanes, C. and Guevara, S. (1972) The Tropical Rain Forest: A Non-
328 science.177.4051.762.
329 Hastie, T. and Tibshirani, R. (1993) Varying-Coefficient Models. Journal of the Royal Statistical
331 Jonsson, P.R., Corell, H., André, C., Svedäng, H. and Moksnes, P.O. (2016) Recent decline in
332 cod stocks in the North Sea–Skagerrak–Kattegat shifts the sources of larval supply. Fisheries
335 Kristensen, K., Nielsen, A., Berg, C.W., Skaug, H. and Bell, B.M. (2016) TMB: Automatic Differ-
336 entiation and Laplace Approximation. Journal of Statistical Software 70, 1–21. URL https:
338 Lehodey, P., Bertignac, M., Hampton, J., Lewis, A. and Picaut, J. (1997) El Niño Southern Oscil-
339 lation and tuna in the western Pacific. Nature 389, 715–718. URL https://www.nature.com/
340 articles/39575.
341 Lindgren, F. (2023) fmesher: Triangle Meshes and Related Geometry Tools. URL https://CRAN.
16
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
343 Lindgren, F., Rue, H. and Lindström, J. (2011) An explicit link between Gaussian fields and Gaussian
344 Markov random fields: the stochastic partial differential equation approach. Journal of the Royal
345 Statistical Society: Series B (Statistical Methodology) 73, 423–498. URL http://onlinelibrary.
346 wiley.com/doi/10.1111/j.1467-9868.2011.00777.x/abstract.
347 Lindmark, M., Anderson, S.C., Gogina, M. and Casini, M. (2023) Evaluating drivers of spatiotem-
348 poral variability in individual condition of a bottom-associated marine fish, Atlantic cod (Gadus
349 morhua). ICES Journal of Marine Science 80, 1539–1550. URL https://doi.org/10.1093/
350 icesjms/fsad084.
351 Liu, O.R., Ward, E.J., Anderson, S.C., Andrews, K.S. et al. (2023) Species redistribution creates
352 unequal outcomes for multispecies fisheries under projected climate change. Science Advances 9,
355 McCabe, L.M. and Cobb, N.S. (2021) From Bees to Flies: Global Shift in Pollinator Com-
356 munities Along Elevation Gradients. Frontiers in Ecology and Evolution 8. URL
357 https://www.frontiersin.org/journals/ecology-and-evolution/articles/10.3389/
359 McKeon, C.M., Buckley, Y.M., Moriarty, M., Lundy, M. and Kelly, R. (2024) Increased signal of
360 fishing pressure on community life-history traits at larger spatial scales. Global Ecology and
363 Menge, B.A. and Olson, A.M. (1990) Role of scale and environmental factors in regulation of com-
364 munity structure. Trends in Ecology & Evolution 5, 52–57. URL https://www.cell.com/trends/
366 Millon, A., Petty, S.J., Little, B., Gimenez, O., Cornulier, T. and Lambin, X. (2014) Damp-
367 ening prey cycle overrides the impact of climate change on predator population dynam-
368 ics: a long-term demographic study on tawny owls. Global Change Biology 20, 1770–
17
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
370 https://onlinelibrary.wiley.com/doi/pdf/10.1111/gcb.12546.
371 Pinsky, M.L., Reygondeau, G., Caddell, R., Palacios-Abrantes, J., Spijkers, J. and Cheung, W.W.L.
372 (2018) Preparing ocean governance for species on the move. Science 360, 1189–1191. URL
375 Pinsky, M.L., Worm, B., Fogarty, M.J., Sarmiento, J.L. and Levin, S.A. (2013) Marine Taxa Track
378 R Core Team (2024) R: A Language and Environment for Statistical Computing. R Foundation for
380 Roberts, C.P., Allen, C.R., Angeler, D.G. and Twidwell, D. (2019) Shifting avian spatial regimes
383 Rollinson, C.R., Finley, A.O., Alexander, M.R., Banerjee, S. et al. (2021) Working across space and
384 time: nonstationarity in ecological research and application. Frontiers in Ecology and the En-
386 fee.2298.
387 Sauer, J.R., Hines, J.E., Gough, G., Thomas, I. and Peterjohn, B.G. (1997) The north american breed-
388 ing bird survey results and analysis. Technical report, Eastern Ecological Science Center, Laurel,
389 MD.
390 Thorson, J.T. (2019) Measuring the impact of oceanographic indices on species dis-
391 tribution shifts: The spatially varying effect of cold-pool extent in the east-
392 ern Bering Sea. Limnology and Oceanography 64, 2632–2645. URL https:
394 https://aslopubs.onlinelibrary.wiley.com/doi/pdf/10.1002/lno.11238.
18
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
395 Thorson, J.T., Anderson, S.C., Goddard, P. and Rooper, C.N. (2024) tinyVAST: R package with an
396 expressive interface to specify lagged and simultaneous effects in multivariate spatio-temporal
397 models.
398 Thorson, J.T., Barnes, C.L., Friedman, S.T., Morano, J.L. and Siple, M.C. (2023) Spatially varying co-
399 efficients can improve parsimony and descriptive power for species distribution models. Ecogra-
19
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
402 Figures
Scaled
covariate
1.00
0.75
IID standard normal distributions
Figure 1: Applying the diffusion operator D when interpolating covariate x with interpolation
matrix A to vertices of the finite-element mesh largely conserves the mass across different values
Í Í
of 𝜅 (i.e., Ax ≈ ADx). In the top row, the diffusion is visualised for a single central point and
in the bottom row diffusion is applied to a vector of draws from IID standard normal distributions
to visualise diffusion on the full covariate field. Columns correspond to different 𝜅 values, from
large 𝜅 (low diffusion) to small 𝜅 (high diffusion). Note that the covariate values are scaled within
each kappa scenario to visualise the diffusion; the number in the top left corresponds to the total
mass of the covariate.
20
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
−1.5
0.98
Estimated depth coefficient
0.98
0.99
−2.5
−2.0
−2.5
−3.0
−3.0
−3.5
−3.5
Figure 2: The diffusion model can recover diffused covariate effects and collapse to the null model
in the absence of diffusion. Simulation testing the ability to recover the true estimated depth
coefficient for diffusion and null operating models (left and right, respectively), for diffusion and
null models (x-axis), for three levels of lognormal Poisson observation error 𝜎𝜂 (color). Each point
represents a fit from a simulated data set, black points and vertical lines correspond to the median,
50%, and 95% quantile range. Horizontal lines correspond to the true value. Numbers above vertical
bars correspond to the proportion of simulated datasets (n=200) assigned to the estimation model
(per value of 𝜎𝜂 ) based on marginal AIC.
21
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
0.03
Estimated depth coefficient
0.98 0.08
0.92
-2
-4
-6
Figure 3: As the diffusion declines (𝜅 increases, from left to right column), the difference between
estimated depth coefficients from the diffusion and null models decreases (from left to right). Each
point represents a fit from a simulated data set, black points and vertical lines correspond to the
median, 50%, and 95% quantile range. Horizontal lines correspond to the true value. Numbers
above vertical bars correspond to the proportion of simulated datasets (n=200) assigned to esti-
mation model based on AIC.
22
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
ΔAIC
Figure 4: Marginal AIC favours covariate-diffusion in more than half of fishes and two bird species.
The points depict delta marginal AIC between the null and diffusion model, where positive values
indicate support for the diffusion model and negative values indicate support for the null model.
Point colours correspond to the correlation between the raw and the diffused covariate. Points
in the grey rectangle have delta AIC>2, indicating strong support for the diffusion model. Points
within the two vertical dashed lines have inconclusive ΔAIC results. Letters in brackets in the
Eastern Bering sea fish case study refers to the life stage (j=juvenile, a=adult, ej = early juvenile),
except for P. camtschaticus, where Bb stands for Bristol bay. Note the x-axis is fourth-root power
transformed.
23
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
50°N Canada
45°N
log(κ)=7.82
Latitude
Pacific
35°N
Ocean
Mexico
30°N -2-1 0 1 2
125°W 120°W 115°W 110°W 105°W
Longitude
log(κ)=-8.07
-2-1 0 1 2 -2 0 2
Figure 5: Human population density covariate, a diffused version of the covariate, and predicted
counts from the breeding bird case study. Panel (a) depicts the raw human population density
covariate. Panel (b) depicts the diffused covariate for two species with contrasting support for
diffusion. Common starling (Sturnus vulgaris) in the top row does not show support for the diffused
covariate and the diffusion is estimated to be small whereas black-headed grosbeak (Pheucticus
melanocephalus) in the bottom row shows strong support for the diffused covariate and has a
relatively strong estimated diffusion. The strength of the diffusion (log(𝜅)) is shown towards
the bottom of the (b) panels, where a low value indicates strong diffusion. Panel (c) depicts the
predicted log counts for the two species.
24
bioRxiv preprint doi: https://doi.org/10.1101/2024.12.17.628864; this version posted December 20, 2024. The copyright holder for this preprint
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
available under aCC-BY 4.0 International license.
(a) Depth
66°N
Bering
Sea
64°N
62°N USA
0 1 2 3 4 3 4 5 6
Latitude
60°N
M. villosus M. villosus
58°N
56°N
54°N
0 1 2 3 4
Figure 6: Bottom depth covariate, a diffused version of the covariate, and predicted counts from
the northeastern Bering sea bottom trawl data. Panel (b) depicts the diffused covariate for two
species with contrasting support for diffusion. Adult Alaska pollock (Gadus chalcogrammus) in
the top row does not support the diffusion model and the diffused covariate is similar to the raw
covariate, while capelin (Mallotus villosus) in the bottom row shows strong support for the diffusion
model. The strength of the diffusion (log(𝜅)) is shown in the bottom-right corner of the (b) panels;
a low value indicates strong diffusion. Panel (c) depicts the predicted log counts for the two species
(values < 1% of the maximum density are omitted for visualization).
25