Abstract
We prove asymptotically efficient inference results concerning an Ornstein–Uhlenbeck regression model driven by a non-Gaussian stable Lévy process, where the output process is observed at high frequency over a fixed period. The local asymptotics of non-ergodic type for the likelihood function is presented, followed by a way to construct an asymptotically efficient estimator through a suboptimal, yet very simple preliminary estimator.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
1.1 Objective and background
Given an underlying filtered probability space \((\Omega ,\mathcal {F},(\mathcal {F}_t)_{t\in [0,T]},P)\), we consider the following Ornstein–Uhlenbeck (OU) regression model
where J is the symmetric \(\beta \)-stable (càdlàg) Lévy process characterized by
and is independent of the initial variable \(Y_0\), and where \(X=(X_{t})_{t\in [0,T]}\) is an \(\mathbb {R}^{q}\)-valued non-random càdlàg function such that
with \(\lambda _{\min }(A)\) denoting the minimum eigenvalue of a square matrix A. Throughout, the terminal sampling time \(T>0\) is a fixed constant. Let
where \(\Theta \subset \mathbb {R}^{p}\) (\(p:=q+3\)) is a bounded convex domain such that its closure \(\overline{\Theta }\subset \mathbb {R}\times \mathbb {R}^q \times (0,2)\times (0,\infty )\). The primary objective of this paper is the asymptotically efficient estimation of \(\theta \) when available data is \((X_t)_{t\in [0, T]}\) and \((Y_{t_{j}})_{j=1}^n\), where \(t_j=t_j^n:=jh\) with \(h=h_n:=T/n\); later on, we will consider cases where we observe \((X_{t_{j}})_{j=1}^n\) instead of the full continuous-time record. We will denote the true value of \(\theta \) by \(\theta _0=(\lambda _0,\mu _0,\beta _0,\sigma _0)\in \Theta \).
Analysis of the (time-homogeneous) OU process driven by a stable Lévy process goes back to Doob (1942), where Doob treated the model in a genuinely analytic manner without Itô’s formula, which has not been published as yet at that time. Nowadays, the OU models have been used in a wide variety of applications, such as electric consumption modeling (Perninge et al. 2011; Borovkova and Schmeck 2017; Verdejo et al. 2019), ecology (Jhwueng and Maroulas 2014), and protein dynamics modeling (Challis and Schmidler 2012), to mention a few.
The model (1.1) may be seen as a continuous-time counterpart of the simple first-order ARX (autoregressive exogenous) model. Nevertheless, any proper form of the efficient estimation result has been missing in the literature, probably due to the lack of background theory to estimate all the parameters involved under the bounded-domain infill asymptotics. Let us note that, when J is a Wiener process (\(\beta =2\)), the drift parameters are consistently estimable only when the terminal sampling time tends to infinity, and the associated statistical experiments are known to possess essentially different properties according to the sign of \(\lambda \). That is to say, the model is: locally asymptotically normal for \(\lambda >0\) (ergodic case); locally asymptotically Brownian functional for \(\lambda =0\) (unit-root case); locally asymptotically mixed normal (LAMN) for \(\lambda <0\) (non-ergodic (explosive) case). Turning back to the stable driven case, we should note that the least-squares type estimator would not work unless \(T_n\rightarrow \infty \), as is expected from Hu and Long (2009) and Zhang and Zhang (2013); there, the authors proved that (when \(\beta \) is known) the rate of convergence when \(\lambda >0\) equals \((T_n/\log n)^{1/\beta }\) and the asymptotic distribution is given by a ratio of two independent stable distributions.
1.2 Contributions in brief
First, in Sect. 2, we will show that the model is locally asymptotically mixed normal (LAMN) at \(\theta _0\in \Theta \), and also that the likelihood equation has a root that is asymptotically efficient in the classical sense of Hajék-Le Cam-Jeganathan. The asymptotic results presented here are uniformly valid in a single manner over any compact subset of the parameter space \(\Theta \). In particular, the sign of the autoregressive parameter \(\lambda _0\) does not matter, revealing that (i) the results can be described in a unified manner regardless of whether the model is ergodic or not, and also that (ii) the conventional unit-root problem (see Samarakoon and Knight (2009) and the references therein) is not relevant here at all; this is in sharp contrast to the case of ARX time-series models and the Gaussian OU models. Besides, in Sect. 3, we will provide a way to provide an asymptotically efficient estimator through a suboptimal, yet very simple preliminary estimator, which enables us to bypass not only computationally demanding numerical optimization of the likelihood function involving the \(\beta \)-stable density, but also possible multiple-root problem (Lehmann 1999, Section 7.3).
2 Local likelihood asymptotic
2.1 Preliminaries and result
Let \(P_\theta \) denote the image measure of (J, Y) associated with the value \(\theta \in \Theta \). We show the non-trivial stochastic expansion of the log-likelihood ratio of \(P_{\theta +\varphi _{n}(\theta )v_n,n}^Y\) with respect to \(P_{\theta ,n}^Y\) for appropriate norming matrix \(\varphi _n(\theta )\) introduced later and bounded sequence \((v_n)\subset \mathbb {R}^p\), where \(P_{\theta ,n}^Y\) stands for the restriction of \(P_\theta \) to \(\sigma (Y_{t_j}:\,j\le n)\). The distribution \(\mathcal {L}(Y_0)\) may vary according to \(\theta \); we will assume that for any \(\epsilon >0,\) there exists an \(M>0\) such that \(\sup _{\theta \in \overline{\Theta }} P_\theta [|Y_0|\ge M]<\epsilon \).
Let \(\phi _\beta \) denote the \(\beta \)-stable density of \(J_1\): \(P_\theta [J_1\in dy]=\phi _\beta (y)dy\). It is known that \(\phi _{\beta }(y)>0\) for each \(y\in \mathbb {R}\), that \(\phi _{\beta }\) is smooth in \((y,\beta )\in \mathbb {R}\times (0,2)\), and that for each \(k,l\in \mathbb {Z}_{+}\),
See DuMouchel (1973) for details. Here, we write \(\partial ^{k}\partial _{\beta }^{l}\phi _{\beta }(y):=(\partial ^{k}/\partial y^k)(\partial ^l/\partial \beta ^{l}) \phi _{\beta }(y)\); analogous notation for the partial derivatives will be used in the sequel.
To proceed, we need to introduce further notation. Any asymptotics will be taken for \(n\rightarrow \infty \) unless otherwise mentioned. We denote by \(\rightarrow _{u}\) the uniform convergence of non-random quantities concerning \(\theta \) over \(\overline{\Theta }\). We write C for positive universal constant which may vary at each appearance, and \(a_{n}\lesssim b_{n}\) when \(a_{n}\le C b_{n}\) for every n large enough. Given positive functions \(a_{n}(\theta )\) and \(b_{n}(\theta )\), we write \(b_n(\theta )=o_u(a_n)\) and \(b_n(\theta )=O_u(a_n)\) if \(a_n^{-1}b_n(\theta )\rightarrow _u 0\) and \(\sup _\theta |a_n^{-1}b_n(\theta )| =O(1)\), respectively. The symbol \(a_{n}(\theta ) \lesssim _{u} b_{n}(\theta )\) means that \(\sup _{\theta }|a_{n}(\theta )/b_{n}(\theta )| \lesssim 1\). We write \(\int _j\) instead of \(\int _{t_{j-1}}^{t_j}\).
By integrating by parts applied to the process \(t\mapsto e^{\lambda t}Y_t\), we obtain the explicit càdlàg solution process: under \(P_\theta \),
For \(x,\lambda \in \mathbb {R}\), we write
The basic property of the Lévy integral and the fact that \(\log E_\theta [e^{iu J_1}]=-|u|^\beta \) give
Hence,
Now, the exact log-likelihood function \(\ell _n(\theta )=\ell _n\left( \theta ;\,(X_t)_{t\in [0,T]},(Y_{t_j})_{j=0}^n\right) \) is given by
We introduce the non-random \(p\times p\)-matrix
where the real entries \(\varphi _{kl,n}=\varphi _{kl,n}(\theta )\) are assumed to be continuously differentiable in \(\theta \in \Theta \) and to satisfy the following conditions for some finite values \(\overline{\varphi }_{kl}=\overline{\varphi }_{kl}(\theta )\):
The matrix \(\varphi _n(\theta )\) will turn out to be the right norming with which \(u \mapsto \ell _{n}\left( \theta +\varphi _{n}(\theta )u\right) - \ell _{n}\left( \theta \right) \) under \(P_\theta \) has an asymptotically quadratic structure in \(\mathbb {R}^{p}\); see Brouste and Masuda (2018) and Clément and Gloter (2020) for the related previous studies. Note that \(\sqrt{n}h_n^{1-1/\beta }\rightarrow _{u}\infty \) and \(|\varphi _{21,n}(\theta )|\vee |\varphi _{22,n}(\theta )| \lesssim \log (1/h)\). By the same reasoning as in Brouste and Masuda (2018, page 292), we have \(\inf _{\theta }|\varphi _{11,n}(\theta )\varphi _{22,n}(\theta ) - \varphi _{12,n}(\theta )\varphi _{21,n}(\theta )| \gtrsim 1\) and \(|\varphi _{n}(\theta )| \rightarrow _{u} 0\) under (2.6).
Let
and define the block-diagonal random matrix
where, for a random variable \(\epsilon {\mathop {\sim }\limits ^{P_{\theta }}} \phi _\beta (y)dy\) and by denoting by \(A^\top \) the transpose of matrix A,
Note that \(\mathcal {I}(\theta )\) does depend on the choice of \(\overline{\varphi }(\theta )=\{\overline{\varphi }_{kl}(\theta )\}\); if \(\overline{\varphi }(\theta )\) is free from \((\lambda ,\mu )\), then so is \(\mathcal {I}(\theta )\).
Also, we note that \(\mathcal {I}(\theta ) >0\) (\(P_\theta \)-a.s., \(\theta \in \Theta \)) under (1.2). Indeed, it was verified in Brouste and Masuda (2018, Theorem 1) that \(\mathcal {I}_{\beta ,\sigma }(\theta )>0\) a.s. To deduce that \(\mathcal {I}_{\lambda ,\mu }(\theta )>0\) a.s., we note that \(\int _0^T Y^2_t dt>0\) a.s. and that, by Schwarz’s inequality,
for every nonzero \(u\in \mathbb {R}^q\), since for any constant real \(\xi \) we have \(Y\ne (u\cdot X)\xi \) a.s. as functions on [0, T]. Apply the identity \(\det \begin{pmatrix} A &{} B^\top \\ B &{} C\end{pmatrix}=\det (A)\det (C-BA^{-1}B^\top )\) to conclude the \(P_\theta \)-a.s. positive definiteness of \(\mathcal {I}(\theta )\).
The normalized score function \(\Delta _n(\theta _0)\) and the normalized observed information matrix \(\mathcal {I}_n(\theta _0)\) are given by
respectively. Let \(MN_{p,\theta }(0,\mathcal {I}(\theta )^{-1})\) denote the covariance mixture of p-dimensional normal distribution, corresponding to the characteristic function \(u\mapsto E_\theta \big [\exp (-u^\top \mathcal {I}(\theta )^{-1}u/2)\big ]\). Finally, we write \(M[u]=\sum _i M_i u_i\) for a linear form \(M=\{M_i\}\) and similarly \(Q[u,u]=Q[u^{\otimes 2}]=\sum _{i,j}Q_{ij}u_i u_j\) for a quadratic form \(Q=\{Q_{ij}\}\). Now, we are ready to state the main claim of this section.
Theorem 2.1
The following statements hold for any \(\theta \in \Theta \).
-
(1)
For any bounded sequence \((v_n)\subset \mathbb {R}^p\), it holds that
$$\begin{aligned} \ell _{n}\left( \theta +\varphi _{n}(\theta )v_n\right) - \ell _{n}\left( \theta \right) = \Delta _n(\theta )[v_n] - \frac{1}{2} \mathcal {I}_n(\theta ) [v_n,v_n] + o_{P_{\theta }}(1), \nonumber \end{aligned}$$where we have the convergence in distribution under \(P_\theta \): \(\mathcal {L}\left( \Delta _{n}(\theta ), \, \mathcal {I}_n(\theta ) |P_\theta \right) \Rightarrow \mathcal {L}\left( \mathcal {I}(\theta )^{1/2}Z,\, \mathcal {I}(\theta ) \right) \), where \(Z\sim N_{p}(0,I)\) is independent of \(\mathcal {I}(\theta )\), defined on an extended probability space.
-
(2)
There exists a local maximum point \(\hat{\theta }_{n}\) of \(\ell _{n}(\theta )\) with \(P_\theta \)-probability tending to 1 for which
$$\begin{aligned} \varphi _{n}(\theta )^{-1}(\hat{\theta }_{n}-\theta ) = \mathcal {I}_{n}(\theta )^{-1}\Delta _n(\theta ) + o_{P_\theta }(1) \Rightarrow MN_{p,\theta }\left( 0,\, \mathcal {I}(\theta )^{-1} \right) . \nonumber \end{aligned}$$
It is worth mentioning that the particular non-diagonal form of \(\varphi _n(\theta )\) is, as in Brouste and Masuda (2018), inevitable to deduce the asymptotically non-degenerate joint distribution of the maximum-likelihood estimator (MLE), the good local maximum point \(\hat{\theta }_{n}\) in Theorem 2.1(2).
Remark 2.2
Here are some comments on the model timescale.
-
(1)
We fix the terminal sampling time T, so that the rate of convergence \(\sqrt{n}h^{1-1/\beta }=n^{1/\beta -1/2}T^{1-1/\beta }=O(n^{1/\beta -1/2})\) for \((\lambda ,\mu )\). If \(\beta >1\) (resp. \(\beta <1\)), then a longer period would lead to a better (resp. worse) performance of estimating \((\lambda ,\mu )\). The Cauchy case \(\beta =1\), where the two rates of convergence coincide, is exceptional.
-
(2)
We can explicitly associate a change of the terminal sampling time T with those of the components of \(\theta \). Specifically, changing the model timescale from t to tT in (1.1), we see that the process
$$\begin{aligned} Y^T=(Y^T_t)_{t\in [0,1]}:=(Y_{tT})_{t\in [0,1]} \nonumber \end{aligned}$$satisfies exactly the same integral equation as in (1.1), except that \(\theta = (\lambda ,\mu ,\beta ,\sigma )\) is replaced by
$$\begin{aligned} \theta _T = \big ( \lambda _T,\mu _T,\beta _T,\sigma _T):= ( T\lambda ,T\mu ,\beta ,T^{1/\beta }\sigma \big ) \nonumber \end{aligned}$$(\(\beta \) is unchanged), \(X_t\) by \(X^T_t:=X_{tT}\), and \(J_t\) by \(J^T_t := T^{-1/\beta }J_{tT}\):
$$\begin{aligned} Y^T_t = Y^T_0 +\int _0^t (\mu _T \cdot X^T_s-\lambda _T Y^T_s)ds + \sigma _T J^T_t, \qquad t\in [0,1]. \nonumber \end{aligned}$$Note that \((J^T_t)_{t\in [0,1]}\) defines the standard \(\beta \)-stable Lévy process. This indeed shows that we may set \(T\equiv 1\) in the virtual (model) world without loss of generality. This is impossible for diffusion-type models, where we cannot consistently estimate the drift coefficient unless we let the terminal sampling time T tend to infinity.
Remark 2.3
The present framework allows us to do unit period-wise, for example, day-by-day inference for both trend and scale structures, providing a sequence of period-wise estimates with theoretically valid approximate confidence sets. This, though informally, suggests an aspect of change-point analysis in high-frequency data: if we have high-frequency sample over \([k-1,k]\) for \(k=1,\dots ,[T]\), then we can construct a sequence of estimators \(\{\hat{\theta }_{n}(k)\}_{k=1}^{[T]}\); then it would be possible in some way to reject the constancy of \(\theta \) over [0, [T]] if \(k\mapsto \hat{\theta }_{n}(k)\) (\(k=1,\dots ,[T]\)) is not likely to stay unchanged.
Remark 2.4
It is formally straightforward to extend the model (1.1) to the following form:
Under mild regularity conditions on the function (a, b, c) as well as on the non-random process X, a solution process Y is explicitly given by (see Cheridito et al. 2003, Appendix)
where \(\psi (s,t;\lambda ) := \int _s^t a(X_v,\lambda ) dv\). However, the corresponding likelihood asymptotics becomes much messier. It is worth mentioning that the optimal rate matrix can be diagonal if, for example, \(\partial _t\partial _\theta \log c(t,\sigma )\not \equiv 0\) with \(X_t=t\): for details, see the previous study Clément and Gloter (2020) that treated the general time-homogeneous Markovian case.
2.2 Proof of Theorem 2.1
In this proof, we make use of the general result Sweeting (1980) about the exact-likelihood asymptotics in a more or less analogous way to that of Brouste and Masuda (2018, Theorem 1): under the uniform nature of the exact-likelihood asymptotics, we will deduce the joint convergence in distribution of the normalized score \(\Delta _n(\theta _0)\) and the normalized observed information \(\mathcal {I}_n(\theta _0)\) from the uniform convergence in probability of \(\mathcal {I}_n(\cdot )\) in an appropriate sense. Consequently, we will not need to derive the stable convergence in law of \(\Delta _n(\theta _0)\), which is often crucial when concerned with high-frequency sampling for a process with dependent increments.
We have \(\sup _{t\in [0,T]}|X_t|<\infty ,\) since \(X: [0,T]\rightarrow \mathbb {R}^q\) is assumed to be càdlàg. Through the localization procedure, we may and do suppose that the driving stable Lévy process does not have jumps of size greater than some fixed threshold (see Masuda 2019, Section 6.1 for a concise account). In that case, the Lévy measure of J is compactly supported; hence in particular,
for any \(K>0\). Further, since the Lévy measure of J is symmetric, the removal of large-size jumps does not change the parametric form of the drift coefficient. We also localize the initial variable \(Y_0\) so that \(|Y_0|\) is essentially bounded uniformly in \(\theta \). It follows from (2.2) and (2.10) that \(\sup _{\theta \in \overline{\Theta }}\sup _{0\le t\le T}E_\theta \left[ |Y_t|^K\right] <\infty \) for \(t\in [0,T]\) as well.
To proceed, we introduce some further notation. Given continuous random functions \(\xi _{0}(\theta )\) and \(\xi _{n}(\theta )\), \(n\ge 1\), we write \(\xi _{n}(\theta )\xrightarrow {p}_u \xi _{0}(\theta )\) if the joint distribution of \(\xi _n\) and \(\xi _0\) are well defined under \(P_\theta \) and if \(P_{\theta }[ |\xi _{n}(\theta )-\xi _{0}(\theta )|>\epsilon ] \rightarrow _{u} 0\) for every \(\epsilon >0\) as \(n\rightarrow \infty \). Additionally, for a sequence \(a_n>0,\) we write \(\xi _n(\theta )=o_{u,p}(a_n)\) if \(a_n^{-1}\xi _{n}(\theta )\xrightarrow {p}_u 0\), and also \(\xi _n(\theta )=O_{u,p}(a_n)\) if for every \(\epsilon >0\) there exists a constant \(K>0\) for which \(\sup _\theta P_\theta [|a_n^{-1}\xi _n(\theta )| > K]<\epsilon \). Similarly, for any random functions \(\chi _{nj}(\theta )\) doubly indexed by n and \(j\le n\), we write \(\xi _{nj}(\theta )=O^*_p(a_n)\) if
for any \(K>0\). Finally, let
We will complete the proof of Theorem 2.1 by verifying the three statements corresponding to the conditions (12), (13), and (14) in Brouste and Masuda (2018), which here read
respectively, where (2.12) and (2.13) should hold for all \(c>0\) and where \(\partial _{\theta }^{2}\ell _{n}(\theta ^1,\dots ,\theta ^{p})\), \(\theta ^k\in \Theta \), denotes the \(p\times p\) Hessian matrix of \(\ell _n(\theta )\), whose (k, l)th element is given by \(\partial _{\theta _k}\partial _{\theta _l}\ell _{n}(\theta ^k)\), in which \(\theta =:(\theta _l)_{l=1}^{p}\). Having obtained (2.11), (2.12) and (2.13), Sweeting (1980, Theorem 1 and 2) immediately concludes Theorem 2.1. We can verify (2.12) exactly as in Brouste and Masuda (2018), so we will look at (2.11) and (2.13).
Proof of (2.11). Recall the expression (2.4). To look at the entries of \(\partial _\theta ^2\ell _n(\theta )\), we introduce several shorthands for notational convenience; they may look somewhat daring, but would not bring confusion. Let us omit the subscript \(\beta \) and the argument \(\epsilon _j\) of the aforementioned notation, such as \(\phi :=\phi _\beta (\epsilon _j)\), \(g:=g_\beta (\epsilon _j)\) and so on. For brevity, we also write
so that (2.4) becomes
Further, the partial differentiation with respect to a variable will be denoted by the braced subscript such as \(\epsilon _{(a)}:=\partial _a\epsilon _j(\theta )\) and \(\epsilon _{(a,b)}:=\partial _a\partial _b\epsilon _j(\theta )\). Then, direct computations give the first-order partial derivatives:
followed by the second-order ones:
It is straightforward to see which term is the leading one in each expression above. We do not list all the details here, but for later reference mention a few of the points:
-
\(\partial ^k\log \eta (y) = O_u(1)\) for \(|y|\rightarrow 0\) whatever \(k\in \mathbb {Z}_{+}\) is;
-
\((\log c)_{(\lambda ,\dots ,\lambda )}=O_u(h^k)\) (k-times, \(k\in \mathbb {Z}_{+}\)), \((\log c)_{(\lambda ,\beta )}=O_u(h^2)\), \((\log c)_{(\beta )}=O_u(h^2)\), \((\log c)_{(\beta ,\beta )}=O_u(h^4)\), and so forth;
-
\(\max _{j\le n}|\partial _\lambda ^k\zeta _j(\lambda )|=O(h^k)\) for \(k\in \mathbb {Z}_{+}\);
-
recalling the definition (2.3) and because of the consequence (2.10) of the localization, concerning the partial derivatives of \(\epsilon _j(\theta )\) we obtain the asymptotic representations: \(\epsilon _{(\mu ,\sigma )} = (1+o_{u}(1)) \sigma ^{-2}h^{1-1/\beta }\), \(\epsilon _{(\mu ,\lambda )} = (1+o_{u}(1)) \sigma ^{-1}h^{2-1/\beta }/2\), \(\epsilon _{(\sigma ,\lambda )} = (1+o_{u}(1)) \{-\sigma ^{-2}h^{1-1/\beta }Y_{t_{j-1}} + O^*_p(h \vee h^{2-1/\beta })\}\), \(\epsilon _{(\lambda ,\lambda )} = O^*_p(h^{2-1/\beta })\), \(\epsilon _{(\beta ,\beta )} = O_p^*(h^2 (l')^2) + \epsilon \, O^*_p((l')^2)\), \(\epsilon _{(\lambda ,\beta )} = O^*_p(l' h^{1-1/\beta })\), and so on; the terms “\(o_u(1)\)” therein are all valid uniformly in \(j\le n\).
Now, we write
with \(\mathcal {I}_{11,n}(\theta ) \in \mathbb {R}^{1+q}\otimes \mathbb {R}^{1+q}\), \(\mathcal {I}_{22,n}(\theta ) \in \mathbb {R}^{2}\otimes \mathbb {R}^{2}\) and \(\mathcal {I}_{12,n}(\theta ) \in \mathbb {R}^{1+q}\otimes \mathbb {R}^{2}\). We can deduce \(\mathcal {I}_{22,n}(\theta ) \xrightarrow {p}_u \mathcal {I}_{\beta ,\sigma }(\theta )\) in exactly the same way as in the proof of Eq.(12) in Brouste and Masuda (2018). Below, we will show \(\mathcal {I}_{11,n}(\theta ) \xrightarrow {p}_u \mathcal {I}_{\lambda ,\mu }(\theta )\) and \(\mathcal {I}_{12,n}(\theta ) \xrightarrow {p}_u 0\).
The Burkholder inequality ensures that
for any continuous \(\pi (x,y;\theta )\) and for any \(U(\epsilon _j(\theta ))\) such that \(E_\theta [U(\epsilon _j(\theta ))]=0\) (\(\theta \in \Theta \)) and that the left-hand side of (2.14) is continuous over \(\theta \in \overline{\Theta }\). Also, note that the right continuity of \(t\mapsto X_{t}\) implies that (\(X_t^{\otimes 1}:=X_t\))
These basic facts will be repeatedly used below without mentioning them.
For convenience, we will write
and denote by \(\varvec{1}_{u,p}\) any random array \(\xi _{nj}(\theta )\) such that \(\max _{j\le n}|\xi _{nj}(\theta ) - 1| \xrightarrow {p}_u 0\). Direct computations give the following expressions for the components of \(\mathcal {I}_{11,n}(\theta )=- r_{n}^{-2}\partial _{(\lambda ,\mu )}^2\ell _{n}(\theta )\):
We can deduce that \(\mathcal {I}_{11,n}(\theta ) \xrightarrow {p}_u \mathcal {I}_{\lambda ,\mu }(\theta )\) as follows.
-
First, noting that \(\epsilon _j=\epsilon _j(\theta ) {\mathop {\sim }\limits ^{P_{\theta }}} \text {i.i.d.}~\mathcal {L}(J_1)\), we make the compensation \(g^2= E_\theta [g^2]+(g^2 - E_\theta [g^2])\) in the summands in the rightmost sides of the last three displays and then pick up the leading part involving \(E_\theta [g^2]\); the other one becomes negligible by the Burkholder inequality.
-
Then, the a.s. Riemann integrability of \(t\mapsto (X_{t}(\omega ),Y_{t}(\omega ))\) allows us to conclude that, for \(k,l\in \{0,1,2\}\) and under \(P_\theta \) for each \(\theta \),
$$\begin{aligned} D_{n}(k,l)&:= \left| \frac{1}{n} \sum _{j=1}^{n}Y_{t_{j-1}}^k\,X_{t_{j-1}}^{\otimes l} - \frac{1}{T} \int _0^T Y_{t}^k \, X_{t}^{\otimes l} dt \right| \nonumber \\&\lesssim \frac{1}{n} \sum _{j=1}^{n}\frac{1}{h} \int _j \Bigg (|Y_t- Y_{t_{j-1}}|(1+|Y_t|+|Y_{t_{j-1}}|)^C \nonumber \\&{}\qquad + |Y_t|^k \left| \left( \frac{1}{h}\int _j X_t dt + O(h)\right) ^{\otimes l} - X_t ^{\otimes l} \right| \Bigg )dt \nonumber \\&\lesssim \frac{1}{n} \sum _{j=1}^{n}\frac{1}{h} \int _j \Bigg (|Y_t- Y_{t_{j-1}}|(1+|Y_t|+|Y_{t_{j-1}}|)^C + |Y_t|^k o(1)\Bigg )dt \xrightarrow {p}0, \nonumber \end{aligned}$$where the order symbols in the estimates are valid uniformly in \(j\le n\). By (2.2), under the localization, we have \(\max _{j\le n}\sup _\theta E_\theta [|Y_t|^M]=O(1)\) and \(\max _{j\le n}\sup _\theta E_\theta [|Y_t- Y_{t_{j-1}}|^M]=o_u(1)\) for any \(M>0\), from which it follows that \(D_{n}(k,l) = o_{u,p}(1)\).
Specifically, for the case of \(-r_n^{-2}\partial _\mu ^2\ell _n(\theta )\), we have
with \(\mathcal {I}_{\lambda ,\mu ;22}(\theta )\) denoting the lower left \(q\times q\) component of \(\mathcal {I}_{\lambda ,\mu }(\theta )\). The others can be handled analogously.
Next, we turn to looking at \(\mathcal {I}_{12,n}(\theta )=\{\mathcal {I}_{12,n}^{kl}(\theta )\}_{k,l}\):
We can deduce that \(\mathcal {I}_{12,n}(\theta ) \xrightarrow {p}_u 0\) just by inspecting the four components separately in a similar way that we managed \(\mathcal {I}_{11,n}(\theta )\). Let us only mention the lower-left \(q\times 1\) component: recalling the properties (2.1) and \(|\varphi _{22,n}|\lesssim _u l'\), we see that
Thus, the claim (2.11) follows.
Proof of (2.13). Note that
where \(|\overline{s}_{nj}(\theta ;c)| \lesssim o_{u}(1)(1+|Y_{t_{j-1}}|)\). Also, for each \(k,l,m\in \mathbb {Z}_{+}\), we have \(P_\theta \)-a.s. the (rough) estimate:
Then, as in the proof of Eq.(14) in Brouste and Masuda (2018), for each \(c>0,\) we can find a constant \(R=R(c)>0\) such that (still rough, but sufficient)
where \(\overline{B}(\beta ;R/l_n^{\prime })\) denotes the closed ball with center \(\beta \) and radius \(R/l'\). This shows (2.13). The proof of Theorem 2.1 is complete.
3 Asymptotically efficient estimator
From now on, we fix a true value \(\theta _0\in \Theta \), and the stochastic symbols and convergences will be taken under \(P:=P_{\theta _0}\); accordingly, we write \(E:=E_{\theta _0}\). Having Theorem 2.1 in hand, we can proceed with the construction of an asymptotically efficient estimator. It is known that any asymptotically centering estimator \(\hat{\theta }_{n}^*\):
is regular; by Theorem 2.1, the right-hand side converges in distribution to \(MN_{p,\theta _0}\left( 0,\, \mathcal {I}(\theta _0)^{-1} \right) \). This, together with the convolution theorem in turn, gives the asymptotic minimax theorem: for any measurable (loss) function \(\mathfrak {L}:\,\mathbb {R}^p \rightarrow \mathbb {R}_{+}\) such that \(\mathfrak {L}(u)=\tau (|u|)\) for some non-decreasing \(\tau :\,\mathbb {R}_{+}\rightarrow \mathbb {R}_{+}\) with \(\tau (0)=0\), we have
Recalling that \(\mathcal {L}\left( \Delta _{n}(\theta ), \, \mathcal {I}_n(\theta ) |P_\theta \right) \Rightarrow \mathcal {L}\left( \mathcal {I}(\theta )^{1/2}Z,\, \mathcal {I}(\theta ) \right) \), where \(Z\sim N_{p}(0,I)\) (Theorem 2.1) and in view of the lower bound in (3.2), we may call that any estimator \(\hat{\theta }_{n}^*\) satisfying (3.1) asymptotically efficient. Again by Theorem 2.1, the good local maximum point \(\hat{\theta }_{n}\) of \(\ell _n(\theta )\) is asymptotically efficient. We refer to Jeganathan (1982, Theorems 2 and 3, and Proposition 2) and also Jeganathan (1995, Theorem 8) for more information and details of the above arguments.
Theorem 2.1 is based on the classical Cramér-type argument. The well-known shortcoming is its local character: the result just tells us the existence of an asymptotically well behaving root of the likelihood equation, but does not give information about which local maxima is the one when there are multiple local maxima, and equivalently multiple roots for the likelihood equations Lehmann (1999, Section 7.3). Indeed, the log-likelihood function \(\ell _n\) of (2.4) is highly nonlinear and non-concave. In this section, we try to get rid of the locality by a Newton–Raphson type of improvement, which in our case will not only remedy the aforementioned inconvenience of the multiple-root problem, but also enable us to bypass the numerical optimization involving the stable density \(\phi _\beta \). In Brouste and Masuda (2018, Section 3), for the \(\beta \)-stable Lévy process (the special case of (1.1) with \(\lambda =0\) and \(X\equiv 1\)), we provided an initial estimator based on the sample median and the method of moments associated with logarithm and/or lower-order fractional moments. However, it was essential in Brouste and Masuda (2018) that the model is a Lévy process, for which we could apply the median-adjusted central limit theorem for an i.i.d. sequence of random variables. In the present case, we need a different sort of argument.
In Theorem 2.1, the process \(X=(X_t)_{t\in [0,T]}\) was assumed to be observed continuously in [0, T]. In this section, we will instead deal with a discrete-time sample \((X_{t_j})_{j=0}^{n}\) under the additional condition:
We will explicitly construct an estimator \(\hat{\theta }_{n}^*\) which is asymptotically equivalent to the MLE \(\hat{\theta }_{n}\), by verifying the asymptotically centering property (3.1); for this much-thinned sample, we may and do keep calling such a \(\hat{\theta }_{n}^*\) asymptotically efficient.
3.1 Newton–Raphson procedure
To proceed with a discrete-time sample \(\{(X_{t_j},Y_{t_j})\}_{j=0}^{n}\), we introduce the approximate-likelihood function \(\mathbb {H}_n(\theta )\) by replacing \(\zeta _{j}(\lambda )\) by \(X_{t_{j-1}}\) in the definition (2.4) of the genuine log-likelihood function \(\ell _n(\theta )\) (recall the notation \(l':=\log (1/h)\)):
where
Of course, this approximation is not for free: to manage the resulting discretization error specified later on, we additionally impose that
Then we have at least \(\beta _0 > 2/3\), so that small values of \(\beta _0\) are excluded; this is the price we have to pay for dealing with a discrete-time sample from X in an efficient way. Accordingly, in the sequel, we will reset the parameter space of \(\beta \) to be a domain \(\Theta _\beta \) such that \(\overline{\Theta _\beta } \subset (2/3,2)\).
Toward construction of an asymptotically efficient estimator \(\hat{\theta }_{n}^*\) satisfying (3.1), we will prove a basic result about a Newton–Raphson type of procedure. As in (2.15), we write \(r_n=r_n(\beta _0)=\sqrt{n}h^{1-1/\beta _0}\). Write \(n^{-1/2}\tilde{\varphi }_n(\theta )\) for the lower-right \(2\times 2\)-part of \(\varphi _n(\theta )\), so that the definition (2.5) with \(\theta =\theta _0\) becomes \(\varphi _n(\theta _0) = \textrm{diag}(r_{n}^{-1}I_{q+1},\, n^{-1/2}\tilde{\varphi }_n(\theta _0))\). We then introduce the diagonal matrix
for a constant
The difference between \(\varphi _n\) and \(\varphi _{0,n}\) is only in the lower-right component for \((\beta ,\sigma )\), and note that the matrix \(\varphi _n^{-1}\varphi _{0,n}\) may diverge in norm. Then, suppose that we are given an initial estimator \(\hat{\theta }_{0,n}=(\hat{\lambda }_{0,n},\hat{\mu }_{0,n},\hat{\beta }_{0,n},\hat{\sigma }_{0,n})\) such that \(\varphi _{0,n}^{-1}(\hat{\theta }_{0,n} - \theta _0) = O_p(1)\), namely,
Let us write \(a=(\lambda ,\mu )\) and \(b=(\beta ,\sigma )\). Based on the approximate-likelihood function (3.4) and \(\hat{\theta }_{0,n}\), we recursively define the k-step estimator \(\hat{\theta }_{k,n}\) (\(k\ge 1\)) by
on the event \(F_{k-1,n} := \{|\det (\partial _a^2\mathbb {H}_n(\hat{\theta }_{k-1,n}))| \wedge |\det (\partial _b^2\mathbb {H}_n(\hat{\theta }_{k-1,n}))| > 0\}\) and assign an arbitrary value to \(\hat{\theta }_{k,n}\) on the complement set \(F_{k-1,n}^c\); below, it will be seen (as in the proof of Theorem 2.1) that \(P[F_{k-1,n}]\rightarrow 1\). Hence, the arbitrary property does not matter asymptotically and we may and do suppose that \(P[F_{k-1,n}]=1\) for \(k\ge 1\). In our subsequent arguments, the inverse-matrix part in (3.9) must be block diagonal: see Remark 3.3 below.
In what follows, \(\hat{\theta }_{n}\) denotes the good local maxima of the likelihood function \(\ell _n(\theta )\), when \((X_t)_{t\le T}\) is observable; by Theorem 2.1, we have \(P[\partial _\theta \ell _n(\hat{\theta }_{n})=0] \rightarrow 1\) and \(\varphi _{n}(\theta _0)^{-1}(\hat{\theta }_{n}-\theta _0) = \mathcal {I}_{n}(\theta _0)^{-1}\Delta _n(\theta _0) + o_{p}(1)\Rightarrow MN_{p,\theta _0}\left( 0,\, \mathcal {I}(\theta _0)^{-1} \right) \). Define the number
We deduce the asymptotic equivalence of \(\hat{\theta }_{n}\) and \(\hat{\theta }_{K,n}\):
starting from the initial estimator \(\hat{\theta }_{0,n}\); (3.11) concludes (3.1) (hence, (3.2) as well) with \(\hat{\theta }_{n}^*= \hat{\theta }_{K,n}\).
We assume that \(\varphi _{0,n}^{-1}(\hat{\theta }_{0,n}-\theta _0)=O_p(1)\) (hence also \(\varphi _{0,n}^{-1}(\hat{\theta }_{0,n}-\hat{\theta }_{n})=O_p(1)\)). Then, to establish (3.11), we first look at the amount of improvement through (3.9) with \(k=1\). Write \(\varphi _n = \varphi _{n}(\theta _0)\) and \(\tilde{\varphi }_n = \tilde{\varphi }_{n}(\theta _0)\), and introduce
We apply Taylor’s expansion around \(\hat{\theta }_{n}\) to (3.9) with \(k=1\): for some random point \(\hat{\theta }'_{0,n}\) on the segment joining \(\hat{\theta }_{0,n}\) and \(\hat{\theta }_{n}\),
In what follows, we will derive the rate of convergence of \(\hat{\theta }_{1,n} -\hat{\theta }_{n}\) in several steps. Here again, we may and do work under the localization (See Sect. 2.2).
Step 1. First, we show that \(\hat{\mathcal {I}}_{0,n}^{-1}=O_p(1)\). We have
The first terms on the right-hand sides above tend to \(\mathcal {I}_{\lambda ,\mu }(\theta _0)\) and \(\mathcal {I}_{\beta ,\sigma }(\theta _0)\) in probability, respectively. The second terms equal \(o_p(1)\), by similar considerations to the verification of (2.13) in the proof of Theorem 2.1. Hence, \(\hat{\mathcal {I}}_{0,n} \xrightarrow {p}\mathcal {I}(\theta _0)\) and in particular \(\hat{\mathcal {I}}_{0,n}^{-1}=O_p(1)\) since \(\mathcal {I}(\theta _0)\) is a.s. positive definite.
Step 2. Next, we show that \(R'_{0,n} =\varphi _n^\top \partial _\theta \mathbb {H}_n(\hat{\theta }_{n}))\) is \(o_p(1)\). Observe that
For the first term, we have \(\varphi _n^\top \partial _\theta \ell _n(\hat{\theta }_{n}) = o_p(1),\) since \(P[|s_n \partial _\theta \ell _n(\hat{\theta }_{n})|>\epsilon ] \le P[|\partial _\theta \ell _n(\hat{\theta }_{n})|\ne 0]\rightarrow 0\) for every \(\epsilon >0\) and \(s_n\uparrow \infty \). To manage the second term, we need to estimate the gap between \(\mathbb {H}_n(\theta )\) and \(\ell _n(\theta )\) by taking the different convergence rates of their components into account. By the definitions (2.4) and (3.4),
From the expressions (2.3) and (3.5) and since \(\kappa \le 1\), a series of straightforward computations shows that the partial derivatives of
satisfy the following bounds: \(|\partial _\mu d_{\epsilon ,j}(\theta )| \lesssim h^{1+\kappa -1/\beta }\), \(|\partial _\lambda d_{\epsilon ,j}(\theta )| \lesssim h^{2-1/\beta }\), \(|\partial _\beta d_{\epsilon ,j}(\theta )| \lesssim h^{1+\kappa -1/\beta } l'\), and \(|\partial _\sigma d_{\epsilon ,j}(\theta )| \lesssim h^{1+\kappa -1/\beta }\). Obviously, \(h^{1/\tilde{\beta }_n - 1/\beta _0} = 1 + o_p(1)\) for any \(\tilde{\beta }_n\) such that \(n^{v}(\tilde{\beta }_n-\beta _0)=O_p(1)\) for some \(v>0\); below, we will repeatedly make use of this fact without mention. Further, under (3.6), it holds that
By piecing together these observations, the basic property (2.1), and the expression (3.15), under (3.3) we can obtain
This concludes that \(R'_{0,n}=o_p(1)\).
Step 3. Let \(R''_{0,n}=:(R''_{0,a,n},R''_{0,b,n}) \in \mathbb {R}^{q+1}\times \mathbb {R}^2\). The goal of this step is to show \(R''_{0,a,n} = o_p(1)\) and \(R''_{0,b,n} = O_p(n^{1/2-r} (l')^C)\); at this stage, the latter component may not be stochastically bounded if \(r\le 1/2\) (recall (3.8)). We have \(R''_{0,n} = A_{0,n} H_{0,n}\), where
Under the assumption \(\varphi _{0,n}^{-1}(\hat{\theta }_{0,n}-\theta _0)=O_p(1)\), recalling the block-diagonal forms (2.5) and (3.7), we see that
where the components \(O_p(1)\in \mathbb {R}^{q+1}\) and \(O_p(n^{(1-r)/2} l') \in \mathbb {R}^2\); here and in what follows, we use the stochastic-order symbols for random variables of different dimensions, which will not cause any confusion.
We will show that all the components of \(A_{0,n}\) are at most \(O_p\big ( n^{-r/2} (l')^C\big )\):
For the diagonal parts of \(A_{0,n}\), from the same arguments as in proving (3.13) and (3.14) with the assumption \(\varphi _{0,n}^{-1}(\hat{\theta }_{0,n}-\theta _0)=O_p(1)\), it holds that
Write \(\theta =(\theta _l)_{l=1}^{p}\) and so on, and also let \(\partial _{a}\partial _{b}\mathbb {H}_n(\hat{\theta }'_{0,n}) \in \mathbb {R}^{2}\times \mathbb {R}^{q+1}\) for the size of the matrix. Then, for the non-diagonal part of \(A_{0,n}\), we expand it as follows:
As in the previous diagonal case, the second term on the right-hand side equals \(O_p(n^{-r/2} (l')^C)\). As for the first term, we write
We have seen the explicit expressions of the components of \(\partial _\theta ^2\ell _n(\theta )\) in Sect. 2.2. Based on them, it can be seen that all the components of \(r_n^{-1}n^{-1/2}\partial _{a}\partial _{b}\ell _n(\theta _0)\) take the form:
for some \(\mathcal {F}_{t_{j-1}}\)-measurable random variable \(\pi _{j-1}(\theta _0)\) such that \(|\pi _{j-1}(\theta _0)|\lesssim (1+|Y_{t_{j-1}}|)(l')^C\) and for some odd function \(\psi \) (hence, \(E[\psi (\epsilon _j(\theta _0))]=0\)); the last term “\(O(h^2)\)” only appears in \(\partial _\lambda \partial _\beta \ell _n(\theta )\). Burkholder’s inequality for the martingale difference arrays gives \(n^{-1}\sum _{j=1}^{n}\pi _{j-1}(\theta _0)\psi (\epsilon _j(\theta _0)) = O_p(n^{-1/2}(l')^{C})\). We conclude that \(r_n^{-1}n^{-1/2}\partial _{a}\partial _{b}\ell _n(\theta _0) = O_p(n^{-1/2}(l')^{C})\). Next, we write \(\mathbb {H}_n(\theta ) - \ell _n(\theta ) = \sum _{j=1}^{n}B_j(\theta )d_{\epsilon ,j}(\theta )\) for the expression (3.15). The following estimates hold: \(|d_{\epsilon ,j}(\theta )| \lesssim h^{1+\kappa -1/\beta }\), \(|\partial _a \partial _b d_{\epsilon ,j}(\theta )| \lesssim h^{1+\kappa -1/\beta }(1+l')\), \(|B_j(\theta )| \lesssim 1\), \(|\partial _a B_j(\theta )| \lesssim (1+|Y_{t_{j-1}}|)h^{1-1/\beta }\), \(|\partial _b B_j(\theta )| \lesssim 1+l'\), and \(|\partial _a\partial _b B_j(\theta )| \lesssim (1+l')(1+|Y_{t_{j-1}}|)h^{1-1/\beta }\). Therefore, by (3.16),
Since \(r\le 1\), we have concluded (3.18).
The desired stochastic orders now follows from (3.17) and (3.18):
Step 4. We are now able to derive the convergence rate of \(\hat{\theta }_{1,n}-\hat{\theta }_{n}\). Recall the definition (3.10) of \(K\in \mathbb {N}\) and the initial rate of convergence (3.7).
-
First, we consider \(r>1/2\). Then, \(R''_{0,n}=o_p(1)\) from (3.19), so that we can take \(\varphi _{1,n}=\varphi _n\): by Steps 1 to 3 and (3.12), \(\varphi _n^{-1}(\hat{\theta }_{1,n}-\hat{\theta }_{n})=o_p(1)\). This means that a single iteration is enough if we can take \(r>1/2\) from the beginning.
-
Turning to \(r\in (0,1/2]\), we pick a constant \(\epsilon '\in (0,r/2)\) (hence \(r-\epsilon '>r/2\)), which is to be taken sufficiently small later. Define
$$\begin{aligned} \varphi _{1,n}=\varphi _{1,n}(\epsilon ') := \textrm{diag}\left( r_{n}^{-1}I_{q+1},\, n^{-(r-\epsilon ')} \begin{pmatrix} 1 &{} 0 \\ 0 &{} l' \end{pmatrix} \right) . \nonumber \end{aligned}$$Again by Steps 1–3 and (3.12), \(\varphi _{1,n}^{-1}\varphi _n \hat{\mathcal {I}}_{0,n}^{-1}=\textrm{diag}(O_p(1),O_p\big ((l')^C n^{r-\epsilon '-1/2}\big ))\) and
$$\begin{aligned} \varphi _{1,n}^{-1}(\hat{\theta }_{1,n}-\hat{\theta }_{n})&= \begin{pmatrix} O_p(1) &{} O \\ O &{} O_p\big ((l')^C n^{r-\epsilon '-1/2}\big ) \end{pmatrix} \left\{ o_p(1) + \begin{pmatrix} o_p(1) \\ O_p\left( n^{1/2-r} (l')^C\right) \end{pmatrix} \right\} \nonumber \\&= o_p(1) + \begin{pmatrix} o_p(1) \\ O_p\big (n^{-\epsilon '} (l')^C\big ) \end{pmatrix} =o_p(1). \nonumber \end{aligned}$$It follows that the rate of convergence for estimating \((\beta ,\sigma )\) gets improved from \(\textrm{diag}(n^{r/2}, n^{r/2}/ l')\) of \(\hat{\theta }_{0,n}\) to \(\textrm{diag}(n^{r-\epsilon '}, n^{r-\epsilon '}/ l')\) of \(\hat{\theta }_{1,n}\); this can be seen as a matrix-norming counterpart of the (near-)doubling phenomenon in the one-step estimation (see for example Zacks 1971, Section 5.5). To improve the rate further, we apply (3.9) to obtain \(\hat{\theta }_{2,n}\) from \(\hat{\theta }_{1,n}\), so that the rate of convergence for estimating \((\beta ,\sigma )\) gets improved from \(\textrm{diag}(n^{r-\epsilon '}, n^{r-\epsilon '}/ l')\) to \(\textrm{diag}(n^{2r-3\epsilon '}, n^{2r-3\epsilon '}/ l')\); here again, we can control the constant \(\epsilon '>0\) to be sufficiently small. This procedure is iterated \(K-1\) times, resulting in the rate \(\textrm{diag}(n^{2^{K-2}r-\epsilon '_0}, n^{2^{K-2}r-\epsilon '_0}/ l')\) with \(\epsilon '_0\) being small enough to ensure that \(2(2^{K-2}r-\epsilon '_0)>1/2\). Then, the last (Kth-step) application of (3.9) is the same as in the case of \(r>1/2\) mentioned above.
These observations conclude (3.11).
Thus, we have arrived at the following claim.
Theorem 3.1
Suppose that \(\hat{\theta }_{0,n}\) satisfies that \(\varphi _{0,n}^{-1}(\hat{\theta }_{0,n} - \theta _0) = O_p(1)\) with (3.7) and (3.8), and define K as in (3.10). Then, the K-step estimator \(\hat{\theta }_{K,n},\) defined through (3.9) satisfies (3.11), and hence is asymptotically efficient (by Theorem 2.1):
Because of the diagonality of \(\varphi _{0,n}\), Theorem 3.1 makes it possible to construct an initial estimator \(\hat{\theta }_{0,n}=(\hat{\lambda }_{0,n},\hat{\mu }_{0,n},\hat{\beta }_{0,n},\hat{\sigma }_{0,n})\) individually for each component.
Having (3.20) in hand, we can construct consistent estimators \(\hat{\mathcal {I}}_{\lambda ,\mu ,n}\xrightarrow {p}\mathcal {I}_{\lambda ,\mu }(\theta _0)\) and \(\hat{\mathcal {I}}_{\beta ,\sigma ,n}\xrightarrow {p}\mathcal {I}_{\beta ,\sigma }(\theta _0)\), and then prove the Studentization:
Indeed, this follows by noting the following facts.
-
For construction of \(\hat{\mathcal {I}}_{\lambda ,\mu ,n}\) and \(\hat{\mathcal {I}}_{\beta ,\sigma ,n}\):
-
In the expressions (2.8) and (2.9), we can replace the (Riemann) dt-integrals by the corresponding sample quantities:
$$\begin{aligned} \frac{1}{n} \sum _{j=1}^{n}\big ( Y_{t_{j-1}}^2, Y_{t_{j-1}}X_{t_{j-1}}\big ) \xrightarrow {p}\frac{1}{T} \int _0^T \big ( Y_{t}^2, Y_{t}X_{t}\big ) dt. \nonumber \end{aligned}$$ -
The elements of the form \(E_{\theta _0}[H(\epsilon ;\beta _0)] = \int H(\epsilon ;\beta _0)\phi _{\beta _0}(\epsilon )d\epsilon \) with \(H(\epsilon ;\beta )\) smooth in \(\beta \) can be evaluated through a numerical integration involving the density \(\phi _\beta (\epsilon )\) and its partial derivatives with respect to \((\beta ,\epsilon )\), with plugging in the estimate \(\hat{\beta }_{K,n}\) for the value of \(\beta \) (the initial estimator \(\hat{\beta }_{0,n}\) is enough).
-
Again, note that \(n^{v}(\hat{\beta }_{K,n} - \beta _0,\, \hat{\sigma }_{K,n} - \sigma _0) = o_p(1)\) for any sufficiently small \(v \in (0,1/2)\), so that \(h^{1-1/\hat{\beta }_{K,n}} / h^{1-1/\beta _{0}} = (1/h)^{1/\hat{\beta }_{K,n}-1/\beta _{0}} \xrightarrow {p}1\). The values \(\overline{\varphi }_{lm}(\theta _0)\) contained in \(\mathcal {I}_{\beta ,\sigma }(\theta _0)\) are estimated by plugging in \(\hat{\theta }_{K,n}\) in (2.6):
$$\begin{aligned} \left\{ \begin{array}{l} \hat{\beta }_{K,n}^{-2} l' \varphi _{11,n}(\hat{\theta }_{K,n}) + \hat{\sigma }_{K,n}^{-1}\varphi _{21,n}(\hat{\theta }_{K,n}) \xrightarrow {p}\overline{\varphi }_{21}(\theta _0), \nonumber \\ \hat{\beta }_{K,n}^{-2} l' \varphi _{12,n}(\hat{\theta }_{K,n}) + \hat{\sigma }_{K,n}^{-1}\varphi _{22,n}(\hat{\theta }_{K,n}) \xrightarrow {p}\overline{\varphi }_{22}(\theta _0), \nonumber \\ \varphi _{11,n}(\hat{\theta }_{K,n}) \xrightarrow {p}\overline{\varphi }_{11}(\theta _0), \nonumber \\ \varphi _{12,n}(\hat{\theta }_{K,n}) \xrightarrow {p}\overline{\varphi }_{12}(\theta _0). \nonumber \\ \end{array}\right. \nonumber \end{aligned}$$We can replace \((\hat{\beta }_{K,n},\hat{\sigma }_{K,n})\) by \((\hat{\beta }_{0,n},\hat{\sigma }_{0,n})\) all through the above.
-
-
Since \(\varphi _{n}^{-1}(\hat{\theta }_{K,n}-\theta _0)=O_p(1)\), it follows that
$$\begin{aligned} \sqrt{n}\,\tilde{\varphi }_n(\hat{\theta }_{K,n})^{-1}\left( {\begin{array}{c}\hat{\beta }_{K,n}-\beta _0\\ \hat{\sigma }_{K,n}-\sigma _0\end{array}}\right)&= \sqrt{n}\, \left( \tilde{\varphi }_n(\theta _0)^{-1} + O_p\big ( (l')^C n^{-1/2}\big )\right) \left( {\begin{array}{c}\hat{\beta }_{K,n}-\beta _0\\ \hat{\sigma }_{K,n}-\sigma _0\end{array}}\right) \nonumber \\&= \sqrt{n}\, \tilde{\varphi }_n(\theta _0)^{-1} \left( {\begin{array}{c}\hat{\beta }_{K,n}-\beta _0\\ \hat{\sigma }_{K,n}-\sigma _0\end{array}}\right) + O_p\big ( (l')^C n^{-1/2}\big ) \nonumber \\&= \sqrt{n}\, \tilde{\varphi }_n(\theta _0)^{-1} \left( {\begin{array}{c}\hat{\beta }_{K,n}-\beta _0\\ \hat{\sigma }_{K,n}-\sigma _0\end{array}}\right) + o_p(1). \nonumber \end{aligned}$$
The property (3.21) entails
which can be used for constructing an approximate confidence ellipsoid and for goodness-of-fit testing, in particular variable selection among the components of X.
Remark 3.2
From the proof of Theorem 3.1, we see that it is possible to weaken (3.6) as \(\beta _0 > 2/3\) if the integrated-process sequence \((\int _j X_s ds)_{j=1}^n\) is observable. Moreover, It is possible to remove (3.6) if the model is the Markovian \(Y_t = Y_0 + \int _0^t (\mu -\lambda Y_s)ds + \sigma J_t\) with constant \(\mu \in \mathbb {R}\) with modifying the definition (3.5) as in the estimating function of Clément and Gloter (2020). However, we worked under (3.3) and (3.5) to deal with a possibly time-varying X.
Remark 3.3
The standard form of the one-step estimator is not (3.9), but
By inspecting the proof of Theorem 3.1, we found that the off-block-diagonal part \(-\partial _a \partial _b \mathbb {H}_n(\hat{\theta }_{k-1,n})\) made the claim therein invalid. This has happened since the rate of convergence for estimating the component \(b=(\beta ,\sigma )\) could be too slow. Still, because of the block-diagonality of the original form (2.7), it seems to be a natural and reasonable strategy to use the block-diagonal form from the beginning of defining (3.9).
Remark 3.4
The necessity of more than one iteration (\(K\ge 2\)) would be a technical one. If we could verify the tail-probability estimate \(\sup _n P[|r_n(\hat{\lambda }_{0,n}-\lambda _0,\hat{\mu }_{0,n}-\mu _0)| \ge s]\lesssim s^{-M}\) for a sufficiently large \(M>0\), then it is possible to deduce the optimality of the one-step Newton–Raphson procedure even when a strategy of construction \((\hat{\beta }_{0,n},\hat{\sigma }_{0,n})\) is not smooth in \((\hat{\lambda }_{0,n},\hat{\mu }_{0,n})\) as in the function \(\hat{M}_n(a')\) in Section 3.2.2. However, the model under consideration is heavy tailed and it seems impossible to deduce such a bound since we cannot make use of the localization for that purpose.
3.2 Specific preliminary estimators
In this section, we consider a specific construction of \(\hat{\theta }_{0,n}=(\hat{\lambda }_{0,n},\hat{\mu }_{0,n},\hat{\beta }_{0,n},\hat{\sigma }_{0,n})\) satisfying \(\varphi _{0,n}^{-1}(\hat{\theta }_{0,n} - \theta _0) = O_p(1)\) with \(\varphi _{0,n}\) given by (3.7). We keep assuming that the available sample is \(\{(X_{t_j}, Y_{t_j})\}_{j=0}^{n}\) and the conditions (3.3) and (3.6) are in force. We will proceed in two steps.
-
(1)
First, we will estimate the trend parameter \((\lambda ,\mu )\) by the least absolute deviation (LAD) estimator, which will turn out to be rate optimal, and asymptotically mixed-normally distributed; although the identification of the asymptotic distribution is not necessary here, it would be of independent interest (see Section 3.2.3).
-
(2)
Next, by plugging in the LAD estimator we construct a sequence of residuals for the noise term, based on which we will consider the lower-order fractional moment matching.
Recall that we are working under the localization (2.10) by removing large jumps of J.
3.2.1 LAD estimator
Let us recall the autoregressive structure together (2.2) with the approximation of the (non-random) integral:
where
is an \(\mathcal {F}_{t_{j-1}}\)-measurable random variable such that
We define the LAD estimator \((\hat{\lambda }_{0,n},\hat{\mu }_{0,n}) \in \mathbb {R}^{q+1}\) by any element \((\hat{\lambda }_{0,n},\hat{\mu }_{0,n}) \in {\text {argmin}}_{(\lambda ,\mu )} M_n(\lambda ,\mu ),\) leaving \((\beta ,\sigma )\) unknown, where
This is a slight modification of the previously studied approximate LAD estimator in Masuda (2010) concerning the ergodic locally stable OU process.
We introduce the following convex random function on \(\mathbb {R}\times \mathbb {R}^q\) (recall the notation (2.15)):
The minimizer of \(\Lambda _n\) is \(\hat{w}_n:=(\hat{u}_n,\hat{v}_n),\) where \(\hat{u}_n := r_n(\hat{\lambda }_{0,n}-\lambda _0)\) and \(\hat{v}_n := r_n(\hat{\mu }_{0,n}-\mu _0)\). Further, letting \(z_{j-1}:=(-Y_{t_{j-1}}, X_{t_{j-1}})\), \(w:=(u,v)\), and
we also introduce the quadratic random function
where
where \(s_{0,n}:=\sigma _0 \eta (\lambda _0\beta _0 h)^{1/\beta _0} = (1+o(1)) \sigma _0\). The a.s. positive definiteness of \(\Gamma _{0}\) (see Sect. 2) implies that \({\text {argmin}} \Lambda _{n}^{\sharp }\) a.s. consists of the single point \(\hat{w}_{n}^{\sharp }:=-\Gamma _{0}^{-1}\Delta '_{n}\). Then, our objective is to prove that
The proof is analogous to Mas10ejs (2010, Proof of Theorem 2.1); hence, we will appropriately omit the full technical details, referring to the corresponding parts therein.
As in Masuda (2010, Eq.(4.6)), we can write \(\Lambda _n(w) = \Delta '_n[w] + Q_n(w), \) where
Let us suppose that
Then, we can make use of the argument of Hjørt and Pollard (2011) to conclude (3.25). To see this, we note the inequality due to Hjørt & Pollard (2011, Lemma 2): for any \(\epsilon >0\),
where \(\delta _n(w) := \Lambda _n(w) - \Lambda _n^\sharp (w)\). Obviously, \(\Lambda _{n}^{\sharp }(\hat{w}_n^\sharp ) = {-} (1/2)\Delta '_n \cdot \Gamma _0^{-1}\Delta '_n\). By straightforward computations, we obtain
Also, because of the convexity, we have the uniform convergence \(\sup _{w\in A}\left| \delta _n(w)\right| \xrightarrow {p}0\) for each compact \(A\subset \mathbb {R}^{1+q}\) (see Hjørt & Pollard (2011, Lemma 1)). Note that \(\hat{w}_{n}^{\sharp } = O_p(1)\) by (3.26) and the a.s. positive definiteness of \(\Gamma _0\). Given any \(\epsilon ,\epsilon '>0\), we can find sufficiently large \(K>0\) and \(N\in \mathbb {N}\) for which the following three estimates hold simultaneously:
Piecing together the above arguments concludes that, for any \(\epsilon ,\epsilon '>0\), there exists an \(N\in \mathbb {N}\) such that \(\sup _{n\ge N} P\left[ |\hat{w}_{n}-\hat{w}_{n}^{\sharp }|\ge \epsilon \right] < \epsilon '\). This establishes (3.25), and it follows that
It remains to prove (3.26) and (3.27). Below, we will write \(P^{j-1}\) and \(E^{j-1}\) for the conditional probability and expectation given \(\mathcal {F}_{t_{j-1}}\), respectively.
Proof of (3.26) follows on showing \(\Delta _n(1)=O_p(1)\) and \(R_{1,n}=o_p(1)\), where
The (matrix-valued) predictable quadratic variation process of \(\{\Delta _{n}(\cdot )\}_{t\in [0,1]}\) is given by
We apply the Lenglart inequality Jacod and Shiryaev (2003, I.3.31) for the submartingale \(|\Delta _n(t)|^2\): for any \(K,L>0\),
We have \(n^{-1}\sum _{j=1}^{n}(1+|Y_{t_{j-1}}|)^2=O_p(1)\). To conclude that \(\Delta _n:=\Delta _n(1)=O_p(1)\), let L and K sufficiently large in this order. To see \(R_{1,n}=o_p(1)\), we proceed in exactly the same way as in Masuda (2010, pp. 544–545): by partly using (3.6) and (3.23),
Thus, we have obtained (3.26), and now we can replace \(\Delta _n'\) by \(\Delta _n\) in (3.28):
Proof of (3.27). We decompose \(Q_{n}(w) =: \sum _{j=1}^{n}\zeta _j(w)\) as \(Q_n(w)=Q_{1,n}(w) + Q_{2,n}(w)\), where \(Q_{1,n}(w) := \sum _{j=1}^{n}E^{j-1}[\zeta _j(w)]\) and \(Q_{2,n}(w) := \sum _{j=1}^{n}(\zeta _j(w) - E^{j-1}[\zeta _j(w)])\). Then, for each \(w\in \mathbb {R}^{1+q},\) we can readily mimic the flow of Masuda (2010, pp.545–546) (for handling the term \(\mathbb {Q}_n(u)\) therein). The sketches are given below.
-
We have
$$\begin{aligned} Q_{1,n}(w) = \frac{1}{2} \Gamma _n [w,w] + A_n(w), \nonumber \end{aligned}$$where
$$\begin{aligned} \Gamma _n := \frac{2\phi _{\beta _0}(0)}{s_{0,n}^2}\frac{1}{n}\sum _{j=1}^{n} \begin{pmatrix} Y_{t_{j-1}}^2 &{} -Y_{t_{j-1}}X_{t_{j-1}}^\top \\ -Y_{t_{j-1}}X_{t_{j-1}} &{} X_{t_{j-1}}^{\otimes 2} \end{pmatrix} = \Gamma _0 + o_p(1), \nonumber \end{aligned}$$and where
$$\begin{aligned} |A_n(w)|&\lesssim \left| \frac{1}{n}\sum _{j=1}^{n}(w\cdot z_{j-1})^{2}\left\{ \phi _{\beta _0}\left( -{\frac{\delta _{j-1}'}{s_{0,n}}}\right) - \phi _{\beta _0}(0)\right\} \right| \nonumber \\&{}\qquad + \left| \sum _{j=1}^{n}\int _{0}^{w\cdot z_{j-1}/(s_{0,n}\sqrt{n})}s^{2}\int _{0}^{1}(1-y) \partial \phi _{\beta _0}\left( sy-{\frac{\delta _{j-1}'}{s_{0,n}}}\right) dyds \right| \nonumber \\&\lesssim \frac{1}{n} \sum _{j=1}^{n}(1+|Y_{t_{j-1}}|)^4 (1+|w|)^4 \left( h^{2(1+\kappa -1/\beta _0)} \vee \frac{1}{n} \vee \frac{h^{1+\kappa -1/\beta _0}}{\sqrt{n}} \right) =o_p(1). \nonumber \end{aligned}$$ -
We have \(Q_{2,n}(w)=o_p(1)\): by the Burkholder–Davis–Gundy inequality,
$$\begin{aligned}&E\left[ \left( \sum _{j=1}^{n}(\zeta _j(w) - E^{j-1}[\zeta _j(w)]) \right) ^2 \right] \nonumber \\&\lesssim \sum _{j=1}^{n}E\left[ \left( \int _{0}^{|w\cdot z_{j-1} / (s_{0,n}\sqrt{n})|} I\left( \left| \epsilon '_j + {\frac{\delta _{j-1}'}{s_{0,n}}}\right| \le s\right) ds \right) ^2 \right] \nonumber \\&\lesssim \sum _{j=1}^{n}\frac{|w|}{\sqrt{n}} E\left[ |z_{j-1}| \int _{0}^{|w\cdot z_{j-1} / (s_{0,n}\sqrt{n})|} P^{j-1}\left[ \left| \epsilon '_j + {\frac{\delta _{j-1}'}{s_{0,n}}}\right| \le s\right] ds \right] \nonumber \\&\lesssim \sum _{j=1}^{n}\frac{|w|}{\sqrt{n}}E\left[ |z_{j-1}| \int _{0}^{|w\cdot z_{j-1} / (s_{0,n}\sqrt{n})|} \left( s+\left| {\frac{\delta _{j-1}'}{s_{0,n}}}\right| \right) ds \right] \nonumber \\&\lesssim (1+|w|)^3 \frac{1}{n} \sum _{j=1}^{n}E\left[ (1+|Y_{t_{j-1}}|)^3\right] \left( \frac{1}{\sqrt{n}} \vee h^{1+\kappa -1/\beta _0} \right) \nonumber \\&=O\left( \frac{1}{\sqrt{n}} \vee h^{1+\kappa -1/\beta _0} \right) =o(1). \nonumber \end{aligned}$$
Summarizing the above yields (3.27).
The tightness (3.30) is sufficient for our purpose. As a matter of fact, the LAD estimator \((\hat{\lambda }_{0,n},\hat{\mu }_{0,n})\) is asymptotically mixed-normally distributed. We give the details in Sect. 3.2.3.
3.2.2 Rates of convergence at the moment matching for \((\beta ,\sigma )\)
The remaining task is to construct a specific estimator \((\hat{\beta }_{0,n},\hat{\sigma }_{0,n})\) such that
This can be achieved simply by fitting some appropriate moments; for this purpose, the localization does not make sense, since precise expressions of truly existing moments without the localization come into play. Here we consider, as in Brouste and Masuda (2018), the pair of the absolute moments of order r and 2r.
Let \(a'\in (0, \beta _0/2)\) and define
Let
which are approximately i.i.d. with common distribution \(\mathcal {L}(J_1)\), and also let
We can apply the central limit theorem to ensure that \(\sqrt{n}\Big ( h^{-a'/\beta _0} \sigma _0^{-a'}M_n(a') - m (a';\beta _0) \Big ) = O_p(1)\) as soon as \(a'<\beta _0/2\), where
Moreover, it follows from the discussions in Sect. 3.2.1 that
which in turn gives
It follows that
Now we want to take \(a'=r,2r\), which necessitates that \(r\in (0, \beta _0/4)\) in the current argument. Then, we conclude that
so that
There exists a bijection \(f_r\) such that \(f_r(m(r;\beta )^2/m(2r;\beta ))=\beta \); see Brouste and Masuda (2018, Section 3.2) and the references therein for the related details. Therefore, taking \(\hat{\beta }_{0,n} := f_r(\hat{M}_n(r)^2 /\hat{M}_n(2r))\) results in \(n^{r/2}(\hat{\beta }_{0,n} -\beta _0)=O_p(1)\), as was to be shown. The bisection method is sufficient for numerically finding \(\hat{\beta }_{0,n}\).
Turning to \(\hat{\sigma }_{0,n}\), we note that
Let \(\hat{\sigma }_{0,n} := \left( \frac{h^{-r/\hat{\beta }_{0,n}}\hat{M}_n(r)}{m(r;\hat{\beta }_{0,n})}\right) ^{1/r}\): we claim that \(\frac{n^{r/2}}{l'}(\hat{\sigma }_{0,n}-\sigma _0) = O_{p}(1)\). Since \(\frac{m(r;\hat{\beta }_{0,n})}{m(r;\beta _0)} = O_p(1)\),
Recall that \(n^{r/2}(\hat{\beta }_{0,n} -\beta _0)=O_p(1)\), hence the second term in the upper bound equals \(O_p(n^{-r/2})\). As for the first term, using that \((1/\hat{\beta }_{0,n} - 1/\beta _0) l' = O_p( l' /n^{r/2}) = o_p(1)\), we observe
These estimates combined with (3.32) conclude the claim: we have (3.31) for the above constructed \((\hat{\beta }_{0,n},\hat{\sigma }_{0,n})\). Given an \(r\in (0,\beta _0/4)\), by Theorem 3.1, the K-step estimator for \(K> \log _2(1/r)\) is asymptotically efficient; if \(\beta _0> 1\) is supposed beforehand, then we can take an \(r>1/4\) small enough to ensure that \(K=2\) is enough.
3.2.3 Asymptotic mixed normality of the LAD estimator
Recall (3.30): \(\hat{w}_{n} = -\Gamma _{0}^{-1}\Delta _{n} + o_{p}(1)\). To deduce the asymptotic mixed normality, it suffices to identify the appropriate asymptotic distribution of \((\Delta _{n},\Gamma _{0})\), equivalently of \((\Delta _{n},\Gamma _{n})\).
First, we clarify the leading term of \(\Delta _n\) in a simpler form. We have \(E[\textrm{sgn}(\epsilon '_{j})]=0\) and \(E[\textrm{sgn}(\epsilon '_{j})^{2}]=1\), Observe that \(\Delta _n = \Delta _{0,n} + R_{1,n} + R_{2,n}\), where \(R_{1,n}\) is given in (3.29) and
We have already seen that \(R_{1,n}=o_p(1)\). We claim that \(R_{2,n}=o_p(1)\). Write \(R_{2,n} = \sum _{j=1}^{n} \xi _{j}\). The claim follows on showing that both \(\sum _{j=1}^{n} E^{j-1}[\xi _{j}]=o_p(1)\) and \(|\sum _{j=1}^{n} E^{j-1}[\xi _{j}^{\otimes 2}]|=o_p(1)\), but the first one obviously follows from \(R_{1,n}=o_p(1)\). The second one can be shown as follows: first, we have
Moreover,
for some \(\mathcal {F}_{t_{j-1}}\)-measurable term \(D_{j-1}\) satisfying the estimate \(|D_{j-1}|\lesssim |\delta '_{j-1}|\lesssim (1+|Y_{t_{j-1}}|) h^{1+\kappa -1/\beta _0}\). These observations conclude that \(|\sum _{j=1}^{n} E^{j-1}[\xi _{j}^{\otimes 2}]|=o_p(1)\).
It remains to look at \(\Delta _{0,n}\). The mere convergence in distribution is unsuitable since the matrix \(\Gamma _0\) is random. We will apply the weak limit theorem for stochastic integrals: we refer the reader to Jacod and Shiryaev (2003, VI.6) for a detailed account of the limit theorems as well as the standard notation used below.
We introduce the partial sum process
We apply Jacod (2007, Lemma 4.3) to derive \(S^{n} \xrightarrow {\mathcal {L}_{s}}w'\) in \(\mathcal {D}(\mathbb {R})\) (the Skorokhod space of \(\mathbb {R}\)-valued functions, equipped with the Skorokhod topology), where \(w'=(w')_{t\in [0,1]}\) denotes a standard Wiener process defined on an extended probability space and independent of \(\mathcal {F}\). Here, the symbol \(\xrightarrow {\mathcal {L}_{s}}\) denotes the (\(\mathcal {F}\)-)stable convergence in law, which is strictly stronger than the mere weak convergence and in particular implies the joint weak convergence in \(\mathcal {D}(\mathbb {R}^{q+2})\):
for any \(\mathbb {R}^{q+1}\)-valued \(\mathcal {F}\)-measurable càdlàg processes \(H^{n}\) and \(H^{\infty }\) such that \(H^{n} \xrightarrow {p}H^{\infty }\) in \(\mathcal {D}(\mathbb {R}^{q+1})\).
We note the following two points.
-
We have \(S^{n}\xrightarrow {\mathcal {L}}w'\) in \(\mathcal {D}(\mathbb {R})\), and for each \(n\in \mathbb {N}\) the process \((S^{n}_{t})_{t\in [0,1]}\) is an \((\mathcal {F}_{[nt]/n})\)-martingale such that \(\sup _{n,t}|\Delta S^{n}_{t}|\le 1\). These facts combined with Jacod and Shiryaev (2003, VI.6.29) imply that the sequence \((S^{n})\) is predictably uniformly tight.
-
Given any continuous function \(f:\mathbb {R}^{q+1}\rightarrow \mathbb {R}^{q'}\) (for some \(q'\in \mathbb {N}\)), we consider the function \(H^{n}=(H^{1,n},H^{2,n})\) with
$$\begin{aligned} H^{1,n}_{t}&:= \left( -Y_{[nt]/n},\,X_{[nt]/n}\right) , \nonumber \\ H^{2,n}_{t}&:=\frac{1}{n}\sum _{j=1}^{[nt]}f(Y_{t_{j-1}},X_{t_{j-1}}). \nonumber \end{aligned}$$Then, we have \(H^{1,n}\xrightarrow {p}H^{1,\infty }:={(-Y,X)}\) in \(\mathcal {D}(\mathbb {R}^{q+1})\) and \(H^{2,n}\xrightarrow {p}H^{2,\infty }:=\int _{0}^{\cdot }f(Y_s,X_{s})ds\) in \(\mathcal {D}(\mathbb {R}^{q'})\), with which (3.33) concludes the joint weak convergence in \(\mathcal {D}(\mathbb {R}^{2+q+q'})\):
$$\begin{aligned} (S^{n},H^{1,n},H^{2,n})\xrightarrow {\mathcal {L}}(w',H^{1,\infty },H^{2,\infty }). \nonumber \end{aligned}$$
With these observations, we can apply Jacod and Shiryaev (2003, VI.6.22) to derive the weak convergence of stochastic integrals:
which entails that, for any continuous function f,
where \(Z\sim N(0,1)\) is independent of \(\mathcal {F}\). Now, by taking
we arrive at
In sum, applying Slutsky’s theorem concludes that
References
Borovkova, S., & Schmeck, M. D. (2017). Electricity price modeling with stochastic time change. Energy Economics, 63, 51–65.
Brouste, A., & Masuda, H. (2018). Efficient estimation of stable Lévy process with symmetric jumps. Statistical Inference for Stochastic Processes, 21(2), 289–307.
Challis, C. J., & Schmidler, S. C. (2012). A stochastic evolutionary model for protein structure alignment and phylogeny. Molecular Biology and Evolution, 29(11), 3575–3587.
Cheridito, P., Kawaguchi, H., & Maejima, M. (2003). Fractional Ornstein-Uhlenbeck processes. Electronic Journal of Probability, 8(3), 14. electronic.
Clément, E., & Gloter, A. (2020). Joint estimation for SDE driven by locally stable Lévy processes. Electronic Journal of Statistics, 14(2), 2922–2956.
Doob, J. L. (1942). The Brownian movement and stochastic equations. Annals of Mathematics, 2(43), 351–369.
DuMouchel, W. H. (1973). On the asymptotic normality of the maximum-likelihood estimate when sampling from a stable distribution. Annals of Statistics, 1, 948–957.
Hjørt, N. L., Pollard, D. (2011). Asymptotics for minimisers of convex processes. Statistical Research Report, University of Oslo, 1993. Available at arxiv preprint arXiv:1107.3806.
Hu, Y., & Long, H. (2009). Least squares estimator for Ornstein-Uhlenbeck processes driven by \(\alpha \)-stable motions. Stochastic Processes and Their Applications, 119(8), 2465–2480.
Jacod, J. (2007). Asymptotic properties of power variations of Lévy processes. ESAIM: Probability and Statistics, 11, 173–196.
Jacod, J., Shiryaev, A. N. (2003). Limit theorems for stochastic processes, volume 288 of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. 2nd edn. Springer-Verlag
Jeganathan, P. (1982). On the asymptotic theory of estimation when the limit of the log-likelihood ratios is mixed normal. Sankhyā Series A, 44(2), 173–212.
Jeganathan, P. (1995). Some aspects of asymptotic theory with applications to time series models. Econometric Theory, 11(5), 818–887. Trending multiple time series (New Haven, CT, 1993).
Jhwueng, D.-C., & Maroulas, V. (2014). Phylogenetic Ornstein-Uhlenbeck regression curves. Statistics and Probability Letters, 89, 110–117.
Lehmann, E. L. (1999). Elements of large-sample theory. Springer Texts in Statistics. Springer-Verlag.
Masuda, H. (2010). Approximate self-weighted LAD estimation of discretely observed ergodic Ornstein-Uhlenbeck processes. Electronic Journal of Statistics, 4, 525–565.
Masuda, H. (2019). Non-Gaussian quasi-likelihood estimation of SDE driven by locally stable Lévy process. Stochastic Processes and Their Applications, 129(3), 1013–1059.
Perninge, M., Knazkins, V., Amelin, M., & Söder, L. (2011). Modeling the electric power consumption in a multi-area system. European Transactions on Electrical Power, 21(1), 413–423.
Samarakoon, D. M. M., & Knight, K. (2009). A note on unit root tests with infinite variance noise. Econometric Reviews, 28(4), 314–334.
Sweeting, T. J. (1980). Uniform asymptotic normality of the maximum likelihood estimator. Annals of Statistics, 8(6), 1375–1381. Corrections: (1982) Annals of Statistics 10, 320.
Verdejo, H., Awerkin, A., Kliemann, W., & Becker, C. (2019). Modelling uncertainties in electrical power systems with stochastic differential equations. International Journal of Electrical Power & Energy Systems, 113, 322–332.
Zacks, S. (1971). The theory of statistical inference. John Wiley & Sons Inc. Wiley Series in Probability and Mathematical Statistics.
Zhang, S., & Zhang, X. (2013). A least squares estimator for discretely observed Ornstein-Uhlenbeck processes driven by symmetric \(\alpha \)-stable motions. Annals of the Institute of Statistical Mathematics, 65(1), 89–103.
Acknowledgements
The author should like to thank the anonymous reviewers for their detailed comments, based on which he could fix some essential mistakes and drastically improve the quality of the paper. This work was partly supported by JSPS KAKENHI Grant Number 22H01139 and JST CREST Grant Number JPMJCR2115, Japan.
Funding
Open access funding provided by The University of Tokyo.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The author declares that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Masuda, H. Optimal stable Ornstein–Uhlenbeck regression. Jpn J Stat Data Sci 6, 573–605 (2023). https://doi.org/10.1007/s42081-023-00197-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-023-00197-z