Guyon Joint SPX Vix Calibration
Guyon Joint SPX Vix Calibration
1. Introduction
In [11, 14], it was shown how to build a nonparametric discrete-time arbitrage-free model that perfectly
matches market data on Standard & Poor’s 500 index value (SPX) futures, SPX options, Chicago Board
Options Exchange Volatility Index (VIX) futures, and VIX options. The probability distribution is built by
minimizing the relative entropy with respect to a reference probability measure, and this Schrödinger problem
is numerically solved using an extended Sinkhorn algorithm. This provided the first exact solution to this
joint calibration problem, a difficult problem (especially for short maturities) that had eluded quants for many
years. Jointly calibrating to SPX and VIX futures and options is important to prevent arbitrage and ensure
accurate pricing of liquid hedging instruments; calibrating to VIX derivatives means incorporating market
information on SPX forward volatilities. Figure 1.1 showcases the additional information contributed by the
VIX by quantifying how model-free bounds for SPX path-dependent payoffs tighten when the prices of VIX
futures and VIX options are included. The extra information is also seen in Table 2 where we compare the
prices of several options in models that are all calibrated to SPX smiles but not all calibrated to VIX futures
and smiles. Table 2 shows that to avoid mispricing some payoffs, in particular forward-starting payoffs, which
are sensitive to forward volatilities, it is important that the model also fits VIX futures and VIX options,
even when the payoff depends only on SPX prices.
The aim of this paper is twofold:
(1) Speed up the construction of the discrete-time model by turning to Newton-type methods for
solving the Schrödinger system. In Section 3, we numerically show that a mixed Newton–Sinkhorn
method and an implied Newton method converge much faster than the Sinkhorn algorithm.
(2) Quickly build a continuous-time extension of the model, allowing the pricing of options
depending on SPX values at any date t, while ensuring calibration to the market smiles of SPX and
VIX (Section 4).
Part (1) draws inspiration from De March [6], who explored the entropic approximation for discrete-time
multidimensional martingale optimal transport, excluding VIX, using Newton’s method. In Part (2), our
continuous-time construction may appear similar to the Bass local volatility of [4] but ct fundamentally
different. First and foremost, unlike in [4], our purely forward Markov functional construction does not
require solving a fixed-point problem; it is thus much faster. This is because we first build an arbitrage-free
multimarginal discrete-time model consistent with market data. Second, our continuous-time interpolated
model fits not only SPX option prices but also VIX market data.
of Quantitative Finance.
1
25
20
15
10
Figure 1.1. Model-free bounds of forward-starting calls (ST2 /ST1 −K)+ , with and without
VIX options data. LB (resp., UB) denotes the lower bound (resp., upper bound) along with
prices obtained with the local volatility model (LV), the two-factor Bergomi stochastic local
volatility model (LV + Bergomi 2F), and the discrete minimum-entropy jointly calibrated
model µ∗ [11] that we build in the next section. We refer to [18] for details on how to
compute model-free bounds.
Combining Parts (1) and (2), we thus quickly (in less than a minute) build a continuous-time model
that is, by construction, exactly calibrated to SPX and VIX smiles and futures. By contrast, other known
continuous-time exact solutions to the joint calibration problem [10, 12] are more involved and demand
significantly more computation time. Approximate parametric continuous-time solutions, including those
based on rough or rough-like path-dependent volatility models, [9, 15, 16, 19], classical stochastic volatility
models [1], or signature-based models [5], are also costly in terms of computation time. This is the main
benefit of our novel discrete-time-continuous-time calibration method: both steps (1) and (2) are much faster
than the known methods that directly calibrate a continuous-time model.
A natural practical application of our continuous-time model is the pricing and hedging of structured
products by exotics desks, see Table 2. With our model, the pricing and hedging of structured products on
the SPX indeed take into account the whole information given by SPX smiles (the risk-neutral distributions of
future SPX values) as well as the whole information brought by VIX futures and VIX smiles (the risk-neutral
distributions of some future SPX implied volatilities). Once the model is calibrated (this takes less than a
minute per VIX expiry), it is straightforward to implement and use, as it is a Markov functional model that
involves simulating only one Brownian motion, along with the VIX at VIX future expiries. The model can
also be used for computing reserves and other valuation adjustments, and for assessing model risk.
After a brief reminder on martingale optimal transport in Section 2, Section 3 deals with Part (1), while
Section 4 is devoted to Part (2), i.e., the continuous-time extension.
written on Si is the expectation Ei [ui (Si )] := Eµi [ui (Si )] of the payoff under µi . Similarly, by the absence
2
of static VIX arbitrage, there exists a risk-neutral measure µV := ∂∂K CV
2 such that the price of any vanilla
V µV
option uV (·) written on V is the expectation E [uV (V )] := E [uV (V )] of the payoff under µV .
From the absence of dynamic SPX arbitrage (or calendar arbitrage), µ1 and µ2 are in the convex order,
i.e., E1 [f (S1 )] ≤ E2 [f (S2 )] for any convex function f : R>0 → R, even if we allow trading in the FSLC at T1 .
By the absence of arbitrage, the price of Si at time 0 is the initial SPX spot value S0 > 0, i.e., Ei [Si ] = S0 .
Furthermore, EV [V ] = FV ≥ 0 where FV is the value at time 0 of the VIX future maturing at T1 . Finally,
for the log contracts and the VIX squared to have finite prices, the following assumption is in force in what
follows.
i i
Assumption 1. The
V V
2 given marginals µ1 , µV , µ2 satisfy, for any i ∈ {1, 2}, E [Si ] = S0 , E [|ln Si |] < ∞ and
E [V ] = FV , E V < ∞.
Let X := R>0 ×R≥0 ×R>0 and define the strictly convex function L : x ∈ R>0 7→ − τ2 ln x. For a probability
distribution ρ on R, we denote the associated cumulative distribution as Fρ , i.e., Fρ (x) = ρ((−∞, x]) for every
x ∈ R. Let P R2>0 (resp., P (X )) denote the set of all probability measures on R2>0 (resp., X ).
Let U V be the set of all measurable functions u1 , u2 : R>0 → R, uV : R≥0 → R, ∆S , ∆L : R>0 × R≥0 → R
satisfying ui ∈ L1 (µi ) for i ∈ {1, V, 2}, and ∆S , ∆L bounded. We use the shorthand notation
(S) (L)
∆L (s1 , v, s2 ) := ∆L (s1 , v) L (s2 /s1 ) − v 2 ,
∆S (s1 , v, s2 ) := ∆S (s1 , v) (s2 − s1 ) ,
to denote the P&Ls from delta-hedging at time T1 in the SPX and the log-contract, respectively. Finally, let
Mc (µ1 , µV , µ2 ) denote the set of all VIX-constrained martingale probability measures:
n µ µ µ
Mc (µ1 , µV , µ2 ) := µ ∈ P (X ) : S1 ∼ µ1 , V ∼ µV , S2 ∼ µ2 ,
o
Eµ [ S2 | S1 , V ] = S1 , Eµ [L(S2 /S1 )|S1 , V ] = V 2 .
2.2. Martingale Optimal Transport (MOT). We first consider a market with two trading days T0 = 0
and T1 , where the financial instruments are the SPX (tradable at T0 and T1 ), the vanilla options on it with
maturities T1 and T2 (tradable at T0 ).
From the theory of MOT [18], the model-free upper bound price for a payoff ψ(s1 , s2 ) is the smallest price
at time 0 of a super-replication portfolio:
1
(2.1) Pψ := inf E [u1 (S1 )] + E2 [u2 (S2 )]
(u1 ,u2 ,∆S )∈Uψ
where Uψ ⊂ U is the set of all integrable super-replicating portfolios, i.e., the portfolios satisfying the super-
replication constraint:
∀ (s1 , s2 ) ∈ R2>0 , u1 (s1 ) + u2 (s2 ) + ∆S (s1 ) (s2 − s1 ) ≥ ψ (s1 , s2 ) .
This is known as the primal problem, which corresponds to the “physical” or portfolio problem, hence the
notation P . At time T1 , delta-hedging in the SPX is allowed; as the price of S2 at time T1 is precisely S1 ,
the price of the superreplicating portfolio is Eµ1 [u1 (S1 )] + Eµ2 [u2 (S2 )] . The delta ∆S (s1 ) may depend on
the SPX value at time T1 .
The problem Pψ corresponds to a linear program which can be solved using the simplex algorithm method
by discretizing (s1 , s2 ) on a two-dimensional grid. When the number of discretization points is high and/or
when dealing with a multi-dimensional asset a cutting-plane algorithm can be used, see [17] for details. The
lower bound is obtained similarly by replacing inf by sup and F (s1 , s2 ) ≥ ψ(s1 , s2 ) by F (s1 , s2 ) ≤ ψ(s1 , s2 ).
The dual problem (or “measure problem”) of super-replicating the payoff ψ (S1 , S2 ) is one of maximizing
the expected payoff Eµ [ψ (S1 , S2 )] over all probability measures µ ∈ M (µ1 , µ2 ):
Dψ := sup Eµ [ψ (S1 , S2 )]
µ∈M(µ1 ,µ2 )
where M (µ1 , µ2 ) denotes the set of all martingale probability measures µ on R>0 × R>0 :
n µ µ
o
(2.2) M (µ1 , µ2 ) = µ ∈ P(R2>0 ) : S1 ∼ µ1 , S2 ∼ µ2 , Eµ [ S2 | S1 ] = S1 .
2.3. Dispersion-constrained MOT. When we add VIX information, we allow for static hedging in VIX
vanilla options and dynamic hedging at T1 in the FSLC, and the hedges at T1 can now depend on the VIX.
The theory of VIX-constrained MOT was studied in [11, 14] and the model-free upper bound price for a
payoff ψ(s1 , v, s2 ) becomes
PψV :=
1
(2.3) inf E [u1 (S1 )] + EV [uV (V )] + E2 [u2 (S2 )] ,
V
(u1 ,uV ,u2 ,∆S ,∆L )∈Uψ
where UψV ⊂ U V is the set of all integrable super-replicating portfolios, i.e., the portfolios satisfying the
super-replication constraint:
u1 (s1 ) + uV (v) + u2 (s2 ) + ∆S (s1 , v) (s2 − s1 ) + ∆L (s1 , v) L(s2 /s1 ) − v 2 ≥ ψ (s1 , v, s2 )
for all (s1 , v, s2 ) ∈ R>0 × R≥0 × R>0 . Observe now that at time T1 , delta-hedging in the SPX and the FSLC
is allowed. The respective deltas ∆S (s1 , v) and ∆L (s1 , v) may depend on the values s1 and v of the SPX and
the VIX at T1 .
Similarly, the linear program PψV can be solved using the simplex algorithm by discretizing (s1 , v, s2 ) on
a three-dimensional grid, and the dual problem (or “measure problem”) becomes
V
Dψ := sup Eµ [ψ (S1 , V, S2 )]
µ∈Mc (µ1 ,µV ,µ2 )
where Mc (µ1 , µV , µ2 ) denotes the set of all VIX-constrained martingale probability measures µ on R>0 ×
R≥0 × R>0 :
n µ µ µ
(2.4) Mc (µ1 , µV , µ2 ) := µ ∈ P (R>0 × R≥0 × R>0 ) : S1 ∼ µ1 , V ∼ µV , S2 ∼ µ2 ,
o
Eµ [ S2 | S1 , V ] = S1 , Eµ [ L (S2 /S1 )| S1 , V ] = V 2 .
As shown in [11, 14], the absence of joint SPX/VIX arbitrage is equivalent to Mc (µ1 , µV , µ2 ) 6= ∅ and to
P0V > −∞.
Additionally, in any case, Dµ̄ = Pµ̄ . Note that (P) is an unconstrained concave maximization problem. Both
problems are dual to each other; following a terminology proposed by Dupire, (M) is a measure problem
while (P) is a portfolio problem.
As in practice, only a finite number of SPX and VIX vanilla options are available for trading, we consider
vanilla payoffs u1 , uV , and u2 that are linear combinations of finitely many call options, along with one
position in the bond, one position in S1 , and one position in the VIX futures. Therefore we consider a
1
V
2
market data K composed of call options on S1 , V , and S2 denoted CK K∈K1
, CK K∈KV
, and CK K∈K2
with respective strikes K1 , KV , and K2 , and we build a model of the form
dµK,θ
= eθ (S1 , V, S2 )
dµ̄
(S) (L)
c+∆0S s1 +∆0V v+ K a1K (s1 −K)+ + K aV 2
P P P
K (v−K)+ + K aK (s2 −K)+ +∆S (s1 ,v,s2 )+∆L (s1 ,v,s2 )
eθ (s1 , v, s2 ) := e 1 V 2 ,
P P
where for i ∈ {1, V, 2}, Ki is a shorthand notation for K∈Ki and θ is an element of the set Θ of all
θ := c, ∆0S , ∆0V , a1 , aV , a2 , ∆S , ∆L
such that c, ∆0S , ∆0V ∈ R, a1 ∈ RK1 , aV ∈ RKV , a2 ∈ RK2 , and ∆S , ∆L : R>0 × R≥0 → R are bounded
measurable functions of (s1 , v). The measure µK,θ is then a consistent, arbitrage-free model that jointly
calibrates to the market prices of SPX/VIX futures and options if and only if θ solves the so-called K-
Schrödinger system
µ̄ dµK,θ
E = 1,
dµ̄
dµK,θ
Eµ̄ S1
= S0 ,
dµ̄
dµK,θ
Eµ̄ V
= FV ,
dµ̄
dµK,θ
Eµ̄ (S1 − K)+ 1
∀K ∈ K1 ,
= CK ,
dµ̄
(3.1)
µ̄ dµK,θ V
− ∀K ∈ KV ,
E (V K) + = CK ,
dµ̄
dµK,θ
Eµ̄ (S2 − K)+ 2
∀K ∈ K2 ,
= CK ,
dµ̄
dµK,θ
Eµ̄ (S2 − S1 ) ∀s1 > 0, v ≥ 0,
S1 = s1 , V = v = 0,
dµ̄
S2 dµK,θ
µ̄
E
L −V2 S1 = s1 , V = v = 0, ∀s1 > 0, v ≥ 0.
S1 dµ̄
The first equation states that µK,θ is a probability measure while the others that it belongs to the set
Mc,K (µ1 , µV , µ2 ) of probability measures µ satisfying
Eµ [S1 ] = S0 , Eµ [V ] = FV , ∀K ∈ K1 , Eµ [(S1 − K)+ ] = CK
1
, ∀K ∈ KV , Eµ [(V − K)+ ] = CK
V
,
∀K ∈ K2 , Eµ [(S2 − K)+ ] = CK
2
, Eµ [S2 |S1 , V ] = S1 , E µ [L(S2 /S1 )|S1 , V ] = V 2 .
Remark 2 (On the choice of the reference measure µ̄). The prior measure µ̄ is up to the modeler’s choice.
Examples include:
dv, ds2 ) = ν(ds1 , dv)T (s1 , v, ds2 ) where ν = µ1 ⊗ µV and T (s1 , v, ds2 ) is the
(1) lognormal prior: µ̄(ds1 ,√
distribution of s1 exp(v τ G − 12 v 2 τ ) where G ∼ N (0, 1) That is, S1 and V are independent with
their respective marginals µ1 and µV , and conditional on S1 and V , S2 is lognormal with mean S1
and annualized lognormal volatility V with a probability density function of the form1
2
2
1
s
− 1 ln s2 + v 2τ
T (s1 , v, s2 ) = √ √ e 2v2 τ 1 1s2 >0 .
v τ 2πs2
(2) independent prior (product measure): µ̄ = µ1 ⊗ µV ⊗ µ2 . This is the standard choice made in the
entropic optimal transport literature. However, financially speaking, this reference measure is not
natural as it does not satisfy both the martingality and consistency conditions.
1Assuming that S is lognormal conditioned on S and V is financially natural but may not be the best choice in practice.
2 1
Indeed, in this case, Eµ̄ [eδS2 |S1 , V ] = +∞ for δ > 0, so the integrals are not well defined for many reasonable values of the
parameters u1 , uV , u2 , ∆S , ∆L . In practice, we avoid those integrability issues by working with a finite support approximation
of µ̄ stemming from the Gaussian quadrature approximation of the integrals (see Remark 3).
Remark 3 (Quadrature). To evaluate the expectations arising in (3.1), we use a Gauss–Legendre quadrature
(1) (n )
when integrating with respect to s1 (resp., v) with grid G1 := {s1 ≤ · · · ≤ s1 1 } (resp., GV := {v (1) ≤
(nV )
··· ≤ v }). For s2 , in the case of the lognormal prior, we use a Gauss–Hermite quadrature with knots
{z (1) , . . . , z (n2 ) }. For example, in the log-normal case, we use the approximation
Z
Eµ̄ [eθ (S1 , V, S2 )] = eθ (s1 , v, s2 )µ̄(ds1 , dv, ds2 )
R3
n1 X
nv X
n2
(j) √
(i) (j) (k) (i) τ z (k) − 21 (v (j) )2 τ (i)
X
(3.2) ≈ ωLe ωLe ωHe eθ (s1 , v (j) , s1 ev )µ1 (s1 )µV (v (j) )
i=1 j=1 k=1
where the ωHe (resp., ωLe ) denote the Hermite (resp., Legendre) quadrature weights.
3.2. Solving the Schrödinger system (3.1).
3.2.1. The Sinkhorn algorithm. The classical method for solving Schrödinger systems is the Sinkhorn al-
gorithm, an iterative method that sequentially solves the individual equations in (3.1) to converge to the
optimizer θ∗ of the whole system. This algorithm has recently gained popularity in machine learning where
it is used to quickly compute Wasserstein distances, and more generally solve optimal transport problems,
via a small entropic penalty. It has also been applied to martingale optimal transport problems [6] and in
particular in quantitative finance to quickly build arbitrage-free smiles [7], see also [18] and references therein
for a large overview of MOT. In [14] the Sinkhorn algorithm was extended to accommodate the martingality
and consistency constraints in (3.1), and shown to converge toward a jointly calibrated model. However,
the convergence was somewhat slow (see Section 3.2.4). In the following sections, we present two faster
alternatives for numerically solving (3.1); both rely on solving the portfolio problem (P).
3.2.2. The Newton–Sinkhorn algorithm. Observe that if we define the concave function
X X
Jµ̄,K (θ) := c + ∆0S S0 + ∆0V FV + aiK CK
i
− Eµ̄ [eθ (S1 , V, S2 )] + 1,
i∈{1,V,2} K∈Ki
solving the K-Schrödinger system is equivalent to canceling the gradient of Jµ̄,K . Hence, to solve the system
and build µ∗ , one can directly solve the portfolio problem
(3.3) Pµ̄,K := sup Jµ̄,K (θ),
θ∈Θ
which is the finitely-many-payoff version of (P). To this end, we suggest the following Newton–Sinkhorn
algorithm. Each iteration involves a Newton step followed by a Sinkhorn step.
Newton step. Starting from an initial guess θ(0) , we first solve for every iteration n ∈ N the portfolio
problem (3.3)
(n) (n)
(3.4) θ−∆,(n+1) = arg max Jµ̄,K (θ−∆ , ∆S (s1 , v), ∆L (s1 , v)), ∀ (s1 , v) ∈ G1 × GV ,
θ −∆ ∈Θ−∆
(n) (n)
where θ−∆ := c, ∆0S , ∆0V , a1 , aV , a2 . Since the Hessian of Jµ̄,K (θ−∆ , ∆S (s1 , v), ∆L (s1 , v)) is known in
closed form, this Newton step is extremely fast. We solve (3.4) using the function scipy.optimize.minimize
(method="trust-exact") from the scipy library.
(n+1) (n+1)
Sinkhorn step. Then, for all (s1 , v) ∈ G1 × GV , we jointly solve for ∆S (s1 , v), ∆L (s1 , v) the two-
dimensional nonlinear system
(3.5) fs1 ,v ∆(n+1)
S (s1 , v), ∆
(n+1)
L (s1 , v), a2,(n+1)
= 0,
(3.6) gs1 ,v ∆(n+1)
S
(n+1)
(s1 , v), ∆L (s1 , v), a2,(n+1) = 0,
where a2,(n+1) is the optimal vector a2 from the previous step (3.4) and for all (x, y) ∈ R2 ,
Z P 2
s2 2
2 K∈K2 aK (s2 −K)+ +x(s2 −s1 )+y L s1 −v
fs1 ,v x, y, a := (s2 − s1 ) e µ̄ (s1 , v, ds2 ) ,
ZR P
s2
s
a2 (s −K)+ +x(s2 −s1 )+y L s2 −v 2
gs1 ,v x, y, a2 := − v 2 e K∈K2 K 2
L 1 µ̄ (s1 , v, ds2 ) .
R s1
This second step is a Sinkhorn step, where the last two equations of (3.1) are jointly solved, hence the name
Newton–Sinkhorn. We use the Levenberg–Marquardt algorithm via scipy.optimize.root(method="lm")
to solve (3.5)-(3.6).
Remark 4 (Full Newton algorithm and parametrization of the deltas). As an alternative approach, we
could use a full Newton algorithm on the vector θ. However, the number of parameters for the deltas
(∆S (s1 , v), ∆L (s1 , v))(s1 ,v)∈G1 ×GV is typically too large for the Newton algorithm to be fast: the inversion
of the Hessian matrix required in Newton-type algorithms is too costly. For example, taking n1 = nv = 45
knots for each Legendre quadrature leads to 2 × 45 × 45 = 4050 values for ∆S (·, ·) and ∆L (·, ·) while only a
few dozens parameters for θ−∆ is required for the algorithm to converge (see section 3.2.4 for further details).
This is the reason why we mix a Newton solver for θ−∆ with a Sinkhorn solver for ∆S (·, ·) and ∆L (·, ·).
The previous algorithms are nonparametric in the sense that we find the values of ∆S and ∆L on the two
given quadrature grids G1 and GV . To reduce the number of parameters, it is natural to try parametrizing
the functions (s1 , v) 7→ ∆S (s1 , v) and (s1 , v) 7→ ∆L (s1 , v) and see if we can make the algorithm converge
even faster. We tried multivariate polynomials, i.e., for two-dimensional bases ΨS (·, ·), ΨL (·, ·) and degrees
dS , dL ∈ N>0 , we set
X X
∆S (s1 , v) = bS` ΨS` (s1 , v), ∆L (s1 , v) = bL L
` Ψ` (s1 , v).
|`|≤dS |`|≤dL
Here, the martingality and consistency constraints of (3.1) (last two equations) rewrite as
µ̄ S dµK,θ
E Ψ` (S1 , V ) (S2 − S1 ) = 0, ∀ |`| ≤ dS ,
dµ̄
S2 dµK,θ
Eµ̄ ΨL
` (S1 , V ) L −V2 = 0, ∀ |`| ≤ dL .
S1 dµ̄
They are weaker than the original constraints as we are only projecting the random variables onto the space
generated by our basis functions. In practice, we tried many different choices of basis (Legendre, Hermite,
monomials, Fourier) but in each case, the martingale and consistency conditions were not satisfied with
good enough accuracy, even with large degrees such as dS = dL = 20. We also tried to parametrize ∆S , ∆L
with two feed-forward neural networks. We tried many different architectures (different numbers of hidden
layers, neurons and activation functions). Again, the algorithm converged but the martingale and consistency
conditions were never satisfied with sufficient accuracy.
θ∗ = arg max Jµ̄,K (θ) = arg max J˜µ̄,K θ−∆ , ∆∗S ·, ·, a2 , ∆∗L ·, ·, a2
θ∈Θ θ −∆ ∈Θ−∆
(
fs1 ,v (∆∗S (·, ·), ∆∗L ·, ·), a2
= 0,
gs1 ,v (∆∗S (·, ·), ∆∗L 2
·, ·), a = 0.
That is, for each θ−∆ , we first optimize over ∆S (·, ·) and ∆L (·, ·), and then we optimize over θ−∆ . Note that
the inner optimization depends on θ−∆ only through a2 .
Like for Jµ̄,K , the gradient and Hessian of J˜µ̄,K are known in closed form. In fact, J˜µ̄,K has the same
gradient and Hessian as Jµ̄,K , except for the terms involving differentiation with respect to a2 . Namely, for
every m, n ∈ K2 , we have
Z h
∂a2m J˜µ̄,K (λ) = (s2 − m)+ + ∂a2m ∆∗S (s1 , v, a2 )(s2 − s1 )
R3
i
+ ∂a2m ∆∗L (s1 , v, a2 )(L(s2 /s1 ) − v 2 ) e∗θ−∆ (s1 , v, s2 )µ̄ (ds1 , dv, ds2 ) − Cm
2
and
Z n
∂a2m ,a2n J˜µ̄,K (λ) = ∂a2m ,a2n ∆∗S (s2 − s1 ) + ∂a2m ,a2n ∆∗L (L(s2 /s1 ) − v 2 )
3
hR i h
+ (s2 − m)+ + ∂a2m ∆∗S (s2 − s1 ) + ∂a2m ∆∗L (L(s2 /s1 ) − v 2 ) × (s2 − n)+ + ∂a2n ∆∗S (s2 − s1 )
io
+ ∂a2n ∆∗L (L(s2 /s1 ) − v 2 ) e∗θ−∆ (s1 , v, s2 )µ̄ (ds1 , dv, ds2 ) ,
where e∗θ−∆ (s1 , v, s2 ) := e(θ−∆ ,∆∗ (·,·,a2 ),∆∗ (·,·,a2 )) (s1 , v, s2 ).
S L
The derivatives ∂a2m ∆∗S and ∂a2m ∆∗L are computed as follows. Taking the derivative ∂a2m in
fs1 ,v ∆∗S s1 , v, a2 , ∆∗L s1 , v, a2 , a2 = 0,
3.2.4. Comparison of the different algorithms. All the numerical tests were performed on a MacBook Pro
laptop with a 2.6 GHz 6-Core Intel Core i7 processor and 32 GB memory using the programming language
Python 3.9.6.
To compare the various algorithms, we plot the logarithm (in base 10) of the calibration error for the
SPX smiles at T1 , T2 and the VIX smile at T1 as a function of the computational time as of August 1, 2018
(T1 = 21 days), April 20, 2020 (T1 = 30 days), and April 25, 2022, (T1 = 23 days), see Figure 3.1. The
calibration error is computed as
∗ ∗
X 1 X σ `,µ (K) − σ `,mkt (K)
BS BS
X Eµ̄ [A dµ
dµ̄ ] − FA
h ∗i
µ̄ dµ
(3.10) `,mkt
+ + E −1 ,
|K` | σBS (K) FA dµ̄
`∈{1,V,2} K∈K` A∈{S1 ,V,S2 }
∗
`,µ
where, for any ` ∈ {1, V, 2}, σBS (K) corresponds to the Black–Scholes implied volatility computed with one
`,mkt
of the different algorithms (Sinkhorn, Newton–Sinkhorn, or implied Newton) and σBS (K) is the market
implied volatility. The calibration error (3.10) is thus defined as the sum of the absolute relative errors of
the three futures, the mean of the three smiles, and the total mass of µ∗ (recall the constraint that µ∗ must
be a probability measure).
We choose the lognormal prior for the reference measure µ̄ (see Remark 2) and we take n1 = nv = 45
knots for the integration with respect to s1 and v, and n2 = 25 knots for the integration with respect to s2 .
(1) (n )
We set s1 = Fµ−1 1
(q), v (1) = Fµ−1
V
(q) and s1 1 = Fµ−11
(1 − q), v (nV ) = Fµ−1
V
(1 − q) with q = 10−3 for the
lowest and highest values of the quadrature grids of s1 and v.
We compare the calibration speed of the Sinkhorn (S), Newton–Sinkhorn (NS), and implied Newton
(IN) algorithms. For IN, we initialize the algorithm with 10 iterations of a (pure) S algorithm (warm-start
procedure). A warm-start initialization of NS yielded no improvement. The fact that a Sinkhorn warm-start
speeds up the convergence of IN was already reported in [6, section 7.1] where the author states: “We notice
that the Bregman projection [i.e., Sinkhorn] algorithm is more effective at the beginning, to find the optimal
region, and then it converges slower. In contrast, the Newton algorithm is slow at the beginning when it is
searching the neighborhood of the optimum, but when it finds this neighborhood, the convergence gets very
fast. Then it makes sense to apply a hybrid algorithm that starts with Bregman projections, and concludes
with the Newton method.”
In Figure 3.1, we observe that IN is the fastest, and S the slowest. To emphasize the performance of IN,
we report in Table 1 the ratio of the calibration error obtained with either S or NS over the calibration error
obtained with IN after 30 and 60 seconds. For example, after 60 seconds the implied Newton is 17 (resp.,
12) times more accurate than the Newton–Sinkhorn (resp., Sinkhorn) as of August 01, 2018.
In Figure 3.2, we plot the individual calibration errors on the futures, implied volatilities, and total masses
of measure defined (3.10). Interestingly, it seems that the calibration error is mainly due to errors on the
smile for S2 for all maturities. A penalty term on the errors on the smile for S2 could be added in the
objective function Jµ̄,K ; we leave this for further research.
In Figure 5.1, we plot the futures and smiles for S1 , V, S2 obtained with the implied Newton algorithm
after 60 seconds and the market smiles for the three calibration dates. The fits are perfect. We also show
in Figure 5.2 the functions ∆S , ∆L and martingality and consistency conditions with respect to (s1 , v) as
of August 01, 2018. The constraints are perfectly satisfied. Similar plots are obtained for the other two
calibration dates.
Table 1. Ratio of the Newton–Sinkhorn (NS) and Sinkhorn (S) calibration errors over the
implied Newton (IN) calibration error after 30 and 60 seconds. The calibration error is
defined in (3.10).
We also tested the three methods when the reference measure is the product measure (independent prior).
The calibration errors as of August, 01, 2018 (with a lognormal prior and independent prior) are reported in
Figure 5.3 and the functions ∆S , ∆L along with the martingality and consistency checks are reported (with
an independent prior) in Figure 5.4. Observe that the functions ∆S and ∆L again have similar shapes, but
those shapes are significantly different from the ones obtained with a lognormal prior, see Figure 5.2 for
a comparison. We observe that the three methods seem to be less stable with the independent prior and
typically require a higher number of nodes; we chose n1 = nV = n2 = 45. With the independent prior, IN
needs more steps to converge.
4. Continuous-time extension
We now extend the discrete-time model µ∗ (and the probability space) to build a continuous-time model for
(St )t∈[0,T2 ] . The model will also include the VIX at T1 , V . That is, we build a probability P on C([0, T2 ], R) ×
1.0 1.5
1.5 2.0
2.0 2.5
0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200
time (in seconds) time (in seconds)
April, 25, 2022, T1 = 23 days
0.5
Sinkhorn 10 iter + implied Newton
Newton-Sinkhorn
0.0 Sinkhorn
log10 (calib error)
0.5
1.0
1.5
2.0
Figure 3.1. Comparison of the performance between the Newton–Sinkhorn, the implied
Newton (with a 10-iteration pure Sinkhorn warm start), and the Sinkhorn algorithms. We
plot the decadic logarithm of the calibration error as a function of the computational time
as of August 1, 2018, with T1 = 21 days, April 20, 2020, with T1 = 30 days, and April 25,
2022, with T1 = 23 days.
R+ representing the distribution of ((St )t∈[0,T2 ] , V ). V plays the role of a discrete-time stochastic volatility,
representing the stochastic volatility anticipated at T1 for the [T1 , T2 ] period, but our model involves no
continuous-time stochastic volatility process. Similar to [4], our model is computationally efficient as it only
requires simulating one Brownian motion (and V ).
The key advantage of our construction, compared with the Bass local volatility of [4], is that it directly
starts from a joint discrete distribution of (S1 , S2 ), making the continuous-time interpolation a purely forward
construction with no need for solving a fixed-point problem. Moreover, it includes a stochastic volatility
component V for the calibration to VIX futures and VIX smiles in addition to SPX smiles. As a result,
in contrast with [4] and in line with the path-dependency observed in financial markets [15], our model is
path-dependent: the SPX dynamics after T1 depends on both S1 and V .
4.1. Step 1: Simulation of (St )t∈[0,T1 ] . We want (St )t∈[0,T1 ] to be a P-martingale and S1 to have distri-
bution µ1 under P. To achieve this, one possible choice is to use a Markov functional model St = u(t, Wt ),
t ∈ [0, T1 ], where W is a P-Brownian motion and u satisfies the heat equation
1
∂t u + ∂x2 u = 0.
2
log10 (error)
3
4 6
5
8
6 mean_iv_s1 F_v
mean_iv_v F_s2
mean_iv_s2 tot
7 F_s1 10
0 10 20 30 40 50 60 0 10 20 30 40 50 60
time (in seconds) time (in seconds)
April, 25, 2022, T1 = 23 days
mean_iv_s1 F_v
mean_iv_v F_s2
1 mean_iv_s2 tot
F_s1
2
3
log10 (error)
7
0 10 20 30 40 50 60
time (in seconds)
Figure 3.2. Distribution of the different calibration errors (3.10) (implied volatilities,
futures, probability measure). We plot the logarithm of the errors as a function of the
computational time as of August 1, 2018, with T1 = 23 days, April 20, 2020, with T1 = 30
days, and April 25, 2022, with T1 = 23 days.
In such a case, (St )t∈[0,T1 ] is an ((Ft ), P)-martingale and the terminal condition u(T1 , ·) = g is determined
via quantiles so that u(T1 , WT1 ) has distribution µ1 , i.e., for all x ∈ R, we set
−1 x
g(x) := Fµ1 FN (0,1) √ .
T1
The solution u to the heat equation is explicit and given by
(4.1) u (t, x) = E [ g(WT1 )| Wt = x] = E [g (x + WT1 − Wt )] = (g ∗ KT1 −t ) (x) ,
− x2
e√ 2t
where ∗ is the convolution operator, and for t > 0, Kt : x ∈ R 7→ 2πt
is the heat kernel.
Remark 5 (Other approaches). To generate (St )t∈[0,T1 ] with the given constraints, we could have also cali-
brated a local volatility model
dSt = St `(t, St ) dWt , t ∈ [0, T1 ],
to the SPX smile at T1 . For instance, when SPX smiles at all maturities before T1 are known, simply use the
market local volatility given by Dupire’s formula [8]. Another possibility could have been to use any stopping
time ν solution of the Skorokhod embedding problem for the distribution µ1 and define St := Wν∧ T t−t , where
1
the Brownian motion W starts at S0 .
4.2. Step 2: Simulation of V given (St )t∈[0,T1 ] . At this stage, we have simulated (St )t∈[0,T1 ] such that
(St )t∈[0,T1 ] is an ((Ft ), P)-martingale, and S1 has distribution µ1 under P. Now, we simulate V given
σ((Ws )s∈[0,T1 ] ) under P as follows: the distribution of V given σ((Ws )s∈[0,T1 ] ) under P is assumed to de-
pend only on S1 , and is taken equal to the distribution of V given S1 under µ∗ . Since S1 has distribution µ1
under both µ∗ and P, this means that the distribution of (S1 , V ) is the same under µ∗ and P; in particular,
V has distribution µV under P.
4.3. Step 3: Simulation of (St )t∈[T1 ,T2 ] given FT1 . In this last step, we build dynamics for (St )t∈[T1 ,T2 ]
conditional on FT1 such that (St )t∈[T1 ,T2 ] is an ((Ft ), P)-martingale starting from S1 , and S2 has distribution
µ2 . We use once again a Markov functional construction. Given S1 = s and V = v, we consider
x
gs,v (x) := Fµ−1 F N (0,1) √
2|s,v
τ
for all x ∈ R and where µ2|s,v is the distribution of S2 given S1 = s and V = v under µ∗ . Then, given FT1 ,
we define for t ∈ (T1 , T2 ],
St := uS1 ,V (t, Wt − WT1 )
where for every s, v > 0,
(4.2) us,v (t, x) = E [ gs,v (WT2 − WT1 )| Wt − WT1 = x] = E [gs,v (x + WT2 − Wt )] = (gs,v ∗ KT2 −t ) (x) .
It is easy to check that (St )t∈[T1 ,T2 ] is an ((Ft ), P)-martingale starting from S1 . Moreover, the distribution of
S2 given (S1 , V ) is the same under µ∗ and P. As the distribution of (S1 , V ) is the same under µ∗ and P, we
conclude that the distribution of (S1 , V, S2 ) is the same under µ∗ and P. In particular S1 , V , and S2 have
distributions µ1 , µV , and µ2 under P. We have thus built a model P on ((St )t∈[0,T2 ] , V ) such that (a) S1 , V ,
and S2 have distributions µ1 , µV , and µ2 under P; (b) (St )t∈[0,T2 ] is an ((Ft ), P)-martingale; and (c) V is the
VIX at T1 , since by construction
S2 S2 µ∗ S2
P
E L FT1 = E L P
S1 , V = E L S1 , V = V 2 .
S1 S1 S1
Remark 6 (Extension to several VIX maturities). Note that this approach can easily be iterated on intervals
[Ti , Ti+1 ]. For example, after Step 3, we will have generated S2 ∼ µ2 . Then, we just need to generate
VT2 ∼ µVT2 and repeat the same procedure. Here, we disregard the Wednesday/Friday issue and are making
the approximation that the VIX future maturities are exactly 30 days apart.
4.4. Numerical implementation. In this section, we detail how to implement the continuous-time exten-
sion. Let TN = {t0 = 0, . . . , T1 , . . . , tN = T2 } be a given time-grid with N ∈ N>0 and let us simulate M
(i)
Brownian motions (Wt∈TN )i=1,...,M where M ∈ N>0 denotes the number of Monte Carlo paths. For every
Monte Carlo path i ∈ {1, . . . , M }, we perform the following steps:
(i)
Step 1: Simulation of (St )t∈[0,T1 ] , see section 4.1. For every time-step t ∈ TN such that 0 ≤ t ≤ T1 ,
(i) (i)
we set St = u(t, Wt ) where the function u(t, ·) (computed using an Hermite quadrature) has been defined
in (4.1). Note that we use the (true) market inverse c.d.f. Fµ−1
1
.
(i)
Step 2: Simulation of V (i) given (St )t∈[0,T1 ] , see section 4.2. In general, we define V (i) = hi (U (i) )
where hi is the linear interpolation of the two inverse c.d.f. FV−1 ∗
|S1 =s of V given S1 under µ calculated at
(i) (i) (1)
the two nearest points s of S1 in the grid G1 , where U (i) ∼ U([0, 1]) are i.i.d. However, if S1 < s1 (resp.,
(i) (n )
S1 > s1 S1 ), we set V (i) = F −1 (1) (U
(i)
) (resp., V (i) = F −1 (nS ) (U
(i)
)).
V |S1 =s1 V |S1 =s1 1
(i) (i) (i)
Step 3: Simulation of (St )t∈[T1 ,T2 ] given (ST1 , V ), see section 4.3. For every time-step t ∈ TN
(i) (i) (i)
0, the function us,v (·, ·)
such that T1 < t ≤ T2 , we set St = uS (i) ,V (i) (t, Wt − WT1 ) where for every s, v >
1
(computed using an Hermite quadrature) was defined in (4.2). To compute gS (i) ,V (i) (·), we use a bilinear
1
(i)
interpolation in (s, v) of the inverse c.d.f. Fµ−1
2|s,v
when (S1 , V (i) ) lies within the range of the grid points; for
(i)
all other (S1 , V (i) ), we use their projection onto their nearest point in the grid.
Remark 7 (General remarks on the algorithm). The different sources of errors in the proposed algorithm
come from the quadrature grid, approximation of the inverse c.d.f., Monte Carlo simulation, and numerical
integration for the functions (4.1) and (4.2). Once the discrete-time calibration has been achieved, the whole
procedure is extremely fast. It essentially only requires the simulation of a one-dimensional Brownian motion
along with two one-dimensional integrations.
The resulting smiles (market, discrete, and continuous time) for the SPX at T1 and T2 and that of the
VIX at T1 are displayed in Figure 4.1; they were computed by Monte Carlo simulation with 105 paths.
4.5. Pricing. Our continuous-time model can be used to price path-dependent options on the SPX with
the guarantee that the model exactly matches the SPX smiles at T1 and T2 and the VIX future and VIX
smile at T1 , thus taking into account information about the forward volatility at T1 not included in the
SPX smiles. As a pricing exercise, we compare it with models commonly used by practitioners: the Dupire
local volatility model [8] and the local stochastic version of the two-factor Bergomi model [3]; those models
are calibrated to the full SPX implied volatility surface, but not to VIX smiles. We consider spot-starting,
forward-starting, and mixed versions of lookback and Asian options. The prices are reported in Table 2 along
with their 95% confidence interval. We used 105 Monte Carlo paths and a trapezoidal rule to approximate the
time integral for the Asian option. Note that for forward-starting options, the price in our jointly calibrated
model is always larger than in the other two models: ignoring the VIX information leads to underpricing
these forward-starting payoffs.
Table 2. Prices of various options in the Dupire local volatility model, local Bergomi
model, and our continuous-time model. We choose the same parameters as [13, Table 4] for
the local two-factor Bergomi, jointly calibrated to the term-structures of SPX ATM skew
and VIX2 implied volatility: k1 = 21.91, k2 = 1.04, ρXY = 1, ρSX = −1, ρSY = −1, θ1 =
RT
0.77, ω = 6.64. We have defined: Mt,T := maxt≤u≤T Su , At,T := T 1−t t Su du
Smile of SPX as of August, 01, 2018, T1 = 21 days Smile of VIX as of August, 01, 2018, T1 = 21 days
180
35
160
30
140
25
120
20
100
15
Market Market
Discrete-time model 80 Discrete-time model
10
Continuous-time model Continuous-time model
2300 2400 2500 2600 2700 2800 2900 0.125 0.150 0.175 0.200 0.225 0.250 0.275
Smile of SPX as of August, 01, 2018, T2 = 51 days
35
30
25
20
15
Market
Discrete-time model
10 Continuous-time model
2200 2400 2600 2800 3000
Figure 4.1. SPX smiles at T1 and T2 and VIX smile at T1 (market, discrete-time model
computed using the implied Newton algorithm, and continuous-time extension), as of August
1, 2018.
5. Conclusion
In this article, we have:
• improved model-free bounds on SPX options by incorporating VIX options data;
• built the minimum-entropy jointly calibrated discrete-time model µ∗ very fast using an implied New-
ton method;
• seamlessly extended this discrete-time model to continuous time in a purely forward fashion, using
Markov functionals.
Thus, we have established a swift process for creating an arbitrage-free continuous-time model for SPX that
accurately calibrates to SPX smiles, VIX futures, and VIX smiles. Such a model can be used for pricing and
hedging exotic options, computing reserves or valuation adjustments, and assessing model risk. Our main
methodological contribution is that we first build a jointly calibrated discrete-time model (STi ), where the
Ti are the calibrated maturities, that is later extended to continuous time using an arbitrage-free martingale
time-interpolation. Since the discrete-time model can be exactly calibrated much faster than continuous-time
models, and since extremely fast extrapolations exist, this novel approach seems to be a promising new avenue
for calibrating models.
References
[1] Eduardo Abi Jaber, Camille Illand, and Shaun Li. The quintic Ornstein-Uhlenbeck model for joint SPX and VIX calibration.
Risk, June 2023.
[2] Marco Avellaneda, Craig Friedman, Richard Holmes, and Dominick Samperi. Calibrating volatility surfaces via relative-
entropy minimization. Applied Mathematical Finance, 4(1):37–64, 1997.
[3] Lorenzo Bergomi. Smile Dynamics II. Risk Magazine, 2005.
[4] Antoine Conze and Pierre Henry-Labordère. A New Fast Local Volatility Model. Risk, April 2022.
[5] Christa Cuchiero, Guido Gazzani, Janka Möller, and Sara Svaluto-Ferro. Joint calibration to SPX and VIX options with
signature-based models. arXiv preprint arXiv:2301.13235, 2023.
[6] Hadrien De March. Entropic approximation for multi-dimensional martingale optimal transport. arXiv preprint
arXiv:1812.11104, 2018.
[7] Hadrien De March and Pierre Henry-Labordère. Building arbitrage-free implied volatility: Sinkhorn’s algorithm and vari-
ants. Available at SSRN 3326486, 2019.
[8] Bruno Dupire. Pricing with a smile. Risk, 7(1):18–20, 1994.
[9] Jim Gatheral, Paul Jusselin, and Mathieu Rosenbaum. The quadratic rough Heston model and the joint S&P 500/Vix smile
calibration problem. Risk, 2020.
[10] Ivan Guo, Grégoire Loeper, Jan Obłój, and Shiyi Wang. Joint Modeling and Calibration of SPX and VIX by Optimal
Transport. SIAM Journal on Financial Mathematics, 13(1):1–31, 2022.
[11] Julien Guyon. The joint S&P 500/VIX smile calibration puzzle solved. Risk, April, 2020.
[12] Julien Guyon. Dispersion-Constrained Martingale Schrödinger Bridges: Joint Entropic Calibration of Stochastic Volatility
Models to S&P 500 and VIX Smiles. Available at SSRN: https: // ssrn. com/ abstract= 3853237 , 2022.
[13] Julien Guyon. The VIX future in Bergomi models: Fast approximation formulas and joint calibration with S&P 500 skew.
SIAM Journal on Financial Mathematics, 13(4):1418–1485, 2022.
[14] Julien Guyon. Dispersion-Constrained Martingale Schrödinger Problems and the Exact Joint S&P 500/VIX Smile Calibra-
tion Puzzle. Finance and Stochastics, 28(1):27–79, 2024.
[15] Julien Guyon and Jordan Lekeufack. Volatility is (mostly) path-dependent. Quantitative Finance, 23(9):1221–1258, 2023.
[16] Julien Guyon and Scander Mustapha. Neural joint S&P 500/VIX smile calibration. Risk, December 2023.
[17] Pierre Henry-Labordere. Automated option pricing: Numerical methods. International Journal of Theoretical and Applied
Finance, 16(08):1350042, 2013.
[18] Pierre Henry-Labordère. Model-free hedging: A martingale optimal transport viewpoint. Chapman and Hall/CRC, 2017.
[19] Mathieu Rosenbaum and Jianfei Zhang. Deep calibration of the quadratic rough Heston model. Risk, 2022.
150 45
45
40
40 140
35
35 130
Market 30
Model
2300 2400 2500 2600 2700 2800 2900 40 50 60 70 80 2200 2400 2600 2800 3000
Smile of SPX as of April, 25, 2022, T1 = 23 days Smile of VIX as of April, 25, 2022, T1 = 23 days Smile of SPX as of April, 25, 2022, T2 = 53 days
45 Market
40 180 Model
40
35 35
160
30 30
140
25
25
120 20
Market
20 Model
15
3600 3800 4000 4200 4400 30 40 50 60 3250 3500 3750 4000 4250 4500 4750
Figure 5.1. Futures and smiles of S1 , V, S2 after 60 seconds in the calibrated model with
the Newton-implied algorithm versus market smiles as of August 1, 2018, with T1 = 21 days,
April 20, 2020, with T1 = 30 days, and April, 2022, with T1 = 23 days
S(S1, V) as of August, 01, 2018, T1 = 21 days L(S1, V) as of August, 01, 2018, T1 = 21 days
0.5 50
0.5 50
0.0 0
0.5 0.0 0
50
1.0 0.5
100 50
1.5 1.0
150 100
2.0
1.5
60 60
150
50 2.0 50
40 40
2200
2400 30 V 2200
2400 30 V
S12600 2800 20 S12600 2800 20
3000 10 3000 10
1e 13
1.75 1e 15 8
1e 13
1.50 4 3
6
1.25 4 2
3
1.00 2 1
0.75 2 0
0.50 2 0
0.25 1 4
1
0.00 6
0.25 0 8 2
60 1 60 3
50 50
40 40
2200
2400 30 V 2200
2400 30 V
S12600 2800 20 S12600 2800 20
3000 10 3000 10
Figure 5.2. Functions (s1 , v) 7→ ∆S (s1 , v) and (s1 , v) 7→ ∆L (s1 , v) (upper figures) and
martingale and consistency checks (bottom figures) as of August 1, 2018 for T1 = 21 days.
1.5
1.5
2.0 2.0
0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200
time (in seconds) time (in seconds)
Figure 5.3. Comparison of the calibration errors as of August 01, 2018 for T1 = 21 days
with a lognormal prior (left figure) and an independent prior (right figure).
S(S1, V) as of August, 01, 2018, T1 = 21 days L(S1, V) as of August, 01, 2018, T1 = 21 days
0.2 20
0.2 20
0.0 0
0.0 0
0.2
20
0.4 0.2
40 20
0.6 0.4
60 40
0.8
0.6
60 60
50 50 60
40 40
2200
2400 30 V 2200
2400 30 V
S12600 2800 20 S12600 2800 20
3000 10 3000 10
1e 13
3 1e 15 1e 13
0.75 4 1.5
2
0.50 2 1.0
1
0.25 0 0.5
0
0.00 0.0
1 2
0.25 0.5
2 4
0.50 1.0
60 0.75 60 1.5
50 1.00 50
40 40
2200
2400 30 V 2200
2400 30 V
S12600 2800 20 S12600 2800 20
3000 10 3000 10
Figure 5.4. Functions (s1 , v) 7→ ∆S (s1 , v) and (s1 , v) 7→ ∆L (s1 , v) (upper figures) and
martingale and consistency checks (bottom figures) as of August 1, 2018 for T1 = 21 days
when the reference measure is the product measure.