An Application of Stochastic Maximum Principle For
An Application of Stochastic Maximum Principle For
tr
Emel SAVKU
Department of Computer Engineering, Atilim University, Ankara, TÜRKİYE
Abstract. In this research article, we study a stochastic control problem in a theoretical frame to solve
a constrained task under memory impact. The nature of memory is modeled by Stochastic Differential
Delay Equations and our state process evolves according to a jump-diffusion process with time-delay.
We work on two specific types of constraints, which are described in the stochastic control problem
as running gain components. We develop two theorems for corresponding deterministic and stochastic
Lagrange multipliers. Furthermore, these theorems are applicable to a wide range of continuous-time
stochastic optimal control problems in a diversified scientific area such as Operations Research, Biology,
Computer Science, Engineering and Finance. Here, in this work, we apply our results to a financial
application to investigate the optimal consumption process of a company via its wealth process with
historical performance. We utilize the stochastic maximum principle, which is one of the main methods
of continuous-time Stochastic Optimal Control theory. Moreover, we compute a real-valued Lagrange
multiplier and clarify the relation between this value and the specified constraint.
2020 Mathematics Subject Classification. 93E20, 93E03, 49N90, 60G07, 60J76, 91B16, 91G80.
Keywords. Stochastic optimal control, stochastic maximum principle, stochastic differential delay equa-
tions, Lagrange multiplier, anticipated backward stochastic differential equations
150
A CONSTRAINED SYSTEM WITH MEMORY 151
attention from the researchers in the Stochastic Optimal Control field so far, see [7, 8, 14, 20, 23, 25, 28]
and references therein.
Let us introduce the technical details and mathematical structure of our work:
As we stated, we use a jump-diffusion process with delay as the state process of our control task (for
a detailed theory of continuous-time stochastic processes, see [1, 13, 19] and references therein).
Let R0 := R \ {0} be. B0 represents a Borel σ-field generated by the open subset O of R0 , whose
closure does not include the point 0.
Let (N (dt, dz) : t ∈ [0, T ], z ∈ R0 ) be a Poisson random measure on ([0, T ] × R0 , B([0, T ]) ⊗ B0 ). The
Lévy measure of N (·, ·) is defined by ν and Ñ (dt, dz) := N (dt, dz) − ν(dz)dt is a compensated Poisson
random measure.
Let (W (t) : t ∈ [0, T ]) be a Brownian motion. (Ω, F, Ft , P) represents a complete filtered probability
space generated by the Brownian motion W (·) and the Poisson random measure N (·, ·). We define
F = (Ft : t ∈ [0, T ]) as a right-continuous, P-completed filtration and assume that the Brownian motion
and the Poisson random measure are independent of each other and adapted to F.
We follow a controlled jump-diffusion model with a constant delay term δ > 0, which is one of the
most general representations of such systems and is introduced in [20] as follows:
dX(t) =b(t, X(t), Y (t), A(t), u(t))dt + σ(t, X(t), Y (t), A(t), u(t))dW (t)
Z
+ η(t, X(t), Y (t), A(t), u(t), z)Ñ (dt, dz) (1)
R0
X(t) =θ(t), t ∈ [−δ, 0],
where for t ∈ [0, T ],
Z t
Y (t) = X(t − δ), A(t) = e−ρ(t−r) X(r)dr.
t−δ
For all u ∈ A, let us define the objective criterion in the classical sense (for a broad survey of the
Stochastic Optimal Control theory, see [21, 22, 32] and references therein) as follows:
J(u) = J(x, y, a, u)
Z T
=E f (t, X(t), Y (t), A(t), u(t))dt + g(X(T )) , (2)
0
152 E. SAVKU
Hence, in a classical unconstrained stochastic control problem, our goal is to find the optimal control
u∗ ∈ A such that
J(u∗ ) = sup J(u). (3)
u∈A
On the other hand, in this work, we formulate the constraints inspired by Theorem 11.3.1 of [19] but
with completely different constraints. In this theorem, the author presents an approach for the stochastic
control tasks with a condition at the terminal time T > 0 for a diffusion process. Later, [4] gave an
application of this theorem and [28] extended this theorem to the stochastic differential games with
regimes.
Furthermore, [5] stated a version of Theorem 11.3.1 of [19] with constraint types (5) and (6) for a
jump-diffusion process. These constraints describe deterministic and stochastic Lagrange multipliers,
correspondingly and are different than the terminal conditions given in Theorem 11.3.1 of [19]. But
the authors do not investigate the Lagrange multipliers however they claimed that their existence is a
crucial condition to apply the proved theorems, see Theorem 5.2 and 5.4 of [5]. In our work, we study
a stochastic control problem for a jump-diffusion process with the memory and the constraints defined
with (5) and (6). Hence, our work extends the theorems of [5] to a delayed model. Moreover, we develop
an application for which the corresponding Lagrange multiplier exists. In that sense, we should underline
that our work is the first work that completes the desired task with the constraints (5)-(6) and also, by
inserting a delay term, we study a larger model.
We do not prefer to define many technical conditions over b, σ, η in this section. In Section 2, we will
develop two fundamental theorems to approach stochastic control problems with the constraints (5) and
(6). These can be solved by both Stochastic Maximum Principle (SMP) and Dynamic Programming
Principle (DPP). Thus, the technical assumptions have to be determined specifically depending on the
preferred method. We will highlight them in Section 3, while we are studying an optimal consumption
problem.
This article is organized as follows: In Section 2, we introduce the mathematical formulation of our
constrained stochastic control problem and demonstrate the corresponding theorems in a Lagrangian
environment. Section 3 is devoted to developing a financial application, which formulates the optimal
consumption process of a company with memory. The final section gives a conclusion.
Here, J(·) is defined by Equation (2) and the supremum is taken over Θ of all admissible controls
u : R → U ⊂ R such that Z T
E M (t, X(t), Y (t), A(t), u(t))dt = 0, (5)
0
or Z T
M (t, X(t), Y (t), A(t), u(t))dt = 0 a.s.. (6)
0
M : [0, T ] × R × R × R × U → R is a C 1 function with respect to x, y, and a such that for xi = x, y, a, u:
Z T 2
∂M
E |M (t, X(t), Y (t), A(t), u(t))| + (t, X(t), Y (t), A(t), u(t)) dt < ∞.
0 ∂xi
A CONSTRAINED SYSTEM WITH MEMORY 153
Here, we study two types of constraints: The constraint type (5) represents a real valued Lagrange
multiplier and the type (6) discovers a stochastic one.
Thus, we should specify the set of stochastic Lagrange multipliers as in [27]:
∆= λ : Ω → R|λ is FT − measurable and E[|λ|] < ∞ .
Now, by observing the Equation (4) and the constraints (5) and (6), let us present the unconstrained
stochastic control problem in the following way:
ϕλ (x, y, a) = sup J(x, y, a, u)
u∈Θ
Z T
x,y,a
= sup E f (t, X(t), Y (t), A(t), u(t))dt + g(X u (T ))
u∈Θ 0
Z T
+λ M (t, X(t), Y (t), A(t), u(t))dt , (7)
0
Theorem 1. Assume that for all λ ∈ ∆1 ⊂ ∆, we can develop ϕλ (x, y, a) and the optimal control process
u∗,λ , which solves the unconstrained stochastic control problem (7) subject to the system (1). Moreover,
assume that there exists λ0 ∈ ∆1 , such that
Z T
∗,λ0 ∗,λ0 ∗,λ0
M (t, Xtu , Ytu , Aut , u∗,λ
t
0
)dt = 0, a.s. (8)
0
λ0
Then, ϕ(x, y, a) = ϕ (x, y, a) is obtained and u∗ = u∗,λ0 solves the constrained stochastic control problem
(3) subject to (1) and (6).
Proof. The first inequality appears by definition of the optimal value function as follows:
ϕλ (x, y, a) = J(x, y, a, u∗,λ )
Z T
∗,λ ∗,λ ∗,λ
=E x,y,a
f (t, Xtu , Ytu , Atu , u∗,λ )dt
0
Z T
∗,λ ∗,λ ∗,λ ∗,λ
+λ M (t, Xtu , Ytu , Atu , u∗,λ u
t )dt + g(XT )
0
≥ J(x, y, a, uλ )
Z T
λ λ λ
x,y,a
=E f (t, Xtu , Ytu , Aut , uλ )dt
0
Z T
λ λ λ λ
+λ M (Xtu , Ytu , Aut , uλt )dt + g(XTu ) . (9)
0
In particular, if λ = λ0 exists and since u1 ∈ Θ is feasible in the constrained control problem (3), then
by (8):
Z T Z T
∗,λ0 ∗,λ0 ∗,λ0 λ λ λ
M (t, Xtu , Ytu , Aut , u∗,λ
t
0
)dt = M (Xtu , Ytu , Aut , uλt )dt = 0 (10)
0 0
for all u ∈ Θ. Note that u∗,λ0 ∈ Θ and this completes the proof. □
The following theorem can be proved similarly for the constraint type (5).
154 E. SAVKU
Theorem 2. Assume that for all λ ∈ K ⊂ R, we can determine ϕλ (x, y, a) and the optimal control
process u∗,λ solving the unconstrained stochastic control problem (7) subject to (1). Furthermore, assume
that there exists λ0 ∈ K such that
Z T
∗,λ0 ∗,λ0 ∗,λ0
E M (t, Xtu , Ytu , Atu , u∗,λ
t
0
)dt = 0.
0
Then, ϕ(x, y, a) = ϕλ0 (x, y, a) and u∗ = u∗,λ0 solves the constrained stochastic control problem (3) subject
to the model (1) and the constraint (5).
Remark 1. Theorem 1 and 2 can be applied to a wide range of stochastic control problems by both SMP
and DPP as long as it is possible to determine the corresponding Lagrange multipliers. If we prefer to
apply DPP, we should be careful about Markov property. SDDEs provide a more realistic environment
to interact but we loose Markov property. Moreover, since we have an initial path instead of an initial
value for the system (1), our problem creates the corresponding partial differential equations so-called
Hamilton-Jacobi-Bellman equations in an infinite dimensional space. Hence, a direct application of DPP
is not mathematically possible (more details to handle such problems by DPP in [7, 8, 14] and reference
therein).
Remark 2. To utilize SMP, we do not need any Markovian assumption different than DPP. Hence,
in this work, we will combine the method described in Theorem 2 of our paper with Theorem 3.1 and
Theorem 4.1 of [20] to find the optimal consumption process by SMP.
Remark 3. Our work is inspired from Theorem 11.3.1 of [19], but we should highlight that the constraint
of Theorem 11.3.1 of [19] is defined at terminal time T as:
E[M (XTu )] = 0, (11)
which is completely different than our constraints (5)-(6). We put a condition over running gain compo-
nent rather than the terminal gain. Moreover, we can see similar constraints in [5] but both [19] and [5]
do not include memory impact.
Remark 4. In [30], we studied memory impact within the framework of Lagrange multipliers similar
to Equation 11, which is a different type of constraint as we stated in Remark 3. Furthermore, in [30],
we focused on a dividend policy application in a regime-switching environment with a different control
formulation. Our present work and [30] share a similar philosophy with completely different constraints
and financial formulations.
Now, let us present an application of Theorem 2 in finance.
3. Application to Finance
In this section, we will develop the formulation of an optimal consumption process that corresponds
to the wealth process of a company with memory. This process evolves according to a time-delayed
jump-diffusion model. The dynamics of the model carry past values of the wealth process in the form
of Y (t) = X(t − δ), t ∈ [0, T ], where δ > 0 is a constant. Our purpose is to develop a more realistic
consumption policy, which depends on the information about the historical performance of the company
as well.
µ(·) is a deterministic function and represents the appreciation rate of the company. Furthermore,
we suppose that σ(t) and η(t, z), t ∈ [0, T ], are given bounded, square integrable and adapted processes.
U is a non-empty, closed and convex subset of R. In this section, our problem formulation justifies the
technical assumptions provided in [20] thus, we are allowed to apply Theorem 3.1 and Theorem 4.1 of
that article.
The consumption process is a càdlàg, Ft -adapted control process, which satisfies:
Z T
2
E |c(t)| dt < ∞.
0
Let us state the wealth process X(t) = X c (t), which is a special form of Equation (1) as follows:
A CONSTRAINED SYSTEM WITH MEMORY 155
dX(t) = X(t − δ)µ(t) − c(t) dt + X(t − δ) σ(t)dW (t)
Z
+ η(t, z)Ñ (dt, dz) , t ∈ [0, T ], (12)
R0
X(t) = θ(t), t ∈ [−δ, 0],
where θ(·) is a given nonnegative, deterministic and continuous function.
We assume that the company wants to maximize its wealth despite a quadratic running loss by bal-
ancing it corresponding to a constraint of linear running gain, which is described in terms of the control
process. Moreover, the company aims to reach a level of a constant K times the terminal time T > 0. So
we assume that the company takes into account time restrictions as well. We will develop and highlight
the conditions over K at the end of our computations. Hence, our goal is to find the optimal consumption
process c∗ (·) by solving:
J(c∗ ) = sup J(c)
c∈Θ
Z T
2
= sup E α(t)c (t)dt + βX(T )
c∈Θ 0
subject to the system (12) and to the constraint:
Z T
E γ(t)c(t)dt = T K, K ∈ R, (13)
0
where α(·) < 0 and γ(·) are deterministic functions and β ∈ R.
Now we can develop the Lagrangian form of this stochastic control problem as follows:
J(c∗ ) = sup J(c)
c∈Θ
Z T Z T
2
= sup E α(t)c (t)dt + λ (γ(t)c(t) − K)dt + βX(T ) , (14)
c∈Θ 0 0
for which we aim to find c∗ = cλ,∗ and the real-valued Lagrange multiplier λ = λ0 described in Theorem
2.
Since we apply SMP to solve the problem (14), first, we define the Hamiltonian corresponding to the
wealth process (12):
H(t, x, y, a, c, p, q, r(·)) = α(t)c2 + λ(γ(t)c − K) + (µ(t)y − c)p + yσ(t)q
Z
+y η(t, z)r(t, z)ν(dz). (15)
R0
Note that it is clearly seen that Hamiltonian H is a concave function of x, y, a and c, hence the con-
cavity condition over H is satisfied, see Theorem 3.1 of [20] is justified.
Furthermore, we should present the corresponding Anticipated Backward Stochastic Differential Equa-
tion (Anticipated BSDE) and solve it for unknown p(t), q(t), and r(t, z).
For t ∈ [0, T ], let us introduce:
dp(t) = −E µ(t + δ)p(t + δ) + σ(t + δ)q(t + δ)
Z
+ η(t + δ, z)r(t + δ, z)ν(dz) 1[0,T −δ] (t)|Ft dt
R0
Z
+ q(t)dW (t) + r(t, z)Ñ (dt, dz) (16)
R0
p(T ) = β. (17)
We call Anticipated to this type of BSDEs since as seen in µ, σ, η, p(·), q(·), and r(·, ·), the terms involve
time-advanced values in the form of t + δ for t ∈ [0, T ]. This type of BSDEs was first introduced and
developed by Peng and Yang, see [23]. For technical definitions of the Hamiltonian (15) and the System
156 E. SAVKU
(16)-(17), please see Apendix 4 or Section 2 in [20]. Furthermore, see [25, 28] for the formulation of
Anticipated BSDEs and their relation with SDDEs via different models.
We follow the technique described in [20] to find the solution for p(·), q(·), and r(·, ·), which will be
computed inductively in the following way:
Step 1: For t ∈ [T − δ, T ], the corresponding adjoint equation becomes:
Z
dp(t) = q(t)dW (t) + r(t, z)Ñ (dt, dz),
R0
p(T ) = β,
for which we have the solution:
p(t) = E[p(T )|Ft ] = β, t ∈ [T − δ, T ].
By martingale representation theorem, since the Lagrange multiplier is a real value, we choose q =
r = w = 0. Hence, the Anticipated BSDE gets the form:
dp(t) = −µ(t + δ)p(t + δ)1[0,T −δ] (t)dt, t ≤ T,
p(t) = β, t ∈ [T − δ, T ].
Step 2: We define:
h(t) = p(T − t), t ∈ [0, T ]. (18)
That way, we get a deterministic delay equation:
dh(t) = −dp(T − t) = µ(T − t + δ)p(T − t + δ)dt
= µ(T − t + δ)h(t − δ)dt, t ∈ [δ, T ],
h(t) = p(T − t) = β, t ∈ [0, δ].
For such equations, again, we have an approach of solving inductively. Since we can compute h(t) on
[(j − 1)δ, jδ], we obtain:
Z t
h(t) = h(jδ) + h′ (s)ds
jδ
Z t
= h(jδ) + µ(T − s + δ)h(s − δ)ds (19)
jδ
By also observing (18) and (19), we provide the following open form of the solution process:
X(t) = θ(t), if − δ ≤ t ≤ 0,
Z t
1
X(t) = θ(0) + θ(s − δ)µ(s) − α(s)(h(T − s) − λγ(s)) ds
0 2
Z t
+ θ(s − δ)dL(s) if 0 ≤ t ≤ δ,
0
Z t Z v−δ
1
X(t) = X(δ) + θ(s − δ)µ(s) − α(s)(h(T − s) − λγ(s)) ds
θ(0) +
δ 0 2
Z v−δ
1
+ θ(s − δ)dL(s) µ(v) − α(v)(h(T − v) − λγ(v)) dv
0 2
Z t Z v−δ
1
+ θ(0) + θ(s − δ)µ(s) − α(s)(h(T − s) − λγ(s)) ds
δ 0 2
Z v−δ
+ θ(s − δ)dL(s) dL(v) if δ ≤ t ≤ 2δ = T.
0
Now, the values of h(T − t), t ∈ [0, T ] at the above integrals can be determined by following the
boundary values of the integrals and their relation with t. Remember that T = 2δ.
Then, by (19)
if, 0 ≤ s ≤ t ≤ δ, then, δ ≤ T − s ≤ 2δ,
Z 2δ−s
h(2δ − s) = h(δ) + µ(3δ − u)h(u − δ)du,
δ
Z 2δ−s
h(2δ − s) = β 1 + µ(3δ − u)du .
δ
Moreover,
if, 0≤s≤v−δ and δ ≤ v ≤ t ≤ 2δ, then, 0 ≤ v − δ ≤ t − δ ≤ δ,
so, 0 ≤ s ≤ δ, then δ ≤ 2δ − s ≤ 2δ, then, by (19),
Z 2δ−s
h(2δ − s) = β 1 + µ(3δ − u)du .
δ
Finally,
if, δ ≤ v ≤ t ≤ 2δ, then, 0 ≤ 2δ − v ≤ δ, then, by 18 h(2δ − v) = β.
Firstly, we change the value of h(·) according to the relevant intervals in the above solution processes
and integrate the Equation (21) from 0 to 2δ by following the above δ-length description of X(·).
Then, we apply expectation to both sides of the Equation (21).
Now, let us introduce the following terms:
Z 2δ Z 2δ Z v−δ
A= α(s)γ(s)ds + E α(s)γ(s)ds µ(v)dv + dL(v)
0 δ 0
and
Z δ
B = θ(0) − E X(2δ) + θ(s − δ)µ(s)ds
0
Z δ Z 2δ−s Z δ
1
− β α(s) 1 + µ(3δ − u)du ds + E θ(s − δ)dL(s)
2 0 δ 0
Z 2δ Z 2δ Z v−δ
+ θ(0)µ(v)dv + θ(s − δ)µ(s)ds µ(v)dv
δ δ 0
Z 2δ Z v−δ Z 2δ−s
1
−β α(s) 1 + µ(3δ − u)du ds µ(v)dv
δ 0 δ 2
158 E. SAVKU
Z 2δ Z v−δ
+E θ(s − δ)dL(s) µ(v)dv
δ 0
Z 2δ Z 2δ Z v−δ
1
− β α(v)dv + E θ(0) + θ(s − δ)µ(s)ds
2 δ δ 0
Z v−δ Z 2δ−s Z v−δ
1
− β α(s) 1 + µ(3δ − u)du ds + θ(s − δ)dL(s) dL(v) .
2 0 δ 0
Then, we get:
2B
on condition that A ̸= 0.
λ= (22)
A
Now, by (18) and (19), let us make some observations about the constraint (13):
Z T
1 δ
Z Z 2δ−t
E γ(t)c(t)dt = γ(t)α(t) β 1 + µ(3δ − u)du − λγ(t) dt
0 2 0 δ
Z 2δ
1
+ γ(t)α(t)(β − λγ(t))dt
2 δ
=2δK.
Then, let us utilize the above equality to clarify λ and define the following terms:
Z 2δ Z δ Z 2δ−t
β
D= γ(t)α(t)dt + γ(t)α(t) µ(3δ − u)du dt − 2δK,
2 0 0 δ
and Z 2δ
C= γ 2 (t)α(t)dt.
0
Then, we obtain:
2D
on condition that C ̸= 0.
λ= (23)
C
Finally, by observations (22)-(23), we conclude that in order to use Theorem 2, we have to specify the K
value in Equation (5) carefully such that
D B
= .
C A
By this final result, we determined explicitly the control process c∗ (·), the Lagrange multiplier λ0 and
consequently, the solution for p(·) corresponding to the Anticipated BSDE (16)-(17), and all the technical
assumptions required.
Sargent [10] used a Lagrange multiplier theorem to convert the entropy constraint onto a penalty on
perturbations from the model. Therefore, we would like to underline the potential of our work towards
robust stochastic control and stochastic games.
Risk minimization and worst-case scenarios have significant value in quantitative finance and insurance
since each action with uncertainty carries a potential for loss that cannot be underestimated. Therefore,
as a further study, we aim to focus on the relation between Lagrange multipliers and robust control.
These structures can be approached from the side of relative entropy as well as from the sides of Var
and CVar concepts, see [9, 15, 16]. Furthermore, within the wide scope of risk management, Lagrange
multipliers can be handled via computational methods such as deep learning and deep reinforcement
learning, see [24, 31].
On the other hand, we strongly believe that however delay systems are demanding and challenging,
they will be highlighted within the context of other hot fields such as Deep Learning. However, the aim
of our research article is to provide theoretical and technical approaches, in [29], we present a collection
of novel aspects within the intersection of computer science and stochastic optimal control under the
memory component.
References
[1] Bichtele,r K., Stochastic Integration with Jumps, Cambridge University Press, 2002. https://doi.org/10.1017/
CBO9780511549878
[2] Cartea, A., Donnelly, R., Jaimungal, S., Algorithmic trading with model uncertainty, SIAM Journal on Financial
Mathematics, 8(1) (2017), 635-671. https://doi.org/10.1137/16M106282X
[3] Cont, R., Tankov, P., Financial Modelling with Jump Processes, Chapman and Hall/CRC, 2003. https://doi.org/
10.1201/9780203485217
[4] Dahl Rognlien, K., Stokkereit, E., Stochastic maximum principle with Lagrange multipliers and optimal consumption
with Lévy wage, Afrika Matematika, 27(3) (2016), 555-572. https://doi.org/10.1007/s13370-015-0360-5
[5] Dordevic, J., Rognlien Dahl, K., Stochastic optimal control of pre-exposure prophylaxis for HIV infection, Mathematical
Medicine and Biology: A Journal of the IMA, 39(3) (2022), 197-225. https://doi.org/10.1007/s00285-024-02151-3
[6] Elliott, R. J., Siu, T. K., Robust optimal portfolio choice under Markovian regime-switching model, Methodology and
Computing in Applied Probability, 11 (2009), 145-157. https://doi.org/10.1007/s11009-008-9085-3
[7] Federico, S., A stochastic control problem with delay arising in a pension fund model, Finance and Stochastics, 15(3)
(2011), 421-459. https://doi.org/10.1007/s00780-010-0146-4
[8] Gozzi, F., Masiero, F., Stochastic optimal control with delay in the control I: Solving the HJB equation through
partial smoothing, SIAM Journal on Control and Optimization, 55(5) (2017), 2981-3012. https://doi.org/10.1137/
16M1070128
[9] Gueant, O., Lehalle, C. A., Fernandez-Tapia, J., Dealing with the inventory risk: a solution to the market making
problem, Mathematics and Financial Economics, 7(4) (2013), 477-507. https://doi.org/10.1007/s11579-012-0087-0
[10] Hansen L. P., Sargent T. J., Robustness, Princeton University Press, 2008. https://doi.org/10.1515/9781400829385
[11] Korn, R., Olaf, M., Mogens, S., Worst-case-optimal dynamic reinsurance for large claims, European Actuarial Journal,
2 (2012), 21-48. https://doi.org/10.1007/s13385-012-0050-8
[12] Korn, R., Melnyk, Y., Seifried, F. T., Stochastic impulse control with regime-switching dynamics, European Journal
of Operational Research, 260(3) (2017), 1024-1042. https://doi.org/10.1016/j.ejor.2016.12.029
[13] Lamberton, D., Lapeyre, B., Introduction to Stochastic Calculus Applied to Finance, Chapman and Hall/CRC, 2011.
https://doi.org/10.1201/9781420009941
[14] Larssen, B., Risebro, N. H., When are HJB-equations in stochastic control of delay systems finite dimensional?,
Stochastic Analysis and Applications, 21(3) (2003), 643-71. https://doi.org/10.1081/SAP-120020430
[15] Mataramvura, S., Øksendal, B., Risk minimizing portfolios and hjbi equations for stochastic differential games, Stochas-
tics An International Journal of Probability and Stochastic Processes, 80(4) (2008), 317-337. https://doi.org/10.
1080/17442500701655408
[16] Miller, C. W., Yang, I., Optimal control of conditional value-at-risk in continuous time, SIAM Journal on Control and
Optimization, 55(2) (2017), 856-884. https://doi.org/10.1137/16M1058492
[17] Mohammed, S.E.A., Stochastic Functional Differential Equations, Pitman, London, 1984.
[18] Mohammed, S. E. A., Stochastic differential systems with memory: theory, examples and applications. In Stochastic
Analysis and Related Topics VI: Proceedings of the Sixth Oslo—Silivri Workshop Geilo 1996 (pp. 1-77), Boston,
MA: Birkhauser Boston, 1998. https://opensiuc.lib.siu.edu/cgi/viewcontent.cgi?article=1064&context=math_
articles
[19] Øksendal, B., Stochastic Differential Equations: an Introduction with Applications, Springer Science & Business Media,
2013. https://doi.org/10.1007/978-3-642-14394-6
160 E. SAVKU
[20] Øksendal, B., Sulem, A., Zhang, T., Optimal control of stochastic delay equations and time-advanced backward
stochastic differential equations, Advances in Applied Probability, 43(2) (2011), 572-596. https://doi:10.1239/aap/
1308662493
[21] Øksendal, B., Sulem, A., Stochastic Control of Jump Diffusions. In: Applied Stochastic Control of Jump Diffusions,
Springer, 2019. https://doi.org/10.1007/978-3-030-02781-0_5
[22] Pham, H., Continuous-Time Stochastic Control and Optimization with Financial Applications, Vol. 61, Springer Science
& Business Media, 2009. https://doi.org/10.1007/978-3-540-89500-8
[23] Peng, S, Yang, Z., Anticipated backward stochastic differential equations, The Annals of Probability, 37(3) (2009), 877
– 902. https://doi.org/10.1214/08-AOP423
[24] Peters, J., Mulling, K., Altun, Y., Relative entropy policy search, In Proceedings of the AAAI Conference on Artificial
Intelligence, 24 (2010), 1607-1612. https://doi.org/10.1609/aaai.v24i1.7727
[25] Savku, E., Weber, G. W., A stochastic maximum principle for a Markov regime-switching jump-diffusion model with
delay and an application to finance, Journal of Optimization Theory and Applications, 179(2) (2018), 696-721. https:
//doi.org/10.1007/s10957-017-1159-3
[26] Savku, E., Fundamentals of Market Making Via Stochastic Optimal Control, Operations Research: New Paradigms
and Emerging Applications, (pp.136-154), Purutçuoğlu, V., Weber, G.W., & Farnoudkia, H. (Eds.), (1st ed.), CRC
Press, 2022. https://doi.org/10.1201/9781003324508-10
[27] Savku, E., A stochastic control approach for constrained stochastic differential games with jumps and regimes, Math-
ematics, 11(14) (2023), 3043. https://doi.org/10.1007/s10957-017-1159-3
[28] Savku, E., Deep-Control of Memory via Stochastic Optimal Control and Deep Learning, In International Conference
on Mathematics and its Applications in Science and Engineering (pp. 219-240), Cham: Springer Nature Switzerland,
2023. https://doi.org/10.1007/978-3-031-49218-1_16
[29] Savku, E., Memory and anticipation: two main theorems for Markov regime-switching stochastic processes, Stochastics,
(2024), 1-18. https://doi.org/10.1080/17442508.2024.2427733
[30] Savku, E., An approach for regime-switching stochastic control problems with memory and terminal conditions, Opti-
mization, (2024), 1–18. https://doi.org/10.1080/17442508.2024.2427733
[31] Tamar, A., Glassner, Y., Mannor, S. Optimizing the cvar via sampling, In Proceedings of the AAAI Conference on
Artificial Intelligence, 29 (2015). https://doi.org/10.1609/aaai.v29i1.9561
[32] Touzi, N., Optimal Stochastic Control, Stochastic Target Problems, and Backward SDE, Vol. 29, Springer Science &
Business Media, 2012.https://doi.org/10.1007/978-1-4614-4286-8
[33] Uğurlu, K., Tomasz, B., Distorted probability operator for dynamic portfolio optimization in times of socio-
economic crisis, Central European Journal of Operations Research, 31(4) (2023), 1043-1060. https://doi.org/10.
1007/s10100-022-00834-0
[34] Uğurlu, K., Robust utility maximization of terminal wealth with drift and volatility uncertainty, Optimization, 70(10)
(2021), 2081-2102. https://doi.org/10.1080/02331934.2020.1774586
[35] Wendell, H., F., Soner, H. M., Controlled Markov Processes and Viscosity Solutions, Vol. 25, Springer Science &
Business Media, 2006. https://doi.org/10.1007/0-387-31071-1
Appendix
In order to apply SMP, we have to define corresponding Hamiltonian for a delayed system as follows:
H : [0, T ] × R × R × R × U × R × R × R → R,
H(t, x, y, a, u, p, q, r) =f (t, x, y, a, u) + b(t, x, y, a, u)p + σ(t, x, y, a, u)q
Z
+ η(t, x, y, a, u, z)r(t, z)ν(dz) (24)
R0
where R denotes the set of all functions
r : [0, T ] × R0 → R, for which the integral in (24) converges.
Associated to H, the adjoint, unknown and adapted processes (p(t) ∈ R : t ∈ [0, T ]), (q(t) ∈ R : t ∈ [0, T ]),
and (r(t, z) ∈ R : t ∈ [0, T ], z ∈ R0 ) are described by the following Anticipated BSDE with jumps:
Z
dp(t) = E[µ(t)|Ft ]dt + q(t)dW (t) + r(t, z)Ñ (dt, dz)
R0
p(T ) = gx (X(T )),
where
∂H
µ(t) := − (t, X(t), Y (t), A(t), u(t), p(t), q(t), r(t, ·))
∂x
∂H
− (t + δ, X(t + δ), Y (t + δ), A(t + δ), u(t + δ), p(t + δ), q(t + δ), r(t + δ, ·))
∂y
Z t+δ
ρt ∂H
× 1[0,T −δ] (t) − e (s, X(s), Y (s), A(s), u(s), p(s), q(s), r(s, ·))
t ∂a
A CONSTRAINED SYSTEM WITH MEMORY 161
−ρs
× e 1[0,T ] (s)ds . (25)
As seen in µ(t), we have the future values of X(s), u(s), p(s), q(s), and r(s, ·) for s ≤ t + δ in Equation
(25), hence we call Anticipated to this type of BSDEs.