Data Driven Market Making Via Model Free Learning
Data Driven Market Making Via Model Free Learning
Abstract
This paper studies when a market-making firm
should place orders to maximize their expected net
profit, while also constraining risk, assuming or-
ders are maintained on an electronic limit order
book (LOB). To do this, we use a model-free and
off-policy method, Q-learning, coupled with state
aggregation, to develop a proposed trading strat-
egy that can be implemented using a simple lookup
table. Our main training dataset is derived from Figure 1: An illustration of a limit order book (LOB)
event-by-event data recording the state of the LOB.
Our proposed trading strategy has passed both in- limit orders. Provided that the lowest ask price exceeds the
sample and out-of-sample testing in the backtester highest bid price,1 the market-making firm earns profit when
of the market-making firm with whom we are col- one market order to buy trades with its resting limit sell or-
laborating, and it also outperforms other bench- der and another market order to sell trades with its resting
mark strategies. As a result, the firm desires to put limit buy order. The challenge is that the market-making firm
the strategy into production. cannot guarantee always being on both sides of the trade due
to the stochasticity of order arrivals, and the resulting move-
1 Introduction ments of the lowest ask and highest bid prices. The market-
making firm with whom we partnered prefers to begin with
We consider a financial asset traded on an electronic ex- the simplest possible strategy that places at most one order
change. Market participants, including institutional investors, per side. Furthermore, the firm is most interested in a strat-
market makers, and speculators, can post two types of egy for placing orders at the best bid and ask prices.
buy/sell orders. A market order is an order to buy/sell a cer- Our objective is to provide real-time guidance for how to
tain quantity of the asset at the best available price in the mar- manage the firm’s portfolio of limit buy and sell orders on
ket. A limit order is an order to trade a certain amount at a the LOB, so as to maximize the expected net profit, while pe-
specified price, known as an ask price for a sell order, and nalizing mismatch between the amount bought and sold, and
a bid price for a buy order. Limit orders are posted to an ensuring a sufficiently high Sharpe ratio.2 To do this, we use
electronic trading system, and all the outstanding limit orders historical trading data to train a model for real-time decision
are summarized by stating the quantities posted at each price making. More specifically, we formulate this problem as a
level in a limit order book (LOB), as shown in Figure 1, which Markov decision problem (MDP). Two main issues in solving
is the dominant market structure among exchange-traded U.S. the MDP are: (1) difficulty in estimating the transition prob-
equities and futures. The LOB is available to all market par- abilities, and (2) a very large state space (the notorious curse
ticipants. of dimensionality). To overcome these issues and be able to
The limit orders rest or wait in the LOB, and are matched find a well-performing heuristic, we implement a model-free
against incoming market orders. A market buy (sell) order
executes first at the lowest ask (highest bid) price, and next in 1
The arrivals of limit buy orders with bid prices higher than the
ascending (descending) order with higher (lower) priced asks
lowest ask price will be fulfilled immediately, similarly for the ar-
(bids). The execution within each price level is prioritized rivals of limit sell orders with ask prices lower than the highest bid
in accordance with the limit order time of arrival, in a first- price; thus the highest bid price does not exceed the lowest ask price.
come-first-served (FCFS) fashion. 2
The Sharpe ratio measures the return of an investment com-
In this paper, we take the perspective of a market-making pared to its risk. Usually, any Sharpe ratio greater than 1.0 is con-
firm. The market-making firm provides liquidity by submit- sidered acceptable to good by investors. A ratio higher than 2.0 is
ting limit orders, and removes liquidity by canceling existing rated as very good. A ratio of 3.0 or higher is considered excellent.
available actions are to add, cancel, or do nothing, and we
encode this using 0 and 1. A 0 on the bid side implies we
do not want an order resting at the best bid price, and so
we cancel any existing order on the bid side, and otherwise
do nothing. A 1 implies we do want an order resting at the
best bid price, and so we place an order at the best bid price
and simultaneously cancel any existing order on the bid side.
This leads to the allowable action space for any state Rt be-
Figure 2: Timing of LOB events ing A := {(0, 0), (0, 1), (1, 0), (1, 1)}, where the two com-
ponents in an action pair correspond to the action on the bid
side and the ask side, respectively. Later, this will be useful
Q-learning algorithm together with state aggregation. for us to restrict the action space when there is too much mis-
match between the amounts bought and sold, in which case
2 Model the allowable actions will be a subset of A. The state after
We model this problem as a finite-horizon discrete-time MDP. taking an action At = (At1 , At2 ) ∈ A can be expressed by
The simplified assumed timing of events happening in the two n-dimensional vectors Rta1 and Rta2 , defined as
LOB is illustrated in Figure 2. The objective is to provide a a2
Rtp 2
:=Rtp , for all p ∈ {1, 2, . . . , n}
strategy for when (and when not) to have one buy and/or one
At1 = 1 and p = βRt or
(
sell order resting on the LOB. The assumption that at most 1, if (1)
a1
one buy and one sell order can rest on the buy and sell side Rtp := At2 = 1 and p = αRt
respectively is based on a high-frequency trading convention 0, otherwise.
to (1) backtest whether the simple strategy is profitable, (2)
see how the simple strategy performs in production, and (3) 2.3 Exogenous Order Arrivals and Cancellations
expand to more complicated order strategies (such as order Let D̂tM B and D̂tM S be the number of units demanded respec-
stacking). tively by market buy and sell orders, which arose between
time t and t + 1. We have at most one resting order at αRt
2.1 LOB State Variable and one at βRt , which rest at the end of queue, and none else-
Assume there are n price levels in the order book, indexed where. The implication is that our orders execute if D̂tM B
by P := {1, 2, . . . , n}. At time t ∈ T := {0, 1, 2, ..., T }, and/or D̂tM S is no fewer than the number of orders resting at
1
|Rtp | ∈ {0, 1} denotes whether there exists a limit order be- the best ask and/or best bid; the state can then be updated in
2
longing to us at price p ∈ P, and |Rtp | ∈ {0, 1, 2, . . . } de- terms of the first n-dimensional vector, for all p ∈ P,
notes the total number of limit orders resting from other mar-
ket participants at price p ∈ P. We distinguish between the 0, a1
if p = αRt and D̂tM B ≥ Rtp a2
+ Rtp ,
m1 MS a1 a2
bid and the ask side according to whether Rtp i
(i = 1, 2) is Rtp := 0, if p = βRt and D̂t ≥ Rtp + Rtp , (2)
a1
i Rtp , otherwise.
negative or positive; Rtp < 0 (i = 1, 2) for the bid side, and
i In order to update the second n-dimensional vector, which
Rtp > 0 (i = 1, 2) for the ask side. Whenever the state is
such that we have an order resting, we conservatively assume represents the resting orders from other market participants,
that our order rests at the back of the queue. The implication we require more detailed knowledge. Define pα Rt to be the
is that our model will tend to underestimate the frequency at highest ask price against which a market buy order will ex-
which our orders are executed, resulting in an underestima- ecute.3 If there are enough limit orders resting at the low-
tion of profit. est ask price to fill the incoming market buy orders (i.e.,
The best bid and ask prices (also called the market bid D̂tM B ≤ Rtα a1
Rt
a2
+ Rtα Rt
), then pα
Rt = αRt and the trade
and ask prices) can be expressed as a function of Rt = quantity at price αRt is kRα
:= D̂tM B . Otherwise pα
(Rtp1 2
, Rtp )p∈P . t Rt > αRt ,
the trade quantities at any ask prices p lower than pα Rt exactly
• The best bid price (which is the highest bid price) is equals the number of resting orders at the price, and the trade
1 2
βRt := max{p ∈ {0, 1, ..., n} : Rtp + Rtp < 0}. quantity at price pα
Rt can be expressed by
• The best ask price (which is the lowest ask price) is α MB a1 a2
PpαRt −1 a2
1 2 kR := D̂ − (R + R ) − p=αRt +1 Rtp ,
αRt := min{p ∈ {1, ..., n, n + 1} : Rtp + Rtp > 0}. t t tα Rt tα Rt
Pn
In the above, p = 0 and p = n + 1 represent the degenerate assuming D̂tM B ≤ (Rtα a1
Rt
+ Rtα a2
Rt
) + p=αR +1 Rtp a2
t
cases of no bids and no asks, respectively. Since the best bid (where the summation in the above display is the empty set
and ask prices can be determined from Rt , there is no need if pαRt = αRt + 1). In the rare case that the total num-
to include them as part of the state variable. Then, the pre- ber of limit orders resting on the book is not enough to
decision state variable at time t is given by Rt . fill all the incoming Pmarket buy orders (that is, if D̂tM B >
a1 a2 n a2
(RtαRt + RtαRt ) + p=αR +1 Rtp ), then pα
Rt = n and the
2.2 Decision Variable t
• cumP nL(CP ) ∈ {0, 1}: indicates whether the cumu- and we record these in a lookup table called Q table. The
lative PnL, defined from equation (6) as resulting size of the Q table used to look up the recommended
Xt action associated with any given state is 2 × 2 × 5 × 3 × 2 ×
pnlt := C(Rt , At , D̂tM B , D̂tM S ), (9) 4 + 2 × 2 × 5 × 2 × 2 × 2 = 640.
i=0