0% found this document useful (0 votes)
72 views83 pages

SSRN Id3270269 PDF

Uploaded by

Rajesh Lal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views83 pages

SSRN Id3270269 PDF

Uploaded by

Rajesh Lal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Useful Financial Features

Prof. Marcos López de Prado


Advances in Financial Machine Learning
ORIE 5256

Electronic copy available at: https://ssrn.com/abstract=3270269


What are we going to learn today? (1/2)
• Structural Breaks
– CUSUM tests
– Explosiveness tests
• Right-tail unit-root tests
• Sub/super-martingale tests

• Entropy Features
– Shannon entropy
– The plug-in estimator
– Lempel-Ziv estimators
– Encoding schemes
– Entropy of a Gaussian process
– Entropy and the generalized mean

2
Electronic copy available at: https://ssrn.com/abstract=3270269
What are we going to learn today? (2/2)
• Microstructural Features
– First generation: Price sequences
• The tick rule
• The roll model
• The high-low volatility estimator
• The Corwin-Schulz bid-ask spread model
– Second generation: Strategic trade models
• Kyle’s lambda
• Amihud’s lambda
• Hasbrouck’s lambda
– Third generation: Sequential trade models
• Probability of information-based trading
• Volume-synchronized probability of informed trading

3
Electronic copy available at: https://ssrn.com/abstract=3270269
SECTION I
Structural Breaks

Electronic copy available at: https://ssrn.com/abstract=3270269


CUSUM test: Brown-Durbin-Evans
• It estimates standardized recursive least squares forecasting errors as
𝑦𝑡 − 𝛽 ′𝑡−1 𝑥𝑡
𝜔𝑡 =
𝑓𝑡
𝑓𝑡 = 𝜎𝜀2 1 + 𝑥𝑡 ′ 𝑋𝑡′ 𝑋𝑡 −1 𝑥𝑡
• It compares the observed cumulative sum of forecasting errors (𝑆𝑡 ) against its theoretical
distribution.
𝑡
𝜔𝑗
𝑆𝑡 =
𝜎𝜔
𝑗=𝑘+1
𝑇
1
𝜎𝜔2 = 𝜔𝑡 − E 𝜔𝑡 2
𝑇−𝑘
𝑡=𝑘
• Under the null hypothesis 𝐻0 : 𝛽𝑡 = 𝛽, then 𝑆𝑡 ~𝑁 0, 𝑡 − 𝑘 − 1 .
• Caveat: Results are sensitive to starting point, which is chosen arbitrarily.
5
Electronic copy available at: https://ssrn.com/abstract=3270269
CUSUM test: Chu-Stinchcombe-White
• It assumes E𝑡−1 ∆𝑦𝑡 = 0, and works directly with levels 𝑦𝑡 (log-prices).
• We compute the standardized departure of log-price 𝑦𝑡 relative to the log-price at 𝑦𝑛 , 𝑡 > 𝑛,
as

−1
𝑆𝑛,𝑡 = 𝑦𝑡 − 𝑦𝑛 𝜎𝑡 𝑡 − 𝑛
𝑡

𝜎𝑡2 = 𝑡 − 1 −1
∆𝑦𝑖 2

𝑖=2

• Under the null hypothesis 𝐻0 : 𝛽𝑡 =0, then 𝑆𝑛,𝑡 ~𝑁 0, 1 .


• The time-dependent critical value for the one-sided test is (𝑏𝛼 ≈ 4.6 for 𝛼 = .05)
𝑐𝛼 𝑛, 𝑡 = 𝑏𝛼 + log 𝑡 − 𝑛
• Caveat: Results are sensitive to the reference level, 𝑦𝑛 , which is chosen arbitrarily.

6
Electronic copy available at: https://ssrn.com/abstract=3270269
EXPLOSIVENESS: Chow-Type Dickey-Fuller (1/2)
• Consider the first order autoregressive process with white noise 𝜀𝑡
𝑦𝑡 = 𝜌𝑦𝑡−1 + 𝜀𝑡
• The null hypothesis is that 𝑦𝑡 follows a random walk, 𝐻0 : 𝜌 = 1, and the alternative
hypothesis is that 𝑦𝑡 starts as a random walk but changes at time 𝜏 ∗ 𝑇, where 𝜏 ∗ ∈ 0,1 ,
into an explosive process:
𝑦𝑡−1 + 𝜀𝑡 for 𝑡 = 1, . . . , 𝜏 ∗ 𝑇
𝐻1 : 𝑦𝑡 =
𝜌𝑦𝑡−1 + 𝜀𝑡 for 𝑡 = 𝜏 ∗ 𝑇 + 1, . . . , 𝑇, with 𝜌 > 1
• At time 𝑇 we can test for a switch (from random walk to explosive process) having taken
place at time 𝜏 ∗ 𝑇 (break date). In order to test this hypothesis, we fit the following
specification,
∆𝑦𝑡 = 𝛿𝑦𝑡−1 𝐷𝑡 𝜏 ∗ + 𝜀𝑡
where 𝐷𝑡 𝜏 ∗ is a dummy variable that takes zero value if 𝑡 < 𝜏 ∗ 𝑇, and takes the value one if
𝑡 ≥ 𝜏 ∗ 𝑇.

7
Electronic copy available at: https://ssrn.com/abstract=3270269
EXPLOSIVENESS: Chow-Type Dickey-Fuller (2/2)
• Then, the null hypothesis 𝐻0 : 𝛿 = 0 is tested against the (one-sided) alternative 𝐻1 : 𝛿 > 1:
𝛿
𝐷𝐹𝐶𝜏∗ =
𝜎𝛿
• The main drawback of this method is that 𝜏 ∗ is unknown.
• To address this issue, Andrews [1993] proposed a new test where all possible 𝜏 ∗ are tried,
within some interval 𝜏 ∗ ∈ 𝜏0 , 1 − 𝜏0 .
• The test statistic for an unknown 𝜏 ∗ is the maximum of all 𝑇 1 − 2𝜏0 values of 𝐷𝐹𝐶𝜏∗
𝑆𝐷𝐹𝐶 = sup 𝐷𝐹𝐶𝜏∗
𝜏∗ ∈ 𝜏0 ,1−𝜏0
• Another drawback of Chow’s approach is that it assumes that there is only one break date
𝜏 ∗ 𝑇, and that the bubble runs up to the end of the sample (there is no switch back to a
random walk). For situations where three or more regimes (random walk  bubble 
random walk . . .) exist, this is problematic.

8
Electronic copy available at: https://ssrn.com/abstract=3270269
EXPLOSIVENESS: SADF (1/2)
• In the words of Phillips, Wu and Yu [2011]: “[S]tandard unit root and cointegration tests are
inappropriate tools for detecting bubble behavior because they cannot effectively distinguish
between a stationary process and a periodically collapsing bubble model. Patterns of
periodically collapsing bubbles in the data look more like data generated from a unit root or
stationary autoregression than a potentially explosive process.”
• To address this flaw, these authors propose fitting the regression specification

∆𝑦𝑡 = 𝛼 + 𝛽𝑦𝑡−1 + 𝛾𝑙 ∆𝑦𝑡−𝑙 + 𝜀𝑡


𝑙=1

where we test for 𝐻0 : 𝛽 ≤ 0, 𝐻1 : 𝛽 > 0. Inspired by Andrews [1993], Phillips and Yu [2011]
and Phillips, Wu and Yu [2011] proposed the Supremum Augmented Dickey-Fuller test
(SADF).

9
Electronic copy available at: https://ssrn.com/abstract=3270269
EXPLOSIVENESS: SADF (2/2)
• SADF fits the above regression at each end point 𝑡 with backwards expanding start points,
then computes
𝛽𝑡0,𝑡
𝑆𝐴𝐷𝐹𝑡 = sup 𝐴𝐷𝐹𝑡0,𝑡 = sup
𝑡0 ∈ 1,𝑡−𝜏 𝑡0 ∈ 1,𝑡−𝜏 𝜎𝛽𝑡0 ,𝑡

where 𝛽𝑡0 ,𝑡 is estimated on a sample that starts at 𝑡0 and ends at t, 𝜏 is the minimum sample
length used in the analysis, 𝑡0 is the left bound of the backwards expanding window, and
𝑡 = 𝜏, . . . , 𝑇. For the estimation of 𝑆𝐴𝐷𝐹𝑡 , the right side of the window is fixed at 𝑡. The
standard ADF tests is a special case of 𝑆𝐴𝐷𝐹𝑡 , where 𝜏 = 𝑡.
• There are two critical differences between 𝑆𝐴𝐷𝐹𝑡 and SDFC: First, 𝑆𝐴𝐷𝐹𝑡 is computed at
each 𝑡 ∈ 𝜏, 𝑇 , whereas SDFC is computed only at 𝑇. Second, instead of introducing a
dummy variable, SADF recursively expands the beginning of the sample (𝑡0 ∈ 1, 𝑡 − 𝜏 ). By
trying all combinations of a nested double loop on (𝑡0 , 𝑡), SADF does not assume a known
number of regime switches or break dates.

10
Electronic copy available at: https://ssrn.com/abstract=3270269
EXPLOSIVENESS: Sub/Super-Martingale (1/3)
• Consider a process that is either a sub- or super-martingale. Given some observations 𝑦𝑡 ,
we would like to test for the existence of an explosive time trend, 𝐻0 : 𝛽 = 0, 𝐻1 : 𝛽 ≠ 0,
under alternative specifications.
• Polynomial trend (SM-Poly1):
𝑦𝑡 = 𝛼 + 𝛾𝑡 + 𝛽𝑡 2 + 𝜀𝑡
• Polynomial trend (SM-Poly2):
log 𝑦𝑡 = 𝛼 + 𝛾𝑡 + 𝛽𝑡 2 + 𝜀𝑡
• Exponential trend (SM-Exp):
𝑦𝑡 = 𝛼𝑒 𝛽𝑡 + 𝜀𝑡 ⟹ log 𝑦𝑡 = log 𝛼 + 𝛽𝑡 + 𝜉𝑡
• Power trend (SM-Power):
𝑦𝑡 = 𝛼𝑡𝛽 + 𝜀𝑡 ⟹ log 𝑦𝑡 = log 𝛼 + 𝛽log 𝑡 + 𝜉𝑡

11
Electronic copy available at: https://ssrn.com/abstract=3270269
EXPLOSIVENESS: Sub/Super-Martingale (2/3)
• Similar to SADF, we fit any of these specifications to each end point 𝑡 = 𝜏, . . . , 𝑇, with
backwards expanding start points, then compute

𝛽𝑡0,𝑡
𝑆𝑀𝑇𝑡 = sup
𝑡0 ∈ 1,𝑡−𝜏 𝜎𝛽𝑡0,𝑡

• The reason for the absolute value is that we are equally interested in explosive growth and
collapse. In the simple regression case (Greene [2008], p. 48), the variance of 𝛽 is
𝜎𝜀2
𝜎𝛽2 = 2 𝑡−𝑡 , hence lim 𝜎𝛽𝑡0,𝑡 = 0. The same result is generalizable to the multivariate
𝜎𝑥𝑥 0 𝑡→∞
linear regression case (Greene [2008], pp. 51–52).
• Problem: The 𝜎𝛽2 of a weak long-run bubble may be smaller than the 𝜎𝛽2 of a strong short-
run bubble, hence biasing the method towards long-run bubbles.

12
Electronic copy available at: https://ssrn.com/abstract=3270269
EXPLOSIVENESS: Sub/Super-Martingale (3/3)
• Solution: We can penalize large sample lengths by determining the coefficient 𝜑 ∈ 0,1
that yields best explosiveness signals
𝛽𝑡0,𝑡
𝑆𝑀𝑇𝑡 = sup 𝜑
𝑡0 ∈ 1,𝑡−𝜏 𝜎𝛽𝑡0 ,𝑡 𝑡 − 𝑡0

• For instance,
– when 𝜑 = 0.5, we compensate for the lower 𝜎𝛽𝑡0,𝑡 associated with longer sample lengths, in the simple regression
case.
– For 𝜑 → 0, 𝑆𝑀𝑇𝑡 will exhibit longer trends, as that compensation wanes and long-run bubbles mask short-run
bubbles.
– For 𝜑 → 1, 𝑆𝑀𝑇𝑡 becomes noisier, because more short-run bubbles are selected over long-run bubbles.
• Consequently, this is a natural way to adjust the explosiveness signal, so that it filters
opportunities targeting a particular holding period.
• The features used by the ML algorithm may include 𝑆𝑀𝑇𝑡 estimated from a wide range of 𝜑
values.

13
Electronic copy available at: https://ssrn.com/abstract=3270269
SECTION II
Entropy Features

Electronic copy available at: https://ssrn.com/abstract=3270269


Entropy
• Let 𝑋 be a discrete random variable that takes a value 𝑥 from the set 𝑆𝑋 with probability
𝑝 𝑥 . The entropy of 𝑋 is defined as

𝐻 𝑋 =− 𝑝 𝑥 log 𝑝 𝑥
𝑥∈𝑆𝑋

• A few observations:
1
– The value measures how surprising an observation is, because surprising observations are characterized by their
𝑝𝑥
low probability.
1
– Entropy is the expected value of those surprises, where the log . function prevents that 𝑝 𝑥 cancels and
𝑝𝑥
endows entropy with desirable mathematical properties.
– Accordingly, entropy can be interpreted as the amount of uncertainty associated with 𝑋. Entropy is zero when all
probability is concentrated in a single element of 𝑆𝑋 . Entropy reaches a maximum at log 𝑆𝑋 when 𝑋 is distributed
1
uniformly, 𝑝 𝑥 = , ∀𝑥 ∈ 𝑆𝑋 .
𝑆𝑋

15
Electronic copy available at: https://ssrn.com/abstract=3270269
Time Series Entropy: The Plug-In Estimator
• Given a data sequence 𝑥1𝑛 , comprising the string of values starting in position 1 and ending
in position n, we can form a dictionary of all words of length 𝑤 < 𝑛 in that sequence, 𝐴𝑤 .
• Consider an arbitrary word 𝑦1𝑤 ∈ 𝐴𝑤 of length 𝑤. We denote 𝑝𝑤 𝑦1𝑤 the empirical
probability of the word 𝑦1𝑤 in 𝑥1𝑛 , which means that 𝑝𝑤 𝑦1𝑤 is the frequency with which 𝑦1𝑤
appears in 𝑥1𝑛 .
• Assuming that the data is generated by a stationary and ergodic process, then the law of
large numbers guarantees that, for a fixed w and large n, the empirical distribution 𝑝𝑤 will
be close to the true distribution 𝑝𝑤 .
• Under these circumstances, a natural estimator for the entropy rate (i.e., average entropy
per bit) is
1
𝐻𝑛,𝑤 = − 𝑝𝑤 𝑦1𝑤 log 2 𝑝𝑤 𝑦1𝑤
𝑤 𝑤 𝑤
𝑦1 ∈𝐴

16
Electronic copy available at: https://ssrn.com/abstract=3270269
Time Series Entropy: Lempel-Ziv Estimators
• Plug-in estimators require large samples.
• Kontoyiannis [1998] attempts to make a more efficient use of the information available in a
message.
• Let us define 𝐿𝑛𝑖 as 1 plus the length of the longest match found in the n bits prior to i,
𝑗+𝑙
𝐿𝑛𝑖 = 1 + max 𝑙 𝑥𝑖𝑖+𝑙 = 𝑥𝑗 for some 𝑖 − 𝑛 ≤ 𝑗 ≤ 𝑖 − 1, 𝑙 ∈ 0, 𝑛
• The general intuition is, as we increase the available history, we expect that messages with
high entropy will produce relatively shorter non-redundant substrings. In contrast, messages
with low entropy will produce relatively longer non-redundant substrings as we parse
through the message.
𝑛+𝑘−1
• The sliding-window LZ estimator 𝐻𝑛,𝑘 = 𝐻𝑛,𝑘 𝑥−𝑛+1 is defined by
−1
𝑘
1 𝐿𝑛𝑖
𝐻𝑛,𝑘 =
𝑘 log 2 𝑛
𝑖=1

17
Electronic copy available at: https://ssrn.com/abstract=3270269
Time Series Entropy: Encoding Schemes (1/2)
• Entropy rate estimation requires the discretization of a continuous variable, so that each
value can be assigned a code from a finite alphabet.
• Binary Encoding:
– A stream of returns 𝑟𝑡 can be encoded according to the sign, 1 for 𝑟𝑡 > 0, 0 for 𝑟𝑡 < 0, removing cases where 𝑟𝑡 = 0.
– Binary encoding arises naturally in the case of returns series sampled from price bars (i.e., bars that contain prices
fluctuating between two symmetric horizontal barriers, centered around the start price), because 𝑟𝑡 is
approximately constant.
• Quantile Encoding:
– Unless price bars are used, it is likely that more than two codes will be needed.
– One approach consists in assigning a code to each 𝑟𝑡 according to the quantile it belongs to.
– The quantile boundaries are determined using an in-sample period (training set).
– There will be the same number of observations assigned to each letter for the overall in-sample, and close to the
same number of observations per letter out-of-sample.
– This uniform (in-sample) or close to uniform (out-of-sample) distribution of codes tends to increase entropy readings
on average.

18
Electronic copy available at: https://ssrn.com/abstract=3270269
Time Series Entropy: Encoding Schemes (2/2)
• Sigma Encoding:
– As an alternative approach, rather than fixing the number of codes, we could let the price stream determine the
actual dictionary.
– Suppose we fix a discretization step, 𝜎. Then, we assign the value 0 to 𝑟𝑡 ∈ min 𝑟 , min 𝑟 + 𝜎 , 1 to 𝑟𝑡 ∈
max 𝑟 −min 𝑟
min 𝑟 + 𝜎, min 𝑟 + 2𝜎 and so on until every observation has been encoded with a total of 𝑐𝑒𝑖𝑙
𝜎
codes, where 𝑐𝑒𝑖𝑙 . is the ceiling function.
– Unlike quantile encoding, now each code covers the same fraction of 𝑟𝑡 ’s range.
– Because codes are not uniformly distributed, entropy readings will tend to be smaller than in quantile encoding on
average; however, the appearance of a “rare” code will cause spikes in entropy readings.

19
Electronic copy available at: https://ssrn.com/abstract=3270269
Entropy and the Generalized Mean (1/4)
• Here is an interesting way of thinking about entropy.
• Consider a set of real numbers 𝑥 = 𝑥𝑖 𝑖=1,...,𝑛 and weights 𝑝 = 𝑝𝑖 𝑖=1,...,𝑛 , such that
0 ≤ 𝑝𝑖 ≤ 1, ∀𝑖 and 𝑛𝑖=1 𝑝𝑖 = 1.
• The generalized weighted mean of x with weights p on a power 𝑞 ≠ 0 is defined as

1
𝑛 𝑞
𝑞
𝑀𝑞 𝑥, 𝑝 = 𝑝𝑖 𝑥𝑖
𝑖=1

• For 𝑞 < 0, we must require that 𝑥𝑖 > 0, ∀𝑖.

20
Electronic copy available at: https://ssrn.com/abstract=3270269
Entropy and the Generalized Mean (2/4)
• The reason this is a generalized mean is that other means can be obtained as special cases:
– Minimum: lim 𝑀𝑞 𝑥, 𝑝 = min 𝑥𝑖
𝑞→−∞ 𝑖
𝑛 −1 −1
– Harmonic mean: 𝑀−1 𝑥, 𝑝 = 𝑖=1 𝑝𝑖 𝑥𝑖
𝑛 𝑛
– Geometric mean: lim 𝑀𝑞 𝑥, 𝑝 = 𝑒 𝑖=1 𝑝𝑖 log 𝑥𝑖
= 𝑝𝑖
𝑞→0 𝑖=1 𝑥𝑖
𝑛
– Arithmetic mean: 𝑀1 𝑥, 𝑛−1 𝑖=1,...,𝑛 = 𝑛−1 𝑖=1 𝑥𝑖
𝑛
– Weighted mean: 𝑀1 𝑥, 𝑝 = 𝑖=1 𝑝𝑖 𝑥𝑖
𝑛 2 1/2
– Quadratic mean: 𝑀2 𝑥, 𝑝 = 𝑖=1 𝑝𝑖 𝑥𝑖
– Maximum: lim 𝑀𝑞 𝑥, 𝑝 = max 𝑥𝑖
𝑞→+∞ 𝑖

• In the context of information theory, an interesting special case is 𝑥 = 𝑝𝑖 𝑖=1,...,𝑛 , hence


1
𝑛 𝑞
𝑞
𝑀𝑞 𝑝, 𝑝 = 𝑝𝑖 𝑝𝑖
𝑖=1

21
Electronic copy available at: https://ssrn.com/abstract=3270269
Entropy and the Generalized Mean (3/4)
1
• Let us define the quantity 𝑁𝑞 𝑝 = , for some 𝑞 ≠ 1.
𝑀𝑞−1 𝑝,𝑝

• Again, for 𝑞 < 1 in 𝑁𝑞 𝑝 , we must have 𝑝𝑖 > 0, ∀𝑖.


1
• If 𝑝𝑖 = for 𝑘 ∈ 1, 𝑛 different indices and 𝑝𝑖 = 0 elsewhere, then the weight is spread
𝑘
evenly across k different items, and 𝑁𝑞 𝑝 = 𝑘 for 𝑞 > 1.
• In other words, 𝑁𝑞 𝑝 gives us the effective number or diversity of items in p, according to
some weighting scheme set by q.
𝜕𝑀𝑞 𝑝,𝑝 𝜕𝑁𝑞 𝑝
• Using Jensen’s inequality, we can prove that ≥ 0, hence ≤ 0. Smaller values
𝜕𝑞 𝜕𝑞
of q assign a more uniform weight to elements of the partition, giving relatively more weight
to less common elements, and lim 𝑁𝑞 𝑝 is simply the total number of nonzero 𝑝𝑖 .
𝑞→0

22
Electronic copy available at: https://ssrn.com/abstract=3270269
Entropy and the Generalized Mean (4/4)
𝑛
• Shannon’s entropy is 𝐻 𝑝 = 𝑖=1 −𝑝𝑖 log 𝑝𝑖 = −log lim 𝑀𝑞 𝑝 = log lim 𝑁𝑞 𝑝 .
𝑞→0 𝑞→1
• This shows that entropy can be interpreted as the logarithm of the effective number of items
in a list p, where 𝑞 → 1.
• Intuitively, entropy measures information as the level of diversity contained in a random
variable. This intuition is formalized through the notion of generalized mean.
• The implication is that Shannon’s entropy is a special case of a diversity measure (hence its
connection with volatility).
• We can now define and compute alternative measures of diversity, other than entropy,
where 𝑞 ≠ 1.

23
Electronic copy available at: https://ssrn.com/abstract=3270269
SECTION III
Microstructural Features

Electronic copy available at: https://ssrn.com/abstract=3270269


A brief History of Market Microstructure Models
• First generation: price sequences
o The Roll model [1984]
o High-Low Volatility Estimator: Beckers [1983], Parkinson [1980]
o Corwin and Schultz [2012]
• Second generation: strategic trade models
o Kyle’s lambda [1985]
o Amihud’s lambda [2002]
o Hasbrouck’s lambda [2009]
• Third generation: sequential trade models
o Probability of information-based trading (PIN): Easley et al. [1996]
o Volume-synchronized probability of informed trading (VPIN): Easley et al. [2011]

Electronic copy available at: https://ssrn.com/abstract=3270269


Roll Measure

2 |cov Δ𝑝𝑡 , Δ𝑝𝑡−1 |


• Δ𝑝𝑡 is the change
in close price
between two bars
at time 𝑡.

• The covariance is
evaluated on a
rolling basis for
different window
size

Electronic copy available at: https://ssrn.com/abstract=3270269


Roll Impact

2 |cov Δ𝑝𝑡 , Δ𝑝𝑡−1 |


𝐷𝑜𝑙𝑙𝑎𝑟 𝑉𝑜𝑙𝑢𝑚𝑒𝑡

Electronic copy available at: https://ssrn.com/abstract=3270269


Kyle’s Lambda

Δ𝑝𝑡 = 𝜆𝑏𝑡 𝑉𝑡

• 𝜆 is derived from the


regression with a
rolling window

• 𝑉𝑡 is the volume and


𝑏𝑡 = sign 𝑝𝑡 − 𝑝𝑡−1
which is computed on
bar level.

Electronic copy available at: https://ssrn.com/abstract=3270269


Amihud’s Lambda

𝑅𝑒𝑡𝑢𝑟𝑛𝑡
𝑀𝑜𝑣𝑖𝑛𝑔 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 , 𝑤𝑖𝑛𝑑𝑜𝑤
𝐷𝑜𝑙𝑙𝑎𝑟 𝑉𝑜𝑙𝑢𝑚𝑒𝑡

Electronic copy available at: https://ssrn.com/abstract=3270269


VPIN

|𝑉𝑡𝑆 − 𝑉𝑡𝐵 |
𝑛𝑉

∆𝑝𝑡
• 𝑉𝑡𝐵 = 𝑉𝑡 𝑍 , 𝑉𝑡𝑆 = 𝑉𝑡 − 𝑉𝑡𝐵
𝜎∆𝑝𝑡

• 𝑛 is the number of bars used

Electronic copy available at: https://ssrn.com/abstract=3270269


Kyle & Amihud are best In-Sample (1/2)
• MDI results have strong similarity across all labels. It is observed that MDI
is biased towards features with higher variance (Altmann et al. [2010]).
See below for MDI results with 50 bars window.

Corwin-Schultz Realized volatility JB statistics


Electronic copy available at: https://ssrn.com/abstract=3270269
Kyle & Amihud are best In-Sample (2/2)
• MDI results have strong similarity across all labels. It is observed that MDI
is biased towards features with higher variance (Altmann et al. [2010]).
See below for MDI results with 50 bars window.

Sequential correlation Return skewness Return kurtosis


Electronic copy available at: https://ssrn.com/abstract=3270269
VPIN is best Out-Of-Sample (1/2)
• When the backward window is large, only VPIN can contribute positively
to out-of-sample prediction across all labels except sequential correlation.
Below are MDA result with 500 bar window

Corwin-Schultz Realized volatility JB statistics Return skewness Return kurtosis

Electronic copy available at: https://ssrn.com/abstract=3270269


VPIN is best Out-Of-Sample (2/2)
• VPIN’s MDA importance at predicting realized volatility remains significant
with large window size, even when other variables become irrelevant

1000 bars 1500 bars 2000 bars


Electronic copy available at: https://ssrn.com/abstract=3270269
For Additional Details
The first wave of quantitative innovation in finance was led by Markowitz
optimization. Machine Learning is the second wave and it will touch every
aspect of finance. López de Prado’s Advances in Financial Machine Learning is
essential for readers who want to be ahead of the technology rather than
being replaced by it.
— Prof. Campbell Harvey, Duke University. Former President of the American
Finance Association.

Financial problems require very distinct machine learning solutions. Dr. López
de Prado’s book is the first one to characterize what makes standard machine
learning tools fail when applied to the field of finance, and the first one to
provide practical solutions to unique challenges faced by asset managers.
Everyone who wants to understand the future of finance should read this
book.
— Prof. Frank Fabozzi, EDHEC Business School. Editor of The Journal of
Portfolio Management.

35
Electronic copy available at: https://ssrn.com/abstract=3270269
Disclaimer
• The views expressed in this document are the authors’ and do not necessarily
reflect those of the organizations he is affiliated with.
• No investment decision or particular course of action is recommended by this
presentation.
• All Rights Reserved. © 2017-2020 by True Positive Technologies, LP

www.QuantResearch.org

36
Electronic copy available at: https://ssrn.com/abstract=3270269
Market Microstructure
in the Age of Machine Learning
Marcos López de Prado
Lawrence Berkeley National Laboratory
Computational Research Division

Electronic copy available at: https://ssrn.com/abstract=3270269


A brief History of Market Microstructure Models
• First generation: price sequences
o The Roll model [1984]
o High-Low Volatility Estimator: Beckers [1983], Parkinson [1980]
o Corwin and Schultz [2012]
• Second generation: strategic trade models
o Kyle’s lambda [1985]
o Amihud’s lambda [2002]
o Hasbrouck’s lambda [2009]
• Third generation: sequential trade models
o Probability of information-based trading (PIN): Easley et al. [1996]
o Volume-synchronized probability of informed trading (VPIN): Easley et al. [2011]

Electronic copy available at: https://ssrn.com/abstract=3270269


Advantages of the AI Age
• Financial Big Data (tick/book level)

• New and advanced statistical techniques (Machine Learning)

• Unprecedented computational power (Supercomputers)

Microstructural relationships often are non-linear and hard to


parameterize. When used properly, these technologies can uncover
relationships unknown to traditional approaches.

Electronic copy available at: https://ssrn.com/abstract=3270269


Goals of this Presentation
• Investigate all three-generation market microstructure variables
jointly on the most recent 10 year market data

• Apply ML based feature importance analysis and test the usefulness


of microstructure variables at predicting various market movements

• Study how the feature importance pattern changes with different


labels and different time scales

Electronic copy available at: https://ssrn.com/abstract=3270269


Section I
Data & Variables

Electronic copy available at: https://ssrn.com/abstract=3270269


Market Data
• 87 liquid futures traded globally (index, currency, commodity and
fixed-income) with 10 years history
• Tick level trade data, aggregated into bars (price-volume bars). Each
bar is formed with a timestamp 𝑡 when
𝑡

𝑝𝜏 𝑉𝜏 ≥ 𝐿
𝜏=𝑡−1
• The threshold 𝐿 is chosen to have roughly 50 bars per day in the year
2017. Each bar contains Close, Open, High, Low and Volume

Electronic copy available at: https://ssrn.com/abstract=3270269


Market Data
A snapshot of ES1 Index data

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variables
Roll measure

2 |cov Δ𝑝𝑡 , Δ𝑝𝑡−1 |

• Δ𝑝𝑡 is the change in


close price between
two bars at time 𝑡.

• The covariance is
evaluated on a rolling
basis for different
window size (all plots
are done with window
= 50 bars)

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variables

Roll impact

2 |cov Δ𝑝𝑡 , Δ𝑝𝑡−1 |


𝐷𝑜𝑙𝑙𝑎𝑟 𝑉𝑜𝑙𝑢𝑚𝑒𝑡

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variables
Kyle’s 𝜆

Δ𝑝𝑡 = 𝜆𝑏𝑡 𝑉𝑡

• 𝜆 is derived from the


regression with a rolling
window

• 𝑉𝑡 is the volume and


𝑏𝑡 = sign 𝑝𝑡 − 𝑝𝑡−1
which is computed on
bar level.

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variables
Amihud measure
𝑅𝑒𝑡𝑢𝑟𝑛𝑡
𝑀𝑜𝑣𝑖𝑛𝑔 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 , 𝑤𝑖𝑛𝑑𝑜𝑤
𝐷𝑜𝑙𝑙𝑎𝑟 𝑉𝑜𝑙𝑢𝑚𝑒𝑡

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variables
VPIN

|𝑉𝑡𝑆 − 𝑉𝑡𝐵 |
𝑛𝑉

∆𝑝𝑡
• 𝑉𝑡𝐵 = 𝑉𝑡 𝑍 , 𝑉𝑡𝑆 = 𝑉𝑡 − 𝑉𝑡𝐵
𝜎∆𝑝𝑡

• 𝑛 is the number of bars used

Electronic copy available at: https://ssrn.com/abstract=3270269


Section II
Algorithms & Research Tools

Electronic copy available at: https://ssrn.com/abstract=3270269


Selection of ML algorithm: Random Forest
• Random Forest is an ensemble method. The bootstrapping process
brings in randomness that can reduce potential overfitting, a common
issue in finance problems.

• As a tree-based algorithm, Random Forest has a tree-based feature


importance method (Mean Decreased Impurity). We can compare it
with a more generic ML feature importance method (Mean
Decreased Accuracy).

Electronic copy available at: https://ssrn.com/abstract=3270269


ML-based feature importance analysis
• Mean-Decreased Impurity (MDI). MDI is built for all tree-based ML
algorithms including random forest. All tree-based algorithms consist
of multiple data splits on selected features, and each split is obtained
by minimizing the impurity. MDI measures how much sample size
weighted total impurity each feature reduces during training, and
rank the feature with largest decreased impurity the highest.

• MDI feature importance is in-sample as it is extracted from training


data only. It is a statement on explanatory importance rather than
predictive importance.

Electronic copy available at: https://ssrn.com/abstract=3270269


ML-based feature importance analysis
• Mean-Decreased Accuracy (MDA). MDA is a generic feature
importance method that applies to all ML algorithms. The idea is to
randomly permute the values of each feature and measure how much
the permutation decreases the model’s out-of-sample accuracy. The
more the accuracy decreases indicates the feature is more important.

• In contrast to MDI, MDA tests the actual importance for the out-of-
sample performance.

Electronic copy available at: https://ssrn.com/abstract=3270269


Analysis
• Feature importance rankings from MDI and MDA are generally
different. Important features from MDA might contribute little or
even negatively to out-of-sample prediction performance.

return skewness prediction MDI | 250 bars return skewness prediction MDA | 250 bars
Electronic copy available at: https://ssrn.com/abstract=3270269
Prediction labels
Binary Label generation: at time 𝑡, we generate a binary label by
computing a measure 𝑀𝑡 that is constructed with a backward window
and comparing it to its value at 𝑡 + ℎ, with ℎ = 250 bars (around 5
trading days) for all labels hereafter in this study.

𝑀𝑡+ℎ

𝑀𝑡

𝑡 𝑡+ℎ time
Label at 𝑡:
sign[𝑀𝑡+ℎ - 𝑀𝑡 ]

Electronic copy available at: https://ssrn.com/abstract=3270269


Prediction labels
Label 1. Sign of change in bid-ask spread estimator (Corwin-Schultz)

sign 𝑆𝑡+ℎ − 𝑆𝑡
2 2
2 𝑒 𝛼𝑡 −1 2𝛽𝑡 − 𝛽𝑡 𝛾𝑡 1 𝐻𝑡−𝑗 𝐻𝑡−1,𝑡
𝑆𝑡 = , 𝛼𝑡 = − , 𝛽𝑡 = 𝐸 𝑗=0 𝑙𝑜𝑔 , 𝛾𝑡 = 𝑙𝑜𝑔
1+𝑒 𝛼𝑡 3−2 2 3−2 2 𝐿𝑡−𝑗 𝐿𝑡−1,𝑡

Label 2. Sign of change in realized volatility

sign 𝜎𝑡+ℎ − 𝜎𝑡

𝜎𝑡 is the standard deviation of realized return

Electronic copy available at: https://ssrn.com/abstract=3270269


Prediction labels
Label 3. Sign of change in Jarque-Bera statistics of realized returns

sign 𝐽𝐵 𝑟𝑡+ℎ − 𝐽𝐵 𝑟𝑡 ,
𝑛 1
𝐽𝐵 𝑟 = 𝑆2 + 𝐶−3 2
, 𝑆 is the skewness and 𝐶 is the kurtosis of realized return 𝑟 in the past
6 4
window

Label 4. Sign of change in serial correlation of realized returns

sign 𝑠𝑐𝑡+ℎ − 𝑠𝑐𝑡

𝑠𝑐𝑡 = corr 𝑟𝑡 , 𝑟𝑡−1 , the correlation between returns of two consecutive bars

Electronic copy available at: https://ssrn.com/abstract=3270269


Prediction labels
Label 5. Sign of change in absolute skewness of realized returns

sign 𝑠𝑘𝑒𝑤𝑡+ℎ − 𝑠𝑘𝑒𝑤𝑡

Label 6. Sign of change in kurtosis of realized returns

sign 𝐾𝑢𝑟𝑡𝑡+ℎ − 𝐾𝑢𝑟𝑡𝑡

Electronic copy available at: https://ssrn.com/abstract=3270269


Section III
Results

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variable feature importance
1. Sign of change in Corwin-Schultz estimator
MDI result

Increasing backward window size for feature generation

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variable feature importance
1. Sign of change in Corwin-Schultz estimator
MDA result

Increasing backward window size for feature generation

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variable feature importance
2. Sign of change in realized volatility
MDI result

Increasing backward window size for feature generation

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variable feature importance
2. Sign of change in realized volatility
MDA result

Increasing backward window size for feature generation

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variable feature importance
3. Sign of change in Jarque-Bera statistics of realized returns
MDI result

Increasing backward window size for feature generation

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variable feature importance
3. Sign of change in Jarque-Bera statistics of realized returns
MDA result

Increasing backward window size for feature generation

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variable feature importance
4. Sign of change in serial correlation of realized returns
MDI result

Increasing backward window size for feature generation

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variable feature importance
4. Sign of change in serial correlation of realized returns
MDA result

Increasing backward window size for feature generation

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variable feature importance
5. Sign of change in absolute skewness of realized returns
MDI result

Increasing backward window size for feature generation

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variable feature importance
5. Sign of change in absolute skewness of realized returns
MDA result

Increasing backward window size for feature generation

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variable feature importance
6. Sign of change in kurtosis of realized returns
MDI result

Increasing backward window size for feature generation

Electronic copy available at: https://ssrn.com/abstract=3270269


Microstructure variable feature importance
6. Sign of change in kurtosis of realized returns
MDA result

Increasing backward window size for feature generation

Electronic copy available at: https://ssrn.com/abstract=3270269


Section IV
Analysis

Electronic copy available at: https://ssrn.com/abstract=3270269


Kyle & Amihud are best In-Sample (1/2)
• MDI results have strong similarity across all labels. It is observed that
MDI is biased towards features with higher variance (Altmann et al.
[2010]). See below for MDI results with 50 bars window.

Corwin-Schultz Realized volatility JB statistics


Electronic copy available at: https://ssrn.com/abstract=3270269
Kyle & Amihud are best In-Sample (2/2)
• MDI results have strong similarity across all labels. It is observed that
MDI is biased towards features with higher variance (Altmann et al.
[2010]). See below for MDI results with 50 bars window.

Sequential correlation Return skewness Return kurtosis


Electronic copy available at: https://ssrn.com/abstract=3270269
VPIN is best Out-Of-Sample (1/2)
• When the backward window is large, only VPIN can contribute
positively to out-of-sample prediction across all labels except
sequential correlation. Below are MDA result with 500 bar window

Corwin-Schultz Realized volatility JB statistics Return skewness Return kurtosis

Electronic copy available at: https://ssrn.com/abstract=3270269


VPIN is best Out-Of-Sample (2/2)
• VPIN’s MDA importance at predicting realized volatility remains
significant with large window size, even when other variables become
irrelevant

1000 bars 1500 bars 2000 bars

Electronic copy available at: https://ssrn.com/abstract=3270269


Section V
Conclusions

Electronic copy available at: https://ssrn.com/abstract=3270269


Conclusions (1/3)
• For the prediction of the estimated bid-ask spread, the feature importance
remains almost the same across all window sizes, for both MDI and MDA,
indicating universality.

• For the prediction of the realized volatility, while Amihud and Roll
measures remain stable, VPIN’s importance increases while Kyle’ lambda
decreases as the window size expands, indicating the growth of
predictability of VPIN with larger look back window size.

• For the prediction of the JB test the result is similar to realized volatility.
VPIN’s importance increases while Amihud measure decreases as the
window size expands, while the others remain almost the same, indicating
the growth of predictability of VPIN with larger look back window size.

Electronic copy available at: https://ssrn.com/abstract=3270269


Conclusions (2/3)
• For the prediction of the sequential correlation, MDA results
demonstrate the Roll measure is much more predictive than all other
variables, corresponding to the fact that it is built on past sequential
correlation of returns.

• For the prediction of various moments of realized return (volatility,


skewness and kurtosis), MDA feature importance shows that VPIN
gives the largest contribution consistently, indicating universality.

Electronic copy available at: https://ssrn.com/abstract=3270269


Conclusions (3/3)
• Technologies at the Age of AI provide new methods and perspectives
for market microstructure study.
• ML offers new prediction framework with classification on categorical
output, compared to traditional regression-based models.
• The 5 prototypical market microstructure variables (Roll Measure, Roll
Impact, Kyle’s Lambda, Amihud’s Lambda and VPIN) show different
importances in-sample and out-of-sample. This demonstrates
explanatory power does not fully relate to predictability.
• For all prediction labels tested, VPIN is shown to have consistently
high importance.

Electronic copy available at: https://ssrn.com/abstract=3270269


Future directions
• Incorporate more market indicators as features (e.g. VIX)
• Experiment with different bar formations (Volume/Dollar Imbalance
Bars)
• Vary forecast horizon and test other prediction labels

Electronic copy available at: https://ssrn.com/abstract=3270269


For Additional Details
How does one make sense of todays’ financial markets in which complex
algorithms route orders, financial data is voluminous, and trading speeds are
measured in nanoseconds? For academics and practitioners alike, this book
fills an important gap in our understanding of investment management in the
machine age.
— Prof. Maureen O’Hara, Cornell University. Former President of the
American Finance Association.

The first wave of quantitative innovation in finance was led by Markowitz


optimization. Machine Learning is the second wave and it will touch every
aspect of finance. López de Prado’s Advances in Financial Machine Learning is
essential for readers who want to be ahead of the technology rather than
being replaced by it.
— Prof. Campbell Harvey, Duke University. Former President of the American
Finance Association.

45
Electronic copy available at: https://ssrn.com/abstract=3270269
THANKS FOR YOUR ATTENTION!

46
Electronic copy available at: https://ssrn.com/abstract=3270269
Disclaimer

• The views expressed in this document are the authors’ and do not necessarily
reflect those of the organizations he is affiliated with.
• No investment decision or particular course of action is recommended by this
presentation.
• All Rights Reserved. © 2017-2019 by True Positive Technologies, LP

47
Electronic copy available at: https://ssrn.com/abstract=3270269

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy