Testabc 0 Bookassetmgt
Testabc 0 Bookassetmgt
net/publication/309835452
Asset Management
CITATIONS READS
0 16,997
1 author:
Paolo Vanini
University of Basel
144 PUBLICATIONS 858 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Advanced Mathematical Methods for Economics and Natural Sciences View project
All content following this page was uploaded by Paolo Vanini on 12 July 2020.
Paolo Vanini
University of Basel
3
4 CONTENTS
6 Proofs 521
7 Appendix 537
8 References 539
8 CONTENTS
Chapter 1
Introduction
Assets and their management (AM) are a key discipline in a modern economy: we man-
age our assets to maintain the standard of living after retirement, to buy property later,
or because a sovereign wealth fund does not want to lose the assets of future genera-
tions. AM is a process of building, distributing, and maintaining assets throughout the
life-cycle cost-ecient and compliant. Pension funds, institutional investors or private
investors are dierent users of the AM process.
Game Changers
PwC (2015, 2012), McKinsey (2015), Oliver Wyman (2016) and many others identify the
following game changers for the asset management industry:
1
• Growth of wealth: Global assets under management (AuM) will exceed USD 100
trillion by 2020, up from USD 64 trillion in 2012.
• Regulation: In the past, banks dominated the nancial industry. They were the in-
novators. Regulation focused on banks and insurers after the 2008 Great Financial
Crisis (GFC). AM initially faced fewer regulatory requirements and is now moving
more and more center stage.
• Longevity and demographics: Retirement and health care will become critical issues
as aging grows. The ratio of pensioners to the working-age population will reach
25.4 percent by 2050, up from 11.7 percent in 2010. This puts a strain on pension
systems. The still increasing life expectancy - each new generation will live three
months longer in the developed world - increases the need for individual wealth
1 The data published by consulting rms are private and results cannot be veried nor replicated by
a third party.
9
10 CHAPTER 1. INTRODUCTION
management solutions when people are retired. Asset managers will therefore focus
on long-term investments and on individual asset decumulation. This change aects
in particular the US, Japan, most European countries, South Korea, Singapore,
Taiwan and China.
• Fees will continue to decrease for most asset management solutions and regulation
requires to transform many existing fee models.
• Alternative investments transform into traditional ones and exchange traded funds
(ETFs) continue to proliferate.
Climate change is missing in the list above, although it will be one of the most impor-
tant game changer. Furthermore, the game changer 'performance' is missing although
performance is a notorious problem for many investors and there is no consensus about
optimal investment behavior. We will give this topic wide scope.
While regulation dominated the decade after the GFC, the changes caused by tech-
nology are even more profound for the future of AM.
The current digitization wave diers from the well-known automation. The technology
has matured to a level where abstract banking and asset management products can be
11
understood, researched and valued by clients in a completely dierent way than in the
past. Today's technology is closer to humans than it ever was. Technology is also able
to replace human labor even for complex activities in the AM value chain - which work
will still be human-specic in the AM industry?
Contents
The content is from a methodological point of view split into two parts: Classical
methods and innovation. The former one considers some of the main developments in
the last decades which are in use in the AM industry. These can be the many ways how
portfolios are constructed using the models or methods of Markowitz, factor investing,
Black Litterman and many others. But it also means the way how the AM value chain
is structured and organized. We focus in innovation on two topics: Data science, i.e. the
way how possibly better forecasts can be made or customer needs measured. The second
one are platforms and blockchain. This means new forms how the asset management
infrastructure and value chain can be designed. The traditional models are discussed in
Chapter 4 and innovation is considered in Chapter 5.
From a topical perspective, standard and trend topics can be dierentiated. The
rst one includes to understand how dierent asset or asset classes behave, how their are
selected and managed. Besides the technological trends described above the focus is on
retirement provision. The standard material appears in all rst ve chapters. The trends
in retirement provision are presented in Chapter 5
I am grateful for the assistance of Dave Brooks and Theresia Büsser. I would like to
thank Sean Flanagan, Barbara Doebeli, Bruno Gmür, Jacqueline Henn-Overbeck, Tim
Jenkinson, Andrew Lo, Helma Klüver-Trahe, Roger Kunz, Tom Leake, Robini Matthias,
Attilio Meucci, Tobias Moskowitz, Tarun Ramadorai, Blaise Roduit, Olivier Scaillet,
Stephen Schaefer and Andreas Schlatter for their collaboration, their support or the
possibility to learn from them.
12 CHAPTER 1. INTRODUCTION
Chapter 2
Denition 1. Financial assets are nancial contracts that dene resources over which
property rights are enforced and from which future economic benets can ow to the
owner. An asset class is a group of nancial assets that share predened economic, legal
and regulatory characteristics.
Financial assets are intangible, non-physical assets. Financial assets often are more
liquid than tangible assets. Securities are tradable nancial assets. They are issued
through nancial intermediaries (primary market) and can often be traded on the sec-
ondary market. They dier among others in their ownership, complexity, liquidity, risk
and reward prole, transaction fees, accessibility and regulatory compliance. Traditional
asset classes are equities, xed income securities, money market instruments and curren-
cies. Alternative asset classes include real estate, commodities and private equity. Hedge
funds are not an asset class but an investment strategy dened for liquid asset classes.
The goal of investing is to save today for the benets of future consumption. The
13
14 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
benet after an investment period should be greater than the present direct consumption
of all resources. Investments are made through the use of securities of all kinds - that is,
money, stocks, bonds, ETFs, mutual funds or derivatives.
The AM rm's role to channel savings towards investment can be structured as follow.
It creates products that match investors' needs. By trading the assets AM contributes
to liquidity of nancial markets. Investments are used by rms and governments. AM
are one of the biggest investors in government bonds.
AM makes investment in issued bonds and stocks accessible to small private investors
by using wrappers such as funds: Investors get for a small amount of money access to the
economics of a diversied portfolio of assets. AM also engage with investee companies.
As shareholders they hold the companies accountable and integrate environmental, social
and governance (ESG) concerns in their investment processes.
AM rms are required by law to act in the best interests of their clients and to invest
in accordance with a predened set of rules and principles. They charge a fee which is
based on the value of the assets under management (AuM). AuM grow if investment
is performing which leads to higher fees for AM and higher returns for investors. The
incentives of investors and asset managers to achieve positive returns are aligned. AuM
refers to all the assets managed by a nancial service provider. This includes assets
managed under a discretionary asset management mandate as well as assets managed
under an advisory asset management mandate. Denitions and formulas for calculating
the AUM vary from company to company. Some nancial institutions include bank
deposits, investment funds and cash in their calculations; others limit them to funds
where the investor assigns responsibility for investment decisions to the company.
Pricing and price forecasts of assets are important for investors. There are two ways
to price assets in theory: absolute pricing as an equilibrium outcome in an economy,
and relative pricing using the concept of no arbitrage. Equilibrium pricing is not rele-
vant for AM industry except for the CAPM as a benchmark model, while no arbitrage
pricing is key in derivative pricing. To price stocks and bonds, also called cash assets,
often empirical pricing models are used. They follow from working with data such as the
Fama-French model or more recently by using machine learning and AI. This approach
is the far most used one in the industry although lack of theoretical foundations and mis-
use of statistics often lead to awed investment strategies - data mining, data snooping,
inaccurate backtestings are examples.
1. Who decides?
4. How are asset management services produced and distributed in dierent jurisdic-
tions? - the protability, process, client segmentation, regulation and technology
question.
In the past, technology was mostly needed to implement the investment strategies.
New technologies enable radically new investment approaches that dier from traditional
statistical models such as the Capital Asset Pricing Model (CAPM). But technology is
also the key factor in scaling the business and managing regulatory complexity, i.e. to
keep or increase protability.
Question 4. attracted a large part of the asset management resources in the decade
after the GFC due to regulatory and technological changes and also to dierent client
expectations. This question can be considered as the sum of the following strategic
business issues (UBS [2015]):
• In which countries does an AM rm want to compete in? The answer to this
geographical question depends on the AM rm's actual strength, its potential, the
costs to comply with the country specic regulation, the costs to build up the
human capital and the business and technological complexity.
• Which products and investment areas should the AM rm focus on? Often large
AM rms oer up to several hundred investment strategies.
• What services should be provided and which technologies should be used for them?
• What operating model should be used? This question has a distribution dimension
(global vs. (multi)-local oering), an operational one (centralized vs. decentral-
ized), a value-chain one (in-house vs. outsourcing) and a legal/tax environment
one (on-shore vs. oshore).
Figure 2.1: The size of the area indicates the proportion of global GDP produced in that
area during the years concerned. GDP is measured in USD to oset purchasing power
parity. In each chart, the total assets are displayed in USD. 1 AD means the year 1 anno
Domini in the Julian calendar (worldmapper.org).
Assets under Management (AuM) is the market value of assets that an investment
company manages on behalf of investors. AuM is often used as a measure of growth
between asset managers. As protability varies widely for dierent types of assets, AuM
should be used with caution to draw conclusions about the asset manager's protabil-
ity. GIPS (Global Investment Performance Standards) is the market standard or AuM
reportings investors.
PwC (2015) estimates that global AuM will exceed USD 100 trillion by 2020, up
from USD 64 trillion in 2012. Other estimates are similar. These gures would result
in an annual global compounded growth rate of 6 percent. This rate varies for dierent
geographic regions (Boston Consulting Group [2016]):
• Emerging Markets (EM): South America, BRIC states, Middle East, Eastern Eu-
rope: 8.5% p.a.
The dierent growth rates dene opportunities for wealth managers in developed markets
to oer solutions in fast-growing markets. Therefore, market access for the development
of AM plays a prominent role. At the individual level, per capita GDP in 2016 was USD
11'000 for the emerging economies and USD 47'000 for the industrialized countries. The
2.1. WEALTH OF NATIONS AND ASSETS UNDER MANAGEMENT (AUM) 17
estimates for the period 2016-2021 are 150% for the EM and 50% for the developed ones
(IMF World Economic Outlook [2016]).
The evolution of EM can also be seen by considering specic assets, see Table 2.1 for
the emerging market E(M) bonds market share growth. 20 years ago almost 100% of
Table 2.1: Bond market shares (Barclays Capital, BIS, FactSet, J.P. Morgan Asset Man-
agement [2016]).
the EM bonds had a high yield creditworthiness, in 2016 only 45% had such a rating in
the JP Morgan EM bond index and therefore with 55% with an investment grade rating.
Figure 2.2 shows other dimensions of the EM developments.
Wealth growth must be compared to the dynamics of wealth inequality. Increase in
inequality is likely to destabilize the growth of wealth as it leads to social and political
instability. Inequality risks are among the highest risks in the annual global risk map of
the World Economic Forum. On the one hand, the global increase in wealth has been
the main reason for poverty to fall worldwide at a level never seen before in history.
CO2 emissions due to the changed living conditions, mobility, meat dominated food and
tourism among others will trigger or reinforce global economic and social tension.
The global wealth projections of PwC (2015) for dierent types of investors are shown
in Table 2.2.
Clients 2012, USD tr. 2016, USD tr. E2020, USD tr. Growth rate p.a.
Pension funds 33.9 38.3 53.1 6.5%
Insurance companies 24.1 29.4 38.4 4.8%
SWF 5.2 7.4 10 6.9%
HNWIs 52.4 72.3 93.4 4.9%
Mass auent 59.5 67.2 84.4 6.7%
Table 2.2: There are double counts. Assets of wealthy individuals (HNWIs) are invested
in insurance and pension funds. Mass auent refers to individuals with liquid assets
between USD 1-3 mn. HNWIs possess liquid assets of USD 3 - 20 mn. The categorization
is not unique. The predictions of the 2020 AuM changed from 2015 and 2018 vista time.
While the numbers were stable for pension funds and insurance companies, the forecast
for HWNI was signicantly correceted upwards and the mass auent number for 2020
is now signicantly lower (PwC [2015], PwC [2018]).
Mass auent clients and HNWIs in emerging markets are the main drivers of AuM
18 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
0.3
8.00%
0.25 6.00%
4.00%
0.2
2.00%
0.15
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
0.00%
Taper Tantrum 2017 Taper Tantrum 2017
EM Consumption US Consumption
Figure 2.2: Upper left panel: Share of global nominal consumption measured in current
USD expenditures. Upper right panel: EM country fundamentals at the time of the
taper-tantrum and measured at the beginning of 2017. Lower panel: Creditworthiness
of EM countries. The right panel shows the divergence for dierent EM countries. (
J.P. Morgan Guide to the Markets, UN, World Bank, J.P. Morgan Global Economics
Research [2013, 2015, 2016])
growth. The global middle class is projected to grow by 180 percent between 2010 and
2040, with Asia replacing Europe as home to the highest proportion of middle classes as
early as in 2015 (OECD, European Environment Agency, PwC [2014]). The growth of
pension funds will be large in countries with fast growing GDPs, weak demographics and
dened contribution pension schemes.
2.2 Investors
There are dierent types of investors: private clients, high net worth individuals, pension
funds, family oces or state investment funds. At a higher level, investors are divided
into private investors and institutional investors. The ownership of assets between these
two categories changes over time, see Figure 2.3 for the US.
2.2. INVESTORS 19
Figure 2.3: Equity ownership in the US. In the 1950s, 90% of equity in the US were held
by private investors. This number dropped almost linearly to 40% by the end of 2010
and then began to rise slightly. The fraction of equity ownership held by institutional
investor follows the opposite evolution. Source: Rohner [2014].
Private investors show a strong real estate dependence in their balance sheet, see
Figure 2.4 for the Swiss case. In particular younger investor face a large leverage eect
of mortgage nancing: the ratio of assets (real estate) to existing capital is large. Small
changes in the property asset price have a signicant impact on the balance sheet equity
of the investor. Interest rate risk and real estate market price risk aect the asset. The
latter risk is more dangerous for the investor's default.
Consider a private investor which bought a house worth CHF 1 million. The 'golden
rule of aordability' in Swiss banking states that the investor needs to cover 20% of the
house price with his own capital and that the interest rate charge for the mortgage should
not exceed 1/3 of regularly income assuming a hypothetical high interest rate level of 5%.
For a mortgage of CHF 8000 000 regular income of the investor has to be not lower than
20 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
CHF 3 × 0.05 × 8000 000 = 1200 000. Suppose that the investor gets a mortgage with xed
5 year rate of 1% which is a plausible number in a zero interest rate risk environment.
He therefore pays for the next 5 years without any amortization payments CHF 8 000
0
per annum which is much less than renting the same object. Assume that the remaining
liquid capital of the investor is CHF 1000 000 and an annual salary of CHF 1500 000.
The leverage ratio of the investor, the ratio of the asset value to equity value, is
10 0000 000
λ= 1000 000 = 10. Consider two scenarios. First, interest rates are up in ve years to
3%. Second, house price fall by 15% in the next ve years. The rst scenario implies
that the investor has to pay
0
CHF 24 000 per annum for the interest rate charge for the
new mortgage after ve years. Three times more than in the past but still an aordable
part of income. In the second scenario. the house is only worth CHF 8500 000. Since
the investor should always cover 20% of the house price, a maximum mortgage of 80%
means a value of CHF 6800 000 since the new house price is CHF 8500 000. The investor
has to pay the dierence of the old and new mortgage value of
0
CHF 120 000. This is an
annual salary! Hence, house price risk is more severe risk than interest rate risk.
Given the importance of real estate risk for private clients it is not understandable
why the myriad of sophisticated wealth management tools almost always only consider
the nancial assets leaving aside the house asset and mortgage debt. But not only retail
investors use mostly an asset only approach in investment. Research from State Street
(2014), using data from a worldwide survey of 3, 744 investors, shows that although
2.2. INVESTORS 21
nearly 80 percent of investors realize the importance of achieving long-term goals but
prociency in achieving them can strongly deviate. In the US, public pension funds were
on average less than 70 percent funded, with more than USD 1.3 trillion of unfunded
liabilities. A similar picture holds for private investors. While 73 percent cited long-term
goals only 12 percent could say with condence that they were on target to meet those
goals. Many academic papers address the misalignment between what investor's say is
important (ALM) and what they do (asset only). There is a myriad of possible reasons
for this dierence between what they state and what they do which are discussed in the
papers.
Investors dier also in the type of nancial assets they buy. The more professional
investors are, the more they invest in cash products. They do not use mutual funds or
structured products, since they can create the same payos without paying the wrapper
costs. Figure 2.4 shows on the aggregate of all investors that bond investments and struc-
tured products did not grow in the last decade opposite to the growth of funds and shares.
Individuals and smaller pension funds prefer mutual funds and structured products.
One reason is lack of capital to reach a reasonable diversication. We discuss below that
a Swiss investor needs about CHF 1.5 million in order to achieve a reasonable diver-
sication by investing in cash products. The second reason is that individuals fail to
have direct access to some markets and they cannot enter into short positions or are not
allowed to trade derivatives under the International Swaps and Derivatives Association
(ISDA) agreement. They are forced to buy derivatives in the packaged form of a mutual
fund or a structured product.
Why are there so many SWFs in emerging markets? More than 50 percent of all large
SWFs originate in oil and Asian governments are much more active in managing their
economies than some of their western counterparts. According to Ang (2014), another
reason is that the US, after the many state bankruptcies of the 1980s and 1990s, told
emerging markets to save more. In recent years, a debate has begun on whether it is
productive to accumulate so much capital in sovereign wealth funds. Would it not be
more productive to invest capital directly in the local economy?
Many SWFs accumulate liquid assets as reserves for unexpected future economic
shocks. This forms a long-term precautionary savings motive for future generations.
22 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
This motivation is crucial for the acceptance of a SWF. A SWF can only exist if it has
public support. This public support is a sensitive issue. Scandals due to incompetent
fund management, lack of integration of the fund into economic strategies, political
mismanagement and criminal acts should be avoided. All changes in the risk policy for
asset management must be documented and communicated to the owners of the Fund.
For example, the Norwegian SWF initially invested only in bonds. Only after a broad
public discussion was a diversication of investments into other asset classes considered.
This behavior of Norwegians is unique and rooted in their democratic tradition.
Pension funds are one part of the total pension system of a country, which is often
divided into three pillars:
• Pillar I - This pillar should cover the subsistence level and is often organized ac-
cording to the pay-as-you-go system. Each month, employees pay part of their
salary, which is immediately distributed to pensioners.
• Pillar II - This is the pillar of the pension funds. It should be enough to cover
the cost of living after retirement together with pillar I. The asset owners only
have limited access to their assets. There are two types of funds: Dened Benet
(DB) and Dened Contribution (DC). DB plans are based on predetermined future
benets for the beneciaries, but keep the contributions exible. DC plans x the
contributions but not the future benets. In summary, the contributions dene the
benets in the DC plans and the benets dene the contributions in the DB plans.
• Pillar III - Privately managed investments, which often have tax advantages. Access
to assets before retirement is usually limited.
Figure 2.5 illustrates the importance of dierent pillars in dierent countries. Re-
tirement systems are under pressure in most industrialized countries due to demographic
changes and increasing longevity. For the rst pillar, demographic change means that
working people pay on average for a growing number of retirees. This jeopardizes the
2.3. PENSION FUNDS 23
Figure 2.5: Left panel: The importance of the three pillars in percentage of retirement
income (ABP [2014]). Right Panel: Basic form of DC and DB pension plans.
The threat to the rst pillar has major implications for national budgets. The rst
pillar accounts for more than 90 percent of retirement income in Spain. For Germany,
France and Italy, the value is between 75 percent and 82 percent. Given the extremely
low fertility rates and high unemployment among young people in Spain and Italy, the
rst pillar can not survive. Shifts into the second or third pillar are required, which rep-
resents an opportunity for asset management. But this only makes sense for the workers
with a regular income. The pension problem of the mass of today young people without
work remains unresolved.
• Benchmark
• Longevity
The results indicate the increase in rst pillar contributions. The assumptions in the
scenarios are mild given tha in southern Europe unemployment rate for generations of
young workers are higher than 20% and that the working class in Japan will most likely
drop by 50% in the next 20 years. This is the reason why Japan heavily invests in robo
technology to substitute missing human workforce.
In the DB plans, the pension is set in relation to the last average salaries, see Figure
2.5. The contributions are calculated in such a way that they generate a predened cap-
ital stock at the end of working life. Therefore, an increase in salary requires additional
funds in order to maintain the full rent. On the other side, a year with very low income
can have dramatic eects for the contributor in the retirement period. Since the nancing
amount can change on an annual basis, they are considered intransparent.
In DC plans, the xed contributions are invested in several asset classes and the rent
is only weakly related to the most recent salary of the contributor. The growth of the
invested capital, including interest payments, implies a nal capital value at the end of
working life. The conversion rate applied to that nal capital level nally denes the
annual rent. Contributors to DC plans - contrary to those who contribute to DB plans
- bear the investment risk. This makes this form of pension system cheaper to oer for
employers. Unlike DB plans, the contributors can at least partially inuence investment
decisions - that is, choose the risk and return levels of the investments. This is one reason
why DC plans have become more attractive to contributors than their DB counterparts.
Finally, in some jurisdictions, DC plans are portable from one job to the next, while DB
plans often are not portable.
Underfunding is a serious problem. The S&P 500's biggest pension plans faced 2018
a $ 382 bn funding gap, of the 200 biggest DB plans in the S&P 186 aren't fully funded
in 2018. Companies like Intel have a ratio of pension assets to pension obligation of less
than fty percent. In Switzerland, the average funding ratio of private pension funds
in 2013 was 107.9 percent (Kunz [2014)]). The ratio for public funds was 87.8 percent,
showing strong underfunding. Private and public pension funds dier even more severely
when comparing the overfunding and underfunding gaps: For the Swiss private sector,
there is CHF 16.2 billion of overfunding capital and CHF 6.4 billion of underfunding. In
the public domain, the situation is the opposite: CHF 1.4 billion of overfunding versus a
CHF 49.5 billion funding gap.
2.3. PENSION FUNDS 25
Another perspective associated with the transition to DC-based plans is the average
undersavings in such plans. Munnell et al. (2014) report that in 2013 the average DC
portfolio at retirement is USD 110,000, while over USD 200,000 is needed. Finally, DB
and DC dier in their costs. The CEM benchmarking (2011) which considers 360 global
DB plans with 7 trillion USD assets nd a fee range between 36 and 46 bps. Munnell
and Soto (2007) estimate the fees for DC plans between 60 and 170 bps.
Furthermore, private savings are becoming more important due to the problems in
the rst pillar. They will be responsible for a larger part of their assets and bear the
investment risk. Given the inability to cover retirement losses, pension fund clients will
ask for less risky assets.
26 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Several nancing and redistribution risks between the active insured and the retires
exist. Many countries dene a legal minimum xed interest rate which has to be applied
to the minimum benet pension plan. This rate is in Switzerland 1.75% for 2015 and
1.25% in 2016. Given the CHF swap rate for 10 years in 2015 close to zero, it is not
possible for a pension fund to generate the legally xed rate using risk free investments.
This denes nancing risk for the contributing population to a pension plan.
Figure 2.7: The return of the 10y Swiss government bond, the minimum legal rate for
Swiss pension plans and the technical rate for privately insured retired individuals. If
this status remains unchanged in the next years then underfunding becomes a serious
issue and there can be no signicant return expected from investment in the xed income
asset class. The technical rates are even higher than the minimum rates which indicates
the extent at which actual pensions are too high. (Swisscanto [2015], SNB [2015], OAK
[2014]).
2.3. PENSION FUNDS 27
If interest rates are low, pension funds are forced to consider alternative investments:
Invest more or newly in stock markets, credit-linked notes, private markets, liquid invest-
ment strategies (smart beta or factor investing), insurance-linked investments, high-grade
securitized mortgages or senior unsecured loans. These alternatives induce dierent risks
and the experience of many pension funds is limited. Pension funds can also reduce their
costs. This would help but not solve any of the above problems due to demographics,
low interest rates or longevity risk.
Another reason is implicit or explicit return guarantees on the liability side. Guaran-
tees cut linear payos of liabilities; i.e. options are generated. Unlike standard nancial
derivatives on stocks, the pricing of these options is much more complex and opaque:
the underlying assets are not tradeable and risk-sharing mechanisms must be considered
in option pricing. These options are often neither valued nor hedged. But they exist
adversely aect the goals of a pension fund. A third reason is the overlapping of genera-
tions in the design of the pension system, i.e. generation x pays also for say a yet retired
generation.
We are pursuing the less ambitious task of taking the asset side management into
account, with the liability side implicitly included in the asset return benchmark. It is
customary to divide the yield contribution into three parts: strategic asset allocation
(SAA), tactical asset allocation (TAA) and stock selection. The SAA is an asset alloca-
tion over a long-term period of 5-10 years. It is based on unconditional past information;
returns are unconditional expectations. The TAA seeks to exploit the predictability of
returns over a short to medium term horizon. TAA forecasts are conditional expecta-
tions, the current status of the nancial market or the business cycle matter. As a result,
SAA weights change slowly over time while TAA weights are more dynamic. Formally,
with Ft the information set at time t. The denition of this set is basic in the Ecient
Market Hypothesis or the predictability of asset prices.
establish exposures to permissible asset classes and currencies. The end result is a set of
portfolio weights (of asset classes) that denes the investor's risk-return trade-o.
The SAA's primary objective is to create a long-term optimal expected risk and
return asset mix. The SAA divides assets into dierent asset classes, geographic regions,
sectors, currencies and various credit rating levels.
The TAA bets on the predictability of asset return. But are asset returns predictable?
Although the concept of a TAA has existed for more than 40 years, practitioners and
scientists attribute dierent meanings to a TAA. Practitioners use a one-period setup to
dene a TAA. Academics often use intertemporal portfolio theory to derive dynamic op-
timal investment rules. This theoretical optimal TAA that has a short-sighted one-period
and a dynamic hedging demand component. The short-sighted part of the optimal TAA
corresponds to the TAA of practitioners. The other component is missing in practice,
see Sections 4.3.4.6 and 3.1.9.
The rst investment rm to consider a TAA was Wells Fargo in the 1970s. The decline
in many assets during the 1973-1974 oil crisis increased investor demand for alternatives
to shifts within a particular asset class. Wells Fargo proposed shifts across asset classes
and bonds. The system was able to generate positive returns over a period when stock
markets fell more than 40 percent. In the 1980s, portfolio insurance became popular
based on the option price theory. These dynamic strategies seek to maintain a guaranteed
minimum portfolio return (oor). The Constant Proportion Portfolio Insurance (CPPI)
approach largely simplied the option approach, making portfolio insurance even more
attractive to investors. The global stock crash in 1987 shifted the investor's interest
away from portfolio insurance back to TAA, as portfolio insurance strategies mostly did
not deliver the guaranteed oor, while TAA strategies suered before the crash, but
outperformed shortly thereafter. We refer to Lee (2000) for a detailed discussion.
Let's go back to the management of pension funds. We assume that the people at the
top of the funds have little investment knowledge. Their decisions concern the SAA. At
the lower end of the fund hierarchy are the experienced asset managers. Their success
is measured in relation to TAA and they seek to generate excess returns over the TAA
benchmark by selecting assets. However, many empirical studies show that SAA is the
most important determinant of total return and risk of a broadly diversied portfolio.
This denes the discrepancy between economic relevance and know-how in the hierarchy
of decision-makers.
• Brinson et al. (1986) report that around 90 percent of the return variance arrives
from the passive investment part. Subsequent papers claried these ndings and
estimate the importance of these returns being between 40 percent and 90 percent
(see, for example, Ibbotson and Kaplan [2000]). Schaefer (2015), one author of the
2.3. PENSION FUNDS 29
professors report to the Norway's Government Pension Fund Global, states that
the variance attribution to the benchmark return was 99.1% and only 0.9% was
attributed to the active return.
• Between 5 and 25 percent are due to TAA and related to the Chief Investment
Ocer (CIO) function.
• Between 1 and 5 percent are due to security selection by the portfolio managers.
The economically most advanced societies face another population problem. Each fu-
ture generation will be smaller than the one that preceded it. For some, this has already
become a matter of national survival. Triggered by low fertility rates, this phenomenon
is gaining ground worldwide: 46 percent of the world's population has fallen into a low-
fertility regime. There is nothing to indicate that this rate is going to recover. Magnus
(2013) states that (i) the ratio of children to older citizens stands at about 3 : 1 but is
declining. By 2050, there will be twice as many older citizens as there are children, (ii)
the number of over-60s in the rich world is predicted to rise by 2.5 times by 2050 to 418
million and (iii) in the emerging and developing worlds, the number of over-60s will grow
by more than seven times to over 1.5 billion by 2050, and behind this, you can see a
17-fold increase in the expected population of those aged over 80, to about 262 million.
Magnus (2013)
Malthus (1798) were the rst to study the interdependence between economic growth
and population growth. He assumed that as long as there was enough to eat, people
would continue to produce children.
Since this would lead to population growth rates in excess of the growth in the food supply,
30 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
people would be pushed down to the subsistence level. According to Malthus's theory,
sustained growth in per capita incomes was not possible; population growth would always
catch up with increases in production and push per capita incomes down. Of course, today
we know that Malthus was wrong, at least as far as the now industrialized countries are
concerned. Still, his theory was an accurate description of population dynamics before the
industrial revolution, and in many countries it seems to apply even today. Doepke (2012).
Hence, for Malthus children were a normal good. When income went up more
children were consumed by parents. Using a micro economic model the equilibrium
supports the above intuition: An increase in productivity causes a rise in the population,
but only until the wage is driven back down to its steady- state level. Even sustained
growth in productivity will not raise per capita incomes. The population size will catch
up with technological progress and put downward pressure on per capita incomes. This
model explains the relationship between population and out-put for almost all of history,
and it still applies to large parts of the world today. Doepke (2012).
But ageing in developed countries occurs in parallel with better health, more extensive
education, and related societal changes. We are not just living longer, we are slower to
age. Boersch-Suppan et al. (2005, 2006, 2013) make this precise. They nd that:
2.3. PENSION FUNDS 31
• The average expected healthy life expectancy of men at the age of 65 is larger than
5 years for men living in any European country.
• Using more than 4.8 million data sets of a large insurance company, the authors
measured the productivity of dierent aged workers for dierent type of work
classes: Contract negotiation (the most challenging jobs), standard advice of cus-
tomers and repetitive jobs. They found that older workers made more error in the
repetitive jobs than the younger ones but were signicantly more productive in the
challenging jobs than their younger counter parts.
Fact 3. The discussion whether we can work until the age of say 67 is for the average
population not related to its healthiness. The retirement age could from a health point of
view be raised to 70 years. The tendency of ring older works is destruction productivity
since the experience of the older, motivated workers generates a higher productivity for
demanding jobs as this is achieved by younger ones.
We spend longer in education; we travel more before permanently joining the work-
force; we start families later. We don't think of ourselves as being as old as previous
generations would have at the same age. The eect of all these changes taken together
is not that society is ageing, but that it is getting younger. Finally, a society with
a predominantly young population has a dierent productivity level than a more aged
population. Syl and Galenson show that 40 percent of productivity increases are down
to young people who enter new markets. These young people break with tradition and
manifest new ways of thinking. Google and Facebook are two prominent examples. Older
individuals possess more experience and wisdom. But Syl and Galenson state that this
only gradually changes productivity.
To manage the emerging demographic regime, innovative policies and new ways of think-
ing about population are called for. Romaniuk (2012). This change in the structure of
society will have many consequences. One of the most signicant will be a labor short-
age. If societies are going to maintain their standard of living, they are going to have
to avoid any reduction in the workforce as a proportion of the total population. At the
same time, many people are going to reach retirement age and realize that they do not
have enough income to maintain what they feel is an acceptable standard of living. The
combination of these two issues will put a lot of pressure on our current views on the
relationship between working and retirement. Employment and retirement laws designed
for a young and growing population no longer suit populations that are predominantly
old but healthy and capable of being productive, all the more so in a work environment
of automated technology. Prevailing family assistance policies are equally antiquated.
Though the maternity instinct may still be present as it always was, women's conditions
have radically changed. The women of today in developed countries, and throughout
the modernizing world, are faced with many deterrents to maternity (e.g., widespread
32 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
celibacy, marital instability, nancial insecurity) on the one hand, and with many ful-
lling, nancially well-rewarded opportunities on the other. So much that they are left
with little incentive to trade the latter for the uncertainties of motherhood.
It is easier to bring population down than to make it up, writes John May (2012).
And that is why - in order to escape the sub-replacement fertility trap and to bring the
fertility rate to, and sustain it at, even a generational replacement level, Romaniuk (2012)
- we need to bring to bear meaningful nancial and social rewards for maternity. The
current family allowance and other welfare-type assistance to families cannot do this.
Societies under a demographic maturity regime may need to have in place permanent,
'life-sustaining' mechanisms to prevent fertility from sliding ever lower. Instead we need
a more balanced resource allocation between production and reproduction.
The Melbourne Mercer Global Pension Index report (MMGPI [2015]) from the Aus-
tralian Centre for Financial Studies and Mercer compared the status of the retirement
systems of 25 countries. The index is based on the following construction; see Figure 2.8.
Although it is called a 'pension index', it allows one to consider the entire retirement
systems of the dierent countries. Figure 2.9 summarizes the results for the 25 countries
surveyed.
Figure 2.8: The Melbourne Mercer Global Pension Index (GMMPI [2015]).
Figure 2.9: Summary for the 25 countries in the Melbourne Mercer Global Pension Index
as of 2015 (Adapted from GMMPI [2015]).
34 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
• Increasing the retirement age. For countries with a high unemployment rate this
is not a feasible alternative.
• Keeping the pay-as-you-go systems and reducing the contribution to the pension
funds.
The asset allocation of pension fund assets diers signicantly between countries.
The exposure to growth assets (including equities and property) varies and ranges from
less than 10 percent, in India, Korea, and Singapore, to about 70 percent in Australia,
South Africa, the UK, the US, and Switzerland. GlobalPensionIndex (2015). The more
growth assets are included in the asset allocation, the larger are the risks: there were
signicant declines in the value of assets in 2010 and 2011 reecting the consequences of
the global nancial crisis of 2007 and 2008. However, since that time there has been a
steady recovery in the level of pension assets in each country surveyed as equity markets
have recovered. GlobalPensionIndex (2015).
The expansion of investment strategies means to apply factor investing for example.
All pros and cons of last sections also apply to the pension system case. The third pos-
sibility means to make some illiquid asset classes accessible for pension funds. Examples
are private equity, insurance-linked investments and securitized loans. These are the
typical examples given.
Example
2.3. PENSION FUNDS 35
Asset managers can become more important nancial actors by driving the raising of
capital and the capital deployment required to meet the demands of growing urbaniza-
tion and cross-border trade. The world urban population is expected to increase by 75
percent between 2010 and 2050, from 3.6 billion to 6.3 billion. The urban prole in the
east will see many more 'megacities' (cities with a population in excess of 10 million)
emerging. Today's number of 23 megacities will be augmented by a further 14 by 2025,
of which 12 will be in emerging markets.
This will create signicant pressure on infrastructures. According to the OECD, USD
40 trillion needs to be spent on global infrastructure through 2030 to keep pace with the
growth of the global economy. Some policy makers appear to have taken the problem
on board: in Europe - after considerable debate - the European Long Term Investment
Funds (ELTIF) initiative was nally created in 2013, helping European asset managers
to invest in infrastructure. But infrastructure investments will disproportionately target
emerging markets and emerging markets' asset managers have recognized this and already
started to focus on it.
The authors use data from eVestment and limit their analysis to US long-only equity
products, which can be considered to be among the ecient markets. In the approxi-
mate period 1999 to 2011, one-quarter of these products were recommended annually by
investment consultants and the rest were not recommended. This much larger number
of recommended products compared to the non-recommended ones remains stable in the
dierent years studied.
The authors nd, the rst question, that consultants' recommendations are partly
driven by past fund performance, but also by other soft factors such as service quality and
investment quality factors, Jenkinson et al. (2014) : to be recommended it is not su-
cient to have a strong return history. The authors then analyze whether the size of the
fees charged has an impact on the recommendation rate. If this were the case, conicts
of interest would be suspected. The analysis shows that this is not the case. Fees are
very similar for recommended and non-recommended products independent of the size of
the products and their styles (growth, value, small- and mid- cap). The fees are in line
with the fees in Section 2.7.4.3 - that is to say, close to 70 bps for larger products.
The answer to the third question created a lot of public attention. They construct
equal- and value-weighted portfolio returns of recommended and non-recommended prod-
ucts. Using the returns of these portfolio they estimate one- (the CAPM), three- (FF),
and four- factor (FFC) alphas and excess returns over portfolios of selected benchmarks.
For the equally weighted portfolios, the returns of the recommended products were
signicantly lower than those of the non-recommended ones by the order of 1 percent in
magnitude, independent of the factor model chosen (see Figure 2.10). For value-weighted
Value-
portfolios, dierent factor models lead to dierent returns for the two alternatives.
weighted returns and alphas are consistently lower, suggesting that smaller products per-
form relatively better. Jenkinson et al. (2014). Summarizing the evidence: investment
consultants are not able consistently to add value by selecting superior investment prod-
ucts.
worse. However, after adjusting for dierent sizes, the explanation turns out to be wrong.
Figure 2.10: The table shows the performance of portfolios of actively managed US equity
products that experience a net increase (decrease) in the number of recommendations in
the twelve or twenty-four month period following the recommendation change. Perfor-
mance is measured using raw returns; returns in excess of a benchmark chosen to match
the product style and market capitalization; and one-, three-, and four-factor alphas
(corresponding to the CAPM, the Fama - French three-factor model, and the Fama -
French - Carhart model). Excess returns and alphas are expressed in percent per year.
All reported gures are gross of fees. The rst part of the table shows the results for
equally weighted portfolios of products whereas the second part of the table shows the
same statistics for portfolios of products weighted using total net assets at the end of the
previous year. t-statistics based on standard errors - robust to conditional heteroscedas-
ticity and serial correlation of up to two lags as in Newey and West (1987) - are reported
in parentheses. ***, **, * mean statistically signicant at the 1, 5, and 10 percent levels,
respectively. The benchmarks for the investment products are the corresponding Rus-
sell indices. Investment product large cap growth is benchmarked by the Russell 1000
Growth, the small cap value by the Russell 2000 Value, etc. (Jenkinson et al. [2014]).
These results raise several questions. First, why do pension funds use - on a rational
basis - investment consultants that add no value? The argument, that consultants act as
insurance against being sued is simply not justiable. Second, it is dicult to understand
why investment consultants are virtually unregulated in most jurisdictions.
38 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
An investment decision today has to meet much more regulatory standards than in
the past. Regulation denes restrictions and rules for decision-making, but it never sets
an AM rm goals.
Individual regulations can have strategic or operational implications for AM. High
operational impacts have UCITS, PRIIPS, EMIR or MiFID II.
1 Low Strategic Impact
have PRIIPS and MAD II. MiFID II, the Volcker Rule or Dodd-Frank Act, UCITS have
a high strategic importance. The ability of international banks and large AMs after the
GFC to comply quickly and integrate the regulatory program into their strategic plan-
ning resulted in a competitive advantage over smaller institutions. The know-how of the
international institutions enables them to participate actively in the technological change
eciently. They are almost invulnerable, despite the many and heavy nes imposed on
them by the many scandals in recent years.
Example Impact of Regulation on the Swiss banking sector and asset management
Regulatory burden togehther with broken business models impact the nancial indus-
try. It is estimated that of the approximately 300 Swiss banks in 2014, about one-third
will stop operating as an independent brand. A KPMG study from 2013 (KPMG [2013])
summarizes:
1 PRIIPs are the Packaged Retail Investment and Insurance-based investment Products documents
and UCITS is The Undertakings for Collective Investment in Transferable Securities Directive for collec-
tive investments by the European Union. Obligations for central clearing and reporting (EMIR, Dodd
Frank) and higher capital requirements for non-centrally cleared contracts (CRR), the obligation to trade
on exchanges or electronic trading platforms is considered by revising MiFID, the so-called The Markets
in Financial Instruments Regulation (MiFIR). US T+2 means the realization of a T+2 settlement cycle
in the US nancial markets for trades in cash products and unit investment trusts (UITs). FIDLEG is
part of the new Swiss nancial architecture which should be equivalent to MiFID II of the euro zone.
In 2013, following the LIBOR and EURIBOR market-rigging scandals, the EU Commission published
legislative proposal for a new regulation on benchmarks (Benchmark Regulation). The Asia Derivative
Reform mainly focus on the regulation of OTC derivatives and should therefore be compared with EMIR
and Dodd-Frank Act. The Market Abuse Directive (MAD) in 2005 and its update MADII resulted in
an EU-wide market abuse regime and a framework for establishing a proper ow of information to the
market. BCBS considers principles of risk data aggregation and reporting by the Basel Committee on
Banking Supervision. Comprehensive Capital Analysis and Review (CCAR) is a regulatory framework
introduced by the Federal Reserve in order to assess, regulate, and supervise large banks and nancial
institutions. EU FTT means the EU Financial Transaction Tax. IRS 871 (m) are regulations of the
IRS about dividend equivalent payment withholding rules for equity derivatives. CRS are the Common
Reporting Standards of the OECD for the automatic bank account information exchange.
2.4. WHO DECIDES? 39
• A total of 23 percent of Swiss banks faced losses in 2012. All of them with AuM
of less than CHF 25 billion.
• Non-protable banks in 2012 were mostly not protable in previous years too.
• Dispersion between successful banks (large and small ones) and non-performing
banks (small ones) is increasing.
• The performance of small banks is much more volatile than that of larger ones.
• A total of 53 percent of the banks reported negative net new money (NNM).
Small asset managers, many of them rms with less than 5 employee's, faced after the
GFC's due to the regulatory and legal changes a cost and lack of knowledge problem.
They failed to have legal and compliance know how and it was also not protable to higher
specialists in these elds. Similarly, they could not invest in new, scalable technologies for
accounting, strategy construction, performance calculation and attribution. etc. Both
factors led to platform-as-a-service (PaaS) innovations where the dierent services are
outsourced and are bought by connecting via API technology.
Many of the regulatory initiatives launched in recent years are related to asset man-
agement and trading. We consider the eurozone. The Alternative Investment Fund
Managers Directive (AIFMD) mainly acts in the hedge fund sector, whereas UCITS are
key for the fund industry. EMIR regulates the OTC derivative markets, and PRIIPS
initiative is responsible for the key information for retail investors in the eurozone. Mi-
FID II provides harmonized regulation for investment services across the member states
of the EU with one of the main objectives being to increase competition and consumer
protection in investment services. In the US, the Dodd-Frank Act is the counterpart of
many European initiatives.
Regulatory initiatives place greater demands on asset managers and their service
providers. They force changes in the areas of customer protection, agreements with
service providers, disclosure of regulatory and investor information, distribution channels,
trade transparency, and compliance and risk management functions (PwC [2015]).
2.4.1 MiFID II
The MiFID II Directive implements the G20 Pittsburgh Summit Agreement in 2009 in
the euro area and for all non-EU nancial intermediaries oering investment products in
the eurozone. It requires the adoption of 32 legal acts by the European Commission, 47
regulatory standards, 14 performance standards and 10 packages of measures.
2 MiFID
2 Similar remarks apply also to other regulatory initiatives such as Dodd Frank Act in the US. Its
implementation requires to create 398 new rules for governing nancial activities, disclosures and pro-
40 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
• The creation of a robust framework for all nancial market players and nancial
instruments.
• Improving the supervision of the various market segments and market practices, in
particular OTC nancial instruments.
Investor protection is based on four topics. First, inducements, i.e. the need to
disclose independent versus non-independent status of advice and the prohibition for
discretionary managers and independent advisers to be involved in inducements. Prod-
uct governance means that the manufacturers' product approval process has to include
the target market denition which has to be taken into account by the distributors and
which has to be tracked by the asset managers. Suitability and appropriateness
requires from all investment rms operating in EU countries to provide clients with ade-
quate information for assessing the suitability and appropriateness of their products and
services, and to comply with best execution obligations. Finally, client information
requires that enhanced information is shared with clients, both regarding content and
method such as in particular costs and charges for services or advice.
• Execution only: Investors decide themselves and investment rms only execute
orders.
• Advisory: Investors and investment rm sta interact. While relationship managers
or specialists advise the investor, the investment decision is nally made or approved
by the investors themselves. Advisory was the traditional intermediation channel
before the nancial crisis of 2007.
cesses, conduct 67 studies, and issue 22 periodic Reports. The law itself consists of 2'300 pages, without
estimating the nal documents for implementation.
2.4. WHO DECIDES? 41
Figure 2.11: Client segmentation and intermediation segmentation as per MiFID II.
• Mandate: The investor delegates the investment decision in a mandate. The man-
date contract reects the investor's preferences. The portfolio manager chooses
investments within the contracted limits. Many banks and asset managers moti-
vated their clients to switch from the advisory to the mandate channel after the
GFC. The main reasons for this are lower business conduct risk and better oppor-
tunities for automatization. These reduce production costs and enhance economies
of scale. Since the active portfolio managers are benchmarked against the CIO's
TAA mandates they face the same problems as actively managed funds - most of
them will turn out to be zero-alpha funds, see Section 4.6.6.3. This will motivate
many customers to move back to the advisory or execution only channel.
Client Segmentation. Investment rms must dene written policies and procedures
according to the following categorization:
Wealth as the sole variable for the classication of customers is no longer applicable.
Customers can both opt to opt up or down. So they can choose a less or stringent
protection category. Suitability and appropriateness requirements are dened in each
42 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
cell of the 3×3 segmentation matrix (Figure 2.11). Client suitability addresses the
following six points:
1. Information on clients
5. Investment objective
These six points reect the parameters that dene the optimization problem of a
rational economic investor. To determine the preferences of an investor one needs to
have general information about the investor (4.1) and specic risk attitudes (6), which
both enter into the objective function (5). The optimization of the objective function
leading to the optimal investment rule is carried out under various restrictions: the
budget restriction (4) and restrictions of admissible securities due to their complexity or
the experience of the investor (3). Tax issues, legal constraints, and compliance issues
also enter into the restriction set and require information to be provided to the client
(4.3). These six points are therefore sucient for the investor to determine his or her
optimal investment strategy.
Client product suitability consists of requirements that ensure that the product
is suitable:
4. Disclaimer
These requirements become less demanding the more experienced the client is. Suit-
ability in advisory services requires qualied sta and an appropriate incentive structure
in the asset management rm.
over its life cycle and compares the risk and return properties with the initially dened
client prole. If necessary, this process sends warning or necessary activity messages to
the client and/or advisor. A CIO view typically consists of several inputs such a quanti-
tative model, research macro view and market view. Smaller institutions do not have the
resources to provide all these inputs. They then buy the CIO view from another bank.
Figure 2.12: An investment process. The three channels from left to right are the client
- advisor channel, the investment oce, and the producers of the assets or portfolios
(trading and asset management).
New trends in technology allow the process outlined in Figure 2.12 to be shaped. In
extremis, there will be no need for an investor to disclose his or her investment preferences
since the data already exist in the virtual world. If, furthermore, the investment views are
formed in a fully automatized manner using publicly available data, then the function
both of advisor's and of the CIO will become superuous. Digital money managers
are enticing with deep barriers to entry. Your performance is impressive. The greatest
weakness is customer understanding.
44 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
2.4.3 Robos
Selma looks young and trendy. Her white-blond hair is formed into a casual bob hairstyle.
The glasses are tinted, the lipstick is bright red. Selma keeps smiling and winking at you.
She's with everyone right now by you. 'Hi! I'm Selma. Let's have a quick chat about
your nances,' she writes. Selma is warm - but not a person of esh and blood. She is
the digital investment assistant for Selma Finance, a Robo Advisor. They have come to
challenge traditional asset management and reinvent it with technical assistance. The
idea is to take the complexity out of classic nancial services in a playful way. Since the
appearance of the iPhone 2008 , little by little, all areas of everyday life are being digitized,
driven by technological progress and the belief that computers not only perform tasks
faster and cheaper, but also better with the help of articial intelligence. This section is
based on Gerbl (2019).
2.4.3.1 Markets
The digitization of wealth management initiated in the USA and Great Britain after the
GFC: A retail customer practically no longer get investment advice in the UK cost and
compliance after the GFC forced banks to change their business model. A niche opened
up that was quickly lled by nancial service providers. Over 100 Robo Advisors are
now active in the USA. Companies such as Vanguard, Charles Schwab, Betterment and
Wealthfront dominate the market. ETF giant Vanguard's Robo alone is responsible for
$ 120 billion in investment money. In total, Robos is increasing more than 800 billion
dollars. The Robos are forecast to manage around USD 2.2 trillion in 2023; one-third of
Blackrocks 2018 AuM.
In Switzerland or Germany, investors are receiving better care. Every regional bank
picks up the customer and covers him with products where technology supports the RM.
Hence, a hybrid model applies so far. This evidently scales much less than the digital
world in Anglo-Saxon or Scandinavian countries. But proponent of the hybrid model
base their business approach on the assumption that wealth management is not bought,
but sold. This does not usually happen with Robos, so there are no huge inows in
Switzerland or Germany so far. Currently, 200 million francs are being managed for end
customers on its own platform in Switzerland (Gerbl (2019)). Is this cultural evidence
strong enough to outweigh the advantages of the Robos
trading is also oered by some rms. In any case the information structure used to
form the portfolios follows the EMH as an anchor (say mean-variance optimization us-
ing historical estimation of the input variables) and as an overlay individual views and
preferences as well collective market participants views. They therefore do not engage in
more expansive active management with its doubtful performance track record. So far
there is not much intelligence in the Robos. But they follow the investment strategy of
the customer and therefore rebalancing is a service which Robos oer.
Clients are oered visualizations to change the weights in the portfolio construction
following their preferences. Anecdotal evidence states that second customer adapts such
pattern strategies. Such patterns cannot be considered as passive investment any longer.
In fact, more and more Robos are oering such active components for wealthier and
more experienced investors. They can playfully simulate dierent strategies and compare
them by backtesting. The programme decides whether the wishes are fullled or not.
Costs are an important component whether or not it pays to use Robos. In comparison
to traditional asset management mandates with costs usually well above one percent,
Robos with at fees of 0.68 percent are a cheaper alternative. In the USA, the average
cost of Robos is less than 0.4 percent. How protable are Robos? Agnesens (2019)
state that `Since the beginning of 2000, Robo strategies have yielded up to two percent
better returns than mixed funds every year.' She compared strategy funds and dierent
Robo-advisory model portfolios, see Figure 2.13.
The dierences increase with increasing risk which is due to increasing costs for strat-
egy funds facing increased risk. In Agnesens' comparison the Robos before costs are even
slightly ahead of the strategy funds.
What are main critics against Robos? First it is claimed that many investors do not
understand Robos, i.e. they are primarily concerned with the investment side and less
with the client. Second, it is claimed that Robos don't know their customers well enough
and don't explain to them what's going on in the markets and with their investments.
There is no deeper understanding of the customer and his needs. A Robo Advisor is far
from the personal advisor you ideally know for many years. If this holds true this can
become a problem in turbulent markets. But one can also state that to know possibly
customers better was not of any help in the nancial crisis.
2.5
2
Annual Return %
1.5
0.5
0
0 2 4 6 8 10 12 14
-0.5
Annual Volatility %
Figure 2.13: Strategy funds versus Robo Advisory strategies after costs. The blue dots
represent strategy funds. The red symbols Robo Advisors. The red circle are model
portfolios with equity up to 25%, the diamonds are between 25 − 50% and the squares
are for portfolios with more than fty percent equity. (Agnesens [2019]).
whether to delegate the TAA to external portfolio managers in the form of a mandate or
whether they will manage the assets within the fund. Furthermore, the benchmark and
the denition of risk-based areas for tactical asset allocation have to be determined. It
must also be decided whether the reporting, administration and risk controlling functions
of the investment portfolios should also be outsourced. In case of outsourcing, request for
proposal will be used. The entire investment decision outsourcing process is conducted
with the involvement of external consultants. Goyal and Wahal (2008) estimate that 82
percent of US public pension funds use pension consultants.
We discuss in Section 2.3.2.4 that the extensive use of investment consultants raises
is by no means free of conicts for the performance of the delegated investments and
for the selected asset managers. Critics for example often make them the accusation to
be drivers of new investment strategies which turn out to be more complex (hence more
dicult to handle, understand and also more expensive) than the actual used ones but
where it is not clear whether they lead to a larger performance.
The other steps in the process, as illustrated and described in the last gure, are
evident.
2.4. WHO DECIDES? 47
Example
The Financial Stability Board (FSB) stated in 2013: One of the key lessons from
the crisis was that reputational risk was severely underestimated; hence, there is more
focus on business conduct and the suitability of products, e.g., the type of products sold
and to whom they are sold. As the crisis showed, consumer products such as residential
mortgage loans could become a source of nancial instability. The FSB considers the
48 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
• Tone from the top: The board of directors and senior managers set the institu-
tion's core values and risk culture, and their behaviour must reect the values being
espoused.
• Incentives: nancial and non-nancial incentives should support the core values
and risk culture at all levels of the nancial institution.
Conduct risk is a real source of risk for investment rms: nes worldwide amounted
to more than USD 100 billion for the period 2009-2014. These nes and the new reg-
ulatory requirements raise serious protability concerns for investment rms and banks
(see Figure 8). But there is more than just nancial costs at play for the intermediaries.
A loss in trust in large asset managers and banks can prove disastrous. In particular if
new entrants without any reputational damage can oer better services thanks to Fin-
Tech. Figure 2.15 shows the evolution of the nes imposed by the British regulatory
authorities (Left Panel) and the global value of nes. One sees that it took about three
years after the GFC to charge the nes to the banks, insurance companies and asset
managers. The global gures now exceed USD 230 bn since the start of the GFC. The
horizontal lines in the histogram show how large the individual nes were. It follows from
example that there was a ne in 2014 of more than USD 15 bn to a single institution.
In the US, enforcement statistics from the Securities and Exchange Commission (SEC)
show an increase in enforcement actions in the category investment advisor/investment
company of roughly 50% following the GFC. Compared to the pre-crisis gures of 76 and
97 cases per year, respectively, 2011-2014 returned respective gures of 130 and 147 cases.
Figure 2.15: Left Panel: Table of nes imposed in the UK (FSA and FCA web pages).
Right Panel: Global value of nes (FT research, June 2015).
Patton et al. (2013) show that disclosure requirements for hedge funds are
not sucient to protect investors. The SEC for example requires US-based hedge
funds managing over USD 1.5 billion to provide quarterly reports on their perfor-
mance, trading positions, and counterparties. The rule for smaller hedge funds are less
detailed. Instead, one has to care seriously about the quality of the information disclosed.
Are these voluntary disclosures by hedge funds reliable guides to their past perfor-
mance? The authors state:
Vintage analysis refers to the process of monitoring groups and comparing perfor-
mance across past groups. These comparisons allow deviation from past performance to
be detected. The authors nd
that in successive vintages of these databases, older performance records (as far back
as 15 years) of hedge funds are routinely revised: nearly 40 percent of the 18, 382 hedge
funds in the sample have revised their previous returns by at least 0.01 percent at least
once, and over 15 percent of funds have revised a previous monthly return by at least
1 percent. These are very substantial changes, given the average monthly return in the
sample period is 0.64 percent.
Less than 8 percent of the revisions are attributable to data entry errors. About
25 percent of the changes were based on dierences between estimated values at the
reporting dates for illiquid investments and true prices at later dates. Such revisions
can be reasonably expected. In total, 25 percent (50%) of the revisions relate to returns
that are less than three months old (more than 12 months old). They nd that negative
revisions are more common, and larger when they do occur than positive ones. They
conclude that on average initially provided returns signal a better performance compared
to the nal, revised performance. These signals can therefore mislead potential in-
vestors. Moreover, the dangerous revision patterns are signicantly more likely revised for
funds-of-funds and hedge funds in the emerging-markets style than for other hedge funds.
Can any predictive content be gained from knowing that a fund has revised its history
of returns? Comparing the out-of-sample performance of revising and non-revising funds,
Patton et al. (2013) nd that non-revising funds signicantly outperform revising funds
by around 25 basis points a month.
• show on an ad hoc basis when a portfolio is more than the sum of the parts - that
is, more return and less risk;
Table 2.3: Average annual returns and standard deviations of the asset classes and growth
of capital after 88 years. The calculation logic being 71, 239 = 100(1 + 0.075)88 .
The Figure 2.16 shows the distribution of return and risk, measured by the standard
deviation, over 88 years of investments.
In the long run equity had in most economies higher returns and risks than its bond
counterparts. We discuss below why nevertheless an advice to invest in stocks only if
the investor has a long-term horizon is not an optimal strategy. Furthermore, a small
dierence in the average return creates a large dierence in wealth accumulation; the
compounding eect. Finally, gold has in this long period a large risk component but
only a small average return. This rst analysis allows us to consider diversication next.
8.00%
7.00%
5.00%
4.00%
3.00%
2.00%
1.00%
0.00%
-1.00%
-2.00%
0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00%
Standard deviation
Figure 2.16: The distribution of return and risk, measured by the standard deviation,
over 88 years of investments. The square marks represent equity, the diamonds bonds,
the triangle is cash, and the circle is gold (data from Kunz [2014]).
approach: the weights are not optimally chosen using a statistical model but are xed
based on heuristics (experience). We form four portfolio strategies - called conservative,
balanced, dynamic, and growth, see in Table 2.4.
Strategy
Conservative Balanced Dynamic Growth
Equity 25% 50% 75% 100%
CH 10% 20% 30% 40%
Rest of world total (six countries)* 15% 30% 45% 60%
Rest of the world per country 2.5% 5% 7.50% 10%
Bonds 75% 50% 25% 0%
CH 66% 44% 22% 0%
Rest of world total (six countries)* 9% 6% 3% 0%
Rest of the world per country 1.50% 1% 0.50% 0%
Table 2.4: Investment weights in four investment strategies (data from Kunz [2014]).
*Investment in G, F, I, J, USA, UK.
Using data from Figure 2.16 for the dierent asset classes, we get the returns in Table
2.5.
Figure 2.17 shows that a combination of risk and return gures of basic asset classes
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 53
Table 2.5: Average annual return, risk, and wealth growth for the four investment strate-
gies.
can lead to a portfolio from which more return can be expected for the same risk or less
risk for the same return. The green marks for the investment strategies form a virtual
boundary line. In fact, the Markowitz model is an example that there is a ecient frontier
such that there can be no portfolio construction with more return and lower risk than
any portfolio on the ecient frontier within this model approach.
Figure 2.17: Distribution of return and risk, measured by the standard deviation, over
88 years of investments. The square marks represent equity, the diamonds bonds, the
triangle is cash, and the circle is gold. The dots represent the four investment strategies
- conservative, balanced, dynamic, and growth (data from Kunz [2014]).
Consider the rst question. Often employees own many stocks of their employer directly
or indirectly in their pension scheme. Such stock concentration can be disastrous. Enron
employees for example had over 60% of their retirement assets in company stock. They
faced heavy losses when Enron went bankrupt. Diversication reduces such idiosyncratic
risk.
with ρ the correlation between the two assets. Portfolio risk becomes additive only if
the assets are not correlated. A negative correlation value reduces portfolio risk which
motivates the search for negatively correlated risks. If correlation is −1, portfolio risk
becomes a complete square and can be eliminated completely in two risky asset case by
solving σp2 = 0 w.r.t. the strategy. If correlation is +1, which is typical for many asset
classes when markets are under stress, portfolio risk is maximal.
Figure 2.18: Pair-wise correlations over time for dierent asset classes (Goldman Sachs
[2011]).
Elton and Gruber (1977) show that the individual risk of stocks could be reduced
from 49 percent to 20 percent by considering 20 stocks per market. Adding another
980 stocks only reduces risk further to 19.2 percent. The eect of adding more and more
assets has a diminishing impact on risk diversication.
Proposition 4. Assume N uncorrelated asset returns and equally weighted (EW) in-
vestment, that is φk = 1/N for all assets. Increasing the number of assets N reduces
portfolio risk σp2 arbitrarily and monotonically.
The EW-assumption is not necessary but facilitates the proof. To eliminate portfolio
risk completely in an portfolio with uncorrelated returns, one only has to increase the
number of assets in the portfolio. The proof reads:
N N
X 1 1 X Nc
σp2 = var Rj = 2 var Rj ≤ 2
N N N
j=1 j=1
with c the largest variance. If assets are correlated to each other, which removes an
unrealistic assumption in the last theorem, then:
The proof is only slightly more complicated than the former proof, and leads to:
var 1
σp2 = + (1 − )cov .
N N
Hence, covariances prove more important than single asset variances in determining the
portfolio variance. Taking the derivative of the portfolio variance w.r.t. the number of
assets N, − N12 . Adding to N = 4 a further asset
the sensitivity becomes proportional to
1 1
reduces portfolio risk by
25 , adding another asset to 9 assets the reduction is only 100 .
Therefore, reducing portfolio risk by adding new assets becomes less and less eective
the larger the portfolio is.
We show with that the two-asset intuition does not carry over to three or more as-
sets. Consider an investor which wants to increase the return on investment by selling
volatility and correlation of two stocks S1 and S2 . He sells the risk that any of the two
stocks breaches a barrier level in specied time period. The price for this sold volatility
and correlation risk is transformed into a xed coupon which the investor receives. The
sold option is a down-and-in put option since the barrier level is typically lower than the
strike of the option and the option has a value dierent from zero only if the barrier is
breached ('in'). Barrier reverse convertibles are a wrapper for such a payo. An investor
gets at maturity his invested amount plus the coupon if there was no breach or the the
coupon plus the lowest stock value at maturity in case of a breach. The higher the prob-
ability of a breach, the higher the coupon to the investor.
Suppose that both stocks can move up and down with the same probability. If +1,
the change of a barrier breach is 50% - either both move up or down and breach. If −1,
the probability is 1 since one stock has to go down and breach the barrier. If they are
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 57
uncorrelated, the probability is 75% since there is only one state with both up where
there is no breach in the four possible states. Hence, for two assets the more negatively
correlated the assets are the higher the risk of breaching the barrier and therefore the
higher the coupon.
Consider the same investment with 3 stocks. The intuition of the two asset case
does not generalize. Given 3 assets there are 3 pairwise correlations. That all three
correlations equal to −1, which would lead to the highest coupon, is not possible. If two
correlations are −1, then the third one has to be +1. This shows that the 2-asset case
logic does not extend to the three asset case.
A portfolio context is used since building a risk model on say 10'000 individual assets
would mean to consider 10'000 models. Therefore, a risk model is build for all assets.
Traditionally, risk is dened as the variance of returns. Most risk models in asset
management are based on linear multi-factor return models. These models are simple,
clear and tractable. The hope is to capture the dependency structure between the many
assets by considering a much smaller number of factors. Factors should be independent
of one another. If we have N assets, the dimension of the covariance matrix N (N − 1)/2
is reduced to K + N (K + 2), if there are K factors. Formally, for asset i out of N assets,
a generic linear models reads
with D2 the diagonal idiosyncratic covariance matrix with the variances of the idiosyn-
cratic risks as entries and I the identity matrix. The (N × K ) matrix β is the loadings
matrix. The dynamics (2.2) implies
The matrix indicates that the rst and second assets as well the third and fourth
assets are driven by the same risk factor. The other correlations are also of the same
order of magnitude. Instead of considering (4 × 3)/2 = 6 correlations, one would start
with a two-factor model.
The linear factor model for the assets transform in the same functional form for
portfolios. Let φj be the portfolio weights (long or short) which add up to 1. Portfolio
risk reads
Therefore, a risk model specication means to x the factor covariance matrix CF , the
factor exposures β and the residual risks D2 .
• First, factors or betas are not specied. Then a statistical factor model is used such
as Asset Pricing Theory (APT) model. A model provider is Sungard. Principal
Component Analysis (PCA) is used for the estimation. Statistical factor models
are the best in-sample performing ones by construction. The resulting factors are
dicult to interpret and they can vary strongly. The models are not meaningful
in wealth management when portfolio risk has to be explained but they are used
in trading thanks to the their precision for short time horizons to circumvent the
instability problem.
• Second, factors are dened and betas are estimated by a time-series regression.
This set-up is used by UBS, Blackrock, swissQuant, Quantec, R-Squared.
• Third, betas are dened and factors are estimated using a cross-sectional regression.
Providers of this model are Barra, Axioma, Bloomberg.
The second and third model both lose information, i.e. estimation error enters the risk
model either in the stock betas, factor returns and covariances. In the second method,
the estimation error in the betas are diversied away on the portfolio level if N is large.
This is not true for the third model: Estimation risk on the portfolio and individual
asset level are the same. Both methods assume that the variables in the estimation are
observable.
The time-series model (type 2) can only be used when the stock betas are stable over
time. But style investing (factor investing) assumes that betas are not stable. In risk
models where style factors enter, a hybrid approach is necessary - one part for the stable
model (second model) and one part for the style part (third model).
means that assets move in the same direction, which we summarize that diversication
disappears. This observation holds for individual stocks, country equity markets, global
equity industries, hedge funds, currencies, and international bond markets. Basically,
correlation seems to align in the left tail of the risk distribution over all assets. Hence,
using full-sample correlations do not account for this tail behaviour and are misleading.
Prudent investors therefore use additional risk gures such as downside risk measures
and scenario analyses. Chua et al. (2009) documented signicant undesirable correlation
asymmetries for a broad range of asset classes: Correlations increase on the downside
and signicantly decreased on the upside. This is exactly the opposite of what investors
want: Not all assets moving downwards in a crisis but all moving upwards in a boom.
If diversication fails in a crisis, diversied portfolios may have greater exposure to loss
than more concentrated portfolios. Leibowitz and Bova (2009) showed that during the
GFIC a diversied portfolio underperformed a simple 60% US stocks/40% US bonds
portfolio by 9 percentage points.
There are dierent way of how to measure correlation in the tails. Longin and Solnik
(2001) and Chua et al. (2009) used double conditioning, i.e. they isolate months during
which both assets moved (up or down) by at least a given percentage. Page and Panariello
(2018) condition only on a single asset:
(
ρ(x, y|x > θ), θ > 0
ρ(θ) =
ρ(x, y|x < θ), θ < 0
where x, y represent the two assets, θ is the return threshold which partitiones the data.
This anti-symmetric single asset conditioning measures dierences in tail correlations
based on which market drove the sello.
Figure 2.19: US equity correlation with international equity using monthly data Jan
1970 to Jun 2017. Shown are conditional correlations by percentile based on US stock
returns between US stocks (MSCI US Total Return Index) and non-US stocks (MSCI
EAFE Total Return Index). The dotted line shows the correlation prole that we would
expect if both markets were normally distributed. Empirical conditional correlations are
adjusted by the data-augmentation methodology. Page and Panariello (2018).
strategies and for risk factors all with similar results of asymmetry.
These facts have several implications. First, if a portfolio manager has a proven
track record to forecast market and asset movements within a certain condence then
he should pick stocks in the upside and buy a protective put in the expected downside.
These assumptions are in most cases not valid. Either market movements come as a
surprise or stock picking capabilities are not existing. Then one approach is rst to
consider risk management serious, i.e. to analyze the tail behaviour if markets boom or
are under stress, and to second to trade with discipline within the given risk governance
framework.
non-leveraged portfolios.
p
hφ, Cφi
TA(φ) = , (2.5)
hφ, Di
where D is the vector of volatilities. The numerator is equal to the portfolio risk term in
the Markowitz model (4.1). The most-diversied-portfolio MDP portfolio minimizes the
diversication index of Tasche, see Choueifaty and Coignard (2008). The diversication
ratio is dened by
1
DR(φ) = , (2.6)
T A(φ)
i.e. the ratio of of the weighted average of volatilities divided by the portfolio volatility.
This ratio is smaller than one and only equal to one if all wealth is invested in a single
asset. Given a set of constraints M, the MDP is the portfolio which maximizes the
diversication ratio under the set of constraints. If the expected returns of the assets
are proportional to the their volatilities, expected returns replace in DR the nominator
hφ, Di. Then, maximizing DR is the same as maximizing the Sharpe ratio of the portfolio
and MDP is the tangency portfolio.
E(R)+
SR(R) = ≥0 (2.7)
σ(R)
with R a general return (absolute, relative, net, gross) and A+ = max(A, 0).
Often the Sharpe ratio is not constrained to be positive. But this ratio is not very
meaningful for negative values since the higher risk for a xed negative return the higher
the Sharpe ratio. Assuming log normal returns. Square-root scaling rule implies that
√
the Sharpe ratio scales with T for an increasing time horizon while the market price of
risk (MPR) is time-scale invariant. While conceptually simple, there are many dierent
interpretations and calculation methods for the Sharpe ratio: Should one use linear or
log returns, how do we scale the Sharpe ratio properly from one time horizon to another
one, what are the industry standards in the calculation of the ratio? The widely ob-
served square-root scaling rule only holds in the IID case, see the Section Risk Scaling.
For non-IID returns the situation is more complex and Lo (2003) is the reference to follow.
zero. Risk concentration is minimal if the portfolio weights are equally weighted. The
Herndahl index which is similar to the Gini Index, is dened by
N
X
Herndahl Index = φ2k . (2.8)
k=1
N
It takes the value +1 in the case of maximum concentration and 1/N = N2
in the EW
portfolio case.
N
X
S(φ) = − φk ln φk . (2.9)
k=1
To understand entropy measurement, consider two dies - one symmetric and the other
distorted. The outcome for the symmetric one is more uncertain than for the other die.
Shannon axiomized this notion of uncertainty in the 1940s in the context of information
theory. He proved that there exists only the above function S(φ), which satises his eight
axioms describing uncertainty.
In nance, entropy measures how close dierent probability laws are to each other.
The prior and the posterior distribution in the Black-Litterman model are an example.
The space of probability laws is just a set and not a vector space. It is not trivial to
nd a reasonable measuring stick to measure nearness of say two normal distributions,
one with mean 0.1 and variance 0.2 and the other one with mean 0.2 and variance 0.1.
The relative entropy S(p, q), the Kullback-Leibler Divergence (KLD), for two discrete
distributions p and q, dened by
X qk
S(p, q) = − pk ln( ), (2.10)
pk
k
measures the similarity of two probability distributions. In machine learning, KLD the
information gain achieved if q is used instead of p. In Bayesian inference, KLD is a
measure of the information gained by revising one's beliefs from the prior probability
distribution q to the posterior p. It is the amount of information lost when a model dis-
tribution q is used to approximate the true data distribution p. Although KLD measure
the nearness of two distributions it is not a metric since it is not symmetric nor does it
satises the triangle inequality.
Roncalli (2014) illustrates the dierent notions of diversication. There are 6 assets
with volatilities 25%, 22%, 14%, 30%, 40%, and 30%, respectively, and the same returns.
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 63
Table 2.6: Comparison of the global minimum variance (GMV), equal risk contribu-
tion (ERC), most diversied (MDP), and equal weights (EW) portfolios. All values are
percentages (Roncalli [2014]).
Since correlation is uniform, but for one asset, it is 'overlooked' in the GMV alloca-
tion. Therefore, the GMV optimal portfolio picks asset 3 with the lowest volatility. The
GMV portfolio is heavily concentrated. Portfolio risk measured by GMV is the smallest,
which comes as no surprise.
The MDP, on the other hand, focuses on assets 5 and 6, which are the only ones that
do not possess the same correlation structure as the others. Contrary to GMV, MDP
is attracted by local dierences in the correlation structure. The diversication index is
lowest for the MDP. If we consider the concentration measures of Herndahl, the EW
64 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
should be considered if the investor wishes to have the broadest weight diversity and
the ERC if risk concentration is the appropriate diversication risk measurement for the
investor.
We consider in more detail ERC. The risk contribution of asset j to the portfolio risk
is by denition the sensitivity of portfolio risk w.r.t. to φj times the weight φj . The
Euler Allocation Principle states when the sum of all risk contributions equals portfolio
risk.
X ∂R(φ) X
R(φ) = φj =: RCj (φ) . (2.12)
∂φj
j j
Calculating say portfolio risk for10 000 positions in a portfolio is complicated. But using
0
Euler's theorem, we need to calculate 1 000 sensitivities, multiply them with their position
and sum the result which is a much simpler task. For the volatility risk measure this
means:
X ∂R(φ) X (Cφ)j
R(φ) = σp (φ) = φj = φj √ 0 (2.13)
∂φj φ Cφ
j j
where (Cφ)j denotes the j -th component of the vector Cφ. The Euler risk decomposition
holds true for the volatility, VaR, and expected shortfall risk measurements.
Consider four assets in a portfolio with equal weights of 25 percent. The volatilities
are 30%, 20%, 40%, and 25%. The correlation structure
1
0.8 1
ρ=
0.7 0.9 1
.
0.6 0.5 0.6 1
The covariance matrix C is then calculated as (using the formula Ckm = ρkm σk σm )
9%
4% 4%
C=
8.4% 7.2% 16%
.
4.5% 2.5% 6% 6.25%
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 65
follows. Taking the square root, the portfolio volatility of 25.25% follows. Using (2.13),
the marginal risk contribution vector
26.4%
Cφ 18.3%
√ 0 =
φ Cφ 37.2%
19%
follows. Multiplying each component of this vector with the portfolio weight gives the
risk contribution vector RC = (6.6%, 4.5%, 9.3%, 4.7%). Adding the components of this
vector gives 25.25% which is equal to the portfolio volatility. This veries the Euler
formula.
Table 2.7: Asset class diversication and risk allocation. The rst two columns contain
the diversication using the asset class view. The third column shows the result using
risk allocation. While the investment seems to be well diversied using the asset classes
the risk allocation view shows that almost 80% of the risk is due to equity. IEQ means
international equities, ICB means international corporate bonds, CCR corporate credit
risk.
This fact is often encountered in practice: Equity turns out to be the main risk factor
in many portfolios. But then capital diversication is a poor concept from a risk per-
spective.
The asset allocation of European's asset managers was in 2013 (EFAMA (2015)):
• 43% bonds;
• 33% equity;
66 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
• 16% other assets (property, private equity, structured products, hedge funds, other
alternatives).
The allocation has been fairly stable in the past except in the GFC where equities lost
massive value. This average allocation signicantly diers for dierent countries. UK for
example has investment in the equity class between 46% and 52% in the past while in
France the same class is around 20%. This dierence is due to dierences in preferences
of home-domiciled clients and the large dierences in cross-border delegation of asset
management. The ratio of AuM/GDP in UK is 302% which shows the importance of
UK as the leading asset management center of Europe with a strong client basis outside
of the UK. Comparing the allocation for investment funds and discretionary mandates,
the bond allocation is 28% in investment funds and 58% in the mandates and equities
have a share of 39% in the funds and 26% in the mandates. Hence, self-deciders are less
risk averse than those who delegate the investment decisions using mandates.
• Modelling volatility at a short horizon and then scaling to longer horizons can
be inappropriate since temporal aggregation should reduce volatility uctuations,
whereas scaling amplies them.
• Returns in short-term nancial models are often not predictable but they can be
predictable in longer-term models. Applying the scaling law one connects the
volatility in two time domains that are structural dierent.
• The scaling rule does not apply if jumps occur in the returns.
• If returns are serially correlated, the square-root rule needs to be corrected (see
Rab and Warnung [2011] and Diebold et al. [1997]).
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 67
• 25% of the return arises from dividends, which face a taxation rate of 30%,
• Investments can be via an investment fund (mutual fund, SICAV) with annual
costs of 1.5 percent, or an index fund with annual costs of 0.5 percent.
The net returns using these gures are given in Table 2.8.
Given these net returns, an investment of CHF 100 takes after 25 years the values in
Table 2.9.
Fact 8. Using a cost and tax ecient wrapper for an investment amounts to an annual
return gain of 1.45% compared to an investment fund.
Given the zero-sum game of active investment, see the next Section, that only 0.6%
of 2,076 actively managed US open-end, domestic equity mutual funds generate a posi-
tive alpha after costs, see Section 4.6.6.3, and the possibility to wrap many investment
ideas in cheap index funds or ETFs, it becomes clear why many investor prefer passive
investments.
Assuming that the return of the passive investment equals that of the market, (2.15)
implies that the active return equals market return independent of the fraction λ.
Therefore, without any probabilistic or behavioural assumptions, before costs the three
investments pay back the same return:
Proposition 9 (Sharpe). Before costs, the return on the average actively managed dollar
will equal the return on the average passively managed dollar.
Because active managers bear greater costs than a passive investment:
Proposition 10 (Sharpe). After costs, the return on the average actively managed dollar
will be less than the return on the average passively managed dollar.
These statements are strong and they are based on strong assumptions. Despite its
beauty, the assumptions that lead to (2.15) trivialize the problem. Suppose all investor
are active ones - who is on the other side of the trades? Returns are not independent
of the demand and supply side but in fact follow in market equilibrium. Demand and
supply matter. Pedersen (2018) extended the Sharpe arithmetic to cases where active
management can on average be more protable than passive one in an equilibrium con-
text. He replaced the unrealistic assumption that an active investor's gain is the loss of
another active investor, leading in the aggregate to a zero sum game. Next, the market
portfolio is not constant. It changes over time since new shares are issued and corporate
actions happen: Passive investors need also to trade regularly. If they have to trade at
less favourable prices than the active investors do, then the logic of Sharpe is broken.
Roll pointed out that a true market portfolio is not observable since it would include
any single asset. Market weighted indices are used as an approximation. In the US,
the Wilshire 4'500 Index contains 4'500 stocks of approximately 5'000 listed stocks. In
Switzerland, SPI Index contains 210 of 270 listed stocks. The global market portfolios
also dier signicantly depending who is calculating it. The major contributors are debt
and equity where equity is split in global equity, EMMA equity, private equity and small
cap equity and debt is split in government bonds, agency bonds, asset backed securities,
EMMA bonds, corporate bonds. The assumption that passive investment means to be
invested in the market portfolio is an approximation. Consider funds. US retail funds
are dierent to US institutional funds and are also dierent to non-US funds. The one-
ts-all argument of Sharpe does not considers the heterogeneity of investment wrappers
across dierent asset classes, dierent geographical regions, dierent client segmenta-
tions. Finally, the result is based on average active managers. It does not account for
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 69
The goal of active asset management is to outperform benchmarks. The manager tries
to beat the benchmark within a given Tracking Error (TE) limit. In fact proponents of
active investment use argumentation following the work of Berk and Green (2004). They
show that ecient markets not contradict the existence of skilled fund managers who
beat the market consistently. The concept of benchmarking and hence relative perfor-
mance has several advantages for the portfolio manager: Performance measurement is
simple relative to the benchmark, benchmarking has a disciplining force acting on the
asset manager and the structuring of the investment portfolio is simplied.
Active management often has both a passive component, the long-term goals in a
benchmark portfolio, and an active component, playing the views to exploit market
opportunities (TAA). The passive portfolio stabilizes the whole investment.
ETFs, trackers and index funds are examples of passive strategies. Mutual funds, op-
portunistic use of derivatives, and hedge funds are examples of active strategies. While
the deviation of a strategy from a benchmark, the tracking error, should be as small as
possible in passive investment, the tracking error in active investment describes how far
away the active manager moves away from the benchmark.
Dierent types of benchmarks are used. Either the benchmark is used to compare
the performance of a fund with its peers or the benchmark is a market index. While
both methods are meaningful for active investment, in a passive investment only index
benchmarking makes sense.
The main stock benchmark indices are MSCI World Index, FTSE, S&P 500 and
some other well known stock market indices. Since bond securities do not trade on
open exchanges there is less transparency about bond prices and the indexes used for
benchmarking are those created by the largest bond dealers such as the Barclays Global
Aggregate Bond Index, which tracks the largest bond issuers globally. Benchmark indexes
for commodities are for example provided by S&P and Goldman Sachs (S&P GSCI) or
by Bloomberg (Bloomberg Commodity Index). For credit risk of the Markit iTraxx
indices reect the creditworthiness of large corporates. A provider for real estate indices
is MSCI. There are four dierent type of income-producing real estate assets: oces,
retail, industrial and leased residential. Non-income producing assets are houses, vacation
properties or vacant commercial buildings. These dierent types of real estates assets
lead together with the geographical segmentation to many dierent real estate indices.
70 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Trading units and asset management rms are the suppliers of assets for investment.
Mutual funds or ETFs are often oered by non-banking rms such as BlackRock. These
rms issue products but also provide other services.
3
The largest asset management organizations in 2017 were BlackRock with USD 6.3
trillion AuM followed by the Vanguard Group.
4 The largest fund in 2014 was the SPDR
ETF on the S&P 500 managed by State Street Global Advisors with assets of USD 224
bn; see the Appendix.
The AM rms also contribute to the real economy. Firms, banks and governments
use AM rm to meet their short-term funding needs and the long-term capital require-
3 BlackRock Solutions - the risk management division of BlackRock - was mandated by the US Trea-
sury Department to manage the mortgage assets owned by Bear Stearns, Freddie Mac, Morgan Stanley,
and other nancial rms that were aected by the nancial crisis in 2008. This gained expertise boosted
the BlackRock Solutions to become more important than the asset management rm part.
4 The Vanguard Group 5.1 tr USD, Charles Schwar 3.4 tr USD, UBS 3.1 tr USD, State Street 2.8 tr
USD.
2.6. MARKET FIGURES 71
ments. The AM contribution to debt nancing is 23%: European asset managers held
this amount of all debt securities outstanding which also represents 33% of the value of
euro-bank lending. The equity nancing gures are similar. The AM industry held 29%
of the market value of euro area listed rms and 42% of the free-oat.
From a corporate nance perspective, the valuation and market capitalization of asset
management rms compared to banks and insurance companies between 2002 and 2015
is shown in Table 2.10 (McKinsey (2015)):
Table 2.10: Key gures 2015 for asset management rms, banks and insurance companies.
(McKinsey [2015])
Total Assets under Management (AuM) in Europe increased by 10% in 2017 to EUR
25.2 trillion. Comparing the growth of investment funds versus discretionary mandates in
Europe, both categories have increased in 2014 to a similar level of EUR 13.1(9.1) trillion
in investment funds and EUR 12(9.9) trillion in discretionary mandates (EFAMA (2018)
and (2015)). The share of investment funds compared to the mandates was falling from
2007 until 2011 but it then started to increase in the last three years. While mandates
represented more than 70% of the AuM in the UK, Netherlands, Italy, Portugal, and more
than 70% of the all AuM in Germany, Turkey or Romania were invested in investment
funds. The dominance of either type of investment can have dierent causes. In the UK
and the Netherlands pension funds play an important role in asset management and they
prefer to delegate the investment decisions. The pool of professionally managed assets in
72 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Europe remains centered in the UK (37% market share), France (20%), Germany (10%),
Italy, Nordic countries and Switzerland.
The number of individuals directly employed (asset managers, analysts) in the indus-
try is estimated 2017 (2013) at 1100 000(900 000) with one-third in the UK. The indirect
employment such as IT, marketing, legal, compliance and administration is estimated
to boost the total number of employees in the whole industry up to a half-a-million
individuals.
• Per annum, global AuM growth between is 5%. The main driver was market per-
formance. Typically, the net AuM ows are between 0% and 2% per annum.
• The growth of AuM is 13.1% in Europe, 13.5% in North America and 226% in
emerging markets which is largely due to the money market boom in China.
• The absolute value of prots increased in Europe by 5%, 29% in North America
and 79% in the emerging markets.
• Prot margins as the dierence between net revenues margin and operating cost
margin are 13.3 bps in Europe, 12.5 bps in North America and 20.6 bps in emerging
markets. The observed revenue decline in Europe is due to the shift from active
to passive investments, the shift to institutional clients and the decrease in man-
agement fees. The revenue margin in the emerging markets is only slightly lower
in 2014 compared to 2007 (down to 68.1 bps from 70.6 bps) but the increase in
operating cost margin from 33.8 bps to47.4 bps in 2014 is signicant.
• The absolute revenues in some emerging markets such as China, South Korea,
Taiwan are with values between USD 10.1 bn to USD 3.7 bn. They are almost at
par with the revenues in Japan, Germany, France and Canada (all around USD 10
bn). The revenue pools of UK (USD 21.2 bn) and the US (USD 150.8 bn) are still
leading the global league table.
• The cost margins in Europe are stable between 21 bps and 23 bps. The split of the
cost margin is in sales and marketing (around 5 bps), fund management (around 8
bps), middle/back oce (around 3.5 bps) and IT/support (around 6 bps). There
is a cost increasing trend for IT/support, decreasing costs for sales and marketing
and middle/back oce.
emerging markets, the CAGR for institutional customers is 13% compared to 11%
for retirement/DC.
By considering the above facts one should take into account the particular circumstances
in the years after the GFC such as the decreasing interest rates level and stock market
boom which were the main factors in the success of the asset management industry in
this period.
Table 2.11 illustrates the global distribution of AuM by product and its dynamics in
the last decade.
Table 2.11: Global distribution of AuM by product and its dynamics in the last decade in
trillion USD. Alternatives includes hedge, private-equity, real-estate, infrastructure, and
commodity funds. Active solutions includes equity specialties (foreign, global, emerging
markets, small and mid caps, and sector) and xed-income specialties (credit, emerging
markets, global, high yield, and convertibles). LDIs (liability-driven investments) in-
cludes absolute-return, target-date, global-asset-allocation, exible, income, and volatil-
ity funds. Active core includes active domestic large-cap equity, active government xed-
income, money market, and traditional balanced and structured products (Valores Cap-
ital Partners [2014]).
The gure for 2016 and the projection to 2015 are from PwC (2018).
The table indicates that the growth rate of passive investments is larger than for
active solutions. McKinsey (2015) states for the period 2008-2014 that cumulated ows
are 36% for passive xed income and 22% for passive equity. Standard active management
is decreasing for some asset classes and strategies: Active equity strategies lost 20% on
a cumulated ow basis while active xed income gained 52%. A next observation is
that active management of less liquid asset classes, or with more complex strategies, is
increasing. An increase of 49% cumulate ows for active balanced multi asset and of 23%
for alternatives. The global gures vary strongly for dierent regions or countries. Swiss
and British customers adopted the use of passive much faster than for example Spanish,
French or Italian investors. Figure 2.20 shows the distribution of global investable assets
by region and by type of investor.
Regulation imposes a great deal of complexity on the whole business of asset man-
agement and banking. On the other side of the fence, there is a so-called shadow banking
sector with much less regulatory overview. Although the expression 'shadow bank' makes
no sense at all - either an institution has a banking license or not - there is an incentive for
banks to consider outsourcing their asset management units to these 'shadow banking'
74 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Figure 2.20: Global investable assets by region in trillions of USD (Brown Brothers
Harriman [2013]).
sector.
Forward looking estimates by PwC (2014, 2018) for the period 2014-2020 estimate
that actively managed funds will grow at an CAGR of 5.4 percent and mandates with 5.7
percent (PwC [2014]). The actively managed funds growth driver is the growing global
middle-class client base. Mandates growth factors are institutional investors (pension
funds and SWFs) and HNWIs, see Table 2.12. Furthermore, the ratio active:passive =
7:1 by 2012 and is estimated to fall to 3:1 by 2020. By the end of 2014, the AuM in
actively managed funds are distributed as follows - 60% in the Americas, 32% in Europe,
and 12%in Asia. Compared to 2010, there is a relative stagnation or decrease in Europe
2.6. MARKET FIGURES 75
Table 2.12: Actively managed funds, mandates, and alternative investment (PwC [2014]).
The formation of four regional blocs in AM - South Asia, North Asia, South Asia,
Latin America, and Europe - creates opportunities, costs, and risk. These blocks develop
regulatory and trade linkages with each other based on reciprocity - AM rms can dis-
tribute their products in other blocs. The US, given the actual trends, will stay apart
since it prefers to adhere to its regulatory model. But integration will not only increase
between these blocs but also within blocs. There will be, for example, a strong regula-
tory integration inside the South Asia bloc. The ASEAN platform between Singapore,
Thailand, and Malaysia will be extended to include Indonesia, the Philippines, and Viet-
nam. All these countries possess a large wealthy, middle-class of potential AM service
investors. The global structure UCITS continues to gain attraction worldwide and reci-
procity between emerging markets and Europe will be based on the European AIFMD
model for alternative funds. By 2013, more than 70 memoranda of understanding for
AIFMD had been signed.
The traditional AM hubs London, New York and Frankfurt will continue to dom-
inate the AM industry. But new center will emerge due to the global shift in asset
holdings. There will be a balance between global and local platforms. Whether or not
a global or local platform is pushed depends on many factors: Time-to-market, regu-
latory and tax complexity, behavior and social norms in jurisdiction and the eduction
level matter. AM rms recruit local teams in the key emerging markets - the people
factor. The education of these local individuals started originally in the global centers
but will diuse more and more to the new centers in the emerging markets. Due to the
positive brand identities that tech rms have, they can integrate part of the business
layer into their infrastructure layer and oer AM services under tech rm brands instead
of more traditional banking or AM company brands ( Branding reversal). Finally,
alternatives asset managers on one hand side oer new products - asset managers
move in the space banks left vacated - and on the other hand side try that their alterna-
tive funds become mainstream. New products include primary lending, secondary debt
market trading, primary securitizations, and o-balance-sheet nancing.
• Agency business model. Asset managers are not the asset owners, they act on a
76 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
best eort basis for their clients and the performance is attributed to their clients.
• Low balance sheet risk. Since asset managers to not provide loans, to not act as
counter parties in derivatives, nancing or securities transactions and they seldom
borrow money (leverage) their balance sheet does not face the risk of a bank's
balance sheet.
• Protection of client assets. Asset managers are regulated and in mandated asset
management, the client assets are held separately from the asset management rm's
assets.
From a risk perspective, asset management is a fee business with conduct, business,
and operational risk as the main risk sources.
Trading is contrary a market, counter party and liquidity risk business which needs a
strong balance sheet of the intermediary. Trading is a mixture of a fee (agency trading)
and a risk-taking business (principal and proprietary trading). Agency trading is a fee
business based on client ow. Clients place their orders and the trading unit executes
the orders on behalf of the client's account. For example, a stock order is routed by
the trader to the stock exchange where the trade is matched. The bank receives a fee
for this service. Principal trading already requires active market risk or counterparty
risk taking by the bank since the bank's balance sheet is aected by the prots and
losses from trading. Principal trading is still based on clients' orders but it requires the
traders to take some trading positions in their market-making function or in order to
meet future liabilities in issued structured products. This is a key dierence to agency
trading. Proprietary trading is not based on the client's ow at all. Proprietary traders
implement trading ideas without any reference to a client activity. This type of trading
puts the bank's capital at risk. New regulations limit proprietary trading by investment
banks such as the The Volcker Rule in the US and 'ring-fencing' in the UK.
AM rms wrap the underlying assets into collective investment schemes ('funds')
while the trading of a bank oers issuance and market making for cash products, deriva-
tives, and structured products. Despite their dierences, trading and asset management
are linked. Portfolio managers in the asset management function execute their trades
via the trading unit or a broker. The market making of ETF and listed fund trading
takes place in the trading unit. Cash products are used by the asset management func-
tion in their construction of collective schemes and asset managers use in their portfolios
derivative (overlay) to manage risk and return characteristics.
The size of investment is very huge for IAM and smaller for WM. The risk man-
agement for IAM is comprehensive and of the same quality as it is used by say banks
for their own purposes. In WM risk management is often less sophisticated. Fees are
typically lower for IAM than for WM. While IAM are highly regulated the regulation
of WM was in the past much less strong. This changed after the GFC where MiFID II,
Know-Your-Client, product information sheets, etc. heavily increases the WM regulation
setup. Finally, the loyalty of IAM clients is decreasing while WM clients are more loyal.
It will be interesting to observe in the future how loyalty of WM clients will change if
technology will make investments not only more tailor-made but also more open platform
oriented and therefore, less strongly linked to the home institution of the WM clients.
The 1920s saw the creation in Boston of the rst open-end mutual fund - the Mas-
sachusetts Investors' Trust. By 1951 more than 100 mutual funds existed and 150 more
were added in the following twenty years. The challenging 1970s - oil crisis - were marked
by a number of innovations. Wells Fargo oered a privately placement, equally weighted
S&P 500 index fund in 1971. This fund was unsuccessful and Wells created a successful
value-weighted fund in 1973. It required hugh eorts - tax and regulatory compliance,
build up stable operations and education of potential investors. Bruce Bent established
the rst money market fund in the US in 1971 such that investors had access to high
money market yields in a period where bank regulated interest rates. In 1975, John
Bogle create a mutual fund rm - Vanguard. They launched 1976 the rst retail index
fund based on the S&P 500 Index. In 1993, Nathan Most developed an ETF based
on the S&P 500 Index. The following table summarizes the worldwide market gures
of investment funds without fund of funds. The fund industry is not free of scandals.
78 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Table 2.13: MM means Money Markets, GP Guaranteed and Protection, RE Real Es-
tate and IF Investment Funds. Data are end of Quarter Q4 2018. Sourcee: EFAMA,
Investment Company Institute (ICI), International Investment Funds Association (IIFA).
Statistics from 47 countries are included in this report.
In 2003 for example illegal late trading and market timing practices were uncovered in
hedge fund and mutual fund companies. Late trading means that trading is executed
after the exchanges are closed. Traders could buy mutual funds when markets were up
at the previous day's lower closing price, and sell at the purchase date's closing price for
a guaranteed prot.
There are dierent types of funds: Mutual funds, index funds, ETFs, hedge funds
and alternative investments. We note some broad characteristics:
• Index funds seek to match the fund's performance to a specic market index, such
as the S&P 500, before fees and expenses.
• Mutual funds are actively managed and try to outperform market indexes. They
are bought and sold at the current day's closing price - the NAV (net asset value).
• ETFs are traded real time at the current market price and may cost more or less
than their NAV.
NAV is a company's total assets minus its total liabilities. If an investment company
assets are worth USD 100 and has liabilities of USD 10, the company's NAV is USD 90.
Since assets and liabilities change daily, NAV also changes daily. Mutual funds generally
must calculate their NAV at least once every business day. An investment company
calculates the NAV of a single share by dividing its NAV by the number of outstanding
shares.
5
5 We assume that at the close of trading a mutual fund held USD 10.5 mn securities, USD 2 mn of
cash, and USD 0.5 mn of liabilities. With 1 million shares outstanding, the NAV is USD 12 per share.
2.7. THE FUND INDUSTRY 79
Funds can be open- or closed-end. Open-end funds are forced to buy back fund shares
at the end of every business day at the NAV, see Table 2.14. Prices of shares traded
during the day are expressed in NAV. Total investment varies based on share purchases,
share redemptions, and uctuations in market valuation. There is no limit on the number
of shares that can be issued. Closed-end funds issue shares only once. The shares are
listed and traded on a stock exchange: An investor cannot give back his or her shares
to the fund but must sell them to another investor in the market. The prices of traded
shares can be dierent to the NAV - either higher (premium case) or lower (discount
case). The vast majority of funds are of the open-end style.
The legal environment is crucial for the development of the fund industry. About
three-quarters of all cross-border funds in Europe are sold in Luxembourg. Luxem-
bourg oers favorable framework conditions for holdings/holding companies, investment
funds, and asset-management companies. These companies are partially or completely
tax-exempt; typically, prots can be distributed tax free. For private equity funds, two-
thirds have the US state of Delaware as their domicile. For hedge funds one-third are
in the Caymans; one-quarter in Delaware. As of Q3 2013, 48 percent of mutual funds
had their domicile in the US, 9 percent in Luxembourg, and around 6 percent in Brazil,
France, and Australia, respectively.
Denition 12. A mutual fund is a company that pools money from many investors
and invests the money in stocks, bonds, short-term money-market instruments, other
securities or assets, or some combination of these investments. The combined holdings
the mutual fund owns are its portfolio. Each share represents an investor's proportionate
ownership of the fund's holdings and the income those holdings generate.
80 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
In Europe, mutual funds are regulated under the UCITS regime and mutual fund
equivalents are called SICAVs. When we refer below to mutual funds, we always have US
mutual funds in mind. Some characteristics of mutual funds are that investors purchase
mutual fund shares from the fund and not via stock exchange, investors can sell their
share any time, that they pay for mutual fund shares the NAV plus any shareholder
fees, that if there is a new demand, mutual funds create and sell new shares, and nally,
investment portfolios are managed by separate entities (investment advisers) that are
registered with the SEC. Mutual funds are non-listed public companies that neither pay
taxes nor have employees.
• Aordability - the basic unit of a fund unit requires only little money from the
investors and access to assets.
• Liquidity. Mutual fund investors can redeem at any time their shares at the current
NAV plus any fees and charges assessed on redemption.
• Investment strategy. The investor can choose between active and passive invest-
ment, can have access to rule-based strategies, etc. But he cannot choose a guar-
anteed payo as for structured products. Hence, investors in funds believe that the
fund managers skills generate the performance.
• Price uncertainty. Pricing follows the NAV methodology, which the fund might
calculate hours after the placement of an order.
The Investment Company Institute and US Census Bureau (2015) states that a total
of 43.3% of US households with a median income of USD 85, 000 own mutual funds. The
median mutual fund holdings are USD 103, 000 and the median of household nancial
assets is USD 200, 000. 86% own equity funds, 33% hybrids, 45% bond funds, and 55%
money-market funds. Only 36% was invested in global or international equity funds.
The primary nancial goal (74%) for mutual fund investment are retirement goals.
2.7. THE FUND INDUSTRY 81
Cross-border distribution has been most successful within the European UCITS for-
mat. This is not only true for Europe. UCITS dominate global fund distribution in more
than 50 local markets (Europe, Asia, the Middle East, and Latin America). This kind
of global fund distribution is the preferred business model in terms of economies of scale
and competitiveness. In 2016 around 80,000 registrations for cross-border UCITS funds
exist. The average fund is registered in eight countries. Furthermore, UCITS are not
required to distribute all income annually.
UCITS do not need to accept redemptions more than twice a month. Although the
two previous points hold in general, many funds oer - for example - the option to dis-
tribute income annually or make redemptions possible on a daily basis. UCITS sponsors
must comply with the EU guidelines on compensation for key personnel: the remunera-
tion directive.
Both, UCITS funds and mutual funds originally were quite restrictive in their in-
vestment guidelines. Then UCITS (similar remarks apply to mutual funds) were allowed
to use derivatives extensively. Using derivatives means, among other things, leveraging
portfolios or creating synthetic short positions - UCITS are not allowed to sell physical
assets short. The strategies of these funds - referred to as 'newCITS' - are similar to
hedge fund strategies and they showed strong growth to USD 294 billion in 2013 accord-
ing to Strategic Insight (2013).
But there are also dierences between US mutual funds and European UCITS on a
more fundamental level. US clients invest in existing funds while European investors are
regularly oered new funds. That is, the number of US mutual funds has been decreasing
in the last decade while the European funds have showed a strong increase in numbers;
see Table 2.15. The stability of the US fund industry is due to the inuence of US
retirement plans (dened contribution), which do not change investment options often.
The tendency to innovate permanently in Europe leads to funds which on average around
six-times smaller than their US counterparts.
2003 2013
US
Number of funds 8,125 7,707
Total Assets USD tr 7.4 15.0
Asset per fund USD mn 911 1,949
Europe
Number of funds 28,541 34,743
Total Assets USD tr 4.7 9.4
Asset per fund USD mn 164 270
Asia
Number of funds 11,641 18,375
Total Assets USD tr 1.4 3.4
Asset per fund USD mn 116 183
Table 2.15: Number of funds, average fund size and assets by region (Investment Com-
pany Institute [2010, 2014] and Pozen and Hamacher [2015]).
NAV is theoretically simple, the process of implementing the calculation is not since one
has to accurately record all securities transactions, consider corporate actions, determine
the liabilities for example. Digitization oer an opportunity to overcome present NAV
calculation problems. If say the NAV can be calculated real-time, why should fund shares
not be listed on a stock exchange?
Mutual funds as companies pay out almost all of their income - dividend and realized
capital gains - to shareholders every year and pass on all their tax duties to investors.
Hence, mutual funds do not pay corporate taxes. Therefore, the income of mutual funds
is taxed only once while the income of 'ordinary' companies is taxed twice.
6
6 Mutual funds make two types of taxable distributions to shareholders: ordinary dividends and capital
gains. The Internal Revenue Service (IRS) denes rules that prevent ordinary rms from transforming
themselves into mutual funds to save taxes: A rule demands for example that mutual funds have only a
limited ownership of voting securities and that funds must distribute almost all of their earnings.
2.7. THE FUND INDUSTRY 83
custody).
Shareholders
Board of Directors
Mutual Fund
Figure 2.21: The organization of a mutual fund (Adapted from ICI Fact Book [2006]).
There are tax-exempt and taxable fund types. The former invest in securities backed
by municipal authorities and state governments. Both securities do not pay federal in-
come tax. Which fund to choose is only a question of the after-tax yield. Tax-exempt
funds make sense for investors who face a high tax bracket. In all other cases, taxable
funds show a better after-tax yield. Fund sponsors typically oer a retail and an insti-
tutional investor series of MM funds.
Bond Funds
There are many types of bond funds. Bond funds can be tax-exempt or taxable,
84 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
US and global bonds. In each possible category dierent factors matter: The creditwor-
thiness of the bond, the maturity of the bonds, the segmentation of global bonds into
emerging market bonds and general global bonds and the classication of bonds accord-
ing to dierent economic sectors or specic topics. Finally, alternative bond funds use
techniques from hedge funds to shape the risk and return prole.
Stock Funds
For stock funds the dierence between tax-exempt and taxable does not exist since
most of their income comes from price appreciation and income from dividends is very
low. Categories are US versus global funds, sectors, regions, style, etc. As for bond
funds, a 3 × 3 style box from Morningstar exists with size as one dimension and style the
other one.
The SEC (2008) denes the following components for mutual fund fees. (i) fees paid by
the fund out of fund assets to cover the costs of marketing and selling fund shares ... (ii)
'distribution fees', including fees that compensate brokers and others who sell fund shares
and that pay for advertising, the printing and mailing of prospectuses to new investors,...
(iii) 'shareholder service fees' - fees paid to persons who respond to investor inquiries and
who provide investors with information about their investments.
The expense ratio is the fund's total annual operating expenses including management
fees, distribution (12b-1) fees and other expenses. All fees are expressed as a percentage
of average net assets. Other fees include fees related to the selling and purchasing of
funds: Back-end sales load is a sales charge investors pay when they redeem mutual
funds. Front-end sales is the similar fee when funds are bought. It is generally used by
the fund to compensate brokers. Purchase and redemption fees are not the same as the
back- and front-end sales. They are both paid to the fund. The SEC generally limits
redemption fees to 2 percent.
2.7. THE FUND INDUSTRY 85
Class-A shares for example charge a front-end load and have low 12b-1 (distribution)
fees. They are benecial for long run investors. In Europe the type of share classes can
dene the client segmentation, specify investment amount and specify the investment
strategy. For example:
• N-class: Only for clients which possess a mandate contract or an investment con-
tract with the bank.
NAVT × f1 × . . . × fT
P% = × 100 (2.16)
NAV0
NAVex + BA
f= ,
NAVex
with BA the gross payout - that is to say, the gross amount of the earning- and capital-
gain payout per unit share to the investors, and NAVex the NAV after the payout.
86 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Example
Consider a NAV at year-end 2005 of CHF 500 million, 2006 earnings of CHF 10
million, and a capital-gain payout of CHF 14 million. The NAV after payments is CHF
490 million and the NAV at the end of 2006 is CHF 515 million. The adjustment factor
is
490 + 10 + 14
f= = 1.04898.
490
This gives the performance for 2006
515 × 1.04898
P = −1 = 8.045%.
500
There are several reasons why it is important to measure the performance of a fund
correctly: Selection of the best fund, check whether the fund managers do what they
promise and a correctly measured performance allows one to check whether the fund
manager added value.
The performance formula (2.16) can be rewritten in the eective return form
T
Y BAk
(1 + P )NAV0 = NAVT × f1 × . . . × fT = NAVT 1+ . (2.17)
NAVex,k
k=1
If the gross payouts are zero in all periods, then the performance reads
(1 + P )NAV0 = NAVT
with P the simple eective return. Contrarily, assume that in each period a constant
fraction
BA
g = NAV is paid out. Then,
ex
Since (1 + g)T is larger than one, with the same eective return P, the fund without any
payouts achieves a larger nal eective value than the fund with payouts.
Example
The return calculation for funds can be misleading. Consider the following reported
annual returns: 5%, 10%, −10%, 25%, 5%. The arithmetic mean is 7%. The geometric
mean is 6.41%. How much would an investor earn after 5 years if he or she starts with
USD 100?
100 × 1.05 × 1.1 × 0.9 × 1.25 × 1.05 = USD136.4.
2.7. THE FUND INDUSTRY 87
If the fund reports the arithmetic mean, the investor would expect
Using the geometric mean of 6.41%, the true value of USD 136.4 follows. Although it
is tempting to report the higher arithmetic mean, such a report would be misleading.
Some jurisdictions require funds to report returns in the correct geometric way.
Both schemes are supervised by the Luxembourg nancial sector regulator. A main
reason for Luxembourg's attractiveness is taxation. Both, SICAV and SICAF investment
funds domiciled in Luxembourg are exempt from corporate income tax, capital gains tax,
and withholding tax. They are only liable for subscription tax at a rate of 0.05 percent
on the fund's net assets. Also, favorable terms apply with regards to withholding tax.
Total UCITS funds' AuM grew from EUR 3.4 trillion at the end of 2001 to EUR 5.8
trillion by 2010 with a value of EUR 6.8 trillion at the end 2014. Roughly 85 percent of
the European investment fund sector's assets are managed within the UCITS framework.
On average, 10 percent of European households invest directly in funds: Germany, 16%;
Italy, 11%; Austria, 11%; France, 10%; Spain, 7%; and the UK, 6%.
There have been ve framework initiatives - UCITS I (1985) to UCITS V (2016).
Goals of UCITS IV:
• Increase investor protection by the use of key investor information (KID). KID
replaces the simplied prospectus.
• Increase market eciency by reducing the waiting period for fund distribution
abroad to 10 days.
The Mado fraud case and the default of Lehman Brothers highlighted some weak-
nesses in and lack of harmonization of depositary duties and liabilities across dierent
EU countries leading to UCITS V. It considers the following issues. First, it denes
what entities are eligible as depositaries and establishes that they are subject to capital
adequacy requirements, ongoing supervision, prudential regulation and some other re-
quirements. Second, client money is segregated from the depositary's own funds. Third,
the depositary is confronted with several criteria regarding the holding of assets. Fourth,
remuneration is considered. A substantial proportion of remuneration, for example, and
at least 50 percent of variable remuneration, shall consist of units in the UCITS funds
and be deferred over a period that is appropriate in view of the holding period. Fifth,
sanctions shall generally be made public and pecuniary sanctions for legal and natural
persons are dened. Finally, measures are imposed to encourage whistle-blowing.
The evidence on mutual fund performance indicates not only that these 115 mutual
funds were on average not able to predict security prices well enough to outperform a buy-
the-market-and-hold policy, but also that there is very little evidence that any individual
fund was able to do signicantly better than that which we expected from mere random
chance.
A growth analysis of the top ten global asset managers over the past ve years con-
rms this trend. Vanguard with its emphasis on passive products is the strongest growing
2.8. INDEX FUNDS AND ETFS 89
AM, followed by BlackRock with its passive products forming the iShares family. Both
index funds and ETF aim at replicating the performance of their benchmark indices as
closely as possible. Issuers and exchanges set forth the diversication opportunities they
provide - like mutual funds - to all types of investors at a lower cost as for mutual funds,
but also highlight their tax eciency, transparency, and low management fees. Although
actively managed ETFs were launched around twenty years ago their importance remains
negligible. One major reason is that actively managed ETFs lose their cost advantage
compared to mutual funds. As of June 2012 about 1, 200 ETFs existed in the US, in-
cluding only about 50 that were actively managed.
Example Core-satellite
2.8.1.1 Weighting
Various methods are used for determining the weight of individual members in the index.
Within the same category of members there can be subcategories.
was taken into free oat. This is the most common form of weighting for public
indices and the rule for indices such as S&P, FTSE, MSCI and SMI .
• Equal weighting 2 (Currency Weighting): The CHF weight assigned to each asset
is the same, i.e. Si wi , is the same for each asset. This means that if CHF 500 is
to be invested in a basket of 10 assets, the amount bought of each asset would be
CHF 50.
• Share weighting: The members are weighted proportional to the total number of
tradable units issued, i.e. wi is dependent on the number of the shares outstanding
for the equity asset class.
• Attribute weighting: The members are weighted according to their ranking score in
the selection process. If our ranking is based on ethical and environmental criteria,
and asset Y has a score of 75 and asset X 25, then weight ratio between asset Y
and X will be Weight Y / Weight X = 3.
Free-oating is the portion of total shares held for investment purposes. This is opposite
to shares held for strategic purposes, i.e. for control. Some indices are quoted using
dierent weighting schemes, e.g. MSCI. However, the main quoted value is using the
market capitalization weighting method.
Remark:
The dierence between the asset weighting scheme and the weight of an asset in the
index is as follows. For a price weighted index w1 = w2 for asset 1 and asset 2. However
if S1 /S2 = 3 the weight of asset 1 in the index will be 3 times larger than the weight of
asset 2.
2.8.1.2 Divisor
The divisor is a crucial part of the index calculation. At initiation it is used for normal-
izing the index value. For instance, the initial SMI divisor on June 1998 was chosen to
a value, which normalized the index to 1500. However, the main role of the divisor is
to remove the unwanted eects of corporate actions and index member change on the
index value. It ensures continuity in the index value in the sense that the change in the
index should only stem from the investor sentiment and not originate from "synthetic"
2.8. INDEX FUNDS AND ETFS 91
changes. Corporate actions, which need to be accounted for by changing the divisor
value, are dependent on the weighting scheme used for the index.
An example is the eect of a stock split for
• Market capitalization weighting: The price of stock will be reduced, but the number
of free-oating shares will increase. These two eects will be osetting and no
change has to be made to the divisor.
• Equal weighting 1 (Price Weighting): The stock price reduction will have an eect,
but the number of free-oating share has no impact on such a weighting. Therefore,
the divisor has to be changed, to a lower value, in order to avoid a discontinuity in
the index value.
• Price return index: No consideration is taken to the dividend amount paid out by
the assets. The day-to-day change in the index value reects the change in the
asset prices.
• Total return index: The full amount for the dividend payments is reected in the
index value. This done by adding the dividend amount on the ex-dividend date to
the asset price. Thus, the index value 'acts' as if all the dividend payments were
reinvested in the index.
• Total return index after tax: The dividend amount used in the index calculation
is the after tax amount, i.e. the net cash amount. In contrast, in the total return
index case the gross dividend amount is used.
In addition, if the index constituents have a wide geographical span, there are other
issues that need to be taken into consideration. Some of the rules that need to dened are:
92 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
index value quotation currency, source of currency rates, index opening and closing hours,
and assets registered on multiple exchanges. For most major indices the quotation is real
time and the currency rate used is also real time. The opening hour for the constructed
index starts with the opening of the exchange of any index member, and the closing occurs
when no index member exchange is open. Having a global index, with constituents from
Japan to USA, would mean that the index would be "open" most hours of the day.
One must distinguish between the theoretical index and a strategy that replicates
the theoretical index using securities. The theoretical index is not an investable asset or
security. If we set φi,t for the weight of asset i in the index at time t, with Ri,t the gross
return of the asset in the period t−1 to t, the index value It satises the dynamics
XN
It = It−1 ( φk,t Rk,t ) , I0 = 100 . (2.18)
k=1
The value of the index tomorrow is equal to the present value times the return of each
stock generated until tomorrow weighted by the asset weight. The index fund Ft aims
to replicate (2.18) by investing in the stocks. At each date t the fund has a number nk,t
of stocks k and Ft is equal to the sum of all stocks times their price Pk,t . The dierence
between the values Ft and It is the tracking error where the accuracy of the replication
is often measured with the volatility of the tracking error.
Example
The tracking error (TE) can be calculated directly or indirectly. Consider the follow-
ing returns for a portfolio and its benchmark (market portfolio).
The indirect method uses the following replication of the tracking error. The TE is
equal to buying the portfolio and selling the benchmark. We can use the general variance
2.8. INDEX FUNDS AND ETFS 93
formula for two random variables and choosing the weights φ1 = +1 and φ2 = −1:
The TE is equal to σ. The covariance of the two time series is 0.011 percent. Dividing by
the volatilities of the two time series the correlation factor ρ = 0.89 follows. This gives
the TE per period and scaling it with the square root law the annualized TE of 0.92%
follows.
Example
The divisor is a crucial part of the index calculation. At initiation it is used for
normalizing the index value. The initial SMI divisor in June 1998 was chosen as a value
that normalized the index to 1, 500. However, the main role of the divisor is to remove the
unwanted eects of corporate actions and index member changes on the index value. It
94 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
ensures continuity in the index value in the sense that the change in the index should only
stem from investor sentiment and not originate from 'synthetic' changes. The impact of
corporate actions depends on the weighting scheme used for the index. Consider a stock
split for an index with:
• Market capitalization weighting - The price of the stock will be reduced and the
number of free oating shares increases. These two eects will be osetting and no
change has to be made to the divisor.
• Equal weighting (price weighting) - The stock price reduction will have an eect,
but the number of free-oating shares has no impact on such a weighting. Therefore,
the divisor has to be changed to a lower value in order to avoid a discontinuity in
the index value.
How the dividends are handled in the index calculation determines the return type of
the index. There are three versions of how dividends can be incorporated into the index
value calculations:
• Price return index - No consideration is taken of the dividend amount paid out
by the assets. The day-to-day change in the index value reects the change in the
asset prices.
• Total return index - The full amount for the dividend payments is reected in the
index value. This is done by adding the dividend amount on the ex-dividend date
to the asset price. Thus, the index value acts as if all the dividend payments were
reinvested in the index.
• Total return index after tax - the dividend amount used in the index calculation is
the after tax amount; that is to say, the net cash amount.
Mk,t Pk,t
φk,t = PN (2.19)
j=1 Mj,t Pj,t
with M the number of outstanding shares. The numerator is the market capitalization
of stock k and the denominator is the market capitalization of the index. The weights φ
can change as follows, where we write MC for the index market capitalization:
shares are constant over time, the same holds true for the number of shares N that are
needed to construct the fund. This is one of the main reasons why CW is often used:
the constancy of the shares implies low trading costs. This reason and the simplicity of
the CW approach have made it the favorite index construction method.
Alternative weighting schemes - smart beta approaches - weight the indices not by
their capital weights but either by other weights, which should measure the economic
size of companies better (fundamental indexation), or by risk-based indexation. Most
often, investors will use a mixture of CW and alternative schemes. A rst requirement for
such a mix is that the two approaches show a low correlation. Fundamental indexation
serves the purpose of generating alpha to dominate the CW approach while risk-based
constructions focus on diversication.
Examples of risk weighted allocations are EW, MV, MDP, ERC and MDP. Roncalli
(2014) compares the dierent methods for the Euro Stoxx 50 index using data from
December 31, 1992, to September 28, 2012. He computes the empirical covariance matrix
using daily return and a one-year, rolling window; rebalancing takes place on the rst
trading date of each month and all risk-based indices are computed daily as a price index,
see Table 2.17.
CW EW MV MDP ERC
Expected return p.a. 4.47 6.92 7.36 10.15 8.13
Volatility 22.86 23.05 17.57 20.12 21.13
Sharpe ratio 0.05 0.16 0.23 0.34 0.23
Information ratio - 0.56 0.19 0.42 0.62
Max. drawdown -66.88 -61.67 -56.04 -50.21 -56.85
Table 2.17: Statistics for the dierent index constructions of the Euro Stoxx 50. CW is
capital weighting, EW is equal weighting, MV is mean-variance optimal, MDP is most
diversied portfolio, and ERC is equal risk contribution (Roncalli [2014]).
2.8.4 ETFs
Exchange traded funds (ETFs) are a mixture of open- and closed-end funds. The main
source is Deville (2007). They are hybrid instruments which combine the advantages of
96 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
both fund types. Mutual funds must buy back their units for cash, with the disadvantage
that investors can only trade once a day at the NAV computed after the close. Further-
more, the trustee needs to keep a fraction of the portfolio invested in cash to meet the
possible redemption outows. Closed-end funds avoid this cash problem. Since it is not
possible to create or redeem fund shares, there is no possibility to react to changes in
demand for the shares in such funds: If there are strong shifts in demand, price reactions
follow such as signicant premiums or discounts with respect to their NAV.
ETF trade on the stock market and shares can be created or redeemed directly from
the fund due to the in-kind creation and redemption process.
The in-kind process idea is due to Nathan Most. ETFs are organized as commodity
warehouse receipts with the physicals delivered and stored, whereas only the receipts
are traded, although holders of the receipt can take delivery. This 'in-kind' - securities
are traded for securities - creation and redemption principle has been extended from
commodities to stock baskets, see Figure 2.22.
It illustrates the dual structure of the ETF trading process with a primary mar-
ket open to institutional investors (AP) for the creation and redemption of ETF shares
directly from the fund. The ETF shares are traded on a secondary market. The perfor-
mance earned by an investor who creates new shares and redeems them later is equal to
the index return less fees even if the composition of the index has changed in the mean-
time. Only authorized participants can create new shares of specied minimal amounts
(creation units). They deposit the respective stock basket plus an amount of cash into
the fund and receive the corresponding number of shares in return. ETF share are
not individually redeemable. Investors who want to redeem are oered the portfolio of
stocks that make up the underlying index plus a cash amount in return for creation units.
Since ETFs are negotiated on two markets - primary and secondary market - it has
two prices: the NAV of the shares in the primary market and their market price in the
secondary market. These two prices may deviate from each other if there is a pressure
to sell or buy. The 'in-kind' creation and redemption helps market makers to absorb
such liquidity shocks on the secondary market, either by redeeming outstanding or by
creating shares. It also ensures that departures between the two prices are not too large
since authorized participants in the primary market could arbitrage any sizable dier-
ences between the ETF and the underlying index component stocks. If the secondary
market price is below the NAV, APs could buy cheap ETFs in the secondary market, take
on a short position in the underlying index stocks and, then ask the fund manager to
redeem the ETFs for the stock basket before closing the short position at a prot. Since
ETF fund manager do not need to sell any stocks on the exchange to meet redemptions,
they can fully invest their portfolio and the creations do not yield any additional costly
trading within the fund. Finally, in the US, 'in-kind' operations are a nontaxable event.
Most ETFs track an index and are passively managed. ETFs generally provide diver-
2.8. INDEX FUNDS AND ETFS 97
Primary
ETF pponsor / fund market
Institutional
Creation
investors
Redemption
Stock basket + cash ETF shares in return
in return ETF shares for stock basket + cash
Buy / Sell
Institutional and
retail investors
Cash Cash
Buyers Exchange Sellers
ETF shares ETF shares
Secondary
market
Figure 2.22: Primary and secondary ETF market structure where the 'in-kind' process for
the creation and redemption of ETF shares is showsn. Market makers and institutional
investors can deposit the stock basket underlying an index with the fund trustee and
receive fund shares in return. These created shares can be traded on an exchange as
simple stocks or later redeemed for the stock basket then making up the underlying
index. Market makers purchase the basket of securities that replicate the ETF index
and deliver them to the ETF sponsor. In exchange each market maker receives ETF
creation units (50,000 or multiples thereof ). The transaction between the market maker
and the ETF sponsor takes places in the primary market. Investors who buy and sell the
ETF then trade in the secondary market through brokers on exchanges. (Adapted from
Deville [2007] and Ramaswamy [2011]).
sication, low expense ratios, and the tax eciency of index funds, while still maintaining
all the features of ordinary stock, such as limit orders, short selling, and options. ETFs
can be used as a long-term investment for asset allocation purposes and also to im-
plement market-timing investment strategies. All of these features rely on the above
described specic 'in-kind' creation and redemption principle. ETF are constructed by
index providers, exchanges, or index fund managers (the originators).
The costs of an ETF have two components: transaction costs and total expense ratio
(TER). Transaction costs are divided into explicit and implicit costs. Explicit trans-
action costs include fees, charges, and taxes for the settlement by the bank and the
exchange. Implied costs are bid-ask spreads and costs incurred due to adverse mar-
ket movements. ETFs can be constructed by direct replication (physical) or by using
98 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Trends in ETF investment arise from regulation and investors' desire. From a reg-
ulatory perspective there has been barriers for active managers due to regulations by
Retail Distribution Review (RDR) in UK and MiFID II in the euro zone. But growth
in passive strategies will also be driven by cost transparency and the search for cheap
investments. But also new uses for ETFs will emerge. Institutions will use them to get
access to specic asset class or geographic exposures and retail investors will invest in
ETFs as a lower-cost alternative to mutual funds and UCITS funds. Finally, trends in
the last year are to construct ETF not on an CW basis but on a risk weighted one using
risk parity methods and to focus on risk factors instead of asset classes as underlying
instruments.
This approach minimizes the tracking error for the ETF investor and enables more
underlyings to be accessed. The basket of securities used as collateral is typically not
related to the basket delivered to the swap counterparty, which mimics the index. Why
should an investment bank, as swap counterparty, enter into such a contract, see the next
example.
Example
missing S3 -asset is the tracking error source. The swap counterparty (the investment
bank (IB)) delivers to the ETF sponsor seven securities, C1 ,..., C7 , as collateral. These
assets are in the inventory of the IB due either to its market-making activities or the
issuance of derivatives, i.e. business that is not related to ETFs. When these securities
Ci are less liquid, they will have to be funded either in unsecured markets or in repo
markets with deep haircuts. The IB has, for example, to pay 120% for a security Ci that
is worth only 100% at a given date. Transferring these securities to the ETF sponsor,
the IB may benet from reduced warehousing costs for these assets. Part of these cost
savings may then be passed on to the ETF investors through a lower total expense ratio
for the fund holdings. The cost savings accruing to the investment banking activities
can be directly linked to the quality of the collateral assets transferred to the ETF
sponsor. A second possible benet for the IB is lower regulatory and internal economic
capital requirements since the regulatory charge for less liquid securities Ci is larger than
for the more liquid securities S1 and S2 in the basket delivered by the ETF sponsor.
Summarizing, a synthetic swap has a positive impact on the security inventory costs of
the IB due to non-ETF business or regulatory capital or internal economic risk capital
charges.
The drawbacks of synthetic swaps are counterparty risk and documentation require-
ments (International Swaps and Derivatives Association [ISDA]).
Bond ETFs face typically face huge demand when stock markets are weak such as
when recessions occur. An asset rotation from stocks to bonds is often observed in such
cases. Figure 2.24 shows bond inows of USD 800 billion and equity redemption in long-
only equities (LO equities) after the GFC. In the last years an opposite rotation began
due to close-to-zero interest rates.
Figure 2.24: Bond inows and equity redemptions (BoA Merill Lynch Global Investment
Strategy, EPFR Global [2013]).
Commodity ETFs invest in oil, precious metals, agricultural products, etc. The idea
of a gold ETF was conceptualized in India in 2002. At the end of 2012 the SPDR Gold
Shares ETF was the second-largest ETF. Rydex Investments launched 2005 the rst cur-
rency ETF. These funds are total return products where the investor gets access to the
FX spot change, local institutional interest rates, and a collateral yield.
Actively managed ETFs were oered in the United States since 2008. Initially, they
grew faster than index ETFs did in their three years. But the growth rate was not
2.8. INDEX FUNDS AND ETFS 101
sustainable: The number of actively managed ETFs is not growing since several years.
Many academic studies question the value of active ETF management since they face
the same skill and luck issue as mutual fund and much higher costs than static ETFs.
Example
Consider a LETF with positive leverage factor 2 (bullish leverage). We follow Dobi
and Avellaneda (2012). There are three time periods 0, 1, 2 in the example (see Table
2.18). The index value of the ETF starts at 100, loses 10%, and then gains 10%.
Table 2.18: Data for the leveraged ETF example. tk,− denotes the time tk before adjust-
ment of the TRS and tk,− after the adjustment of TRS.
The initial AuM is USD 1, 000 at day 0, and the AuM is USD 800 at day 1 due to
the 10% drop on day 1:
USD 800 = 1, 000(1 − 2 × 0.1).
This implies a required TRS exposure of 2 × 800 = USD1, 600. The notional value of the
TRS from day 0 has become, at day 1,
1 is USD 1, 600,
This is the exposure before adjustment. Since the exposure needed at day
the swap counterparty must sell (short the synthetic stock) USD 200 = 1, 800 − 1, 600
of TRS. Doing the same calculation for day 2, the AuM is USD 960 and the exposure
needed is USD 1, 920 at day 2. Similarly, on day 2 the swap counterparty must buy a
TRS amount of USD 160 = 1, 920 − 1, 760, where USD 1, 760 = 1, 600 × (1 + 0.1) is the
exposure before adjustment.
Example
We consider the compounding problem for a LETF. Fix an index and a two-time
LETF, both beginning at 100. Assume that the index rst rises 10% to 110 and then
drops back to 100, a drop of 9.09%. The LETF will rst rise 20% and then drop 18.18% =
2 × 9.09%. But 18.18%120 = 21.82. Therefore, while the index has value 100, the LETF
is at 98.18. which implies a loss of 1.82%. Such losses always occur for LETF when the
underlying index value changes direction. The more frequent such directional changes
are - hence it is a volatility eect - the more pronounced the losses.
These examples illustrate that a LETF always rebalances in the same direction as
the underlying index, regardless of whether the LETF is a bullish one (positive leverage)
or bearish one (negative leverage). The fund always buys high and sells low in order to
maintain a constant leverage factor. A similar results holds for inverse LETFs.
The trend of decreasing fees continues. But for the index funds a bottom level seems
to be close. Table 2.19 also considers ETF fees.
Equity Bonds
Mutual funds (*) 0.74% 0.61%
Index funds (*) 0.12% 0.11%
ETFs (**, ]) 0.49% 0.25%
ETF core (**,+) 0.09% 0.09%
Table 2.19: Fees p.a. in bps in 2013 ((*) Investment Company Institute, Lipper; (**)
DB Tracker; (]) Barclays; (+) BlackRock).
2.9. ALTERNATIVE INVESTMENTS (AI) - INSURANCE-LINKED INVESTMENTS103
100
80
60
40
20
0
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
Figure 2.25: Expense ratios of actively managed (upper lines) and index funds (lower
lines) - bps p.a. (Investment Company Institute and Lipper [2014]).
AIs are often dened as investments in asset classes other than stocks, bonds, commodi-
ties, currencies, and cash. These investments can be illiquid. We only consider insurance
linked securities in the sequel. It is estimated that alternative investments will reach
to USD 13 trillion by 2020 up from USD 6.9 trillion in 2014. One expects that more
and more investors can access AIs as regulators begin to allow them access to specic
regulated vehicles such as alternative UCITS funds in Europe and alternative mutual
funds in the US.
This section is based on LGT (2014). Insurance-linked investments are based on the
events of life insurers, and of non-life insurers such as insurers against natural catastro-
phes for example. The main products are insurance-linked securities (ILS such as CAT
bonds) and collateralized reinsurance investments (CRI). The size, in global terms, of
this relatively young market is USD 200 bn as of 2014. Regulation plays a signicant
role in the use of alternatives. The creditworthiness of the insurance and reinsurance
company require large capital basis' from a regulatory perspective for the catastrophe
cases. To reduce the capital charge under Solvency II, the catastrophe part of the risks
is transferred to the capital markets using ILS and CRI.
104 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
2.9.0.1 ILS
Insurance buyers such as primary insurers, reinsurers, governments, and corporates enter
into a contract with a special purpose vehicle (SPV). They pay a premium to the SPV
and receive insurance cover in return. The SPV nances the insurance cover with the
principal paid by investors. The principal is returned at the end of the contract if no
event has occurred. The investor receives, in excess to the principal payback, the pre-
mium and a collateral yield.
An example is the catastrophe or CAT bond 'Muteki'. Muteki SPV provided the
insurance buyer Munich Re with protection against Japanese earthquake losses. Central
to ILS investing is the description of the events. The description has to be transparent,
unambiguous, measurable, veriable, and comprehensive. The parametrization in Muteki
is carried out using parameters from the 1,000 observatories located in Japan that use
seismographs. 'Ground acceleration' is used to calculate the value of the CAT bond
index. This determines whether a payout from the investors to the insurance protection
buyers is due.
7 Figure 2.26 shows the peak ground velocities measured during the 11
March, 2011 earthquake. The star indicates the epicenter; the regions with the highest
ground velocities also experienced the related tsunami.
The insurance industry lost an estimated USD 30−35 billion. The ground acceleration
data became available on 25 March, 2015. Multiplying the ground velocity chart by the
weight-per-station chart of Munich Re implied an index level for the CAT bond of 1, 815
points. This index level led to a full payout from the investors to the insurance buyer
since the trigger level - that is to say, the level of the index at which a payout starts to
be positive - of 984 was exceeded and also because the exhaustion level of 1,420 points
was breached. Hence, investors in this CAT bond suered a 100 percent loss.
2.9.0.2 CRI
In collateralized reinsurance investments (CRIs) the same insurance protection buyers as
for ILS buy insurance cover from an SPV in exchange for a premium. The SPV hands
over the premium and collateral yield to the investor. The investor pays, in cases where
he receives proof of loss, the loss payment to the SPV. Between the investor and the in-
surance buyer a letter of credit is set up to guarantee the potential loss payment. Table
2.20 summarizes ILS and CRI product specications. The ILS pays out if an event is
realized and triggers are met. Then the bond pays out. For the CRI, if and event is
realized and triggers are met, the investor makes a loss payment.
ILS and CRI comprise 13 percent and 18 percent, respectively, of total reinsurance
investments. The remainders are traditional uncollateralized reinsurance investments.
The cumulative issuance volume of CAT bonds and ILS started in 1995, reached 20 bn
7 The exposure of Munich Re in Japan is not uniformly spread over the whole country. The insurer
therefore weights the signals of the measuring stations such that the payout in the CAT bond matches
the potential losses of Munich Re from claims incurred due to the event.
2.9. ALTERNATIVE INVESTMENTS (AI) - INSURANCE-LINKED INVESTMENTS105
Figure 2.26: Ground velocities measured by the Japan's 1,000 seismological observatories
during the earthquake of 11 March, 2011, which also caused a huge tsunami and almost
20,000 fatalities (Kyoshin [2011]).
8 The main intermediaries or service providers to the catastrophe bond and insurance-linked securiti-
zation market in 2014 were Aon Beneld Securities, Swiss Re Capital Markets, GC Securities, Goldman
Sachs, and Deutsche Bank Securities.
Table 2.20: Comparison between ILS and CRI investments (LGT [2014]).
106 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Figure 2.27: Average expected coupon and average expected loss of CAT bonds and ILS
issuance by year (artemis.com [2015]).
Table 2.21: Correlation matrix for dierent asset classes. Monthly data in USD from 31
Dec 2003 until 30 Nov 2014 (LGT [2014], Barclays Capital, Citigroup Index, Bloomberg).
of a nation.
buys shares for the investors but is heavily involved in the management of the company.
9
Fifth, PM transactions are characterized by signicant access to capital and to networks
with strong expertise.
The evolution of PM can roughly classied in three periods. In the area 1970-1990
private markets ment emergence of leveraged buyouts focussed on the US and in the
retail, chemical and manufacturing sectors. Such leveraged buyouts (LBO) mean to buy
a company using a combination of equity and debt where the company's cash ow is used
to repay the borrowed money. Debt is used since it costs of capital are lower than for eq-
uity. Interest payments reduce the corporate income tax liability but dividend payments
based on equity do not. The use of leveraged buyouts led to several defaults of rms
since their debt ratio was too high. This led banks to require lower debt-to-equity ratios.
In the period 1990-2010 private equity became broader in the industries they invested
in (healthcare, education) and PM became a global activity. In the last period starting
after the GFC PM became broader with three pillars: Debt, real estate and infrastructure.
The low interest rate environments made private markets attractive for investors such
as pension funds which before the GFC did not invest in these markets. A study of Tow-
ers Watson in 2017 highlighted that 94% of the actual PM investors will increase or
maintain their private market allocations in the longer-term. The AuM in PM steadily
increased from 2006 USD 1.5 tr to more than USD 4 tr in 2017. Dry powder however
did not increase in the same period but dropped from around 40 percent before the GFC
to values between 30 and 35 percent in the last years. Dry powder refers to highly liquid
securities. If deal activity falls and dry powder accumulate a risky situation can emerge
when investors adds pressure to the PM rm to deploy that capital, i.e. doing transac-
tions they might not otherwise do.
A second observation related PM and public markets in the last 25 years. First, the
number of publicly listed rms dropped from 7'322 in 1996 to 3'671 in 2016 (Credit Su-
isse, Doldge et al. (2016)) and second, private rms stay private longer or even forever.
Facebook for example was founded 2004 and had its IPO in 2012.
The valuation of share in PM and public markets are in 2018 both at historic highs.
The S&P 500 index increased by a factor of almost 2.5 in the last 6 years and EV/EB-
BITA in PM also increased around 40 percent in the same period for value 14x for large
caps and 12x for small and mid caps (Sources: S&P and Partners Group (2018)).
Figure 2.28 shows that operational value creation drives performance more than -
nancial development which is the opposite compared to the Leveraged Buyout period.
A further signicant tendency of investor is abstain from excessive diversication
9 The ten largest PE rms in 2017 according to PEI Media are The Blackstone Group, Kohlberg
Kravis Roberts, The Carlyle Group, TPG Capital, Warburg Pincus, Advent International Corporation,
Apollo Global Management, EnCap Investments, Neuberger Berman and CVC Capital Partners.
108 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
but to search high conviction portfolios. Excessive diversication was a result of lack of
transparency whereas high conviction is the result of experiences and successful selection
of the investments. Typically, in the past institutional investor spread their PM invest-
ments among hundreds of assets. Today, the most successful PM rms see the investment
spreads only across several dozen of assets.
Comparing return in PM after the GFC with public markets, roughly PM have a 3
to 4 percent average higher return than their public counter parts in the equity, debt,
real estate and infrastructure investments and considering maximum drawdowns in the
period 2000-2015, the gures for PM are between 20 and 30 percent lower in the above
four classes compared to the public counter parts.
Some major players in the PM start to oer part of their PM oering to wealthy
private clients or auent clients. This requires to transform some of the PM oerings
into public ones. Since many investors became familiar with PM in the last years they in-
creased their allocations and invest globally. This requires PM rms to consider portfolio
construction techniques on a more sophisticated level than in the past.
2.11. HEDGE FUNDS 109
HFs often have a limited number of wealthy investors. If a HF restricts the number
of investors it is not a registered investment company. It is then in the US exempt from
most parts of the Investment Company Act Of 1940 (the 40-Act). Most HFs in the US
have a limited-partnership structure. The limitation of the number of investors auto-
matically increases the minimum investment amount to USD 1 million or more. Many
HFs do not allow investors to redeem their money immediately. The reason are short
positions of the funds. To reduce this risk, HF needs to pay margins. If short positions
increase, HFs need to add more and more margin and would then eventually face liquid-
ity problems if at the same time investors redeem their money. Mutual funds are not
allowed to earn non-linear fees, while most HFs do charge a at management fee and a
performance fee (rule 2/20 for 2% management fee, 20% performance fee). The business
of running a hedge fund has become more expensive due to the increased regulatory
burden. KPMG (2013) outline the following gures for the average set-up costs: USD
700, 000 for a small fund manager, USD 6 million for a medium-sized one, and USD
14 million for the largest. In all, KPMG estimated hedge funds had spent USD 3 billion
meeting compliance costs associated with new regulation since 2008 - equating to, roughly,
a 10 per cent increase in their annual operating costs. KPMG (2013).
HFs can face losses due to their construction or the market structure even in cases
when there are no specic market events. As Khandani and Lo (2007) state, quantitative
HFs faced a perfect nancial storm in August 2007 in a normal market environment. The
Global Alpha Fund, managed by Goldman Sachs Asset Management, lost 30 percent in a
few days although it claimed to be designed for low volatility and low correlated strategies.
The HF received an injection of USD 3 billion to stabilize it.
10 The main sources are the hedge fund review of Getmansky, Lee, and Lo (2015) and Ang (2013).
11 Fatca, the Foreign Account Tax Compliance Act, is an US extraterritorial regime of hedge fund
regulation. It requires all non-US hedge funds to report information on their US clients. Europe's
Alternative Investment Fund Managers Directive (AIFMD) requires information by any fund manager
independent where they are based if they sell to an EU-based investor.
110 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
The largest HF in 2014, 2017, 209 are shown in Figure 2.22. Total HF size in 2014
was USD 2.85 trillion versus USD 2.6 trillion in 2013. The average growth in HF assets
from 1990 to 2012 was roughly 14 percent per year. The decrease in AuM after the GFC
was fully recovered six years later. The losses incurred during the GFC were around 19
percent, which is only around half the losses of some major stock market indices. In the
period 2009 to 2012, HF performance was lower than the S&P 500, ranging between 4.8
percent and 9.8 percent on an annual basis.
Hedge Funds USD bn 2014 USD bn 2017 USD bn 2019 Growth 14-
Bridgewater Associates USA 87.1 122.2 124.7
40
AQR Capital Management USA 29.9 69.9 62
J.P. Morgan Asset USA 59.0 45.0 47.7
Renaissance Technologies USA 24.0 42.0 110
Two Sigma Investments/Advisers USA 17.5 38.9 51
D.E. Shaw USA 22.2 34.7 62
Millenium Management USA 21.0 33.9 39
Man Group, London UK 28.3 33.9 62 20
Och-Zif Capital Management USA 36.1 33.5 32
Winton Capital Management UK 24.7 32.0 22.1 30
Elliott Management Corporation USA 23.3 31.3 35
The decreases in AuM during the GFC and the European debt crisis from USD 2.1
tr to 1.5 tr show that investors allocate money pro-cyclically to HFs, similar to mutual
funds or ETFs. The following facts regarding the largest HFs are from Milnes (2014)
(the number after the hedge fund's name is its ranking in the list of the world's largest
HFs as of 2014).
• Bridgewater Associates (1). There was a relatively poor performance of the three
agship funds in 2012 and 2013 of 3.5%, 5.25%, and 4.62%. The performance over
ten years is 8.6%, 11.8%, and 7.7%.
• J.P. Morgan Asset Management (2). J.P. Morgan bought 2004 the global multi-
strategy rm Highbridge Capital Management for USD 1.3 billion. Highbridge's
assets have 2004 multiplied by nearly 400 percent to USD 29 billion.
2.11. HEDGE FUNDS 111
• Brevan Howard Capital Management (3). This HF maintains both solid returns
and asset growth - which is the exception of a HF. The agship is a global macro-
focused HF (USD 27 bn AuM), which - since its launch in 2003 - has never lost
money on an annual basis.
• Och-Zi Capital Management (4) oers publicly traded hedge funds in the US with
far greater disclosure than other HFs. Its popularity is mainly due to Daniel Och's
conservative investing style.
• BlueCrest Capital (5) was a spin-o from a derivative trading desk at J.P. Morgan
in 2000. It has grown rapidly and is one of the biggest algo hedge fund rms. Its
reputation boosted up in 2008 when it made large prots while most other HF
facing losses.
• AQR Capital Management (7), co-founded by Cli Asness, gives retail investors
access to hedge fund strategies. Asness is also well-known for his critique of the
unnecessarily high fees charged by most HFs and his scientic contributions.
• Man Group (9) was founded in 1783 by James Man as a barrel-making rm. It has
225 years of trading experience and 25 years in the HF industry. In recent years,
its agship fund AHL struggled due to its performance.
• Winton Capital Management (13) has its roots in the quant fund AHL (founded
1987 and bought by Man Group in 1989). David Harding, like many in the quan-
titative trading eld with a math or physics education was also a pioneer in the
commodity trading adviser (CTA) eld. Winton is the biggest managed futures
rm in the world.
The largest loss a HF has suered was the USD 6 billion losses of Amaranth in 2006.
This loss, of around 65 percent of the fund's assets, was possible due to extensive leverage
and a wrongheaded bet on natural gas futures.
2.11.2.1 HF Strategies
An important selling argument for HFs is that their investment only weakly correlates
with traditional markets. Starting in 2000, correlation between MSCI World and the
112 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
broad DJ CS Hedge Fund Index (HF Index) changed on a two-year rolling basis: Cor-
relation was 0.16 (HF index) in the years 2000-2007 and jumped to 0.8 in 2007-2009
since a signicant number of HFs' managers started 2007 to invest traditionally in stocks
and commodities. Many HF use similar strategies as in factor investing. The main dif-
ference is transparency of the latter one, implementation of the factors as indices and
construction of a cross-asset oering of factors. This main advantages make it attractive
for investor to switch their investments from the more opaque and often more expansive
HF to a factor portfolio.
Figure 2.29: Development of the managed futures industry. Data are from Barclay CTA
index (Gmür [2015]).
The gure shows the strong inow in 2009 after the GFC where managed futures
12 The abbreviation CTA means Commodity Trading Advisors which are heavily regulated in the US
by NFA / CFTC. Typically traded instrument are futures (and options) on equities, equities indices,
commodities, xed income such as spot, forwards, futures and options in FX asset class.
2.11. HEDGE FUNDS 113
were successful and other investments in HF faced heavy losses. The last 4 years show
stagnation in the growth of AuM. Many events in the recent past made trend following
dicult: Euro Sovereign Debt Crisis, Greece, China Crisis 2015, etc. The zig-zag be-
haviour of markets due to such events is the natural enemy for trend models since trend
reversal signals are 'too late'. The largest player as of end of 2017 with around USD 32 bn
is Wynton Capital, followed by MAN HL and Two Sigma Investments. Geographically,
the London area dominates followed by the US and Switzerland. In the last two decades
there has been a signicant shift from the US to London and other European countries.
Are HF fees justied? Titman and Tiu (2011) document that on average HF in the
lowest R2 quartile charge 12 basis points more in management fees and 385 basis points
more in incentive fees compared to hedge funds in the highest quartile. Feng et al. (2013)
nd that management fees act similar as a call option at maturity, and that HF man-
agers can therefore increase the value of this option by increasing the volatility of their
investments. For CTAs one observes that very professional investors in CTAs prefer to
set the xed management fee to zero and instead to share even more than 20% of the
performance fee.
Fees are particularly opaque for double layer funds of funds, see Brown et al. (2004).
They nd that individual funds dominate funds of funds in terms of net-of-fee returns
and Sharpe ratios. The performance fee impacts compensation of HF managers or owner.
While top hedge fund managers can earn billions of USD in one year. This dominates
salaries of bluechip CEO by factors 10 to 30 times.
The fee discussion continues to damage the reputation of HF. California Public Em-
ployees' Retirement System (CalPERS) decided 2014 to divest itself of its entire USD 4
billion portfolio of HF.
Hedge funds often use leverage to boost returns. Since leverage increases both re-
turns and risks, it is most relevant for low volatility strategies. Besides return volatility,
illiquidity is another risk source for leveraged investments, i.e. the loans are linked to
margin calls. This can force HFs to shut down in a crisis when the HF is unable to cover
the large margin calls. Ang et al. (2011): ... hedge fund leverage decreased prior to the
start of the nancial crisis in 2007 and was at its lowest in early 2009 when the leverage
of investment banks was at its highest.
Leverage is not constant over time. Cao et al. (2013) nd that HF are able to adjust
114 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
• new investors are often forced into a one-year 'lockup' period during which they
cannot withdraw their funds,
Such restrictions protect against re-sale liquidations causing extreme losses for the
HF remaining investors. The discretionary right to impose withdraw gates can be very
costly for investors if the losses accumulate during the period where withdrawing is not
possible, see Ang and Bollen (2010). Several studies document a positive empirical rela-
tionship between fund ows and recent performance. HF investors seek positive returns
and ee from negative returns (Goetzmann et al. [2003], Baquero and Verbeek [2009],
and Getmansky et al. [2015]). The relationship between fund ows and investment per-
formance is often non-linear.
13
• Survivor bias and selection bias, i.e. there is a stronger reporting incentive if
returns are positive. This bias increases the average fund's return, ranging between
0.16% − 3%, see Ackermann et al. [1999], Liang [2000] and Amin and Kat [2003].
13 See Aragon, Liang, and Park (2013), Goetzmann et al. (2003), Baquero and Verbeek (2009), Teo
(2011) and Aragon and Qian (2010) report about some non-linear relations.
2.11. HEDGE FUNDS 115
• Backll bias. The primary motivation for disclosing return data is marketing. HF
start to report after they have been successful: They ll in their positive past
returns; the 'backll bias'. Fung and Hsieh (2000) estimate a backll bias of 1.4
percent p.a. for the Lipper TASS database (1994-1998). Malkiel and Saha (2005)
estimate that the return of HFs that backll is twice the return gure for those not
backlling.
Backlling means that part of the left tail loss return distribution are missing in HF
databases. Since large, well-known HFs do not need to engage in marketing by report-
ing to commercial databases also part of the right-hand return tail is missing in the
databases. We recall the ndings of Patton et al. (2013) in Section 2.4.5 about the
revision of previously reported returns.
Given these biases why do databases not correct in a transparent and standardized
way these biases when publishing their data? Figure 2.30 shows that impact if one
corrects for survivorship and backll biases annualized returns half.
Figure 2.30: Summary statistics for cross-sectionally averaged returns from the Lipper
TASS database from January 1996 through December 2014. The last value - box p-value -
represents the p-value of the Ljung-Box Q-statistics with three reported legs (Getmansky
et al. [2015]).
We consider entries and exits in HF. More than twice as many new funds entered
Jan 1996-Dec 2006 the Lipper TASS database each year, despite the high attrition rates.
This process reversed in the GFC period. After the peak number of new HF in 2007 -
2008, the attrition rate jumped to 21 percent, the average return was the lowest at −18.4
116 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
The survival rates of hedge funds is estimated by several authors, see Horst and
Verbeek (2007). Summarizing, 30 − 50 percent of all HFs disappear within 30 months of
entry and 5 percent of all HFs last more than 10 years. These rates dier signicantly
for dierent stlyes, see Getmansky et al. (2004).
Asness (2014) plots the realized alpha of hedge funds over a period of 36 months. He
takes the monthly returns over cash, subtracts 37 percent for the S&P 500 excess return
- which is the full-period, long-term beta - and looks at the annualized average of this
realized alpha (see Figure 2.31).
We observe a decreasing alpha over time which ends up negative in the near past.
Recent years seem to have been particular. Unlike for mutual funds, a number of studies
document positive risk-adjusted returns in the HF industry before the GFC. Ibbotson et
al. (2011) report positive alphas in every year in the period 1995-2009. While the alphas
of the HF industry have been decreasing steadily in the last two decades, correlation
with broad stock market indices shows the opposite evolution.
14 Convertible Arbitrage, Dedicated Short Bias, Emerging Markets, Equity Market Neutral, Event
Driven, Fixed Income Arbitrage, Global Macro, Long/Short Equity Hedge, Managed Futures, Multi-
Strategy, and Fund of Funds.
2.11. HEDGE FUNDS 117
Figure 2.31: Average monthly returns (realized alpha) of the overall Credit Suisse Hedge
Fund Index and the HFRI Fund Weighted Composite Index for a rolling 36 months
(Asness [2014]).
• Agarwal and Naik (2000a), Chen (2007) and Bares et al. (2003) nd performance
persistence for short periods.
• Brown et al. (1999) and Edwards and Caglayan (2001) nd no evidence of perfor-
mance persistence.
• Fung et al. (2008) nd a positive alpha-path dependency. Given a fund has a
positive alpha, the probability that the fund will again show a positive alpha in
the next period is 28 percent. The probability for non-alpha fund is only half of
this value. The year-by-year alpha-transition probability for a positive-alpha fund
is always higher than that of a non-alpha fund.
We consider the performance of the CTAs Winton and Chesapeake. Starting with
USD 1 of investment in October 1997 until the January 2013 (Quantica [2015]), the rst
118 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Figure 2.32: Monthly return distribution for Faireld Sentry (line) and S&P 500 (dots)
returns (Ang [2013]).
CTA pays out around USD 9 ad the end of 2013 and the second one USD 18. Both CTAs
had positive return until the GFC. Then Chesapeake's volatility started to increase and
the positive past trend became essentially a at one. This behaviour is typical for other
CTAs too. For Winton, there is almost no suering of return during and after the GFC.
The reason is risk. Winton takes much less risk than Chesapeake. Why can a CTA strat-
egy work? Empirical evidence for the equity index market shows that skewness and the
Sharpe-ratio are highly positively related in equity markets: Investors are compensated
with excess returns for assuming excess skewness rather than excess volatility. Trend-
following strategies which oer positive risk-premia with positive skewed returns. Market
participants often belief that hedge funds are excessively using short strategies. This is
not the case for CTAs - around 80% of the investments are long-only strategies and 20%
use short strategies.
Figure 2.33 shows the attribution of the prot and loss to the dierent asset classes in
the last decade. During the GFC, CTA did not produced a positive return by huge short
positions in equity markets but by long positions in the trend model for xed income:
The decreasing rates in this period where a constant source of positive returns.
Figure 2.33: Annual sector attribution of the prot and loss for the Quantica CTA
(Quantica [2015]).
study of Aragon and Martin (2012) gives evidence that HF successfully use derivatives
to prot from private information about stock fundamentals. Cao et al. (2013) nd that
HF managers increase (decrease) their portfolios' market exposure when equity market
liquidity is high (low), and that liquidity timing is most pronounced when market liquidity
is very low.
They consider equity long/short strategy, emerging markets, equity market neutral,
event driven, and global macro strategies. The main results are that the majority of
funds are still zero-alpha funds (ranging from 41% to 97% for dierent strategies) similar
to mutual funds. But there is a higher proportion of positive alpha funds compared to
mutual funds (0%−45%) and the proportion of negative-alpha funds ranges between 2.5%
and 18.6%. The highest skilled funds are emerging market strategies, followed by global
macro and equity long/short. The proportion of skilled or unskilled funds is dierent
for dierent market stress periods. But there is not an uniform decline of skilled funds
120 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
observed over the period from 1992 to 2006 as for mutual funds. This is some evidence
that successful mutual fund asset managers moved to the HF and/or that markets are
less ecient for HF strategies than for mutual fund ones.
Figure 2.34: Monthly correlations of the average returns of funds for the 10 main Lipper
TASS hedge fund categories in the Lipper TASS database from January 1996 through
December 2014. Correlations are color-coded with the highest correlations in blue, in-
termediate correlations in yellow, and the lowest correlations in red (Getmansky et al.
[2015]).
Getmansky et al. (2015) use a factor model based on PCA to gain more insight into
possible correlations. The size of the eigenvalues indicates that 79% of the strategies'
2.11. HEDGE FUNDS 121
The heterogeneity and commonality among HF styles is shown in Figure 2.35. Ded-
Figure 2.35: Summary statistics for the returns of the average fund in each Lipper TASS
style category and summary statistics for the corresponding CS-DJ Hedge Fund Index
from January 1996 through December 2014. Sharpe and Sortino ratios are adjusted for
the three-month US treasury bill rate. The 'All Single Manager Funds' category includes
the funds in all 10 main Lipper TASS categories and any other single-manager funds
present in the database (relatively few) while excluding funds of funds (Getmansky et al.
[2015]).
icated Short Bias underperformed all other categories. Multi-Strategy hedge funds out-
performed Funds of Funds, Managed Futures funds' returns appear roughly IID and
Gaussian. The returns of the average Convertible Arbitrage fund are auto-correlated
and have fat tails. The styles Long/Short Equity, Event Driven, and Emerging Markets
122 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
funds have high correlations with the S&P 500 total return index between 0.64 − 0.74.
Return volatility of the average Emerging Markets fund is three times greater than for
the average Fixed Income Arbitrage fund.
The CTA Quantica shows a low correlation with the traditional asset classes inclusive
10 − 15% correlations to the S&P 500, USD Gov
the global hedge fund index: betweeen
Bonds 3-5y and GSCI commodity index, 24% to the HFRX Global Hedge Fund Index and
68% to the Newedge CTA index. The large correlation with the CTA index indicates
that many CTA are using similar models - trend-following models which are broadly
diversied. Although CTAs show a persistent upwards drift in the long run (see Figure
2.36), they may well suer from temporary heavy losses.
Figure 2.36: Drawdown periods for S& P500 total return, GS commodity total return
index and Barclays US Managed Futures index BTOP 50. Data are from Dec 1986 to
Mar 2013 (Bloomberg).
It follows that the CTA index shows much less heavy drawdowns than an equity
or a commodity index. The main reason is discipline in investment. This has two
components. First, CTAs are fully rule based. If a stop-loss trigger is breached losses
are realized. Second, CTAs allocations are risk-based where again, the risk attribution is
carried out mechanically. CTAs therefore follow the investment advice of David Ricardo
written in The Great Metropolis 1838: Cut short your losses, and let your prots
run on.
2.12. AM INNOVATION - VIEWS ON DISRUPTION 123
But digital disruption has a much broader meaning than eciency. Innovation and
the new entrants, FinTechs and the Tech Giants, can disrupt ownership of the FI both
on the production and the customer side. FinTech innovation is driven from an end-
customer perspective and customer will follow their business model. Hence FI have to
adopt this view too and leave their bank centric approach. Since FI can integrate or copy
the solutions of the many FinTechs they are not a real threat. But the few Tech Giants
are. They already have a broad customer base, a technological advantage and almost
unlimited resources. The Revised Payment Service Directive of the European Union,
eective 2018, is an example. It has disruption potential since it is an important step
towards an open nance system. That is a system where the end-costumers choose their
best products and services from a platform. The FI deliver their services and products to
the platform. Compared to the traditional business model in an open nance economy
the link end-costumer / FI is broken and the FI are in competition on the platform.
Clearly, in an open economy business becomes much less protable unless the FI owes
the platform. The above directive is a regulatory driver for Tech Giants to oer their
superior data analytic capabilities to end-customers.
Besides breaking the customer-FI link, new entrants can act disruptively by changing
the topology of the nancial architecture (Blockchain, cryptocurrencies, platforms). This
means to redene the market participants connections and to reallocate ownership rights
in the architecture. Their broad assumption is that the action space of FI in the value
chains can be largely reduced, sometimes even completely replaced. In fact, technology
is able to replace monopolistic or oligopolistic ownership of key centralized FI functions
by decentralized solutions based on game theory (aka blockchain). This denes the in-
frastructure channel of digital disruption in the nancial industry.
The internet revolution in the 90s revolutionized the ow of information by sending
information quickly and free of charge to many individuals. This is an ecient way
to copy and distribute information. But nancial intermediation is based on asset val-
124 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
ues and their distribution based on contracts. To revolutionize the existing generation
and ow of values, the internet solution approach is useless: Copying a USD 10 bill for
payment purposes is useless. In a digital value ow, someone has to validate that the
payer owes USD 10, that he has not promised it to anyone else and that the millions of
payments in the system are synchronized to prevent fraudulent actions. Formally, trans-
action feasibility, transaction legitimization and transaction consensus have to assured
for each transaction at each date. In the current nancial world, banks, central banks
and exchanges are oering and owning these functions: They provide a payment system
and they validate as third parties the transaction (consensus). Bitcoin based on the
mutually distributed ledger technology proved that complete digitized payment systems
and currencies are possible were code and mathematics replaces all functions of the FI
in at money banking. Whether the thousands of cryptocurrencies survive is not clear.
Each currency needs to reect an economic value and not just a believe of investors,
they need to be competitive (transaction fees, speed, security), ecological sound (energy
consumption) and address the monetary perspective (sti supply side).
The attack on the end-customer /FI link is dierent in nature. The iPhone method
of integration makes it possible to integrate all FI activities in such a unique device.
Furthermore, the methods to express and analyze customers needs will in the very near
future replace the capabilities and quality of any relationship advisor. This sets the stage
for an Open Finance system: They want to have a single access to an intelligent platform,
where they can decide in a user experienced way. The FI, if they do not owe the platform,
are reduced to product service providers and to running the accounts in the background.
The quality of the digital services will ultimately create a time and location independent
emotional relationship with the end-customers. At this stage denitively there will be
no further need for a human FI interaction. Some FI have proven to be able to adapt
quickly to a new environment and they use their powerful resources to act as a shaper.
The nancial crisis 2008 can be considered a starting point for digital disruption in
the nancial sector. Of the 248 surveyed European FinTechs in the Roland Berger (2016)
study, 15 were founded before 2008 and the rest after the nancial crises. Three triggers
2.12. AM INNOVATION - VIEWS ON DISRUPTION 125
cumulated in this period: The iPhone made it impossible to empower the end client,
FI had to spend many their resources to meet the regulatory avalanche, and FI had to
increase protability by lowering costs. The survey of McKinsey (2015) for the sample
of more than 120 000 FinTech start-ups states:
• Target clients: 62% of the start-ups target private customers, 28% SMEs and the
rest large enterprises.
• Function: Most start-ups work in the area of payment services (43%) followed by
loans (24%), investments (18%) and deposits (15%).
Even FinTechs consider Tech Giants to be more dangerous for FI as they are them-
selves (Roland Berger (2016). The Big Four Amazon, Apple, Google and Facebook are
examples for Tech Giant entrants in the nancial sector. Our Western-centric view is
to simplify the discussion. For each of the Big Four there is a comparable and equally
successful Chinese counterpart.
15 While the Big Four are less agile than FinTechs, their
almost unlimited resources, their strong client basis and their technological advantage
make them a real threat for FI.
Although it has long been speculated that the Big Four will enter the banking business
on a large scale so far this does not happen. Google has a banking license for Europe
since 2011, Facebook requested one but nothing happened so far. One can speculate
about the reasons: More protable alternatives, to heavy regulatory costs or business
risk such as the program AdWords Business Credit which was discontinued? Facebook
could oer banking services to its 1.5 billion users which are living in countries with a
non-stable political, nancial, social and legal system. Apple which is active with Apple
Pay could do a lot more. It is meaningful what disruption could mean. One scenario
is the the Big Four to become full FI. But they could also prefer to take over the end-
customer interface due to their superior technology and data analytics methods. This
latter model ts well into the so-called open nance paradigm where end-customers are
self-decision makers, they are connected to a platform or a cloud where data analytics
methods provide decision making services for portfolio management for example, where
the data of the customers in dierent FI are aggregated in the platform, where the best
FI is selected to deliver products and services once an end-customer made its decision.
Hence, FI become pure product providers and lose the interface to their clients. Since
2018, the Revised Payment Service Directive (PSD2) from the European Union points
in this direction and puts core banking functions under stress. PSD2 obliges banks
which are active in payments to reveal customer data to third parties if the customers
wishes to do so. Banks could then lose a main part of their value chain since the Tech
Giants could use their excellent analytics to provide services to the end-clints. Banks
will defend their value chain. Their main weapon is the existing payment infrastructure
15 JD.com, Renren, Baidu, Sina, Tencent and Alibaba are counterparts to the Big Four. Tencent for
example started with a market cap of USD 210 m at IPO in 2004 which is up to USD 233 bn according
to Bloomberg. The user base of their application WeChat grew from 50 million in 2011 to over 800
million in 2016.
126 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
which they build up such as IBAN and SWIFT. They will price costs to the new entrants.
• Eciency channel.
• Customer-centricity channel.
Disruptive eciency is the classical view of nancial intermediaries: All banks are
looking at ways to cut costs and also generate more revenues. Ermotti (2016) . Digital
eciency has a dierent meaning than past automation based eciency which meant
to digitalize the information workow in a value chain to reduce human activity and to
reach scalability: Doing the same at lower costs, with fewer errors and using scalability.
Disruption means to redesign the workows and to eliminate humans to a before not seen
degree using Bots, avaters or smart contracts. They are not only digitized legal docu-
ments - say trade conrmations - but they also contain code which make it possible that
the documents manage themselves over the life cycle. Platforms are a second example
of disruption. While platforms since ever changed the connectivity of the participants,
present platforms possess two new features: Not only numerical information ows but
any form of information gained from structured and unstructered data and platforms are
using AI to analyze, manage, control and direct the information ow in the platform.
How will asset protection considered in a digital world where cyber criminals, govern-
ments and bank defaults dene risk sources? The FinTechs do not have the reputation
and size for protecting accounts, insurance contracts and security deposits. The Tech
Giants' reputation is decreasing, in particular in the US and Europe. In the FinTech
study conducted by Roland Berger (2016), the European FinTechs mentioned customer
trust to the nancial intermediaries as the only success factor for nancial intermediaries.
Their protection function works since decades. So far there is no strong alternative to FI
regarding safe keeping of money and nancial assets.
The WEF 2015 document The Future of Financial Services (2015) (FFS) summarizes
and extends the discussion. The paper identied 11 clusters of innovation in six functions
of nancial services, see Figure 2.37.
The approach of considering six independent intermediary functions and identifying
within these functions the eleven clusters is a silo business view. The clusters can be
grouped into six themes that cut across traditional functions:
• Niche, Specialised Products: New entrants with deep specialisations are creating
highly targeted products and services, increasing competition in these areas and
creating pressure for the traditional end-to-end nancial services model to unbundle.
2017 the FFS paper was reconsidered and updated. Some expected trends materialized
in the two years period while for others the expectations were revised due to lack of
demand, technological immaturity or regulatory considerations.
Two years later in 2017 the working group of the WEF published a status report.
The main ndings are:
• Fintechs have seized the initiative dening the direction, shape and pace of inno-
vation across almost every subsector of nancial services.
• Fintechs have reshaped customer expectations, setting new and higher bars for user
experience.
128 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Figure 2.37: The six functions (payments, market provisioning, investment management,
capital raising, deposits and lending, and insurance) and the 11 innovation clusters (new
market platforms, smarter & faster machines, cashless world, emerging payments rails, in-
surance disaggregation, connected insurance, alternative lending, shifting customer pref-
erences, crowd funding, process externalization, empowered investors) (The Future of
Financial Services [2015]).
• Failure: Customer willingness to switch away from incumbents has been overesti-
mated.
• Fintechs have struggled to create new infrastructure and establish new nancial
services ecosystems.
We close this section with sentiments of people about the digital disruption. On a
broad scale, two-thirds of 400 CEOs in the US surveyed by KPMG in 2016 believed that
the next three years will be more critical for the business performance of their companies
than the past 50 years. Additionally, Grossman (2016) states in a CEO survey, based
2.12. AM INNOVATION - VIEWS ON DISRUPTION 129
on Russel Reynolds Associates, that there are only three industry sectors, Health Care,
Asset Management and Industries, where less than 50% of the CEOs expect massive
or moderate digital disruption. For Media, Consumer Financial Services and Telecom
more than 60% expect such a scenario. 34 percent of all 4.000 Chief Information Ocer
respondents surveyed in more than 50 countries note that digital disruption is already a
reality in their companies, and further 28% say that this will happen in the next 1 to 2
years (Harvey Nash (2015)). Those responsible for information expect a much stronger
disruption for the services industry due to the lack of physical components than for the
processing, pharmaceutical or energy sector.
It is a scientic fact that nancial literacy of the population is at a low level. Hence,
any link to end-customers which is not based on a rational but an emotional paradigm
is likely to win the end-customers connection battle. Although in principle an emotional
link can be formed by using a human interaction with the end-customer this approach
is not scalable: A client advisor has in Europe between 100 and 400 clients to serve.
Therefore a digital link is a more promising solution. But how can a communication
between a human and a software generate an emotional basis? The software needs to
care and perform. This means to understand the customer's needs in his life cycle context
and to give meaningful advice. AI is pointing in this direction. If this is possible why
should end-customers bother that they do not communicate with a human?
• Product management.
• Solution providers.
The front oce consists of the distribution channel and the investment process. In
this part of the chain the investor's preferences, risk capacity, and the type of investment
delegation (execution-only, mandate, or advisory) are dened. All communication to end
clients is made via this channel - new solutions, performance, risk reporting, etc. The
investment process, headed by the CIO, starts with the investment view applied to the
admissible investment universe. The view is then implemented by portfolio managers
where dierent procedures can be followed. More precisely, the investment process has
the following sub-processes for mandate clients:
130 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
The middle oce is responsible for reporting and for controlling the client portfolio
with respect to suitability, appropriateness, performance, risk and it also constructs the
eligible client portfolio. The back oce is responsible for the execution and settlement
of the trades.
The product management denes for the investor an eligible, suitable and appropriate
oering. It is also responsible for overall governance, such as market access and regu-
latory requirements. The product management strategy tries to understand where the
market is headed, how this compares with current products, client segments served, and
rms' capabilities, and how competitors price their services in dierent channels. Product
managers anticipate the people, process, and technology requirements for the product.
They also assess gaps versus current capabilities and propose counter measures. A main
function is the new-product-approval (NPA) process oce. This oce guarantees both
an optimal time-to-market and an eective implementation of new products. Finally,
product management also oversees out- or insourcing opportunities in the business value
chain. The solution providers in the investment process provide the building blocks for
2.12. AM INNOVATION - VIEWS ON DISRUPTION 131
implementing the portfolios including funds, cash products, ETFs and derivatives.
The infrastructure layer naturally develops, maintains, and optimizes the IT infras-
tructure for the several functions of the business layer. The technology ocer oversees
the developments in technology and data management and considers the out- or insourc-
ing opportunities along the infrastructure value chain.
To deal with the digital disruption, many leading companies are looking at their
businesses and operations anew, taking something of a 'blank sheet of paper' view of
the world. Many outsource important parts of their back oces (NAV calculations, 'on-
boarding', investor statements, etc.), largely as a reaction to investor pressure following
the scandals, see Section 2.4.5. According to PwC's recently released Alternative Ad-
ministration Survey, 75 percent of alternatives fund managers currently outsource some
portion of their back oce to administrators and 90 percent of hedge funds behave in
this way. While the initial experience has been mixed in many respects, it has helped to
rethink business from scratch.
Externalization of processes is a key strategy for FI in the digital world. FFS classies
dierent innovations in process externalization:
• Cloud computing to improve connectivity with and within institutions. This allows
for simpler data sharing, lowers implementation costs. streamlines the maintenance
of processes, and enables real-time processing.
• Platform, real-time databases or expert systems, leverage automation for the users
and the solution providers.
• Capability sharing between institutions frees them to build up all possible capabil-
ities and allows integration of dierent legal and technical standards.
• External service providers give small and medium-size asset managers access to
sophisticated capabilities that were not previously attainable due to lack of scale.
This gives access to small and medium-size asset managers to top-tier processes
and smaller players are able to compete with large incumbents.
• Cross-border oering become protable with well controlled conduct and regulatory
risk due to the platforms. But it could also amplify the risks of non-compliant activ-
ities and unclear liabilities when centralized externalization providers fail. Automa-
tion also increases the speed at which nancial institutions implement regulatory
changes. Therefore, regulators will receive faster consistent inputs from nancial
institutions.
• Since more capabilities, technologies, and processes are externalized, asset manage-
ment rm becomes more dependent on third parties, lose negotiating power and
continuity.
FundApps in the future, they could ensure consistent compliance across nancial institu-
tions, make dissemination of regulatory changes in disclosure regimes faster, and reduce
the compliance burden faced by the industry. FFS.
Figure 2.39: Left Panel: ESG Classication of Nasdaq. Right Panel: Scoring of Vadofone
using the Renitiv scoring system.
16 Sources: Andersson et al. (2016), Global Sustainable Investment Alliance (2017), Van Duuren et al.
(2016), Hong and Kacperczyk (2009).
134 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
analysis data starting from the GFC; thereby avoiding the production of noisy and non-
robust results that do not reect the current behavior of how ESG is used nowadays.
They use the ESG metrics for each company provided by the Amundi ESG Research
department which are not public available but the scoring system depends on the data
of four external providers. The data are cleaned, normalized, checked by data analysts,
and the
They consider ve investment universes covered by MSCI indices North America,
EMU, Europe ex EMU, Japan and World and three quarterly rebalanced strategies from
Jan 2010 to Dec 2017: active management, passive management or optimized index
portfolios and factor investing. Standardization means to eliminate geographical or sector
biases. For each stock i, its corresponding ICB industry sector code is denoted by I(i)
and its score at time t is denoted by S(i, t). The Z-score is dened as:
S(i, t) − S̄(i, t)
Z(i, t) =
σ(i, t)
where the bar values are the average values of the sector.
Figure 2.40: Annualized return of ESG sorted portfolios. Sorted portfolios following
Fama and French (1992) are constructed. Stocks are quarterly ranked with respect to
their score forming ve quintiles Q1 , with the equally weighted portfolio Q1 corresponding
to the 20 percent best-ranked stocks. Roncalli et al. (2018)
2.13. ESG INVESTING 135
A main result is hat the impact of ESG is highly dependent on the time period.
Before , the investment universe or the strategy. There is no evidence of a consistent
reward of ESG integration in stock prices between 2010 and 2013 although in each period
there is a variability between the ve regions, see Figure ??. But for 2014 and 2017 most
indicators are positive. In North America, buying the best-in-class stocks and selling the
worst- in-class generated an annualized excess return of 3.3 percent and 6.4 percent for
the eurozone. We refer for the relative impacts of the three factors E, S and G to the
paper.
For institutional investors which prefer to implement ESG passively by using opti-
mized tracking error between the benchmarking portfolios and a non-ESG based SAA. It
follows that improving the normalized ESG score implies to accept an increase in tracking
error. Being an ESG investor requires taking on a tracking error risk. This integration of
ESG in passive management reduced performance between 2010 and 2013 but improved
annualized return between 2014 and 2017.
The authors characterize the asset pricing implications in order to better identify
and understand the drivers of performance. Figure ?? shows four possible hypothesis
between the ESG score and return or risk, respectively.
guration (b) is not observed in North America and the Eurozone whatever the score
used. But a skewed-risk market con
136 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Table 2.23: Asset values in bn USD. Soure: Global Sustainable Investment Alliance.
Sustainable investments extend across the range of asset classes. The majority of 51
percent of the assets were allocated to public equities, followed by xed income with 36
percent. Real estate/property and private equity/venture capital each held 3 percent of
global sustainable investing assets.
Data show that humankind is facing a climate change which largely will be irre-
versible. Global temperature anomalies of the recent past compared to the 19511980
show that the last years were the warmest one, see Figure 2.42.
Energy demand will further increase due to population growth, progressing indus-
trialization and increasing wealthiness. This will without any countermeasures increase
human made CO2 emissions. But keeping in line with the 2 degree goal a drastic reduc-
tion of CO2 emission is needed: For most countries 30 percent is ok but better would be
50 percent.
As stated above, we do not focus on governmental law or laisser fair but we consider
cases where it pays economically to reduce CO2 emission and to invest in energy eciency.
The results depend on the following facts:
2.14. GREEN INVESTING 137
Figure 2.42: 2015 was the warmest year in the NASA/NOAA temperature record, which
starts in 1880. It has since been superseded by the following years (NASA/NOAA; 20
January 2016).
138 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
• There is enough clean energy, i.e. energy which can be used to replace CO2 emitting
energy.
• There exist nancial market solution which match investor's demand for sustainable
investment with the demand for energy project nance.
Figure 2.43 shows the impact of potential climate changes on dierent dimension of
humanity and the ecosystem.
Figure 2.43: Impact of potential climate change. Sources: Stern Review, IPCC, 4th
Assessment Report, Climate Change 2007, WWF and Credit Suisse 2011.
The increase in CO2 over the last 100 years has lead to a measurable change in cli-
mate. The data from researchers show that climatic change less related to risk but more
to uncertainty: There is lack of knowledge about the speed, the irreversibility, possible
feedback eects and some hidden non-linearities. The impact of the worst case scenarios
on GDP forecasts a drop between 5 and 20 percent. Estimates of Credit Suisse and
2.14. GREEN INVESTING 139
other institutions state that investment ows between USD 700 and 2000 billion per an-
num is required over the next decade to limit warming to 2 degrees Celsius.. This would
mean around 2 percent of global GDP per annum. The majority of the required cap-
ital investment is concentrated in low carbon energy, energy eciency, and low carbon
transport infrastructure. Low carbon energy is primarily linked to investment in renew-
ables, electricity infrastructure like grids and transmission and storage. The opportunity
is concentrated in China, the US and the EU27. They represent nearly 60 percent of
the mitigation cost. Figure 2.44 shows the distribution of the investments necessary to
achieve the 2 percent pathways. The matrix has the dimensions geographical location
and area of investment. The authora state dierent type of barriers aecting the current
decarbonization eorts. Since regulatory mechanisms do not exist yet which price the
externalities of carbon emissions technical and nancial barriers exist: The economics of
low carbon projects are often less attractive than those of their high carbon alternatives.
Structural barriers include network eects (consumer will not buy electric cars unless
there are workable and available charging solutions, but private investor hesitate to build
a charging network unless there is sucient demand), agency problems (the party making
low carbon investment is under existing structures often not the one which will benet
from the savings) or the status quo bias (strong bias towards maintaining the status quo
instead of making changes).
• Political risk. Countries such as Algeria, Libya, Saudi Arabia given their natural
oil resources have no interest in a solar energy project. But is not the case; in fact
Saudi Arabia owes leading solar energy institutes. Furthermore, the solar energy
project will be benecial for employment and job creation in these countries to a
far larger extent and due to the excess solar energy these states will be able to
create new farming land.
• Why should these countries produce energy for Europe given their own need? This
point led also to heavy debates in Europe and to a standstill of the project.
• Another risk factor is political instability. The events starting in 2011 demon-
strate that this risk exists and that without the protection of an army the project
cannot be sustainable. But which army should guarantee the functioning of the
technology?
140 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Figure 2.44: Annual investment required to achieve 2 degrees Celsius pathway is USD
700 bn. Sources: Credit Suisse/WWF analysis based (2011) on McKinsey's Climate Desk
tool.
2.14. GREEN INVESTING 141
• A further political risk is that the middle and northern part of Europe will depend
on a single point of entry in the Mediterranean region.
• Technology and nancing risk. Natural damage risks and energy losses in
the transport are not material. But the need to construct a new powerful energy
infrastructure in Europe triggers delicate nancial issues.
Besides the DESERTEC example other examples show the risks of large scale environ-
mental projects. The Lisbon Strategy, adopted in 2000, largely failed on its three pillars,
where the environmental pillar recognized that economic growth must be decoupled
from the use of natural resources. The overly complex structure with multiple goals and
actions, an unclear division of responsibilities and tasks and a lack of political engage-
ment from the member states let to its failure and GFC was then the nal blow to the
strategy. At the Spring Summit 2010 EU leaders endorsed the European Commission's
proposal for a Europe 2020 strategy. This new strategy puts knowledge, innovation and
green growth at the heart of the EU's blueprint for competitiveness and proposes tighter
monitoring of national reform programmes, one of the greatest weaknesses of the Lisbon
Strategy.
Another example is water pollution in Switzerland in the 60s of last century. Den-
ing incentives and providing nancial support by the Swiss government a new industry
emerged (clarication plants) and treatment of farming land was changed. After some
decades water from Swiss lakes or rivers is often potable. In the US, acid rain led to the
implementation of the Clean Water Act which solved also the problem.
to emit some portion of the region's total amount. If an organization emits less than its
allotment, it can sell or trade its unused permits to other businesses that have exceeded
their limits. Entities can trade permits directly with each other, through brokers, or in
organized markets.
The rst green bond was issued 2007 by the European Investment Bank (EIB) and
World Bank. For more details see www.climatebonds.net where the following data are
taken from. In November 2013 the rst corporate green bond was issued by a Swedish
company. Tesla Energy issued the rst solar ABS in November 2013. The biggest ABS
issuer is Fannie Mae. ABS includes solar ABS, green MBS, green RMBS, green CMBS,
and other types. The green bond market 2018 issuance reached USD 167.3 bn with over
USD 500 bn currently outstanding.
Using debt capital markets to fund climate solutions
The majority of the green bonds issued are green 'use of proceeds' or asset-linked
bonds. The following products fall in the category green bond, taken from www.climatebonds.net.
• 'use of proceeds' bonds. Proceeds from these bonds are earmarked for green
projects. The same credit rating applies as issuer's other bonds. Barlays Green
Bond are an example.
• 'use of proceeds' Revenue Bond or ABS Earmarked for nance of green projects
Revenue streams from the issuers though fees, taxes etc are collateral for the debt.
The Hawaii State ABS is backed by fee on electricity bills of the state utilities.
• Project Bond. This are ring-fenced for the specic underlying green project. Re-
course is only to the project's assets and balance sheet. Example is the Invenergy
Wind Farm bond which is backed by Invenergy Campo Palomas wind farm.
• Covered Bond. They are earmarked for eligible projects included in the covered
pool. Recourse is to the issuer and to the collateral pool. The Berlin Hyp green
Pfandbrief is an example.
• Loan. A loan is not a security. Loans are earmarked for eligible projects and full
recourse to the borrower in the case of unsecured loans and in the case of covered
bonds to the collateral. Examples are MEP Werke, Ivanhoe Cambridge and Natixis
Assurances (DUO).
Benets for issuers outweigh their additional costs compared to non-green bonds since
issuers must track, monitor and report on use of proceeds. The benets for the issuers
are reputation, branding and build up of know about environmental investments.
2.14. GREEN INVESTING 143
Green bonds are at and the same as for ordinary bonds, i.e. they are pari pasu to
vanilla issuance. As an outlook, investors with $ 45 tr of assets under management have
made public 'commitments' to climate and responsible investment. This is around 50
percent of all AuM.
• Energy solution provider. A corporate provides the technology to realize the energy
cost gains. The energy solution should lead to a substantial reduction in energy
costs.
As an overview the following gures hold as rough rules in case buildings are made energy
ecient. The data are from Siemens (2015).
Measure and Visualize means that a rm makes transparent its energy consumption
at well-chosen location within the rm. Elevators are often used since most people in
an elevator search for a x point to focus on or the entrance lobby is also well suited.
It has by now been reported in several studies that simple transparency or monitoring
without any other actions leads to an approximate energy reduction of about 10 percent.
It seems that such a transparency changes behavior of some employees leading to this
reduction.
When it comes to nancing the project a major requirement is that the project also
makes economic sense. That is, we require Gain > 0. The gain is a sum of investment
costs I and the savings of energy costs over time. Saving of the energy costs has four
risk sources:
144 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
ct = c̄ + dct
with c̄ the expected amount of saved energy once the project is nished and dct
the risk of deviation from the expectation.
• Energy price risk. I.e. the price pt of saved energy (oil, electricity, a mixture of
them) is equal to
pt = p̄ + dpt
with p̄ the forward/futures prices and dp the deviation risk from the forward prices.
• The last risk is counter party risk of the energy solution user - here the city admin-
istration. Depending on type of nancing the project the counter party risk matter
for the investors or not, see below for details. We write default risk is in the form
u = 1 − dk with 1 for not-defaulting and dk for the expected default rate.
The gain of the project can be written symbolically - i.e. without using summation and
discounting notation, but focussing on the dierent parts in the gain function - as follow:
¯
I, expected investment costs;
dI, investment risk;
c̄ × p̄, estimated savings (costs and volume);
Gain = dc × p̄, volume risk;
c̄ × dp,
energy price risk;
dc × dp,
cross risk;
dk × c × p,
default risk.
This denes the risk prole for the city without any structuring of risk. Therefore, the
next question is: Who bears which risk? Professional technology provider keep the
investment and volume risk due to their experience and their large project portfolio.
That is variation in these two factors are absorbed in a large project portfolio. Consider
an investor. The investor is willing to pay the expected investment costs I¯ in exchange of
participating at the future energy saving. That is, the city and the investor share future
energy savings: The city participates with c̄× p̄×a and the investor with c̄× p̄×(1−a) at
future energy savings. This denes the performance contract. The function a denes
as function of time future participation. Since the investment has to be paid back to the
investor, he will participate stronger at the beginning than the city. Else, the payback
time increases. In this set-up the whole investment is risk free for the city. The only risk
which is not attributed is default risk of the city. Either it is passed and compensated to
the investor or the bank keeps this risk. This type is a structured product solution.
Other possible solutions are:
2.14. GREEN INVESTING 145
Before we consider some of these solutions we provide an example for the structured
product. Assume a project which payback time 4y. Then the amount of saved energy
c̄ = 25%. Assume that the project costs 100 in a currency, that a increases linearly from
10 to 40 percent, 1−a decreases linearly from 90 to 60 percent, that energy price risk
is ±2 percent per annum, that default risk of the city is 10 bps, that fees in structuring
the deal are 1 percent per annum and that interest rates are at at 2 percent.
Then,
• After 5 years the investment amount is amortized, i.e. the years 6-8 generate return
for the investor.
• The return for the investor is in case of constant energy prices equal to 6.3 percent,
5.3 percent if energy price fall by 2 percent each year and 7.1 percent in monotone
increasing case. This return has to be corrected by the possible default of the city.
If the investor does not want to take this default risk, the returns are lowered by
the credit risk costs for the city.
Finally, if an investor wishes to get ride-o energy price risk the structuring delivers him
x energy prices or prices which are kept within a bandwidth.
From the other nancing possibilities we only mention the green bond. This bond
is issued by the city as an ordinary bond. The dierence to such a bond is the coupon
payment. The value of the coupon each year is determined by the price of the saved
energy amount, i.e. it is a coupon derived from the underlying value 'energy price ×
saved energy volume'.
Clearly such a construction requires strong legal and documentation work for and
between the dierent parties. Furthermore, more hazard issues exists: The energy solu-
tion provider can change an excessive price I for the investment to cover possible price
risk dI or the energy solution provider can predict biased low saved energy amounts to
reduce its energy volume risk. To avoid such potential disincentives, a simple solution
is let the energy rm itself invest into the project, i.e. to take a part of the investor's
stake. This then both reduces moral hazard related to the investment amount and also
to the expected energy volume savings since systematic deviations reduce the return of
investment.
146 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Figure 2.45: Comparing global GDP growth (pecent, annual, real) in the Great Recession
and Great Depression for the US and developed non-US countries (Bacchetta and van
Wincoop [2013]).
There was basically no dierence during the Great Recession between the GDP
growth in the US and that in the G20 states representing the main worldwide econ-
omy without the US. But in the Great Depression, the decline in US GDP growth did
not spread with comparable intensity to the rest of the world. This indicates that while
the Great Recession can be called a global crisis, the Great Depression was more local
in nature. The authors show that the Great Recession was, in historical terms, the rst
global recession. The rst question is: How could the crisis spread from the US nancial
2.15. UNIFORMITY OF MINDS 147
sector to the US real sector? The second question is: Why did the Great Recession
spread almost instantaneously from the US economy to the global economy - how did
the recession become a global one?
As many authors have shown, the nancial crisis was part of the so-called boom -
bust cycle of the real economy. Of particular importance are real-estate boom - bust
cycles. Reinhart and Rogo (2008) illustrate the following pattern. Set T to be the date
of a banking crisis. Consider the growth rate of the real-estate asset class some years
before and after this date. One typically observes that before T prices increase and that
they fall after or shortly before the banking crisis. In this sense, a nancial crisis is part
of a boom - bust cycle. The surprising aspect of the most recent crisis was not that
it happened, but that such a crisis could be strong enough to destabilize the nancial
system of a developed economy (the US, here).
Given this US view, how could the recession become a global one? The standard
channel for explaining global linkages is trade. But the US is not a very open economy,
and imports - for many countries - to the US are relatively small. There is no empirical
evidence of a link between openness in terms of trade and a decline in growth. Hence,
the macroeconomic trade channel fails to provide an answer to the question of how the
recession spread globally. Another possible channel is the nancial channel. That is
to say, the decline is asset prices and real-estate prices and changes to the credit supply
channelled into the real economies outside of the US. But this hypothesis is not supported
by empirical evidence either. While real-estate prices dropped in, say, Spain and Ireland,
they did not in Germany or Switzerland. While Switzerland has a much stronger nan-
cial link to the US than do most European countries, the European countries were much
more aected by the Great Recession. While some countries faced a decline in credit
supply, others did not. Although policy makers have often used the expression credit
crunch', rms participating in surveys about the period have indicated that - during the
Great Recession - lower demand was more important to them than reduced credit supply.
Summarizing, standard macroeconomic models cannot explain the global recession.
148 CHAPTER 2. ASSET MANAGEMENT OVERVIEW
Bacchetta et al. (2013) argue that there must have been other drivers that caused
the global recession. They argue that it was not the globalization of the economy, as
considered above, but rather the globalization of how individuals form expectations that
was responsible for the recession spreading worldwide. This argument is, of course,
linked to questions of information technology, information transmission, and information
quality in worldwide terms. In contrast with the past, information today is spread almost
in real time around the world, it is more dicult to control information distribution, and
mainstream information is mostly costless to the consumer. Therefore, one can argue
that - given a nancial crisis and its related information ow - individuals around the
world had access to similar information sets upon which to form their expectations. The
authors claim that panic, by consumers and rms throughout the world, lead to declines
in aggregated demand in most countries. Such panic must show a systemic component
to have a worldwide impact. They assume therefore that such panic is rational or self-
fullling:
• Agents rst expect low future income due to the information available and uncer-
tainty at play at the beginning of the nancial crisis.
• This leads to low future production and income, which matches the agents expec-
tations as outlined in the rst step.
Chapter 3
Fundamentals Theory
3.1 Returns and Performance Attribution
Returns are key in asset management for the calculation of risk and performance. The
calculation of returns is not as straightforward as one might guess. One needs to cal-
culate returns for arbitrary complicated cash ow proles where cash can be injected
or withdrawn at dierent time dates. Dierent assets possess dierent time scales for
return calculations varying from intraday to months for illiquid assets. Returns often
need to be aggregated for risk calculations to reduce the dimensionality and risk models
are needed to value expected returns. Finally, the return for an investor can be the result
of several money managers, i.e. returns should be decomposable to account for dierent
contributors.
Why do we work with returns and not with prices? Price growth behaviour in price
time series are statistically hard to manipulate. The mean value has little meaning if
prices grow exponentially. One works with a scale free quantity; Returns. Why one
St −St−1
does work with log-returns? The simple return over a period,
St−1 ∈ [−1, ∞), is not
useful if one tries to model returns assuming a normal distribution since simple returns
range from −1 (total loss) to +∞ while the normal distribution ranges over the reals.
Furthermore, 10 days gross return is the product of ten one-days gross returns and not
the sum. But the product of normal distributions is not normal. One therefore prefers
to work with log-returns where the product of return aggregation is replaced by a sum
and the sum of log-normals is log-normal.
149
150 CHAPTER 3. FUNDAMENTALS THEORY
u(ct ) ≥ u(cT )
with the utility function u. To make the investor indierent, the consumption good at
time T must be larger than at time t, i.e.
with ∆ the interest and R the interest rate to compensate for impatience.
The function which weights CFs at dierent dates T > t is the discount function
D(t, T ). Discounting restores additivity which makes it possible to add CFs at dierent
dates. Any two complicated cash ow proles can be compared for investment purpose.
price of a product can be
Discounting is the necessary ingredient such that the
written as the probability and a time weighted sum of future cash ows.
Given the CF additivity, they can be mapped to a single point; the present value (PV)
and future value (FV). It is irrelevant which date is chosen for mapping of all CFs in
comparing two investment opportunities.
The discount function has the form D(t, T ) = D(T − t), i.e. the homogeneity of
time or the irrelevance of the vista time. The inverse operation of discounting is com-
pounding D(t, T )D(t, T )−1 = 1. If the discount factor is 1, interest rates are zero, if if
is larger than one, interest rates are negative.
Consider CHF 1 at time T and two scenarios. First, discount the CHF back directly to
t. Second, discount it rst back to a time s, t < s < T , and then from s to t. There is no
risk. The value at t of the Swiss franc should be independent of the chosen discounting
path. Else, by buying low and selling high generates a money machine (arbitrage in a
risk free environment). Formally,
Cauchy proved that the exponential function is the unique continuous function which
satises (3.1):
D(t, T ) = e−a(T −t) , a > 0 .
This motivates exponential discounting. a has the dimension inverse time and calculating
∂D
the growth rate of the discount factor, ∂T
D = −a, identies a with the interest rate R.
The discount function D(t, T ) for dierent maturity dates T denes the spot rate term
structure which we write {D(t, T )} := {D(t, T ), T ≥ 0}. Assume that there exists
interest rate risk. Then equation (3.1) makes no sense sind D(s, T ) is a random variable.
To restore the identity, we have to x the rate between s and T at time t, i.e. with the
discount factor D(t, s, T ) we again have that D(t, s)D(t, s, T ) = D(t, T ). This denes
the Forward Rate Term structure {D(s, t, T )}. Given one term structure, the other term
structure follows by no arbitrage.
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 151
• The par swap rate curve is vector of spot starting swap rates for all maturities.
Table 3.1: To obtain the discount factors from the swap rate we use (3.5). To get the spot rates
1/T
from the discount factor we use R(0, T ) = D(0,t)
1
− 1 and the forward rates are calculated
D(0,T )T
as F (0, S, T ) = D(0,S)S
. The day-count factor reads act/360/100 =1/36'000*365=0.0101388.
The absence of arbitrage implies that there exists exactly one discount factor for
each currency and for each maturity; else build a money machine. But there are many
dierent interest rate, prot and loss and performance calculations. The reasons are:
• Do we use market rates for discounting or synthetic rates from an asset management
perspective such as the yield-to-maturity (YtM) to value and compare dierent
investments?
• The calender and day-count-convention dier: The number of days within a year
varies for dierent countries, exchanges and products.
Examples
Compounding
Hence, F Vnd ≥ F Vns . The formulae can be generalized to the case with sub-annual peri-
ods and where R is not constant. The limit forward value is achieved for instantaneous
interest rates which results in the exponential compounding formula as a limit how fast
capital can grow.
152 CHAPTER 3. FUNDAMENTALS THEORY
Remarks:
• Simple discounting is used for LIBOR rates, products with maturity less than a
year, discrete compounding for bonds and continuous compounding for derivatives
or Treasury Bills.
The discount function is a simple function of the interest rate. But the interest rate
itself is a complicated function of a risk free rate, the creditworthiness of counter parties,
liquidity in the markets etc. The discount function construction is the key object in
nancial engineering.
p(t, T ) the price of a zero-coupon bond (ZCB) at time t paying USD 100 at
Let
maturity T if there is no default. Except from counter party risk, a ZCB is the same as a
discount factor. ZCB are the most simple interest rate products. More complex products
such as coupon paying bonds can be written as a linear combination of ZCBs. Consider
a coupon bond with a yield R, i.e. the rate needed such that the PV of the bond is
equal to its present price is R. The slope of the price-yield graph is negative since a bond
issued today will have a lower price tomorrow if the interest rates increase (opportunity
1
loss). The relation is non-linear since p(t, T ) = D(t, T ) × 1 = (1+R(t,T ))T −1
× 1.
The eective simple rate Re,s is the gross return needed to reach from the PV value
the FV value:
(1 + Re,s )P V := F V .
Consider Re,s for a n-year investment in a stock S (where PV=S0 , FV=Sn ):
n
Sn Sn Sn−1 S1 Y
1 + Re,s = := ... = (1 + Rk,k−1 )
S0 Sn−1 Sn−2 S0
k=0
where Rj,j−1 is the sub-period return. The eective, simple gross return is equal to the
product of the period returns. The compounded eective rate Re,d follows by taking a
square root n in the above formula. If compounding is continuous, the eective return is
equal to the arithmetic sum of period returns since the log of a product is a sum. This
is one reason why continuous compounding is preferred.
Bond 1 has more attractive future CFs but bond 2 is cheaper. Which one to prefer? If
maturity would increase then bond 1 should become more protable and the opposite
holds if the price of the bond 2 become more cheaper compared to the bond 1. The
yield-to-maturity (YtM) y is a decision criterion which assumes that products are kept
until maturity. The YtM y solves by denition the equation:
n
X c N
Price = + .
(1 + y)j (1 + y)n
j=1
The bond with the higher y is the preferred one. This equation can be solved easily
numerically. YtM, which has a at term structure, is the most important example of a
Money-Weighted Rate of Return (MWR), see below.
Originally, IRS were introduced for interest arbitrage reasons. Consider two rms
A and B. A has a high creditworthiness, B a low one. Both rms can borrow at a xed
or oating rate given in Table 3.2.
A B Dierence
Fixed 5% 6 % 1 %
Floating LIBOR LIBOR + 0.75 % 0.75 %
The two rms can both benet if they enter into a IRS, since the dierence in xed
rate borrowing diers from the oating rate one by 0.25%. Both parties can realize and
divide this amount using an IRS. To lock in the prot, each party borrows where it has
an advantage: A borrows xed and B oating. B agrees to pay A oating rate LIBOR
plus 0.75 percent and A agrees to pay B xed 5.9 percent. A gets oating rate fund-
ing at LIBOR minus 0.15 percent and B gets an advantage in xed funding of 0.1 percent.
The rst swap was designed 1981 between the World Bank and IBM, see Figure 3.1.
IBM received DM from its funding program and used this money to nance project in
the US. That for IBM needed to change periodically USD in DM to serve the coupon pay-
ments. Since USD became stronger in that period compared to DM, IBM made currency
gains. To realize these gains IBM needed to get ride of its DM-liabilities. The World
Bank borrowed in the capital markets and lent to developing countries for project -
nance. The costs of the loans were the same than the nancing cost of the World Bank
in the markets. US interest rates were at 17 percent in this period and in Germany and
Switzerland they were 12% and 8%. World Bank preferred raising all funds in lower in-
terest rate currencies. But it was constraint to borrow in these countries and had to use
also USD. It searched for party which owed DM and wanted to exchange them against
USD. An investment banker at Salomon Brothers realized that a currency swap would
1 The expression vanilla is used for basic or simple derivatives. Call and put options are examples.
More complicated products are called exotic.
2 For the xed payments the day count convention of bond markets 30/360 is used and act/360 is used
for the oating leg.'30/360' means that each month has 30 days and each has 360 days. The convention
'act' means that the actual calender dates are summed. The reset frequency is the frequency of oating
payments.
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 155
Bond Market
Pays
DM & CHF
USD
Coupons & Notional DM & CHF
Pay back of Loans Loans Coupons & Notional
DM & CHF DM & CHF
Clients Existing Loans
of DM & CHF
World Bank
solve the problems of both parties: IBM could change their DM-liabilities into USD and
the World Bank could buy DM at favourable rates. The World Bank lent IBM over
notional amounts and coupons denominated in DM and received notional and coupons
in USD in exchange. Such a direct swap without involvement of the banks balance sheet
is a back-to-back swap.
Banks started in the 80's and 90's to enter into own-name transactions. The swap
counter parties discussed directly with the bank as intermediary their desired risk and
return prole. Entering in-between the two swap parties the bank faced counter party
risk. One also stared to develop standardized documentation documents which allowed
to process customized transaction eectively, the ISDA agreements. The third period was
characterized by beginning market making. Banks started to trade swaps with several
counter parties. Market and counter party risk increased due to this wider activities -
large investment in risk management followed. Market risk was often compensated with
transactions in other markets.
since at initiation no cash ows are exchanged. Fixed payments s0,T (0) are made annu-
ally,
3 oating ones quarterly.
4 Figure 3.2 shows replication of a swap into a par xed
bond and a oating rate note (FRN). We prove that the PV of FRN must be worth par
at each quarterly LIBOR reset date. Since the initial value of a swap is zero, the initial
value of the xed leg must also be worth par.
Solving for the swap rate and using PV(Float) = 1 − p(0, T )N we get
1 − p(0, T ) PV Floating
s0,T = = (3.3)
A0,T (0) Annuity
PT
where A0,T (0) = j=1 p(0, j) is the present value of an annuity and p(0, t) is the price
of a zero coupon bond; the level of the swap.
Proposition 13. The PV of a oating rate note is equal to the notional.
Since D(0, 0) = 1, the PV of the oating leg equals
Table 3.3: Valuation of a FRN. The forward rates are calculated using simple compounding
1+T ×R(0,T )
−1 F (0,S,T )
F (0, S, T ) = 1+S×R(0,S)
T −S . The FRN cash ows are derived from CF(T ) = 10 000 × 2 and
the PV follow from PV(CF(T )) = CF(T )
1+T ×R(0,T ) .
We close with some OTC market gures. Figure 3.3 shows notional and gross amounts
in OTC markets. Notional amounts are USD 600 tr which is 8 times worldwide GDP.
The gross amount is more than a factor 10 smaller. The markets cover OTC foreign
exchange, interest rate, equity, commodity and credit derivatives.
3 We assume that the dierence between two consecutive dates is equidistant.
4 They are equal to act/360 times the 3m LIBOR rate at the beginning of the quarter. This is called
setting in advance and paying in arrears .
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 157
Floating Cash
Flows
Swap
t0 t1 t2 t3
1 1
Trick
t0 t1 t2 t3
1 1
Fixed Cash Flows
Fixed
Coupon
t0 t1 t2 t3
Bond
+ 1
Floating
Rate Note
t0 t1 t2 t3
(FRN)
Figure 3.2: Graphical representation of a payer swap replication (payer means the party which
pays the xed rate and obtains the oating one). Dotted lines represents oating cash ows.
Replication is obtained by virtually adding and subtracting notional amounts at the beginning
and maturity of the swap. We assume for simplicity the same periodicity for the oating and
xed leg. The gure shows an important property of risk structuring: To obtain the cash ow
prole of a new product one can add to an existing prole new products and add them vertically.
158 CHAPTER 3. FUNDAMENTALS THEORY
500'000'000 25'000'000
400'000'000 20'000'000
Equity-linked
300'000'000 15'000'000 contracts…
- -
01.06.1998
01.07.1999
01.08.2000
01.09.2001
01.10.2002
01.11.2003
01.12.2004
01.01.2006
01.02.2007
01.03.2008
01.04.2009
01.05.2010
01.06.2011
01.07.2012
01.08.2013
01.09.2014
01.10.2015
01.11.2016
01.12.2017
Commodity
contracts
16%
Notional amounts Gross market values
Figure 3.3: OTC market gures. The statistics on the country level is based on data reported
every six months by dealers in 12 jurisdictions (Australia, Canada, France, Germany, Italy, Japan,
the Netherlands, Spain, Sweden, Switzerland, the United Kingdom and the United States) plus
data reported every three years by dealers in more than 30 additional jurisdictions. (Source:
BIS, 2018)
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 159
The gross positive market values is the sum of the replacement values of all contracts
that are in a current gain position to the reporter at current market prices and similar
for gross negative market value. The gross positive market value is the sum of the two
absolute values. Gross means that there is no netting or osetting. Gross market values
supply information about the potential scale of market risk in derivatives transactions
and it is a measure of comparable economic signicance across markets and products.
• At time 12m the client has to pay CHF −10(1+L(6m, 6m)) Mio. without an FRA.
But the client would like to pay CHF −10(1 + F (0, 6m, 6m)) Mio.
• An FRA contract pays/receives an amount A in 6m and A(1 + F (6m, 6m)) in 12m
such that A balances the payments in 12m between the unwanted risky payment
without a FRA and the wanted xed payment: A solves in 12m the equation:
A(1 + L(6m, 6m)) − 10(1 + L(6m, 6m)) = −10(1 + F (0, 6m, 6m)) .
| {z } | {z } | {z }
Balance Without FRA Desired Payment
T
X p(0, j)
s0,T = wj L(0, Tj−1 , Tj ) , wj = .
A0,T (0)
j=1
160 CHAPTER 3. FUNDAMENTALS THEORY
The sum over all weights wj equals 1. This shows that a IRS is a weighted sum of
FRA's.
1
D(0, 1) = .
1 + s0,1 α0,1
To obtain D(0, 2) we consider a 2y swap with swap par rate s0,2 . From
t−1
P
1 − s0,T (0) αi−1,i D(0, i)
i=1
D(0, T ) = . (3.5)
1 + s0,T .αT −1,T
Table 3.1 shows how dierent rates are derived from the given swap rates. Using these
rates we price 5y swap with a notional of 50 Mio. in a given currency. Table 3.4
summarizes the oating leg pricing. The PV of the oating leg, see Proposition 13, is
PVFloating (0)
s=− = 5.806%.
PVx at 1% (0)
So far, we assumed that necessary input rates exist. What if there are holes, i.e.
times were no observable instrument exists? Then we have to to interpolate. Such a
construction should satisfy several requirements:
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 161
Floating Leg 1y 2y 3y 4y 5y
Rates 4.5615% 5.5518% 6.5750% 6.2827% 6.3128%
Cash ows -2'280'743 -2'775'921 -3'287'486 -3'141'361 -3'156'409
PV of cash ows -2'181'246 -2'516'291 -2'799'551 -2'517'843 -2'380'228
Fix Leg 1% 1y 2y 3y 4y 5y
Fix 1% 1% 1% 1% 1% 1%
Cash ows 500'000 500'000 500'000 500'000 500'000
PV of cash ows 478'188 453'235 425'789 400'757 377'047
Table 3.4: Floating leg pricing. Up to 1y spot rates are used for longer maturities forward rates
apply. Lower Panel: Pricing with a xed 1% rate.
• Stability. The constructed term structures should be stable when switching from
one structure to another one. Switching from a meaningful discount curve to a
forward curve should also provide a meaningful forward curve.
Table 3.5: CHF interest rates as of July 2019. Note that all rates are negative. SARON
(Swiss Average Rate Overnight) is an overnight interest rates average referencing the
Swiss Franc interbank repo market. The data in the table are blended: if several
possibilities exist to construct the table, the most convenient instruments are used to ll
out the table. Source: Swiss National Bank.
That this unknown rate matches the 4 given ones is equivalent to a linear system:
1 1 1 1
8 4 2 1
M x = y , x = (a, b, c, d)0 , y = (4%, 4.5%, 5%, 5.3%, )0 , M =
27 9 3 1
64 16 4 1
where the matrix M has the time index powers as entries. Using the inverse matrix M −1
implies for x = (−0.00033, 0.002, 0.00133, 0.037) the rate
R(0, 2.5) = −0.00033(2.5)3 + 0.002(2.5)2 + 0.00133(2.5) + 0.037 = 4.762% .
The value or wealth process measures position times price (we often neglect the
superscript ψ ):
N
X
ψ
V (t) = ψ0 S0 (t) + ψj Sj (t) =: hψ(t), S(t)i , (3.6)
j=1
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 163
with ψ0 the amount invested in the risk-free asset, ψj the number of units of the risky
security j held in a period [t, t + 1) and hψ, Si the scalar product. The vector ψ(t) is a
portfolio or a strategy. Dividing (3.6) by the value leads to a normalized portfolio:
If all positions are positive, i.e. a long-only portfolio, and there is no leverage, then
the normalized weights are probabilities and add up to one.
In the rst term on the RHS a change in portfolio value between two dates is due to
external money added or withdrawn. Self-nancing rules out such strategies: (∆ψt )St =
0. We summarize some immediate facts:
2. The return of a portfolio is equal to the weighted sum of the portfolio constituent's
return:
N
X
Rφ = φj Rj =: hφ, Ri . (3.8)
j=1
t X
X N
V (t) = V (0) + ψj (s)∆Sj (s) . (3.9)
s=1 j=1
The last fact states that the portfolio value at a future date is given by the sum of all
portfolio prot and loss over time. Each intermediate P&L is determined by the invest-
ment decision at the beginning of the period times the random P&L in the period. The
simple return of a portfolio is invariant of the size of the portfolios: Scaling the portfolio
value by a factor, the factor cancels out in the return calculation. Hence, without loss of
generality we set V (0) = 1.
164 CHAPTER 3. FUNDAMENTALS THEORY
φ
The proposition implies for the growth rate of wealth R[0,t] from 0 to t:
φ Vt Vt Vt−1 V1
1 + R[0,t] := = ... (3.10)
V0 Vt−1 Vt−2 V0
= (1 + Rφ (t))(1 + Rφ (t − 1)) . . . (1 + Rφ (1))
= (1 + hφ(t), R(t)i)(1 + hφ(t − 1), R(t − 1)i) . . . (1 + hφ(1), R(1)i),
i.e.
t
Y
Vt = V0 ((1 + hφ(s), R(s)i) .
s=1
Wealth growth follows from a geometric rate and not an arithmetic one.
Denition 16. ψ(t) is a buy-and-hold (BH) or static portfolio if ψ(t) = ψ(0) for all
t ≥ 0.
ψ(t) is a constant rebalanced (RB) portfolio if ψ (t−)S (t) = c for all positions j
j j j
and all t with c given. t− denotes a prior time arbitrary close to t where the asset value
of the period is realized and the portfolio weight ψj (t − 1) chosen at t − 1 is changed to
ψj (t) such that the position value equals the predened position cj .
V0 = φ0 S0 + ψ0 B0 = 0.6V0 + 0.4V0 .
V0
To achieve the weights, the investor has to buy at time 0 φ0 = S0 × 0.6 of asset S
and similarly, for asset B. After one time step the absolute portfolio value before
rebalancing reads:
V1 = φ0 S1 + ψ0 B1 6= 0.6V1 + 0.4V1
where a change in portfolio value is entirely due to changes in asset values and not in
changing the positions (self-nancing investment strategy). Then the required values are
restored by rebalancing. It follows that the weight of the asset with a price increases is
reduced and vice versa for the other asset.
6 Literature for this section: Hallerbach (2014), Blitz (2015), Hayley (2015), White (2015), Pal and
Wong (2013) and Quian (2014)
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 165
consider this type of rebalancing unless otherwise stated. The proportion on capital in
stock j just before rebalancing is given by
ψk (t)(1 + Rk (t + 1))
ψk (t + 1)− = N
.
P
ψj (t)(1 + Rj (t + 1))
j=1
the weights ψk (t + 1)− are the drifted weights. In a buy-and-hold portfolio drifted
weights equal rebalanced weights at each date.
We show below that rebalancing strategies take volatility into account. Buy-and-
hold strategies and market weighted strategie do not consider volatility. The volatility
drag expresses the dierence between expected geometric and arithmetic returns by the
volatility. Since wealth growth is geometric and rebalancing takes volatility into account,
such strategies are expected to outperform non-volatility-based strategies.
We write GM for the geometric mean and AM for the average arithmetic mean for T
periods:
T
!1/T
Y
GM = (1 + Rk ) −1 . (3.11)
k=1
Taking logarithm,
T
1X
log(1 + GM) = log(1 + Rk )
T
k=1
with Ri the return of the portfolio between time i−1 and i. Writing µ for the expected
mean return of the portfolio we get:
T
1X
E(log(1 + GM)) = E(log(1 + Rk ))
T
k=1
T
1 X
= E(log(1 + µ + Rk − µ)
T
k=1
T
(Ri − µ)2
1X Ri − µ
= log(1 + µ) + E −E + o(µ)
T 1+µ 2(1 + µ)2
k=1
σ2
= log(1 + µ) + 0 − + o(µ).
2(1 + µ)2
If µ is small log(1 + µ) = µ + o(µ), using the Neumann series in the portfolio volatility
term and approximating the log in GM implies the volatility drag equation:
σ2 σ2
E(GM ) = µ − + o(µ) = E(AM) − + o(µ) . (3.12)
2 2
166 CHAPTER 3. FUNDAMENTALS THEORY
The equation also holds if there exists no risk and for individual assets. The volatility
drag denes strategies to harvest volatility. Hence, strategies which take volatility into
account such as rebalancing strategies are expected to outperform pure buy-and-hold or
equal market weighted strategies.
5.000 3.00
4.000 2.50
2.00
3.000
1.50
2.000
1.00
1.000
0.50
-
03.09.1993
03.09.1994
03.09.1995
03.09.1996
03.09.1997
03.09.1998
03.09.1999
03.09.2000
03.09.2001
03.09.2002
03.09.2003
03.09.2004
03.09.2005
03.09.2006
03.09.2007
03.09.2008
03.09.2009
03.09.2010
03.09.2011
03.09.2012
03.09.2013
03.09.2014
-
03.09.1993
03.09.1994
03.09.1995
03.09.1996
03.09.1997
03.09.1998
03.09.1999
03.09.2000
03.09.2001
03.09.2002
03.09.2003
03.09.2004
03.09.2005
03.09.2006
03.09.2007
03.09.2008
03.09.2009
03.09.2010
03.09.2011
03.09.2012
03.09.2013
03.09.2014
SMI MXWO SZG2TR JGAGGUSD
BCOMTR XAU Curncy SPX Rebalanced to EW Equal Weighted Buy-and-Hold Average TX Costs
3.50
2.00
3.00
1.50 2.50
2.00
1.00
1.50
0.50 1.00
0.50
-
03.09.1993
03.09.1994
03.09.1995
03.09.1996
03.09.1997
03.09.1998
03.09.1999
03.09.2000
03.09.2001
03.09.2002
03.09.2003
03.09.2004
03.09.2005
03.09.2006
03.09.2007
03.09.2008
03.09.2009
03.09.2010
03.09.2011
03.09.2012
03.09.2013
03.09.2014
-
03.09.1993
03.09.1994
03.09.1995
03.09.1996
03.09.1997
03.09.1998
03.09.1999
03.09.2000
03.09.2001
03.09.2002
03.09.2003
03.09.2004
03.09.2005
03.09.2006
03.09.2007
03.09.2008
03.09.2009
03.09.2010
03.09.2011
03.09.2012
03.09.2013
03.09.2014
SMI MXWO SZG2TR JGAGGUSD
BCOMTR XAU Curncy SPX Rebalanced to EW IV Momentum
Figure 3.4: Rebalancing example for SMI, MSCI World UCITS ETF (MXWO), JPM
Global Aggregate Bond Index (SZG2TR), equal-weighted index 1,600 hedge funds (JAG-
GUSD), FTSE NAREIT All Equity REITs Index (BCOMT), gold dollar price (AU Cur-
rency) and the S&P 500 Index (SPX).
shows the index or price evolution. The dot.com and GFC crisis are visible. The bottom
left panels shows the rebalancing strategy. Basically, winners are sold and losers are
bought. This panel is a mirror image of the price chart. The panels on the right hand
side show performance of dierent investment strategies. On the top right, the rebal-
anced strategy and the equal weighted buy-and-hold strategy are shown. Both strategies
fail to provide protection if markets are under stress although the rebalancing strategy
suers from a lower shortfall. But it also cuts the upside potential which leads to overall
underperformance. The red line assumes transaction costs of 10 bps per rebalancing.
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 167
In the lower right panel, the rebalanced to EW strategy is compared with the inverse
volatility strategy IV and two momentum strategies. In the IV strategy, the rebalancing
update of the strategies is adjusted by the past volatility of the indices - the more volatile
an index was, the less weight it will have in the next period (negative leverage). With this
strategy the large market stress periods are neutralized but the strategy also annihilates
the growth potential. In the momentum approach, strategies are updated according to
whether a strategy belonged to the winner or looser strategy over the past month. More
precisely, the average last month return of all strategies are calculated. Each strategy
is compared to this average: If the performance is higher (lower) than the average, the
strategy is a winner (looser) one and the updated rebalancing strategy is updated by
adding/subtracting a constant number, respectively. The strategy is a long only strategy
which is atypical for momentum strategies which are implemented as long-short portfolios
(buy the winners, sell the losers). The momentum strategy shows boost and crash before
and during the GFC. These two eects typically are reinforced in a long-short set-up.
Consider a single risky asset S and a risk-free bond that pays 10 percent each period
in a two-period binomial model. The stock starts with a value of 1 and can go up or
down in each period with the same probability of 50 percent (see the data in Figure 3.5).
If an up state is realized, the stock value doubles; otherwise the stock loses half of its
value.
Using these assumptions, wealth projections for the buy-and-hold strategy follow at
once. The value in the node 'up - up' - that is, 2.884 follows from
The payos after period 2 show that rebalancing adds more value to the sideways
paths but less value to the extremes (up - up or down - down) compared to the buy-and-
hold strategy. This transforms the linear strategy of buy-and-hold - that is, payo is a
linear function of the stock value, in a non-linear way. Precisely, consider a European
call option with a strike value 3.676 at time 2 and a European put option with a strike
of 0.466. The option prices at date0 and date 1 follow from no-arbitrage pricing.
168 CHAPTER 3. FUNDAMENTALS THEORY
Figure 3.5: Rebalancing as a short volatility strategy in a binomial tree model. Left are
the risky asset's dynamics, in the middle are the wealth values if a buy-and-hold strategy
(60/40) is used, and right are the wealth levels for a rebalancing strategy to xed (60/40)
weights. Note that up and down is the same as down and up. Therefore, there are two
paths for the stock value after period 2, both with the result of 1 (Ang [2013]).
• A rebalancing strategy.
• A short call + short put + long bond + long buy-and-hold strategy. The rst two
positions are the short volatility strategy.
Therefore the two strategies are identical. This shows that a short volatility strategy,
nanced by bonds and the buy-and-hold strategy, is the same as a rebalancing strategy.
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 169
Since volatility is a rebalancing means short volatility, the investor automatically earns
the volatility risk premium. The short volatility strategy makes the payo in the center
of the probability distribution larger at the costs of the extreme payos. Short volatility
or rebalancing underperforms buy-and-hold strategies if markets are either booming or
crashing, but it performs well if markets are showing time reversals.
The short-term weight is the myopic investment demand. The opportunistic weight
hedges demand against changing investment opportunity sets. This means that besides
the liquid asset's risk there are other risk sources. They are described with a state
variable Y. They can be correlated to the risky asset risk source. Examples are ination
or deation risk, house price risk, divorce risk or unemployment risk. This general rule
follows from the the 'Principle of Optimality' of R. Bellman, see Section 4.2.1.
We write (3.13) more explicitly in the case of a single risky asset for the fraction φ(t)
of wealth invested in this asset:
αt − rt
φ(t) = × RRA−1 + (1 − RRA−1 )∆Y × RIRA−1 (3.14)
σt2
where:
αt −rt
• The Market Price of Risk (MPR): MPR = σt2
.
−1
• RRA the inverse relative risk aversion - the investor's risk tolerance - where
00
RRA = − uu0(c)c
(c) with c consumption.
−1
• RIRA the inverse relative Y -risk aversion.
If the investment opportunity set is constant, the state variable Y is zero and also
∆Y = 0, then optimal investment is equal to myopic investment. The myopic com-
−1
ponent MPR × RRA is equal to a the optimal solution of a one-period model; it
maximizes the Sharpe ratio and it is mean-variance ecient. The opportunistic weight
represents the desire to hedge against future opportunity changes.
We comment on the optimal strategy formula (3.14). First, the optimal investment
strategy is time-varying; buy-and-hold is not optimal from a dynamic investment point
of view. But there is nothing in the optimal formula which states that it is optimal to
rebalance to constant weights.
Third, which of the two components in (3.13) is more important? In the extreme
case where returns are not predictable or stochastic opportunities are not changing over
time or the investor has a logarithmic utility function, then long-term investment is zero.
But in other less extreme cases the literature is ambiguous about the relative strength.
The result depends on size of the opportunistic weight which is driven by two factors:
predictability and investment opportunity. The closer asset returns are to predictability
and/or the less stochastic opportunity set variations matter, the less important is the
opportunity component.
Fourth, the MPR keeps its form for many assets but the division by the variance is
replaced by a multiplication with the inverse covariance matrix C −1 :
Comparing this with the solution of the Markowitz problem with no risk-free asst (4.3),
φ = 1θ C −1 µ, shows that the rst component of the optimal investment strategy (3.13)
denes a mean-variance ecient portfolio. This rationalizes the Sharpe ratio and
the Markowitz model to many period investing. Similarly, in the opportunistic weight
the covariance between liquid asset
Fifth, the inverse relative risk aversion measures the curvature of the utility function
−1
as a function of wealth: If the investor is risk neutral, RRA = 1. The more risk averse
−1
an investor is, the smaller RRA and the more is optimally invested in the risk-free
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 171
asset. The notion of relative risk aversion raises two delicate issues. First, there is a
calibration result by Rabin (2000) that shows that expected-utility theory is an utterly
implausible explanation for appreciable risk aversion over modest stakes. Second, the
measurement of RRA is, in itself, a delicate matter.
Six, the opportunistic weight consists of three dierent terms: First, if the investor is
−1
getting more risk averse, RRA decreases, then the myopic component in the optimal
portfolio becomes less important. Second, the aversion to innovation risk sources. Third,
a hedging demand against innovation risk. This is proportional to cov(R
e , R ) in (3.13) -
I
that is to say, the hedging demand follows from the correlation pattern of the innovation's
portfolio return with the overall portfolio return. Investors will increase their holding of
the risky asset given by the rst term if it covaries negatively with state variables, that
matter in the value function to the investor. A bond is such a hedge against falling
interest rates.
Seventh, if liabilities matter such as in goal based investment, then in both expres-
sions in (3.14) functions of time dierences f (T − t) enter where T is a realization time
of a liability. These functions take into account the 'way to go' eect. It is for example
optimal to take more risk given a positive drift if there is 5 years left to nance, given
an actual nancing degree, a liability than if only one month is left until maturity of the
liability.
Logarithmic utility facilitates calculations but is behavioral specic. Log investors al-
ways act optimally myopic (one-period view) independent of the dynamic context. Their
demand for hedging long-term risks is zero. To understand why, a log investor maximizes
log returns. Assuming normality of the returns, the log return over a long time horizon
is equal to the sum of one-step returns. Long-term return is therefore maximized if the
sum over the one-period returns is maximized which is the same that each one-period
return is maximal.
To see how the optimal investment formula fail to be applied in reality, consider the
Great Financial Crisis (GFC). Pick an investor with a relative risk aversion of 1, a normal
market return of 6% in stocks, a risk free rate of 2% and volatility of 18%. The investor
assumes that returns are IID; he is a myopic investor. From the optimal portfolio formula
0.06−0.02
(3.14): φ= 0.18 = 0.6. That means the investor holds 60% in equities and 40% in a
risk-less asset. In the GFC, volatility (both realized and implied one) increased to levels
around 70%. The optimal myopic formula implies φ = 0.04, i.e. a 4% equity position or
a reduction by 93% from the pre-crisis investment. But stock market participation was
not reduced by 93%. Since the average investor holds the market, he did not show the
same behavior as our theoretical.
We compare three well-known strategies with the myopic part of (3.14):
• Buy falling stocks, sell rising ones (constant-mix 60/40 rebalancing strategies) [Con-
172 CHAPTER 3. FUNDAMENTALS THEORY
• Sell falling stocks, buy rising ones (portfolio insurance strategies) [In-line with the
myopic part of (3.14)].
We follow Perold and Sharpe (1988) and Dangel et al. (2015). They consider buy-and-
hold, constant mix (say 60/40 strategies), constant-proportion portfolio insurance and
option-based portfolio insurance. We start with the rst two strategies with a risky asset
S and a risk free asset B. In th payo diagrams value of the assets is a function of the
value of the stock and in th exposure diagrams the relation between dollars invested in
stocks to the total assets is calculated.
The payo diagram for the 60/40 rule is a straight line with a slope of 0.6, the max-
imum loss is 60% of the initial investment and the upside is unlimited, see Figure 3.6.
The exposure diagram is also a straight line in the space where the value of the assets
are related to the stock position. For a buy and hold strategy, the slope is 1 and the line
intersects the x-axis of the value of the assets at USD 40. If the portfolio is less than 40
USD, the demand to invest in the stock is zero, for the constant mix strategy there is
always a demand.
If there is no volatility in the market, either stocks rise or fall forever. Then the buy-
and-hold payo always dominates the constant mix portfolio. But with volatile markets,
the success of the strategy depends on the paths of asset prices, see volatility drag and
volatility harvesting. A constant mix portfolio tends to be s superior strategy if markets
show reversal behavior instead of trends.
This shows that the performance of rebalancing depends on the investment environ-
ment: dierent economic and nancial market periods lead to dierent results. Ang
(2013) compares the period 1926-1940 with the period 1990-2011. He compares buy-
and-hold investments in US equities and US Treasury bonds and pure investments in the
two asset classes with the rebalanced (60/40) investment portfolio in the two assets. The
countercyclical behavior of rebalancing smooths the individual asset returns. It leads to
much lower losses after the stock market crash in 1929 but it was not able to follow the
strong stock markets before the crash compared to the static strategy. The rebalancing
strategy also leads to much less volatile performance than the single asset or bond strat-
egy.
60/40 60/40
Buy-and-hold Buy-and-hold Zero volatility
Value of assets
Value of assets
60/40
Constant mix
40
Value of assets
Weight stocks
Constant mix
60/40
60/40
Buy-and-hold
Constant mix
Figure 3.6: Payo and exposure diagrams for constant mix and buy-and-hold strategies
(Adapted from Perold and Sharpe [1988]). The left panels shows the payo diagram for
the 60/40 buy and hold strategy and the exposure diagrams for the 60/40 strategy once
buy-and-hold or dynamic, that is assuming a constant mix. The upper right panel shows
the superiority of the buy-and-hold strategy when there are only trends and the lower
diagram shows that constant mix strategy can dominate the buy-and-hold one if there
is volatility depending on the stock asset path which is represented by the thickness of
the asset value line.
strategy investors invest in risky assets even in market stress situations. In practice, how-
ever, there is a strong demand for portfolio insurance since investors have a considerable
downside-risk aversion. Therefore, a rebalancing method 'opposite' to the constant mix
is required: selling stocks as they fall.
Returning to the tree alternatives the payos of the strategies are linear, concave or
convex. The last strategy is called convex since the paoy function is increasing with
an increasing rate if the stock values increase. Hence, rebalancing has an impact on the
payo. Concave strategies, such as the constant mix strategies, are the mirror image of
convex strategies such as portfolio insurance. The buyer of one strategy is also the seller
of the other one.
Summarizing, buying stocks as they fall leads to concave payos. These are good
strategies in market with no clear trend since the principle 'buy low, sell high' applies.
In markets under stress, losses are aggravated since more and more assets are bought.
174 CHAPTER 3. FUNDAMENTALS THEORY
The convex payo of portfolio insurance strategies limits the losses in stressed markets
while keeping he upside intact. But if markets oscillate, their performance is poor.
• Stop-loss strategies. The investor sets a minimum wealth target or oor that must
be exceeded by the portfolio value at the investment horizon. This strategy is
simple but once the loss is triggered the portfolio will no longer be invested in the
risky asset and hence participation in a risky asset recovery is not possible.
• In the option based approach one buys a protective put option. While simple,
this strategy has several drawbacks. First, it act against many investor's behavior
that one should buy portfolio insurance when it is cheap - stock markets boom.
Second, buying an option at the money is expansive compared to the expected
risky asset return and since one has to roll the strategy costs multiple. Therefore,
such option based strategies are often used in long-short combinations (buying
out-of-the-money put and sell an out-of-the-money call).
We start with a classic example with two assets. Asset 1 earns −50% return for all
odd periods and 100% return for all even periods. Asset 2 is a risk-free asset whose
return is always 0%. Table 3.6 shows dierent portfolio returns for a buy-and-hold (BH)
portfolio and a rebalanced portfolio (RB) to equal weights in each period. Initial wealth
is USD 1.
Investing the dollar BH in either of the two assets leads to zero growth of wealth. Re-
balancing to equal weights in each period leads to a portfolio growth of 0.75×1.5 = 1.125.
Systematic rebalancing is capable of capturing prot `from volatility even when the under-
lying assets experience zero growth. If we extend the model to many periods a sequence
7 Recently, Schied et a. (2018) showed that also in continuous time a dynamic independent master
equation is possible.
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 175
Period BH 1 BH 2 RB 1+2 RB 1 RB 2
0 1 1 1 0.5 0.5
1- 1 0.5 0.75 0.25 0.5
1+ 1 0.5 0.75 0.375 0.375
2 1 1 1.125 0.756 0.375
Total Return 0% 0% 12.5% +51.5% -25%
Table 3.6: Buy-and-hold versus equal-weight reabalanced portfolios. 1− means the time
before rebalancing and 1+ is the portfolio and position value after rebalancing.
of alternating return products 0.75 × 1.5 = 1.125 determines excess return of RB relative
to BH. The order of the return products does not matter but the number of such pairs
of of matchings. If we can form N pairs, the growth is boosted as (1.125)N .
To formalize the discussion, consider a binomial tree with two risky asset prices S1 , S2 .
Then
S1 (t)
X(t) = log
S2 (t)
is a measure of relative price.
8 We set ∆X(t) := ±σ where σ 2 is instantaneous variance
of relative prices. For a strategy φ = (φ1 , φ2 ),
V φ (t)
φ S1 (t) + S2 (t) S2 (0)
W (t) := =
S2 (t)/S2 (0) S1 (0) + S2 (0) S2 (t)
is the value of the portfolio φ relative to asset 2. It satises the dynamics (using the
same arguments as for (3.20)):
W φ (t + 1)
= 1 + φ1 (t) e∆X(t) − 1 =: A(t) .
W φ (t)
Iterating this equation implies
t−1
Y
φ
W (t) = A(s).
s=0
Assume that φ is constant, then a volatility matching pair of up and down moves gener-
ates the growth factor which is larger than 1, i.e. A(s)A(s − 1) > 1. It is maximal for
the equal weighted portfolio φ = 0.5. The recursive form of wealth implies that wealth
growth of the constant weighted portfolio dominates the benchmark growth rate of asset
2 if the number of matching pair dominates the moves in the price paths which do not
match. In a perfect zig-zag price path all moves match, in a monotone increasing or
decreasing path there is no matching at all and volatility harvesting is a loser strategy.
8 The log follows from the standard representation of up and down moves in the binomial tree.
176 CHAPTER 3. FUNDAMENTALS THEORY
We use these ideas to develop the formal set-up of SPT starting with some notations.
The market weights µj (t), where Xj (t) is the market capitalization at time t of asset
j, read:
Xj (t)
µj (t) := N
. (3.16)
P
Xk (t)
k=1
Investing in each period according to the market weights, the investment portfolio is the
market portfolio V µ. The temporal update of the market weights, if only asset returns
lead to capital changes, reads
µj (t)(1 + Rj (t + 1))
µj (t + 1) = N
. (3.17)
P
µk (t)(1 + Rk (t + 1))
k=1
and: P
µ j Xj (t)
V (t) = P . (3.18)
j Xj (0)
Let Vµ be the market portfolio value and Vφ any other portfolio value. We dene the
relative portfolio
V φ (t)
V φ/µ (t) := . (3.19)
V µ(t)
The time evolution of the relative portfolio depends only on the market weights for all
t > 0:
N
V φ/µ (t + 1) X µk (t + 1)
φ/µ
= φk (t) (3.20)
V (t) µk (t)
k=1
The expression
γ φ/µis always non-negative by Jensen's inequality and it strictly positive
µk (t+1)
if Y := log µk (t) is not constant, i.e. if there is temporal volatility which we assume
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 177
t
to hold true.
9 We write Γφ/µ (t) =
P
γ φ/µ (s) for the cumulated excess growth rate.
s=0
Since γ φ/µ > 0, Γ → ∞ if time goes to innity, Gamma is called the energy term.
The second transformation for the log return in (3.21) is to rewrite the rst term
using relative entropy:
N N X N
X µk (t + 1) X µk (t + 1) φk (t)
φk (t) log = φk (t) log − φk (t) log .
µk (t) φk (t) µk (t)
k=1 k=1 k=1
P
With the relative entropy notation S(p, q) = j pj log(pj /qj ) for two probability distri-
butions we get:
N
X µk (t + 1)
φk (t) log = S(φ(t), µ(t + 1)) − S(φ(t), µ(t)).
µk (t)
k=1
Summarizing, relative log return for any strategy can be decomposed into:
∆ log V φ/µ (t) = γ φ/µ (t) + S(φ(t), µ(t + 1)) − S(φ(t), µ(t)). (3.22)
If rebalancing takes place to constant weights, then φ is constant and the decomposition
reads:
∆ log V φ/µ (t) = γ φ/µ (t) + S(φ, µ(0)) − S(φ, µ(t)). (3.23)
Figure 3.7 shows the decomposition of a log portfolio value, rebalanced to constant
weights, into its energy and entropy decomposition. Gamma measures the amount of
market volatility captured by the portfolio - the number of matched factors in the in-
troductionary examples. The relative entropy term measures how much the relative
performance deviates from Gamma. This term depends only on the initial and current
positions of the market weight vector, i.e. how the change in capital distribution aects
the performance of the portfolio. There is no volatility eect. The uctuations of the log
return are dominated in the short run by the entropy part and long term growth comes
from the cumulated excess growth rate.
For constant weighted portfolios the following theorem characterizes log return growth.
Theorem 18. Consider a constant weighted strategy φ. Assume that the market returns
µ(t) for all t are element of a compact set K and that Γ(t) is increasing to innity for t
to innity. Then, portfolio value V φ/µ (t) also tends to innity.
The statement is a pathwise one and free of any stochastic modeling assumptions.
Long term outperformance follows whenever the two path properties are satised. The
validity of these two conditions can be evaluated by a portfolio manager at each date.
The authors extend Pam and Wong extend the discussion to non-constant rebalancing
strategies.
9 γ φ/µ = log Eπ/µ (eY −Eπ/µ (Y ) ) ≥ 0 follows by using the denition of Y and of log. A Taylor approxi-
mation shows that Gamma is proportional to an excess growth rate.
178 CHAPTER 3. FUNDAMENTALS THEORY
Log V
Entropy
Figure 3.7: Decomposition of a constant weight rebalanced portfolio in the energy and
entropy paths.
with b the benchmark portfolio weights. Figure 3.8 shows how this return dierence can
be split into three dierent rectangles for each j:
ARRj = 1 + 2 + 3 = (φj − bj )Rjb + (RjV − Rjb )bj + (φj − bj )(RjV − Rjb ) . (3.25)
| {z } | {z } | {z }
=:A =:S =:I
Figure 3.9 shows the performance attribution tree for the MSCI World ESG Quality
Index. The total return RT can be written in the form RT = RT − Rb + Rb = ARR + Rb .
Since fees are not available, total return is a gross return. The gure shows that the
ARR has several levels.
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 179
Benchmark
Weight
Portfolio
1 3
jj
jjb 2
Rjb Return
Rj
Figure 3.8: Arithmetic return decomposition. Source: Adapted from Marty (2015)
The ARR is rst decomposed in asset classes. Then the asset class equity is further
decomposed into three types: Sector and geographical diversication G, the selection
part S and a part which invests into a portfolio of factor risk premia. In the return
attribution, return numbers add up in the hierarchy but non-linear risk gures to not
add up.
Given the return attribution, how do we calculate returns? This is trivial if no cash
inows or outows need to be considered. Two methods to calculate investment return
are distinguished:
We refer to Marty (2015) for a detailed discussion. The TWR measures the return of an
investment where in- or outows do not aect the return of the investment. Its reects
the return due to the asset managers decisions taken in the past. As an example, start
with USD 100 in period one, where another USD 200 are added at the beginning of
period two and portfolio value at the end of period two is USD 300. The net gain of
the portfolio is zero, but calculating the linear return results in a 200 percent return if
do not take into account intermediate cash ows. TWR controls for these cash ows in
return calculations. MWR reect the return from an investor's perspective: In and out
cash ows as well as the prot and loss matter in this perspective. The MWR method
is based on the no arbitrage principle. Both, the MWR and TWR can be applied on an
180 CHAPTER 3. FUNDAMENTALS THEORY
RT
Asset Classes
Risk Premia
Figure 3.9: Performance attribution tree for the MSCI World ESG Quality Index where
the information written in red comes from me (Adapted from MSCI [2016]).
The TWR
TWR
R0,T of a an investment starting in 0 and ending in T, with T −1 time
points in between (not-necessarily equidistant) is dened by:
−1
TY −1
TY −1
TY
TWR Vi+1 − Vi Vi+1
1+ R0,T = (1 + Ri,i+1 ) = 1+ = (3.26)
Vi Vi
i=0 i=0 i=0
Proposition 19. 1. Adding or subtracting any cash ow cbt at any time b
t does not
change TWR.
2. If φi (j) = λi φi−1 (j) for all assets j and all time points i, then TWR equals the
return of the nal portfolio value relative to its initial value. Hence, all intermediate
returns cancel in (3.26).
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 181
The TWR method is used by most index providers since cash in- or out-ows do not
impact the return of the index. To prove the rst property, x a time t and
b let cbt be an
arbitrary cash ow. The relevant terms in the TWR with this additional cash ow are:
Assuming that Vbt = Vt + cbt , i.e. the additional cash ow is added, and inserting this in
the last expression implies
Vt+1
Vt−1
which is the same result as simplifying the two terms in the TWR without any additional
cash ows.
In the MWR cash ows cj are reinvested at the internal rate of return (IRR), i.e.
RM W R solves:
T
X
MW R
P V (C, R )= D(0, j; RM W R )cj (3.27)
j=1
where the discount factor D depends explicitly on the RM W R . Since RM W R enters the
denominator of the discount factor, (3.27) is solved numerically. Using the rst order
1
approximation D ∼ 1+R transforms (3.27) into a linear equation for R- the so-called
Dietz Return (with AIC the Average Investment Capital):
−1
TP
ST − S0 − cj
Dietz P &L j=1
R = := −1
TP
. (3.28)
AIC 1
S0 + 2 cj
j=1
This approximation implies simple compounding and assumes that the CF are realized
in the middle of the respective periods.
X
R0 = hφ, Ri, φi = 1 (3.29)
i
where the rst part represents the levered portfolio and the last term represents borrowing
costs for the leveraged position which is an investment in the borrowed asset B. Note
that this term is negative. In relative terms,
1 = λ(φ1 + φ1 ) + (1 − λ)φ3
which shows that for λ=1 we are back in the unlevered case.
λ
= φ1 + φ2 .
2λ − 1
If there is no leverage, φ3 = 0 λ = 1. Calculating
or the excess return relative to a risk
free rate Rf and to the borrowing rate RB we get:
The excess return relative to the borrowing rate scales linearly in the leverage ratio. But
for the excess return relative to the risk free rate, if RB > Rf , increasing of the leverage
ratio reduces the gains in the original portfolio.
Anderson et al. (2014) call the rst two terms on the right hand side the magnied
source terms due to leveraging. How important is the covariance correction in the last
term? To quantify it we need to consider the volatility drag. Formula (3.33) summarizes
that the expected return of a leverage portfolio also contains a covariance reduction term
between the random leverage ratio and the excess return. Summarizing, in a multi period
investment, there are three factors which matter:
3.2. BASICS OF NO ARBITRAGE 183
• Transaction costs.
Anderson et al. (2014) consider these three factors in a 60/40 target volatility investment
with US equity and US Treasury bonds. They consider monthly returns from Jan 1929
to Dec 2012. The target volatility is set equal the xed 11.59% realized volatility in the
observation period. Since volatility is not known ex ante, the leverage ratio is a random
variable. The borrowing for the leverage is done at the 3m Eurodollar deposit rate and
trading costs are proportional to the traded volume.
The authors nd that the magnied source return in equation (3.33) dominates all
other components. But his portfolio is not realizable. The gross return of the source
portfolio, i.e. 60/40 target (gross of trading costs) and
the risk parity portfolio with
3m Eurodollar nancing (net of trading costs) is 5.75% in the period. The magnied
source term contributes 9.72%. This implies that 3.97% is due to the leverage and
excess borrowing return. The total levered arithmetic return is 6.84%. The dierence
to 0.72% is the covariance correction −1.84% and the trading costs of −1.04%. Finally,
the variance drag value is −0.4% which implies the total geometric levered return of
6.37%. Summarizing, the three eects - transaction costs, covariance correction and
variance drag - reduced the positive leverage return impact of 3.97% by 82% to 0.69%
(3.97 − 1.84 − 1.04 − 0.4 = 0.69%).
• The investor can buy the following contract - a call option C0 at time 0 with the
payos CT at time T
• There is a risk-less instrument B with price 1 today and which pays 1.1 at time T
independent whether the stock rises or falls.
How much is the investor willing to pay at time 0 for the derivative C ? This denes the
pricing problem.
We show that there is a unique, fair answer to this question in complete markets.
We start with the motivations of a seller (writer or trader of a bank) and of a buyer of
the derivative.
The writer of the derivative would like to obtain a price from the buyer at time 0
such that he can buy a portfolio V0 at 0 which will have a value VT at time T which is
always worth at least the liability value of the derivative CT at time T, i.e.
The price at time 0 should be high enough that the writer can pay the liability at time
T using the price change of the portfolio V0 up to time VT without additional money
10
and using the three instruments S, B, C only. The buyer of the derivative does not want
to pay a price at 0 for the derivative such that the writer can buy a portfolio V at time
0 which is worth more than the derivative value at time T:
is the buyer's intention. The price, if it exists, where both motivations are met
VT = CT , in all states ω.
is called the fair price of the replication portfolio (we skipped the state variable
ω ). There are no restrictions on the portfolio positions, i.e. we can be long or short any
instrument.
What is a state? It represents everything that is relevant for the value of the rm,
including rm-specic variables such as earnings and leverage, industry-specic variables
such as product demand and input prices, and macroeconomic variables such as interest
rates and exchange rates. The state includes everything that we are not going to model
explicitly. Sometimes it includes the stocks price, so that, even though we are ultimately
interested in deriving the stock price as a function of more primitive variables, the dis-
tinction between the state and the price becomes blurred.
φ2 ∗ 120 + φ1 ∗ 1.1 = 20
φ2 ∗ 80 + φ1 ∗ 1.1 = 0 . (3.34)
The problem with a risky asset is expressed in terms of linear algebra where no entry is
risky. Whether or not a (unique) replication exists is therefore reduced to the questions
when does a linear system has no, many or a single solution. By solving the system we
get
How do we get the fair price C(0)? To answer this we calculate the portfolio value
at time 0 using the above strategy:
This is the fair derivative price, i.e. V0 = C0 = 13.64. Indeed we apply the Law of One
Price which is a weaker formulation than the no arbitrage principle:
Denition 21 (Law of One Price). Consider a perfect market. Two assets with identical
cash ows must trade at the same price or if the replication price of an option exists,
then this price is unique.
One often states the law of one price as follows: If we have three payos x, y, z at a
given date with
z =x+y .
Then the prices p(·) at any date of these equal payos also agree:
11 In practice one often uses the word hedging both for replication and for the case with hedging risk.
186 CHAPTER 3. FUNDAMENTALS THEORY
Since V T = CT we must have V 0 = C0 . If a dierent price follows, the writer can make
risk less prots in a risky environment. For V0 < C0 , the writer invests the dierence in
the risk less asset. If C0 < V 0 the writer buys the derivative from the investor and sells
it to the fair price. Again the dierence is looked in and invested in the risk less asset.
The law of one price is the most important special cases of no arbitrage.
How does the Law of one Price ts into the no arbitrage relation (3.1) in a risk less
environment, i.e. D(t, s)D(s, T ) = D(t, T ), D(t, t) = 1? Take USD 1 at time T . Dis-
counting this dollar back to time t using two dierent paths has to give the same value
at time t by the Law of one price. This exactly what (3.1) states.
The probabilities P assumes that the risky asset goes up with 90 percent and down
with 10 percent. This probabilities are derived from historical data using econometric
methods. They do not matter explicitly in the pricing of derivatives. The fair
price in a complete market is independent on individual belief 's of the market partici-
pants or real (historic) probabilities. This is a major reason for the success of derivative
pricing since it liberates buyers and sellers to estimate these probabilities which in a multi
period set-up turn out that one has to guess the drift of the risky asset price processes.
Given this observation people are tempted to state that the belief is of no importance at
all. This is not true. Suppose that the common belief is that Google's stock price will
raise by 10% in one week. Then the belief does not enter a derivative contract of Google
but it clearly aects the level of the stock. Therefore, beliefs matter in derivative pricing
by aecting the underlying's price level.
12
In the replication approach a portfolio of bonds and stocks was set up to replicate the
derivative payo. In the hedging approach one considers an unknown amount of the
stock and the derivative. This portfolio is then specied by requiring that the portfolio
has the same value in all states of the world as the risk less bond. Therefore, using the
option and the stock one derives the bond property. A portfolio with this property is
hedge position. The same value for φ2 follows as under the replication approach. One
could equally take the last combination - the derivative and the bond - as a portfolio
combination an replicate the stock.
What happens if the two asset payos are linearly dependent (redundant assets)?
Then, the replication system has no solution. Similar if there is only one asset, the op-
tion cannot be replicated.
We change our market in the initial example as follows: The time T -values of the risky
12 As a game one should setup the above market and ask friends to report about the price that they
would pay for the call option. For each announced price you can state an investment strategy which gives
you risk less prots except one friend announces the fair price. The strategy follows the above receipt:
if the announced price is higher than the fair price, use this latter fraction for hedging and invest the
dierence in the risk free asset. People will be astonished if they see how you (the writer) of the option
will make risk less prots in a risky environment - a king of magic in option pricing happens.
3.2. BASICS OF NO ARBITRAGE 187
80 and 105 and the derivatives pays 10 in the upper state and 0 in the lower one.
asset are
Forming the replication portfolio and solving the equations we get A = 0.4, B = −29.1
and V0 = 10.9. This price makes no sense. Why should anybody pay 10.9 for a contract
which pays 10 or 0 at time T ? The replication portfolio was setup correctly. Therefore
something must be wrong in the market structure. We show that the market is not free
of arbitrage. To see this, we write the risky asset price moves using the up ('u') and
down ('d') notation, i.e. 120 = 100u and 80 = 100d.
Proposition 22. Arbitrage is not possible in the above one period market if and only if
u>1+r >d (3.36)
holds
To explain the result, we note that u > d. Suppose 1 + r > u > d. Then the risk less
investment always dominates the risky one - in all possible stated tomorrow. Therefore,
go short the risky asset and long the risk less one. In the case u > d > r+1 a similar
argument applies. We obtain for the example at the beginning of the section:
The above proposition gives us a simple criterion to check whether no arbitrage is possible
or not in a binomial model.
We consider risk neutral pricing. We have seen that historical probabilities or
beliefs about the risky asset price dynamics do not matter for fair option pricing in the
replication approach. But there is a pricing approach where probabilities matter. These
probabilities are dierent from the empirical or subjective ones. We dene
R−d
q := ,R = 1 + r . (3.37)
u−d
Proposition 23. Suppose that there are no arbitrage possibilities. Then q is a probability,
the so-called risk neutral probability.
To prove this we show 0 ≤ q ≤ 1. Since r − d > 0 and u − d > 0, we have q > 0.
Assume q > 1. This is equivalent to R − d > u − d, i.e. R > u. This contradicts the
assumption of no arbitrage.
If we calculate q in the original setup we get q = 0.75. In the variant with an arbitrage
opportunity we have q = 1.25. The RNP has the form of a Sharp Ratio:
Return relative to risk free
q= .
Volatility
Dividing by R:
Q ST
E = S0 .
R
In an arbitrage free market, the expected value of discounted risky assets under the risk
neutral probability equals today's discounted asset value (note that S0 /1 is the discounted
value in 0). The First Fundamental Theorem of Finance states that converse also holds.
The existence of a martingale measure implies no arbitrage. Since martingales have no
drift, the expected value of the discounted price process is constant.
We relate the replicating approach to the risk neutral one. The solution of the general
replication equations
φ2 ∗ S 0 ∗ u + φ1 ∗ R = C u
φ2 ∗ S 0 ∗ d + φ1 ∗ R = C d (3.38)
is
Cu − Cd C u C u (d − u − 1) − C d u
φ2 = =: ∆ , φ1 = + .
S0 u − S0 d R R(u − d)
φ2 , the Delta, measures the price sensitivity of the derivative given a price change of the
underlying risky asset. φ1 is negative. We have at time 0:
V0 = φ2 S0 + φ1 B0 = ∆S0 + φ1 B0 . (3.39)
From
V0 = C0 = ∆S0 + Cash
we get
1 − Cash/C0 = S0 × ∆/C0 .
| {z }
Leverage Ratio L
L>1 represents a loan (cash is negative) and L<1 the lending case. For L = 5, we
gain with the costs for the option an exposure in the underlying value which is 5 times
larger. To achieve this, we borrow 4/5 of the costs and invest 1/5 of the costs from our
own money.
3.2. BASICS OF NO ARBITRAGE 189
• The discounted derivative process is a martingale under the risk neutral probability.
If markets are incomplete our statements holds true with the exception, that Q is not
unique or equivalently, the price of the derivative is not uniquely determined using no
arbitrage.
with S X two assets and C(S) the derivative. Using X as numeraire, the discounted
and
option C/X and risky asset S/X are both martingales. A numeraire is by denition
a positive random variable. X = 1/(1 + r)
T is a deterministic numeraire. Indeed, it
follows that C/X or the replicating portfolio V /X is a martingale if and only if S/X is
a martingale:
C0 QX CT S0 QX ST
=E if and only if =E . (3.41)
X0 XT X0 XT
The probability QX depends on the choice of the numeraire:
S0 STd
X0 − XTd
qX = .
STu STd
XTu − XTd
R−d
If we set X equal to a risk free asset, qX becomes the well-known q= u−d .
We summarize. First, the advantage of relative pricing w.r.t. X is that the probability
QX is independent of the derivative, the replicating portfolio and the pricing equation
(3.41) holds for all derivatives C . Second, both the derivative and the relative asset
price are martingale measures, i.e. the measure related to the numeraire. Third, we could
choose S as a numeraire instead of X . This leads to a new measure QS such that X/S and
C/S are martingales under this new measure: The choice of a numeraire does not alter the
price of the derivative
13 . One can chose the most convenient numeraire for calculations.
Fourth, if we would consider absolute pricing (pricing using a general equilibrium model)
instead of relative one, the martingale measure for a derivative depends on the specic
derivative payo VT . This is a main reason why one uses relative no arbitrage pricing in
practice much more often than a fully edged general equilibrium model. We summarize:
Theorem 25. • Under no arbitrage pricing the pricing formula holds for all types
of payos or derivatives.
• The risk neutral probability in the linear pricing formula depends only on the asset's
characteristic (numeraire and other assets).
• In absolute pricing (General Equilibrium) the probability entering in the linear pric-
ing formula depends on the assets and payo/derivative.
Denition 26. Consider a one-period model with S = s > 1 states at time T and N − 1
risky assets S and a risk less asset B . The price of asset j at time T in state k := ωk is
given by S j (k). The payo matrix P is dened by14
1
B (1) S 2 (1) · · · S N (1)
.. .. .. ..
P= . . . . . (3.42)
1 2
B (s) S (s) · · · S (s)N
The matrix P has the dimension S × N. The payo or portfolio value X at time T is
X = Pφ . (3.43)
Denition 27. A payo X is attainable given P if a portfolio φ exists such that X = Pφ.
The portfolio φ is called a replication portfolio. The space of attainable payos, the asset
span, is denoted hSi ⊂ RS .
X = Pφ
Denition 29. Consider a market with payo matrix P and asset price vector S0 . An
arbitrage is a portfolio φ = (φ1 , . . . , φN )0 such that
• Pφ ≥ 0
• and there exists at least a single state k̃ where x(k̃) > 0 holds.
If there is arbitrage, a zero costs portfolio today and ends up with no loss tomorrow in
all states and with the chance of a prot in at least one state. Consider a single asset which
prices S+ and S− and B the risk free asset price. In a risky environment S− < B < S+
is the market structure leading to the absence of arbitrage. If B< S− < S + , you borrow
a large amount by selling the risk free asset and invest the whole amount in the risky one
i.e. A+ is the left-inverse A+ A = I. P = AA+ and Q = A+ A are orthogonal projection operators with
the properties: P A = AQ = A, A+ P = QA+ = A+ . P is the orthogonal projector onto the range of A
and (IP ) = (IAA+ ) is the orthogonal projector onto the kernel of A∗ .
192 CHAPTER 3. FUNDAMENTALS THEORY
B(1) S(1) = B(1)φ1 + S(1)φ2 = B0 φ1 (r + d) > 0
Pφ = φ (3.45)
B(2) S(2) = B(2)φ1 + S(2))φ2 = B0 φ1 (r + u) > 0
When is a market free of arbitrage? Using state prices ψ the First Fundamental
Theorem of Finance (FFTF) answer the question.
P0 ψ = S0 . (3.46)
S
X
S0i = ψj Pi (j) .
j=1
S
X S
X S
X
B(0) =: B0 = ψj B 1 (j) = ψj × 1 = ψj =: ψ0
j=1 j=1 j=1
since the risk less asset pays 1 in all possible states. We dene the probabilities with
values (0, 1)
qi := ψi /ψ0 , ∀i .
The risk less asset can be rewritten
S
X
B0 /ψ0 = qj B 1 (j) = E Q (B 1 ) = E Q (1) = 1 .
j=1
Therefore, ψ0 is the discount on a risk less borrowing. If r in the risk less annual interest
rate, we write
1
B0 = ψ 0 = .
(1 + r)T
This implies for all other risky assets:
S S
Si
X 1 X
S0i i i Q Q i
= ψj S (j) = q j S (j) = E = E M S (3.47)
(1 + r)T (1 + r)T
j=1 j=1
3.3. NO ARBITRAGE AND DERIVATIVE PRICING 193
with M the Stochastic Discount Factor SDF.. This factor is in this model deterministic
and does not depends on consumption as it is the case in more general models. We will
consider the SDF in more detail below. That means that in the pure nance set-up with a
given market structure and no-arbitrage the Fundamental Asset Pricing in (3.47) follows.
1 PS
qi 's is equal to 1, qi > 0 and S0i = i (j) = E Q S i,∗ , where
The sum of the
(1+r)T j=1 qj S
S i,∗ is the discounted asset price. Then the q 's are the risk neutral probabilities
(RNP). This shows the equivalence of risk neutral probabilities and state prices:
qi
! ψi . (3.48)
(1 + r)T
State prices are Arrow-Debreu securities e(j), j = 1, . . . , S where the security e(m) pays
1 CHF if the state m is realized and zero otherwise. The FFTF implies
(1 + r)T · · · · · · (1 + r)T
1 ψ1
e(1) 1 0 0 0 ψ1
... =
.
··· ··· ··· ··· ...
e(S) 0 ··· 0 1 ψS
Corollary 31. State-price densities are the prices of the Arrow-Debreu securities. State
price densities and risk neutral probabilities are equivalent.
Each payo can be written as a linear combination of the Arrow-Debreu securities.
In practice one often prefers to work with risk neutral probabilities instead with state
prices. We state the FFTF using risk neutral probabilities.
The expected discounted return of the risky asset under RNP is zero: The discounted
asset has no trend under this probability. If we consider several investment periods, the
risky asset condition for a RNP reads
St = EtQ St+1
∗
with the conditional expectation at time t given the information set at this date. Inter-
preting conditional expectation as best guess, the condition states that under a RNP the
best guess of future discounted prices is today's price. Such price processes are called
martingales. The absence of arbitrage does not implies that the risk neutral probability
is unique. The second Second Fundamental Theorem of Finance considers this question.
194 CHAPTER 3. FUNDAMENTALS THEORY
where S1∗ (t) = Sj (t)/B(t) are the discounted asset prices. A derivative is a contract
signed at t = 0 which leads to a state contingent non-negative reward X1 (ω) at T .
How do we price X fair?
Denition 34. X0 is a fair price for the contingent claims XT , if the enlarged market
(Bt , S1∗ (t), . . . , SN
∗
(t), Xt∗ )t=0,T
Then, the fair price C0 of an attainable call option in a arbitrage free market is
1
C0 = EQ [(S − K)∗+ ] = EQ [(S − K)+ ].
1+r
3.4. APPLICATION 195
3.4 Application
3.4.1 TAA Construction, Forwards and Futures
We consider the construction of the TAA for a Swiss intermediary (Source: ZKB [2013]).
Figure 3.10 shows the inputs in the TAA construction. The asset classes are cash, bonds,
N-Americas USD Stocks MSCI North America USD NR 8.50 iShares MSCI North America 8.50
Asia / Pacific USD Stocks MSCI Pacific USD NR 6.50 ComStage ETF MSCI Pacific 6.50
EM USD Stocks MSCI Emerging Markets USD NR 4.50 db x-trackers MSCI Emerg Mkts TRN 1C 4.50
Global CHF Hedge Funds HFRX Global Hedge Fund CHF Index 4.50 db x-trackers db Hedge Fund 5C 4.50
Global USD Commodities DJ UBS Commodity TR Hedge to CHF 3.00 ZKB-CIF Commodity Index hedged CHF E 3.00
Global USD Gold Spotpreis USD/Unze CHF Hedged 3.00 ZKB Gold ETF CHF Hedged 3.00
CH CHF Real Estate CH SXI Real Estate Funds Index 4.00 UBS-IS - SXI Real Estate Funds I 4.00
Total 100.00 100.00
Figure 3.10: Inputs in a TAA. The table shows the dierent asset classes, their volatility
adjusted benchmark, the implementation when using ETFs, the weights and durations
of the benchmark and ETF portfolio. Source: ZKB (2013)
stocks, hedge funds, commodities, gold and real estate CH. Bonds are split in three time
buckets, 1-5y, 5-10y and more than ten years, into government bonds and corporate
investment grade (IG) and high yield (HYCB) bonds. The list of ETF represents a pos-
sible implementation of the benchmarks. The weights of the portfolio are the result of
an optimization such as a mean-variance optimization. The weights of the dierent asset
classes are volatility weighted (the ISV notation) which will be explained below.
The positions in the TAA are replicated using liquid and cheaper instruments than
ETF: futures, forwards and swaps which we call REP instruments, see Table 3.7. Equity,
foreign bonds, commodities are replicated using futures, currency using forwards and
swaps are used for CHF bonds. This means that asset class Aj is written as a linear
combinations of the REP instruments. Say the 11% allocation of Swiss stocks is equal to
9% in SMI Fut and 2% in SMIM Fut. The splitting of the individual assets classes into
the REP experiments is done by minimizing the tracking error of the REP instruments
towards the benchmark. Hence, a given REP instrument can contribute to several asset
196 CHAPTER 3. FUNDAMENTALS THEORY
classes. The allocation of the REP instruments is shown in the table Allocation 100%.
Hedge funds and real estate cannot be attributed to the liquid REP instruments. Their
allocation weights are therefore zero.
Instrument Allocation 100% Allocation Index Instrument Allocation 100% Allocation Index
Liquidity CHF 4.5% 4.5% AUS 10 YR Bond Future 0.6% 1.2%
SMI Fut 10.7% 20.7% Natural Gas Future 0.3% 0.5%
SMIM Fut 1.2% 2.3% Crude Oil Future 0.2% 0.4%
FTSE Fut 4.1% 7.9% Brent Oil Future 0.4% 0.8%
Euro-Stoxx 50 Fut 6.2% 12% Live Cattle Future 0.2% 0.3%
S&P 500 E-mini Fut 8.4% 16.4% Wheat Future 0.2% 0.4%
TSX 60 Fut 0.7% 1.4% Corn Future 0.2% 0.4%
SPI 200 Fut 1.8% 3.5% Soybean Future 0.4% 0.8%
TOPIX Fut 4.2% 8.2% Sugar Future 0.2% 0.3%
Hang Seng Fut 0.6% 1.2% Aluminum Future 0.2% 0.4%
MSCI Singapore Fut 0.4% 0.7% Copper Future 0.4% 0.8%
MSCI EM Fut 4.8% 9.4% Gold Future 3.8% 7.3%
Swap CHF 3 YR 24.1% 46.8% EUR/CHF Fw 11.5% 22.4%
Swap CHF 7 YR 6.2% 12.1% GBP/CHF Fw 5.1% 10%
Swap CHF 10 YR 1.7% 3.2% USD/CHF Fw 18.7% 36.2%
Euro-Schatz Fut 1.6% 3.2% CAD/CHF Fw 1.8% 3.5%
Euro-Bobl Fut 2.4% 4.7% AUD/CHF Fw 2.9% 5.6%
Euro-Bund Fut 1.3% 2.6% JPY/CHF Fw 4.2% 8.2%
Short Gilt Fut 0.6% 1.2% HKD/CHF Fw 0.6% 1.2%
Long Gilt Fut 0.5% 0.9% SGD/CHF Fw 0.4% 0.7%
US 2 YR Note Fut 1.2% 2.3% Total Cash 4.5% 4.5%
US 5 YR Note Fut 2.4% 4.7% Total Futures 95.5% 185.5%
US 30 YR Note Fut 1.8% 3.4% Total Forwards 45.2% 87.8%
Can 10 YR Bond Fut 1.1% 2.1% Volatility 60d 4.1% 8%
Mini JGB 10 YR Bond Fut 0% 0% Investment Degree 100% 194.2%
AUS 3 YR Bond Fut 0.5% 0.9%
Table 3.7: Futures, forwards and swaps in the TAA replication. The allocation to 100% is scaled
to the allocation index which takes into account that the total volatility of the TAA should be
equal to 8%, see text for explanations.
The next step is to consider volatility. For each instrument, calculate the daily re-
turns for one year. Then form the sum of the products of the returns at each date. This
gives the return of the allocation index time series. The volatility of this index is the
standard deviation multiplied by the square root of return days within one year (square
root rule). This implies the volatility 4.12%. Given the target volatility of 8%, the in-
vestment degree of 194.2% follows, see the Allocation Index. Such a model TAA is an
input for the CIO which makes pairwise bets of dollar value each at inception.
Consider an index of forwards, futures and swaps replicating a TAA. The index value
It at time t is updated as follows from a index value at a prior time s<t
!
X F Xtk k X X
It = 1+ φks R + φ l
Swap
l
+ φm m
s Forws,t Is = Gs,t
F Xsk F ut,t s s,t
m
k l
where RF ut,t is the simple futures return and the FX component only matters for the
futures since the swaps and the forwards are in CHF. φ is the allocation vector arising
k
from an optimization problem. F Xt is the exchange rate of the currency of the future
k against the Swiss franc. francs at time t where 1 currency unit is equivalent to F Xtk
of Swiss francs. The value of the futures k is calculated as the price of the futures k at
the time s in local currency multiplied by the contract unit of the futures. An oil future
l
with contract unit 1, 000 and USD 108 local currency has value USD 108, 000. Swaps,t
3.4. APPLICATION 197
is the value of the swap l at time s with nal date t' at fair value interest rate and with
m
nominal CHF 1.00. The value of the currency forwards Forws,t at time t in Swiss francs
is given by a the fair value forward rate xed at time s with a nominal value of CHF
1.00. The maturity of the forward corresponds to the next planned roll date. The above
formula shows that interest exposure follows from the chain rule.
t
Y
It = Gk−1,k I0 .
k=1
Hence, gross return Rg 0, t := It /I0 is given by the product of one step adjustments.
Gross return can also by written as a product of one-step gross returns:
t
g It It It−1 I1 Y g
R0,t = = ... = Rk−1,k .
I0 It−1 It−2 I0
k=1
Therefore,
t t
g g
Y Y
R0,t = Rk−1,k = Gk−1,k .
k=1 k=1
Forwards can be used to hedge risks. Consider a German based rm which wants to
buy goods in the US. The rm could buy the goods at a future date at the spot rate
USDEUR S(t). To avoid this risk the rm can enter today into a forward contract. The
forward price F (t, T ) is xed such that no cash ows exist at spot date t; the PV of a
forward is zero. The delivery price K is set equal to F at t, i.e. K = F (t, T ). The
equality K = F (t, T ) does not hold any longer after t. If at maturity S(T ) = K , then
the German buyer of the contract faces neither losses nor gains. If S(T ) > K , the Ger-
man rm makes a prot since USD can be bought at the cheaper price K than spot price.
Summarizing, a forward contract V has the following value for the buyer at maturity:
V (T ) = N · (S(T ) − K) ,
with N the notional amount. The payo is a linear function of S(T ) contrary to
options, where the payo is non-linear: The price of a forward does not depend on
198 CHAPTER 3. FUNDAMENTALS THEORY
(spot) volatility since the probability of making a gain or a loss at maturity is sym-
metric. No arbitrage leads to a unique forward price. Taking risk neutral expectation
V (t) = N · e−r(T −t) E Q ((S(T ) − K)) implies
Since E Q (S(T )) = er(T −t) S(t), the discounted price process is a martingale, we get
Since at contract initiation V (t) = 0, the initial forward F (t, T ) = K has to be chosen
as follows:
Proposition 36. The unique arbitrage free forward price F (t, T ) given a risk free rate
r is under continuous compounding:
This price only holds for forward where no dividends, no cost of storage, no interest
rate dierential costs and no convenience yield apply. The growth rate of the future price
F is equal to r. If r 6= 0 then F (t, T ) > S(t) or F (t, T ) < S(t) with F (T, T ) = S(T ).
For simple compounding, using the approximation ex ∼ 1 + x:
We generalize to forwards with stocks paying dividends with a continuous rated, com-
modities with storage cost rate s, bonds with coupon payment rate c, FX-transaction with
dierent interest rates foreign and domestic i (interest rate dierential) and commodities
with a convenience yield y . These extensions are captured by the net cost-of-carry yield
q:
q = r + s − y − d − c ± i.
Proposition 37. The unique arbitrage free forward price F (t, T ) given q is given under
continuous compounding by:
If q > 0, the costs to possess the underlying value are larger than its value. This hap-
pens if storage costs are very high for a commodity forward. The buyer of the forward
therefore compensates the seller for these costs.
We next consider the valuation of a forward at intermediate dates. We recall that the
forward contract has value 0 at initiation time t. For s an intermediate date t < s < T,
the value V (s) of the forward contract is dened by
which implies
V (s) = S(s) − D(s, T )F (t, T ) .
Using the no arbitrage relation S(s) = F (s, T )D(s, T ):
Setting s=t or s=T shows that the known initial and nal values follow.
Consider a stock with S(0) = 25 CHF and a 6m forward contract. The 6m interest
rate is r = 7.12%. Using simple compounding we get
After 3m the stock price is S(t + 3m = 0.25) = 23 CHF and 3m interest rates are
r = 8.08%. The forward price of a new contract with the same maturity reads
The dierence between the forward and spot price is called the basis b: b(t) =
F (t, T ) − S(t). When the underlying S of the futures market and the cash market
are identical, the basis converges to zero on the maturity date. Basis risk arises if either
there is a mismatch of underlying asset or mismatch of maturity. Consider a forward
F (t, T ) = q is known at time 0, we set h =
e−q(T −t) S(t). If e−q(T −t) . Then, the payo
qt
at time t of h-forwards h(F (t, T ) − F (0, T )) = S(t) − S(0)e is the same as for a forward
entered at time 0 with maturity t. Then there is no basis risk.
Forwards oer full exibility to the two involved parties. But forwards also possess
some drawbacks. Each seller needs to nd a buyer and both parties face counter party
default risk. These two drawbacks are eliminated using futures instead of forwards. Fu-
tures are traded at a future exchange. Each future exchange has a clearinghouse. A
clearinghouse is a well-capitalized nancial institutions. It acts as an intermediary be-
tween the two parties. The house guarantees contract performance to both parties. The
200 CHAPTER 3. FUNDAMENTALS THEORY
two parties have an obligation to the clearinghouse and no longer to each other. To re-
duce default risk of the clearinghouse, the buyer and seller must deposit funds with their
broker; the margins. The form of the margin must be eligible such as cash or specied
securities. Since the initial margin is typically a one digit fraction of the goods repre-
sented in the future contract potential losses are much higher than the margin deposit.
To counteract this risk the potentially large gains or losses in future contracts are not
left to grow over time but they are realized on a daily basis. Values of futures positions
are daily settled by marking-to-market of the contracts. Besides the initial margin, the
maintenance margin reects the necessary minimum amount on the margin account and
the variational margins is payable if a shortfall of the margin account is considered.
Futures contracts are standardized contracts w.r.t. to delivery date T , the underlying
value and the quantity to deliver. The two parties only have to agree about the delivery
price K and the number of contracts.
Since a futures initial price is zero, to buy a future is equivalent to buy the underlying
value nanced by borrowing (leveraging). Futures allow for the same exposure than the
underlying value but at lower costs - lower fees and smaller bid-ask spreads. Since future
price can vary heavily during its life time it requires enough liquidity for the potential
margin calls. At the Chicago Mercantile Exchange the minimum amount is 250 thousand
US dollar. The largest future exchanges are CME, CBOT, Eurex.
We consider an example:
• Monday
Investor buys futures USDEUR with a notional amount EUR 1250 000.
The underlying value is
USD
0.7 EUR and maturity is 1 year.
• Tuesday
USD
Price underlying: 0.5 EUR
USD × EUR 1250 000 =
0.2 EUR USD 250 000 are taken away from the margin
account.
• Wednesday
USD
Price underlying: 0.8 EUR
USD × EUR 1250 000 = USD 350 500 are credited to the investor's margin
0.3 EUR
account.
In a Dax future the notional amount is equal to the Dax index value times 25 Euro.
For Dax at 3'900 points the notional amount of a future contract is 97'500 Euro. The
initial margin for a Dax Future is Euro 30 850. With Euro 100 000 on the margin account
one can enter into at most 2 futures contracts. The maturity of Dax futures is typically
3.4. APPLICATION 201
3 months. The tick for the futures is 0.5 Dax index points. Since the value of a single
Dax point is Euro 25, the value of a tick is 12.5 Euro. The exchange fees are Euro 50
cents per contract.
Proposition 39. (Valuation Futures) If there is no interest rate risk, default risk of the
counter parties and no arbitrage holds, then the valuation of futures is the same as the
valuation of corresponding forward contracts.
Table 3.8 proves the proposition where r is the xed one-period interest rate: If we
Table 3.8: Equivalence of forwards and futures for deterministic interest rates
price futures in a market without any frictions, the pricing of futures is given by the cost-
of-carry model, i.e. no arbitrage is the driver. If C represents the expected cost-of-carry,
i.e. the costs which are necessary to carry the good forward from t to delivery date T,
the no arbitrage relations
We illustrate (3.53) for S an equity index, D the value of the dividends before maturity
and r the annualized nancing rate or money market yield. The fair futures prices reads
T −t
F (t, T ) = S(t)(1 + r )−D . (3.54)
360
Next, let B(t, T ) be the market price of a bond including accrued interest rate (dirty
price). The fair futures price is given by
where the interest cost are the interest opportunity cost and the coupon payments are
those up to expiration of the futures contract. Let c be the annualized coupon rate, A
the days of accrued interest rate, then
T −t T −t+A
F (t, T ) = B(t, T )(1 + r ) − cB(t, T ) .
360 360
202 CHAPTER 3. FUNDAMENTALS THEORY
This formula assumes that the bond can be bought and delivered at any data. But this
needs not to be true. Consider US Treasury bond futures which are traded on the Chicago
Board of Trade (CBOT). These bonds have quarterly expiration dates. The size of one
futures contract is equal to USD 100'000 face value of a eligible Treasury bonds having
at least 15 years to maturity and which are not callable for at least 15 years. Therefore
B(t, T ) in the second bond expression in the last formula is replaced by a tradeable bond
for the short seller - the cheapest to deliver bond Bcd (t, T ). Since dierent bonds have
dierent characteristics, standardization is lost at this stage. To give the short seller
exibility in choosing which bond is actually delivered the actual Treasury bond selected
by the short seller for delivery is price adjusted by a delivery factor f such that the bond
reects a standardized 8 percent coupon rate:
−t
B(t, T )(1 + r T360 ) − cBcd (t, T ) T −t+A
360
F (t, T ) = . (3.55)
f
Delta hedging is used to hedge the risk of futures. Consider gold with spot price 400
in a currency, net cost-of-carry 6% p.a. and time-to-maturity one year, i.e. F (t, T ) =
400e0.06 = 425. In a static hedge an investor's is short futures and long spot. Table
3.9 summarizes the prot and loss for two spot price scenarios (N = 1). A gain or loss
Scenario 1 Scenario 2
today tomorrow today tomorrow
Spot 400 600 Spot 400 200
Future 425 637 Future 425 212
1:1 Hedge -25 -37 1:1 Hedge -25 -12
P& L -12 P& L 12
follows which is not what we expect in a hedged position. The reason is that spot and
futures move only 1:1 if the cost-of-carry is zero.
Change Spot
∆Fut
Spot = .
Change Futures
If x is the change in the futures price and τ = T − t, no arbitrage between spot and
futures price implies
e−qτ x
∆Fut
Spot = = e−qτ , (3.56)
x
i.e. the Delta is determined by the cost-of-carry. Hence, a Delta hedged portfolio
W is short ∆ times the futures and long spot:
• Covered Parity (CIP) : The return of a domestic risk free investment equals the
return of a foreign risk free investment if the FX risk is hedged using a forward
contract.
• Uncovered Parity (UIP) : The interest dierential between two countries is compen-
sated by the expected FX changes.
We consider the covered parity for the Japanese yen (JPY) and Brazilian Real (BRL). If
JPY are exchanged against BRL there is no guarantee that BRL does not de-evaluates.
Using a FX forward we eliminate this risk. We assume
• Interest rates Yen RJP Y = 1% p.a., Real RBRL = 10% p.a. and spot rate S(t) =
0.025 BRLJPY.
• He changes the JPY into BRL at the spot rate which gives BRL 25.
In T , the investor changes the BRL 27.50 into JPY at spot S(T ) which is not known at 0.
The above strategy is risky. To choose a risk free FX strategy he replaces today unknown
spot rate S(T ) by the known forward price F (0, T ). The forward price is determined
with the following no arbitrage argument. We write Rd for the nominal interest rate in
the domestic currency JPY and Rf for the interest rate in the foreign currency BRL.
Table 3.10 illustrates the strategy where borrowing is in the foreign currency. At 0:
• The investor borrows BRL for one year. He exchanges the BRL at the spot S(0)
in JPY and invests the JPY for 1y .
204 CHAPTER 3. FUNDAMENTALS THEORY
At T:
To avoid arbitrage at time T the amount received in foreign must equal the amount
of foreign currency payed back. This implies the Covered Interest Rate Parity
Theorem (CIP)
(1 + Rd )
F (t, T ) = S(t) (3.57)
(1 + Rf )
with
Rd − Rf
the interest rate dierential. CIP states that the dierence between domestic and
foreign interest rate determines the forward price.
What is the dierence between the uncovered (UIP) and the covered parity? UIP
replaces the forward price in the CIP procedure by expected spot price, i.e. F (t, T ) by
3.4. APPLICATION 205
Et [S(T )]: :
(1 + Rd )
UIP: Et [S(T )] = S(t) . (3.58)
(1 + Rf )
Which view enters the expectation? Carry trades are bets that expectations formation
diers from the e forward rate view:
(1 + Rd )
Et [S(T )] 6= F (t, T ) = S(t) . (3.59)
(1 + Rf )
Consider a Swiss investor which needs in 30d JPY. He buys the 30d JPYCHF forward
which xes the exchange rate for 30d in CHF. This is a covered position, i.e. there is no
FX risk. A dierent strategy is to exchange the CHF in JPY at the spot rate S(t), to
invest the amount in the Japanese money market for 30d and to pay the debt in JPY
back. This leads to the CIP. Finally, a third strategy is to invest the CHF amount and
to exchange it in 30d into JPY. This investment is not covered. FX risk is only zero if
realized 30d spot rates equals the forward price. If the forward is lower than indicated
by the CIP one borrows money in the foreign currency, exchange it in domestic currency
at the spot price and lend in the domestic currency.
Given the uncovered interest rate parity (UIP), arbitrage implies that the change of
a FX rate is equal to the nominal interest rate dierential between the two currencies.
Hence monetary policy (xing interest rates) and exchange rates are dependent. The
so-called Trilemma or Impossible Trinity. holds. A country cannot simultaneously
choose three policies: 1) a xed exchange rate (exchange rate stability), 2) open capital
markets (nancial integration) and 3) monetary policy autonomy. It can pick two; the
third follows by no arbitrage. If a country chooses open capital markets, uncovered
interest parity must hold: Arbitrage equalizes expected returns at home and abroad; the
domestic interest rate must equal the foreign interest rate plus the expected appreciation
of the foreign currency. If a country chooses open capital markets and xed exchange
rates, domestic interest rates have to equal the base-country interest rate, ruling out
monetary policy autonomy. If a country chooses open capital markets and wishes to set
domestic interest rates at levels suitable to domestic conditions, then exchange rates can
no longer be xed.
Since 2014, the CIP between USD and major other countries is broken. Borio et
al. (2016) analyze reasons for this fact. We show how one can exploit this arbitrage
opportunity. Consider a rm which denominates its income and balance sheet in CHF -
a currency where the CIP with USD in the period since 2014 is broken. Although CHF
interest rates are negative up to several years of maturity the rm cannot take a prot
out of this fact since interest rates are oored, deposits pay zero interest rates, and loans
are shifted upwards. The broken CIP makes it possible that the rm participates at the
negative interest rate environment. The rm asks for the loan in USD. Together with an
USDCHF swap FX risk is hedged and participation at the negative CHF interest rates
follows.
206 CHAPTER 3. FUNDAMENTALS THEORY
Figure 3.11: Left Panel: If a nation adopts position a, then it would maintain a xed
exchange rate and allow free capital ows, the consequence of which would be loss of
monetary sovereignty. Sweden for example decided to have control over the interest rate
and the free international capital ows and accepts that the exchange rate follows, i.e.
they cannot be controlled.Source: Wikipedia. Right Panel: Monetary policy selections
for four countries. Source: J.P. Danthine, Swiss Finance Institute (2011).
With market rates as of June 12, 2017, the mechanics is the following:
• At t = 3m, the USD are bought back at the forward rate 3m USDCHF 0.9670
which implies a pay-back USD amount:
The strategy can be rolled-over until the CIP is eventually restored in the future.
For a portfolio that invests in dierent countries, its value is aected by asset prices
changes, interest-bearing incomes and by P& L from exchange rates. Investment in each
country is a composition of exposures in asset markets and in exchange rates. A cur-
rency overlay modies the currency positions such that FX risk becomes acceptable. We
restrict to linear overlays, i.e. forwards.
3.4. APPLICATION 207
US D UK J
Bond a11 a12 a13 a14
Equity a21 a22 a23 a24
Cash a31
Forward USDEUR F1 −F1 0 0
Forward USDGBP F2 0 −F2 0
Forward USDJPY F3 0 0 −F3
Forward EURGBP 0 F4 −F4 0
Forward EURJPY 9 F5 0 −F5
Forward GBPJPV 0 0 F6 −F6
Overlay Exposures F1 + F2 + F3 −F1 + F4 + F5 −F2 − F4 + F6 −F3 − F5 − F6
Table 3.11: The structure of international portfolio investing in four countries with two asset
classes.
We write aij for the exposure to asset class i of country (=currency) j and F rep-
resents the respective forward position of a contract on a given currency. That is, in
a given currency j the currency exposure is equal to the sum of asset exposures of the
investor in a currency j plus the overlay position consisting of all forward contracts F
in that country or currency. For each forward contract, a minus sign indicates selling
and a plus sign indicates buying. The goal of the investor is to nd the optimal asset
exposure weights a and the optimal forward contracts F. This optimization, using for
example a mean-variance framework, is done under several restrictions. Besides usual
transaction cost constraints, of interest are overlay position constraints. Let L be the
total overlay limit allowed on a portfolio and Lm the total overlay of a portfolio equal
1 P P
to
2 j| i Fij | with Forwards position of contract i on currency j . If L = 1, then the
total currency exposure can deviate from total asset exposure up to 100% of a portfolio.
If Lm = 0, then the portfolio is unhedged and forward contracts are not allowed to to
shift from less-performing to better-performing currencies which hence better improve
the risk-return prole of the portfolio.
Entering into forward contracts incurs the cost of carry , i.e. the interest rate dier-
ential. For an investment in any country j, the total return Rj is given by:
Rj = ar Rja + cj Rjc + vj ij
where aj , cj
vj are respectively asset exposure, currency exposure and overlay posi-
and
tion on country j and the other variables are expected asset return, expected currency
return and expected interest rate of country j . Since overlay position is dened as the
208 CHAPTER 3. FUNDAMENTALS THEORY
Rj = aj (Rja − ij ) + cj (Rjc + ij ).
Hence, the portfolio total return is equal to the product of adjusted returns times asset
exposure and currency exposure, respectively. Therefore, the expression of overlay posi-
tions is not explicitly required to calculate total returns of a portfolio. For more details
see Chatsanga and Parkes (2017).
3.4.3 Call-Put-Parity
Consider an extended arbitrage free market with a call and a put option. There exists a
RNP Q such that
1 1 1
EQ [ S] = S0 , EQ [ C1 ] = C0 , E Q [ P1 ] = P0 .
1o + r 1+r 1+r
From the denition of the call and put follows (S − K) = (S − K)+ − (K − S)+ and
therefore
EQ [S − K] = EQ [S] − K = EQ [C1 ] − EQ [P1 ] .
Inserting the expressions above implies the Put-Call parity
1
C0 − P0 = S0 − K .
1+r
The parity is useful, since once knowing the price of say a call, the corresponding put
price follows : A call is a put and a put is a call. The parity holds for more general
markets too.
P0T ψ = S0 .
ψ 0 = (1/4, 1) .
The market is free of arbitrage. The existence of a unique solution is an exception since
there are 3 equations and 2 unknowns. Typically, the payos of the three non-redundant
securities are conicting.
3.4. APPLICATION 209
Consider a market with a risk less asset with zero interest rate and a risky asset with
3 states:
1 180
P = 1 150 , S0 = (1, 150)0 .
1 120
This market is incomplete. Solving Pψ = S0 , ψj > 0 and the set of state prices is
parametrized by:
ψ = {(a, 1 − 2a, a) , a ∈ (0, 1/2)} .
This incomplete market is free of arbitrage within the given parametrization set. Given
the incompleteness, there exist self-nancing portfolios φ such that there are claims
X ∈
/ hSi: X 6= Pφ for some states. Consider a call option which payo (30, 0, 0). This
call option is not attainable. Since X − Pφ is not zero in all states, hedge risk exist.
No arbitrage alone does not leads to a unique price in this case. A second criterion is
needed to enforce uniqueness. There are many possible criteria. One is that the market
chooses the single RNP Q which is used for pricing. The derivative price is then xed by
mapping the parametrized theoretical prices to observed market prices. This approach
is used in interest rate modelling ('inverting the yield curve').
The two equations (3.60) and (3.61) should determine the three dimensional state price
vector. The solution of the two equations, which are two planes, is in general a line or
arbitrage free prices - and not a point as in a complete market. The state price vector is
not unique. Despite the incompleteness the no arbitrage condition is the same as in the
well-known binomial model: There is no arbitrage if and only if
d<1+r <u .
The line is bounded by the requirement that state prices are positive. Each vector on
the line segment used to price derivatives leads to arbitrage free prices. Solving the two
210 CHAPTER 3. FUNDAMENTALS THEORY
equations, the boundary points of the line segment follow: For m≥1+r
1+r−d u−1−r
ψ1 = , ψ2 = 0 , ψ 3 = ,
(1 + r)(u − d) (1 + r)(u − d)
and
m−1−r 1+r−d
ψ1 = 0 , ψ2 = , ψ3 = .
(1 + r)(m − d) (1 + r)(m − d)
A similar corner solution holds for m < 1 + r.
The boundary values do not lead to arbitrage free prices since some components of
the state price densities are zero. If m→d or m → u, the trinomial model collapses to
the binomial one with the state prices:
1 1
ψ1 = q , ψ2 = (1 − q) , ψ3 = 0 .
1+r 1+r
Sk = S0 (1 + u)Nk (1 + d)k−Nk
with Nk the random number of upwards moves in k time steps. The values of a portfolio
Vt reads
where ∆Vt = Vt − Vt−1 , φt the amount of CHF invested in risk less asset at time t and
ψt the number of shares held at time t. The number of shares ψt has to be known before
t, that is a time t − 1. This property of random variable sequences (stochastic processes)
is called predictability. We only consider self-nancing strategies., see Section 3.1.5.
If φt is self-nancing, the portfolio value reads
t
X
Vt = V0 + φj ∆Xj .
j=0
The nal portfolio value is equal to the initial value plus the cumulative gains and losses
from the price changes of the asset X over time weighted by the investment strategy. If
we recall that replication means Vt = Ct in all states and at all time points, the above
equation transforms to
t
X
Ct = C0 + φj ∆Xj .
j=0
φj is the replication strategy which given the initial option price C0 generates the ran-
dom option claims Ct . The martingale representation theorem states when such
a strategy exists. The notion of an arbitrage strategy carries over from the one-period
case. Formally:
The set of all observable outcomes ωj is the sample space Ω. To understand informa-
tion dynamics, suppose that the rst move was up. Then four paths are still possible
after this step, the others are impossible. After say a down move in the second step,
only two paths remain possible. After the last price move, a single realized path is left.
This allows to introducepossible events. For 8 observable events, the power set A = 28
denes all possible events.
Ft ⊂ Ft+1 , Ft ∈ A ∀t .
212 CHAPTER 3. FUNDAMENTALS THEORY
Time
0 1 2 3
= set of Realized state
observable states w4
w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4
….
w5 w6 w7 w8 w5 w6 w7 w8 w5 w6 w7 w8
A1
w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4
….
w5 w6 w7 w8 w5 w6 w7 w8 w5 w6 w7 w8
A2
Figure 3.12: Illustration of the information and ltration structure for the three period
CRR.
Intuitively, increasing time means information resolution increases (there are more sets).
At t = 0, F0 = {∅, A} everything is possible, i.e. all future information is random.
16 At
t = 1, we dene the sets
A1 = {ω1 , ω2 , ω3 , ω4 } , A2 = {ω5 , ω6 , ω7 , ω8 } .
A1 (A2 ) is the set of all events where the rst price move is 'up' ('down'). We set
F1 = {∅, A, A1 , A2 } .
This assures that F0 ⊂ F1 . F2 is the power set of all eight observable states. The
information sets were generated by the evolution of the asset prices only. This is the
standard information structure set-up in asset and derivatives pricing.
The FFTF transforms to the CRR model case. The theorem is based on the notions
ofRNP or equivalet martingale measures.
Denition 41. n Consider a price process under a probability P . A probability Q is
equivalent to P , written Q ∼ P , if they have the same impossible sets. A probability
Q ∼ P is a risk neutral probability if the discounted price process S̃ := S/N is Q-
martingale with N > 0 the numeraire, i.e.
h i h i
S̃t = E Q S̃s |Ft =: EtQ S̃s ,
16 The inclusion of the empty set guarantees that the set F0 is closed under countable intersection and
complement set formation.
3.4. APPLICATION 213
Equivalence means, that P (State k) > 0 for all states implies Q(State k) > 0 for all
states and vice versa. Then, as in the static case, the CRR model is free of arbitrage if
and only if a RNP Q exists.
This price ratio takes only the values (1 + u) or (1 + d). SinceQ is strictly positive, both
values are attained with a positive probability. This implies d < R = 1 + r < u, else the
equation E Q [St+1 /St ] = 1 + r does not hold true. The last inequality is the no arbitrage
condition of the one period model. If the inequality is violated, arbitrage is possible.
We construct the measure Q?
Proposition 42. Assuming 0 < d < 1 + r < u. The following statements are equivalent:
1. S̃k is a Q-martingale.
R−d
Q[Xk+1 = 1 + u] = q =
u−d
u−R
Q[Xk+1 = 1 + d] = 1 − q = .
u−d
The risk neutral probability is unique, i.e. the CRR market is complete. If the
underlying instrument pays a dividend yield δ ≥ 1, the risk neutral probability and the
no arbitrage condition are:
R/δ − d
u > R/δ > d , q = .
u−d
We show how to price an European call option in the CRR model. Such a contract pays
at maturity C(ST ) = max(ST − K, 0) with K the strike value. The following proposition
is proven in Appendix ??.
Proposition 43. The arbitrage free price of a call option in the n-period CRR model is:
n
n
q n (1 − q)n−k max(S0 uk dn−k − K, 0) .
P
C(S, t) = k (3.62)
k=0
17 Due to the max operator a separation leads to an adjustment of the summation range.
214 CHAPTER 3. FUNDAMENTALS THEORY
it represents a loan. The dierence to the one period model are the more complicated
factors in front of S and K. They are probabilities. The formula states in words that
the price of a call (or put) option in the CRR model at a date t with maturity T is given
by:
X
C(S, t) = Path Probability × No. of paths × Payo End Node T (3.63)
paths
where the path probability equals q ku (1 − q)kd with q the risk neutral probability ku the
number of 'up' moves on the given path from t to T and similarly kd the number of
'down' moves, 'No. of paths' the number of paths connecting the node at time t with
the end node at T.
We compare the accuracy of the binomial CRR model with observed option prices.
Consider a call on ABB Ltd. with strike CHF 31 and expiration June, 20 2008. The bid
and ask prices where at CHF 0.33 and 0.34 respectively and the actual ABB share price
was CHF 29.9. The gures are calculated using (3.63).
u, d, r from the real world data. If R is the annual
The rst step is to calculate the tree
rate, r the rate on the tree, n τ = T − t time to
the number of periods in the tree and
n τ
maturity, we have the relationship (1 + r) = R . The number of periods in the CRR
model is n = 11, time to maturity is τ = 0.917. This implies 1 + r = R
τ /n = 1.00327.
p √
σ1y = daysσ1d = 250σ1d = 0.2882
where we assumed that there are 250 days in a year. We also need to know the risk
less rate for one period. This gives u = 1.087, d = 0.92. These values imply for the risk
neutral probability q = 0.499. The table shows the pricing result.
The sum of the path weights over all end nodes is 1 and the sum of the payos over
all nodes, i.e. the last column, is CHF 3.41205. Discounting this value back to time zero
gives 3.1383. Using the ratio 1 : 10 gives the theoretical price of 0.31 CHF compared to
the actual bid-ask prices of 0.33 − 0.34.
We relate the discrete and continuous time variables. Consider a continuous time
model for the risky asset where the mean and variance of the asset ratio St+dt /St are
given by
2 dt
E(St+dt /St ) = erdt , var(St+dt /St ) = e2rdt (eσ − 1)
with σ the volatility of the continuous time price process of the risky asset. Using a
Taylor approximation we get
dS/dt
E =r,
S
3.4. APPLICATION 215
Node S0 uk−10 dk max(ST − K, 0) No. of Paths q k−10 (1 − q)k S.P. Sum Payo
11 74.541 43.541 1 0.0004 0.0004 0.020
10 63.114 32.114 11 0.0004 0.0052 0.168
9 53.440 22.440 55 0.0004 0.0264 0.593
8 45.248 14.248 165 0.0004 0.0796 1.134
7 38.312 7.312 330 0.0004 0.1600 1.170
6 32.440 1.440 462 0.0004 0.2250 0.324
5 27.467 - 462 0.0004 0.2260 0.000
4 23.257 - 330 0.0004 0.1622 0.000
3 19.692 - 165 0.0004 0.0814 0.000
2 16.673 - 55 0.0004 0.0272 0.000
1 14.118 - 11 0.0004 0.0054 0.000
0 11.954 - 1 0.0005 0.0005 0.000
Sum 1 3.41205
Table 3.12: Valuation of the call option in the 11 period model with ABB as underlying
value. 'S.P.' means 'sum of path weights'.
i.e. the expected risky asset growth rate equals the risk free rate. In the one period
model the mean and variance of the same asset ratio are
and
var(St+1 /St ) = qu2 + (1 − q)d2 − (qu + (1 − q)d)2 = q(1 − q)(u − d)2 .
Equating the moments in the models we get:
2 dt
qu + (1 − q)d = erdt , q(1 − q)(u − d)2 = e2rdt (eσ − 1) .
√ σ
u = 1/d = 1 + σ dt + dt + ... .
2
√
The rst terms agree with the power series expansion of u = eσ dt - this justies the
from scratch using no arbitrage. This would take us to far away and therefore, we state
the Black and Scholes model as a limit model of the CRR.
We have to consider a two-fold limit: Discrete time spacing and discrete states be-
come continuous. We x time T of the continuous model. We to make sure in the limit
procedure that (i) the value of one dollar in [0, T ] is the same in the CRR model and in
the Black and Scholes model and (ii) that the CRR-price process St converges towards
a continuous price process which is log-normally distributed. This is the assumed distri-
bution in Black and Scholes; assuming a dierent distribution dierent continuous time
models follow.
RT m
lim (1 + rm )m = lim 1+ = eRT = (1 + r)T .
m→∞ m→∞ m
Hence, Bm converges towards the same value as in the T -maturity continuous model.
rm − dm
p̂m := .
um − dm
The above parametrization implies u/d = 1. The denitions guarantee that log Xi is
normally distributed with a mean and variance that reduces to the same one as in the
Black and Scholes model. Formally:
That is, the random variable ln S̃ST0 converges for m → ∞ in probability to a normally
2
distributed random variable with mean − σ 2T and variance σ 2 T .
3.4. APPLICATION 217
We apply this result to price call and put options in Black and Scholes model.
This positive price cannot be due do interest rates only since investing CHF 90 for 6m
gives90 · er(T −t) = 90.90 < 1005 CHF. The reason for a price of 2 is due to the fact the
underlying S random variable has a potential to grow above the strike value in the next
6m. The returns of the underlying is normally distributed, i.e.
ST ∼ St LN (µ, σT ) = LN (ln(St ) + µ, σT ) .
Since we know the distribution, we can price the call using the no arbitrage principle.
The price is given
C(S, K, r, σ, t, T ) = E Q [max(ST − K, 0)] . (3.65)
How do we nd the risk neutral probability? No arbitrage implies that the discounted
price process S is a martingale with the risk free interest rate as numeraire. But this
means that the expected value S has to grow like the risk less asset - else the drifts are
not the same. But if the drift of S and the drift r of the riss less asset are not the same,
their ratio - S /risk less asset - cannot be driftless. Summarizing, we must have at T
But the expectation of log normal distributed random variable with mean µ and volatility
σ is given by
E[ST ] = St exp(µ + σT2 /2) . (3.67)
The volatility σT from t to maturity is determined from the annual maturity by the
square-root rule:
√
σT = σ T − t
with σ the annualized volatility. Summarizing
√
1 2
ln(ST /St ) ∼ N r(T − t) − σ (T − t), σ T − t (3.69)
2
or
√
1
ST /St ∼ LN r(T − t) − σ 2 (T − t), σ T − t
2
All these expressions enter d1 and d2 in the Black and Scholes formula. How can we
calculate the probability that we exercise the call? The option is exercised if ST > K .
This reads for the continuous return rS = ln(ST /St )
A calculation shows
The probability to exercise the option is equal to Φ(d2 ) which is the Delta.
of 0.5 with 70 shares of the stock per option contract and he is short 200 Nestle stocks.
The position Delta :
−200 + 0.5 × 10 × 70 = +150 .
Gamma Γ states how much the Delta of an option changes when the price of the stock
moves. Theta Θ, or time decay, is an estimate of how much the theoretical value of an
option decreases when 1 day passes. Thetas for same-parameter calls and puts are not
equal. The dierence depends on the cost-of-carry for the underlying stock. When the
dividend yield is less than the interest rate, positive cost-of-carry, Theta for the call is
higher than for the put. The dierence between the extrinsic value of the option with
more days to expiration and the option with fewer days to expiration is due to Theta.
Therefore, long options have negative Theta and short options have positive Theta.
We consider Delta and Gamma hedging in for the portfolio V:
• Short 10 000 calls, Time-to-Maturity (TtM) 90 days, strike 60, volatility 30%, risk
less rate 8%. The currency is irrelevant.
• The fair option price using Black and Scholes is 4.14452 with Delta 0.581957. We
therefore receive a premium of 4144.52 by selling the options.
• To hedge the position we buy 581.96 stocks at the price 60. That for we borrow
(cash)
581.96 × 60 − 4144.52 = 340 917.39 − 4144.52 = 300 772.88 .
The portfolio value today is zero. We consider the portfolio value after1 day, i.e. TtM
is 89 days. In the Scenario 'unchanged' the underlying value remains at 60. Using Black
and Scholes, the option is worth 4.11833, i.e. Theta acts. This lower option liability
value is partly o-set by the increased cash liability:
Value
unchanged up down
Underlying 34'917.39 35'499.35 34'335.44
Cash -30'779.62 -30'779.62 -30'779.62
Option -4'118.33 -4'721.50 -3'559.08
Sum 19.44 -1.77 -3.26
Table 3.13: Value of the portfolio V after 1 day for dierent scenarios.
This shows that the Delta hedge is eective for small changes in the underlying value.
Can we additionally hedge the Gamma? Since one option is used for the Delta hedge,
we need a second option to achieve also Gamma neutrality. The data of this option are:
220 CHAPTER 3. FUNDAMENTALS THEORY
• All other parameters are the same as for the rst option, see Table 3.14.
Delta and Gamma neutrality means to choose a number of stocks z of option z such
that:
∆V = x − 10 000∆Opt1 + z∆Opt2 = 0
ΓV = −10 000ΓOpt1 + zΓOpt2 = 0 .
x = 300.58 , z = 900.76
To be Delta and Gamma neutral we are long in the underlying, long in option 2 and short
cash. Table 3.15 compares the hedge eectiveness between Delta and Delta & Gamma
hedging.
Vega is an estimate of how much the theoretical value of an option changes when
volatility changes by 1 percent. Option prices and volatility are in a 1:1 relation in the
Black and Scholes model. You can quote option in CHF or in volatility points. Vega is
highest for ATM options. Rho ρ is an estimate of how much the theoretical value of an
option changes when interest rates move 1.00 percent.
3.4. APPLICATION 221
The sensitivities are linked by the Black and Scholes pricing equation:
σ 2 S 2 Γ + rS∆ − rC = −Θ . (3.70)
Inserting the derivatives of the call w.r.t. to the sensitivities shows that this is a partial
dierential equation for the unknown call price. This equation follows by the assumed
market structure and the assumption of no arbitrage - no further economic assumptions
are needed. Adding the specic option contract as a terminal condition, the solution C
of the equation is the Black and Scholes formula for the option under consideration - a
second method to price options beside calculating expected values under a RNP.
We nally consider the creation of an option trading book in a liquid market. Con-
sider the liquid stock Lafargeholcim (LH). We start with a short position of 1000 calls
on LH with price 7.232 CHF (Step 1). The option price is theoretically calculated. If LH
∂C
stock moves, up to rst order
∂S =: ∆, a loss of CHF −587 on the derivative position
follows, see Table 3.16.
Step 2: To reduce Delta risk, we buy 620 LH stocks at the price 80. To generate
P&L dierent possibilities exist. First (Step 3) one sells the options slightly at a higher
price than their values are. This gives a P&L of CHF 268. Second, price movements
as described above lead to P&L (step four where LH gains 1). Step 5 describes how
volatility movements generate P&L. We assume that the portfolio V is Delta neutral.
Volatility is 20%. If volatility increases by 1 volatility point, the bank loses 304 CHF.
If the trader hedges the Vega exposure he needs to trade in dierent options. Step 6
shows that if he trades in a second option, the Vega of the position is reduced but Delta
increases moves from zero. Hence, both Greeks can be controlled.
calculated reference volatility curve, e.g. the Eurex curve: Volatilities of the warrants
are larger than the corresponding reference values and vice versa for the
222 CHAPTER 3. FUNDAMENTALS THEORY
These products are not related in any sense to structured nance products such as
MBS, CDOs. The latter one are based on pooling and slicing risk of illiquid assets.
Consider markets which are disrupted unpredictably by certain events and investors
want to choose an investment in response of the event. Investments should hence be fast
deployed and not capturing any diversication needs but being bets due to the market
disruption and hence the belief, that markets will drift back to normal levels.
There are dierent causes for these events - macroeconomic, policy interventions,
break down of investment strategies, or rm-specic events (for example, Lehman Broth-
ers). While some events are isolated and aect only single corporates, events at the
political or market level often lead to broader investment opportunities. Policy inter-
ventions can trigger market reactions that in turn can lead to new policy interventions.
The Swiss National Bank's announcement, in January 2015, that it would remove the
EURCHF cap and introduce negative interest rates had an eect on Swiss stock markets,
EURCHF rates, and xed-income markets.
Such events can impact nancial markets for a short period of time (a ash crash),
a medium time period (the GFC), or a long time (the Japanese real-estate shock of the
1990s). Making a bet when markets are under stress is simpler than in normal times
SP are in some sense an opposite investment vehicle to funds since most of them do not
rely on the discretionary power of an asset manager but the nal payo is promised ex
ante to the investor. The issuer has to generate with the initial investment amount the
nal payo in any market circumstances: Trading, structuring, pricing and hedging are
key disciplines for SP.
The replication of the payo of an SP with cash products and vanilla options is cen-
tral to the pricing and hedging of the SP. The price of the SP is equal to the sum of the
prices of the building blocks. The no arbitrage paradigm applies. The hedge corresponds
to the position of the dealer of the bank, which must generate the promised payo of the
SP. Theoretical equivalent replications can be dierent in practice if components have
dierent liquidity or if taxation diers. The buyer of a SP faces only claims but no obli-
gations unlike in a swap contract for example. The only counter-party for the investor
is the issuer whose creditworthiness enters in the pricing of the SP.
• (a) the customer is exposed to a range of outcomes in respect of the return of initial
capital invested;
• (b) the return of initial capital invested at the end of the investment period is linked
by a pre-set formula to the performance of an index, a combination of indices, a
'basket' of selected stocks (typically from an index or indices), or other factor or
combination of factors;
3.4. APPLICATION 225
Table 3.18: Mutual funds vs. structured products. COSI are structured products with
a minimal issuer risk thanks to collateralization vis SIX exchange. Triparty Collateral
Management (TCM) serves the same purpose.
• (c) if the performance in (b) is within specied limits, repayment of initial capital
invested occurs but if not, the customer could lose some or all of the initial capital
invested.' Source: FSA Handbook.
SP should not be confused with structured nance products such as MBS, CDO or
CLN. The latter one arise as products by pooling illiquid assets whereas SP are dened
for liquid assets.
• Suppose that the annual interest rates are 2%. The PV of the guaranteed CHF
100 in 5 is
90 = (1 − 5 × 0.02) × 100 CHF
226 CHAPTER 3. FUNDAMENTALS THEORY
using linear compounding. If the issuer invests today CHF 90 in a zero-bond, then
the capital guarantee promise in 5 years can be satised - if the issuer does not
defaults.
Therefore, the investment product SP Vt consists of a zero bond with price p(t, T ) at
time t and maturity T and a participation product whose price depends on the price of
the underlying asset St . In the simplest variant, the value of the product VT at maturity
T is determined as the product of the face value and the participation in the underlying
asset's price return:
ST − S0
VT = N × 1 + max(0, b )
S0
with N the face value and b the participation rate. Rewriting, we get
bN
VT = N + max(0, ST − S0 ) . (3.71)
S0
We note that a '+'- sign in a payo value is a long position and a '-'-sign a short position.
The payo formula (3.71) is written from a buyer's perspective.
Equation (3.71) shows that the payo of the CP at maturity equals an investment in
a zero bond and a long position in a European call option C(S, K, T ) with strike K = S0 .
The number of options is equal to the face value divided by the initial price. (3.71) is a
replication of the payo at maturity xed in the contract. No arbitrage implies for the
fair value of the contract V0
bN
V0 = p(0, T ) + C(0, S, K) (3.72)
S0
with p(0, T ) the zero bond and C(S, K, 0) the arbitrage free option price, i.e. S(0, S, K) =
E Q [D(0, T ) max(ST − S0 , 0)].
Consider an investor with dierent preferences:
• He beliefs that UBS stock is likely to raise over the next year.
• He believes that the stock will not raise strongly. He also prefers a partial capital
protection if UBS stocks falls. He is in turn willing to give up the upside potential
of the stock.
hits the barrier, the capital protection is knocked out and the payback at maturity is
the UBS stock value at this date plus the 10 percent coupon. A BRC delivers a higher
coupon than the stock dividend plus a contingent capital protection. Contrary to CP,
the investor faces market risk of UBS breaching the barrier. The investor gives up the
stocks upside: The coupon is the maximum return possible which is higher than the
UBS dividend yield. Consider the replication of the BRC. The BRC payo at maturity
is replicated with two products:
• a short down & in put ( DIP). I.e. the investor sells a DIP - a barrier option on
UBS. This money is used to generate the coupon value.
Contrary to the problem to classify innovations in general, for RSP successful classi-
cation schemes exist. One of them is the Swiss Derivative Map from the Swiss Derivative
Association. With minor adaption this map is also used in the European Structured
Product Association. The Swiss map denes main categories:
• Capital protection.
• Participation, i.e. product which globally have linear payo prole. 'Globally'
means that for some bounded region in the underlying value the payo can be
non-linear.
• Reference Entity Products. In addition to the credit risk of the issuer, redemption
is subject to the solvency (non-occurrence of a credit event) of the reference entity.
Profit
Underlying
Cap
Discount Certificate
Loss
products: the price of the DC is by no arbitrage equal to the replication payo 's price.
Since the payo is non-linear, options are needed for replication and a model such as
Black and Scholes is used to price the options. The replication portfolio is long a LEPO
(Low Exercise Price Option) and short a call with strike K = 250. LEPOs are European
type call option with very low strike of K = 0.01 CHF. The current value of a LEPO is
equal to the current price of the underlying share compounded by the risk-free interest
rate, less the accumulated value of dividends and the strike price. Since K is close to
zero, the price sensitivity of th LEPO which implies that the price of the LEPO is well
approximated by the price of the underlying minus the PV of the dividends. Hence, the
DC payo is graphically equivalent to a straight line(LEPO) plus a short call payo.
3.4.11 Political Events: Swiss National Bank (SNB) and ECB and SP
Investment
The SNB announced, on 15 January 2015, the removal of the euro cap and the intro-
duction of negative CHF short-term interest rates. This decision caused the SMI to lose
about 15 percent of its value within 1 - 2 days, and the FX rate EUR/CHF dropped
from 1.2 to near parity. Similar changes occurred for USD/CHF. Swiss stocks from
export-oriented companies or companies with a high cost base in Swiss francs were most
aected. The drop in stock prices led to a sudden and large increase in Swiss stock
market volatility. Swiss interest rates became negative for maturities of up to thirteen
years.
3.4. APPLICATION 229
It was also known at the time that the ECB would make public its stance on quan-
titative easing (QE) one week later. The market participants' consensus was that Mario
Draghi - president of the ECB - would announce a QE program. The events in Switzer-
land, which came as a surprise, and the ECB QE measures subsequently announced
paved the way for the following investment opportunities:
1. A Swiss investor could invest in high quality or high dividend paying EUR shares at
a discount of 15 percent. EUR shares were expected to rise due to the forthcoming
ECB announcement.
2. All Swiss stocks, independent of their market capitalization, faced heavy losses
independently of their exposure to the Swiss franc.
3. The increase in volatility made BRCs with very low barriers feasible.
4. The strengthening of the Swiss franc versus the US dollar, and the negative CHF
interest rates, led to a USD/CHF FX swap opportunity that only qualied investors
could benet from.
5. The negative interest rates in CHF and rates of almost zero in the eurozone made
investments in newly issued bonds very unattractive. Conversely, the low credit risk
of corporates brought about by the ECB's decision oered opportunities to invest
in the credit risk premia of large European corporates via structured products.
This investment had two main risk sources. If it was denominated in euros, the EU-
R/CHF risk held and one faced the market risk of the large European companies whose
shares comprised the basket. Most investors classied the FX risk as acceptable since
a signicant further strengthening of the Swiss franc against the euro would meet with
counter measures from the SNB. More specically, a tracker on a basket of fourteen Eu-
ropean stocks was issued. The issuance price was xed at EUR 98.75. As of 1 April
2015 the product was trading at EUR 111.10 (mid-price) - equivalent to a performance
of 12.51 percent pro rata. Similar products were launched by all the large issuers.
Other issuers launched a tracker on Swiss stocks, putting all large Swiss stocks in a
basket that had only a little exposure to the Swiss franc, but which also faced a heavy
price correction after the SNB announcement in January. Again, the input of each issu-
ing bank's research unit in identifying these rms was key. The underlying investment
idea for this product can be seen as a typical application of behavioral nance: an over-
reaction of market participants to events is expected to vanish over time.
The risk in this investment was twofold. First, one did not know whether the SNB
would consider further measures, such as lowering interest rates further, which would
have led to a second drop in the value of Swiss equity shares. Second, international
investors with euros or US dollars as their reference currency could realize prots since
the drop in Swiss share values - around 15 percent - was more than oset by the gain
from the currency, which lost around 20 percent in 'value'; roughly, an institutional
investor could earn 5 percent by selling Swiss stocks. Since large investors exploit such
opportunities rapidly, it became clear three days after the SNB's decision was announced
that the avalanche of selling orders from international investors was over.
Investors and private bankers searched for cash alternatives with a 100 percent capital
guarantee. The negative CHF interest rates made this impossible: if 1 Swiss franc today
is worth less than 1 Swiss franc will be worth tomorrow, one has to invest more than 100
percent today to get a 100 percent capital guarantee in the future.
Low-barrier BRCs - say, with a barrier at 39 percent - could be issued with a coupon
of 1 to 2 percent depending on the issuer's credit worthiness and risk appetite for a ma-
turity of one to two years. S&P500, Eurostoxx 50, SMI, NIKKEI 225, and other broadly
diversied stock indices were used in combination as underlying values for the BRCs.
The low xed coupon of 1˘2 percent takes into account that the product is considered
as a cash alternative with a zero percent, or even a negative, return. See last section for
more details about BRC.
3.4. APPLICATION 231
The reason for using quanto AUD is the higher AUD interest rates compared to JPY
232 CHAPTER 3. FUNDAMENTALS THEORY
interest rates. Higher interest rates lead to higher participation and the participation
in the quanto product was 130 percent. The risk of the investment lay in whether
Abenomics would work as expected; and possibly FX AUD/CHF. The economic program
in Japan worked out well and the redemption rate lay at 198 percent after two years.
This redemption contains a loss of 16.35 percent due to the weakness of the Australian
dollar against the Swiss franc.
To invest in a negative basis product, the issuer of a structured product locks in the
negative basis for an investor by forming a portfolio of bonds and credit derivatives of
those rms with a negative basis. For each day on which the negative basis exists a cash
ow follows, which denes the participation of the investor. When the negative basis
vanishes, the product is terminated.
3.4. APPLICATION 233
Corporate Credit basis in May 2003 (bps) Credit basis in November 2008 (bps)
Merrill Lynch 47 -217
General Motors -32 -504
IBM 22 -64
J.P. Morgan Chase 22 -150
Table 3.19: Credit basis for a sample of corporates in 2003 and their negative basis in
the most recent GFC.
Example
Investing in the negative credit basis of General Motors (see Table 3.19) leads to a
return, on an annual basis, of 5.04 percent if the basis remains constant for one year.
If the product has a leverage of 3, the gross return is 15.12 percent. To obtain the net
return, one has to deduct the nancing costs of the leverage.
Structured products with this idea in mind were oered in spring 2009 to qualied
investors. The products oered an annual xed coupon of around 12 percent and partic-
ipation in the negative basis. The high coupons were possible as some issuers leveraged
investors' capital. This could only be oered by those few issuers in the most recent GFC
that were cash rich; typically AAA-rated banks. The products paid one coupon and were
then terminated after 14 months since the negative basis approached its normal value.
The product value led to a performance of around 70 percent for a 14-month investment
period. Was this formidable performance realized ex ante a free lunch - that is to say,
a risk-less investment? No. If the nancial system had fallen apart, investors would
have lost all the invested capital. But the investors basically only needed to answer the
following question: Will the nancial system and real economy return to normality? If
yes, the investment was reduced to the AAA issuer risk of the structured product.
Many lessons can be drawn from these products. A very turbulent time for markets
can oer extraordinary investment opportunities. The valuation of these opportunities
by investors must follow dierent patterns than in times of normal markets: There is
for example no history and no extensive back-testing, and hence an impossibility of
calculating any risk and return gures. But there is a lot of uncertainty. Making an
investment decision when uncertainty is the main market characteristic is an entirely
dierent proposition to doing so when markets are normal and the usual risk machinery
can be used to support decision-making with a range of forward-looking risk and return
gures. If uncertainty matters, investors who are cold-blooded, courageous, or gamblers,
and analytically strong, will invest, while others will prefer to keep their money in a safe
haven.
234 CHAPTER 3. FUNDAMENTALS THEORY
A credit linked note (CLN) is a structured product. Its payo prole corresponds to
a bond's payo in many respects. A CLN pays - similarly to a bond - a regular coupon.
The size of the coupon and the amount of the nominal value repaid at maturity both
depend on the credit worthiness of a third party, the so-called reference entity (the issuer
of the comparable bond). This is also similar to the situation for bonds. But the size
of the CLN coupon derives from credit derivative markets. Hence, if the credit basis is
positive, a larger CLN coupon follows, as compared to the bond coupon of the same ref-
erence entity. CLNs are typically more liquid than their corresponding bonds since credit
derivative markets are liquid while many bonds, even from large corporates, often suer
from illiquidity. CLNs are exible in their design of interest payments, maturities, and
currencies. CLNs also possess, compared to bonds, tax advantages; in fact, the return
after tax for bonds that were bought at a price above 100 percent is in this negative in-
terest rate environment often negative. The investor in a CLN faces two sources of credit
risk: the reference entity risk as for bonds, and the issuer risk of the structured product.
As an example, Glencore issued a new 1.25 percent bond with a coupon in Swiss francs.
Due to the positive basis, the coupon of the CLN was 1.70 percent. Another product
with, as the reference entity, Arcelor Mittal in EUR implied a higher CLN eective yield
compared to the bond of 1.02 percent in EUR.
Let us consider a more detailed example. Consider the reference entity Citigroup
Inc. The bond in CHF matures in April 2021 and its price is 102.5 with a coupon of
2.75 percent. The bond spread is 57 bps, which leads to a yield to maturity of −0.18
percent - an investor should sell the bond. The CLN has a spread of 75 bps, which
proves the positive basis and an issuance price of 100. The coupon of the CLN is - then
−0.71 percent, which leads to a yield to maturity of 0.57 percent if funding is subtracted.
Therefore, selling the bond and buying the CLN generates an additional return of 75 bps.
3.5 Collateral
3.5.1 Prime Finance
Prime Finance is an important trading activity which is frequently used by asset man-
agement rms. Prime Finance has dierent aspects:
• Lending and borrowing of securities, the Securities and Lending Business (SLB).
The general motivation for repos is the borrowing or lending of cash. In securities lend-
ing, the purpose is to temporarily obtain the security for other purposes, such as covering
short positions or for use in complex nancial structures. Securities are generally lent
out for a fee. Securities lending trades are governed by dierent types of legal agreements
than repos.
Prime nance business changed heavily after the GFC and is still transforming. Sev-
eral rationales motivate prime nance activities and its transformation. A rst rationale
is collateralized banking. Repo business can be considered as secured banking were
collateral serves as a creditor protector for non-retail investors. Creditors are bank, in-
surance companies, governments, rms, AM or pension funds. Markets which are widely
collateralized are for example xed income repo, equity nance, exchange traded se-
curities, OTC derivatives, securities lending, banks loans, asset backed securities. An
important property of collateral is its eligibility, i.e. the extend how collateral can be
converted into an economic value if the counter party defaults. Liquidity, quality in terms
of embedded credit risk and the possibility to settle the collateral dene the collateral eli-
gibility. Cash is the most used collateral followed by government bonds, large-cap shares.
For traders, repos are used to nance long positions, obtain access to cheaper funding
costs of other speculative investments, and cover short positions in securities. A second
rationale is cost reduction in the custody of securities where lending and borrowing secu-
rities generates earnings which lower these costs. Third, to cover short positions one has
to borrow securities. Short positions can be the results of market making, the hedging of
derivative positions or part of an investment strategy. Finally, regulatory requirements
lead to lower risk weighted assets in the regulatory capital charge if one switches from
unsecured to secured transactions.
• At 1: Redemption of the loan and interest rate payments to the buyer and reas-
signment of the security from the buyer to the seller.
The purchase price in 0 equals the market value (dirty price) of the underlying security
minus an add on (Haircut). The haircut provides a restricted protection against falling
security prices. The payback price equals the purchase price plus an agreed interest pay-
ment (repo rate), which depends upon the quality of the security. If the security losses
value, a margin call follows. Using a repo the Buyer obtains favourable rates compared
to an unsecured loan and the Seller receives collateral.
Almost any security may be employed in a repo. But highly liquid securities are
preferred because they can be easily secured in the open market where the buyer has
created a short position in the repo security through a reverse repo and market sale.
Treasury, Government bills, corporate and Treasury/Government bonds, and stocks may
all be used as a collateral in a repo transaction. Coupons which are paid while the repo
buyer owns the securities are passed to the repo seller although the ownership of the
collateral rests with the buyer during the repo agreement. There are three types of repo
maturities: overnight, term (i.e. with a specied date), and open repo.
The most important forms of a repo transactions are specied delivery and tri-party.
The rst form requires the delivery of a prespecied bond at the onset, and at maturity
of the contractual period. Tri-party essentially is a basket form of transaction, and allows
for a wider range of instruments in the basket or pool. The tri-party agent, acts as an
intermediary between the two parties to the repo. The tri-party agent is responsible for
the administration of the transaction, marking to market, and substitution of collateral.
The largest one being Clearstream and JP Morgan Chase.
A reverse repo is the same repurchase agreement from the buyer's viewpoint, not the
seller's. The term reverse repo is used to describe a short position in a debt instrument
where the buyer in the repo transaction immediately sells the security provided by the
seller on the open market.
Example:
While investors trade bonds on a stand alone basis, trading desks use repo jointly with
bond trading. Buying a bond is completed immediately by selling the bond in a repo,
i.e. one nances the bond. We consider an US Treasury Bond with the following dates:
At T the trader buys the bond for the price B(T ) from a counter party A. At T +1 the
repo transaction starts to nance the bond. To achieve this
3.5. COLLATERAL 237
• the repo desk delivers the bond for 1 day, i.e. the period of the repo transaction is
overnight from T1 to T2 for a price B(T1Repo ), to the repo counter party and
• the repo desk agrees to buy the bond back at T2 for the price
Using the data notional 100 Mio. USD, coupon 4 percent, T is Oct 2 for trading the
bond, settlement Oct 3 from the clean Price of the bond 100'078'125 USD (= 100 − 02+
in US Treasury notation) with accrued interest the settlement price 100'110'911 USD
3
follows where the accrued interest rate is
183 × 0.04/2: The bond accrues interest since
Sept 30 and a half a year has 183 days. The repo rate r equals 3.4 percent, the cash rate
is 3.5 percent. Since the bond settles Oct 3, the repo desk nances the bond. The bond
price changes from Oct 2 to Oct 3 by (100-05). Therefore, the value of the position in
dirty prices increased to
3
1000 1890 036 = (1 + 5/32 + × 0.04/2) × 100Mio. USD .
183
At Oct 3 the following payments/transactions are made:
• Bonds are received with value USD 1000 1100 911 and exchanged for a secured loan
0 0
of USD 100 189 036 with the repo counter party.
• The repo counter party hands back the lent bond and obtains the repo rate interest:
• The bond is sold from the repo desk to the buyer. The price equals the clean price
of Oct 3 with Oct 4 settlement plus accrued interest. If the bond increased to
100-08, we have
4
1000 2930 715 = (1 + 8/32 + × 0.04/2) × 100 Mio. USD .
183
238 CHAPTER 3. FUNDAMENTALS THEORY
• Change in bond price: 1000 2930 715 − 1000 1100 911 = +1820 803 USD.
Contrary to the SLB business, repo is always of the type cash against security. Both
transaction types face the same market risk but settlement risk can be dierent.
Eurex, one of the world wide largest exchange for futures and option trading, also
oers platforms for bond trading and for repo (Eurex Repo). The platform is open to all
nancial institutions. The Eurex Repo platform is a TriParty platform with integrated
trading and settlement functionalities. This means that a a third party to the Buyer and
Seller is responsible for administration and operations. The largest providers TriParty
Repo programs are Clearstream and JP Morgan Chase. The Eurex platform integrates
trading, settlement and legal documentation. Participants at Eurex Repo can choose
from a broad menu of repo transactions. An advantage of the Eurex Repo platform is
that the securities which are received as collateral can be used immediately for a new
repo transaction. This allows banks to raise cash if they need to do so. The Eurex market
consists of four links for the participants in CHF repos:
As an example, consider a bond trader (Seller) which wishes to borrow CHF 20 Mio.
to nance for one week an investment of CHF 18 Mio. Swiss Government Bonds with
3 percent coupon. A repo buyer oers a repo rate of 2 percent. The seller accepts the
rate. He delivers CHF 18 Mio. nominal against CHF 20 Mio. cash. At the same day he
pays the buyer CHF 20 Mio. in exchange of the CHF 18 Mio. bonds. After one week
the buyer gives back the bond to the seller. The seller pays back the loan plus accrued
interest:
0.02 × 7
200 0000 000 × = 70 777.8 CHF .
360
Malkiel (2003): Revolutions often spawn counter-revolutions and the ecient market
hypothesis [EMH] in nance is no exception. The intellectual dominance of the ecient-
market revolution has more been challenged by economists who stress psychological and
behavioural elements of stock-price determination and by econometricians who argue that
stock returns are, to a considerable extent, predictable.
Lo (2007): The ecient market[s] hypothesis (EMH) maintains that market prices fully
reect all available information. [...] It is disarmingly simple to state, has far-reaching
consequences ..., and yet is surprisingly resilient to empirical proof or refutation. Even
after several decades of research and literally thousands of published studies, economists
have not yet reached a consensus about whether markets - particularly nancial markets
- are, in fact, ecient.
Asness and Liew (2015):The concept of market eciency has been confused with every-
thing from the reason that you should hold stocks for the long run to predictions that stock
returns should be normally distributed to even simply a belief in free enterprise.
Shiller (2014): [If markets are ecient] there is never a good time or bad time to
enter the market [...]
Denition 46. A nancial market is ecient when market prices reect all available
information about value.
All available information includes past prices, public information, and private infor-
mation. These dierent information sets Ft lead to dierent EMHs (see below). The
statement 'reecting all available information' is not dened. If a company announces to
expect twice as much earnings, do stock prices double, triple, or fall? Reect all available
information means in the sense of Jensen that trading based on the information set does
not lead to an economic prot. An asset pricing model is needed to make precise what
reecting all information means in the EMH. Eciency testing means to test whether the
properties of expected returns implied by the model of market equilibrium are observed
in actual returns. This is referred to as the joint hypothesis problem (Fame [1970]):
• Pillar 1: Do prices reect all available information - that is, are market ecient?
Prices can only change if new information arrives. The information content.
• Pillar 2: Developing and testing asset pricing models. The price formation mecha-
nism (Asset Pricing Model).
See Section ?? for a discussion of information sets and their evolution over time.
19 This section is based on Fama (1965, 1970, 1991), Cochrane (2011, 2013), Malkiel (2003), Asness
(2014), Lo (2007), Nieuwerburgh and Koijen (2007), and Shiller (2014).
240 CHAPTER 3. FUNDAMENTALS THEORY
The standard asset pricing equilibrium model of the 1960s assumed that the equilibrium
expected returns are constant: E(Rt+1 |FM,t ) = constant. If the EMH (3.74) holds, then
follows. To test the EMH, the regression of the future Rt+1 returns on the known infor-
mation Ft should have a zero slope. If this is not the case, the market equilibrium model
could be wrong or the denition of FM,t overlook information in price setting, FM,t and
Ft are not equal, or both channels could be awed.
Remarks
• The EMH does not hold if there are market frictions (trading costs, cost of obtaining
information). In the US, reliable information about rms can be obtained relatively
cheaply and trading securities is cheap too. For these reasons, US security markets
are thought to be relatively ecient.
• Grossman and Stiglitz (1980) show that perfect market eciency is internally in-
consistent.
• The EMH does not assumes rationality of investors. But to operationalize the
EMH one often assumes rationality. Fama proposes the following form:
• The EMH is applicable to all asset classes. If the EMH holds true, then prices react
quickly to the disclosure of information.
Why is the EMH important for AM? Fama's work on market eciency (1965, 1970)
triggered passive investing with the rst index launched 1971. In ecient markets buying
and selling securities is a game of chance rather than one of skill. Active management
is a zero-sum game. If the EMH holds, the variation of the performance of the active
managers around the average is driven by luck alone. Many studies found little or no
3.6. THE EFFICIENT MARKET HYPOTHESIS (EMH) 241
Figure 3.15: Performance ranking of the top 20 equity funds in the US in the 1970s and
in the following decade. The average annual rate of return was 19 percent compared
to 10.4 percent for all funds. In the following decade, the former 20 top funds had an
average rate of return of 11.1 percent compared to 11.7 percent for all funds (Malkiel
[2003]).
correlation between strong performers in one period and those in the next one, see Figure
3.15.
Suppose that one is able to pick in advance those managers who outperform others.
As per the EMH, investors would give them all their money; no-one would select those
managers doomed to underperform. But who will be on the other side of the outper-
former's trades? This process would be self-defeating.
The same conclusion also holds for technical analysis, the study of past stock prices
to predict future prices, and fundamental analysis, the analysis of nancial company
information to select undervalued stocks. If the EMH holds, both approaches are useless
in predicting asset prices. The value of nancial analysts is not in predicting asset values
but to analyse incoming information fast such that the information is rapidly reected
in the asset prices. In this sense analysts support the EMH. Fama (1970) denes three
dierent forms of market eciency, this means dierent sets F. In the weak-form
EMH, F is all available price information at a given date. Hence, future returns cannot
be predicted from past returns or any other market-based indicator. This precludes
242 CHAPTER 3. FUNDAMENTALS THEORY
technical analysis from being protable. In the semi-strong EMH, F is all available
public information at a given date, i.e. nancial reports, economic forecasts, company
announcements, etc. matter. Technical and fundamental analyses are not protable in
this case. This the form of the EMH which is often subsumed in the literature. In the
strong-form EMH, F is all available public and private information at a given date.
This extreme form serves mainly as a limiting case.
Example
A well-known story tells of a nance professor and a student who come across a
hundred dollar bill lying on the ground. As the student stops to pick it up, the professor
says, 'Don't bother - if it were really a hundred dollar bill, it wouldn't be there.' This
story illustrates well what nancial economists usually mean when they say markets are
ecient. But suppose that the student assumes that nobody so far tested whether the
bill is indeed real but that all assumed that someone else checked the bill's validity. Then,
there were no eorts made to generate the information needed to value the bill. But if
nobody faced the costs of generating that information then Ft is the empty set. Then
the EMH cannot hold. This shows that a reasonable assumption about human behavior
can lead to a violation of the EMH.
Example
A rm announces a new drug that could cure a virulent form of cancer. Figure 3.16
shows three possible reactions of the price paths. The solid path is the EMH path:
prices jump to the new equilibrium value instantaneously and in an unbiased fashion.
The dotted line represents a path where market participants overreact and the dashed
one where they underreact. The dash-dotted line is a strong signal for insider trading,
front running, or any other form of illegal trading.
3.6.1 Predictability
If the EMH holds, returns follow a random walk:
If the sequence (t ) is IID with mean zero, variance σ 2 and zero covariance cov(t , t−1 ) =
0, then Rt is a random walk with drift m.
Figure 3.16: Possible price reactions as a function of the day relative to the announcement
of a new drug.
St −St−1
returns given by Rt = St−1 , the random walk equation implies
St+1 − St St − St−1
= + t
St St−1
and after some algebra
t s
Y S1 X
St = S0 ( + k ) .
S0
s=1 k=1
While in a random walk returns are a zero sum game, prices are by no means driftless.
since conditioning on the information set Ft has no value. When returns are not pre-
dictable, prices follow a martingale:
Denition 48. Assume21
E Q [St+1 |Ft ] = St , ∀t, (3.78)
20 Measurability is a well-dened mathematical notion and it is not equivalent to the above verbal
description.
21 The expected value of S exists.
244 CHAPTER 3. FUNDAMENTALS THEORY
Hence, the expected price of a non-predictable process is constant. The price process has
no drift, else the average value would not be constant. But the price itself can vary. If
returns are martingales, then the operational form of the EMH (3.75) holds true.
The return Rt+ in period t to t+1 of a stock is equal to the capital gain plus a
dividend yield D, i.e.
St+1 − St Dt+1
Rt+1 = + . (3.80)
St St
1
Rewriting this equation St = 1+Rt+1 (St+1 +Dt+1 ). Solving this linear dierence equation
implies for k periods
k j k
! !
X Y 1 Y 1
St = Dt+j + St+k . (3.81)
1 + Rt+m 1 + Rt+m
j=1 m=1 m=1
It is common to assume that asset price grow at a lower rate than the return - the second
term has tend to zero for k → ∞. We get:
Theorem 49. If the asset prices growth is lower than the asset returns, the price St is
equal to the discounted future dividends, i.e.
∞ j
!
X Y 1
St = Dt+j . (3.82)
1 + Rt+m
j=1 m=1
Since there is no randomness, the future dividends are known. We extend this formula
by adding risk and considering the EMH. Consider the operational form of the EMH
(3.75) and take conditional expectations in (3.80):
E[Dt+1 |Ft ]
St = .
E[Rt+1 |Ft ]
Hence, capital gains do not matter for asset pricing. If dividends are martingales and
returns are random walks, then the famous pricing formula follows where asset prices are
equal to the ratio of the constant expected dividend and expected return, see the next
example.
3.6. THE EFFICIENT MARKET HYPOTHESIS (EMH) 245
If expected dividends and returns are constant, the above valuation equation reads
D
St = = constant . (3.83)
R
But empirical evidence shows that expected returns and dividends are both not constant
over time. Therefore, (3.83) is too naive. It implies that the volatilities of the growth
rates are the same:
dRt dDt
volatility = volatility .
Rt Dt
But the return volatility is around 16% while the dividend volatility is only about 7%.
Therefore something else must be time varying. Furthermore, the return volatility is
time varying. Monthly market return volatility uctuated between values of 20% and
more in market stress periods (Great Depression, Great Financial, Crisis) and 2% in the
60s and mid-90s of last century, see next section.
Since the pricing formula has the same structure with and without risk, the formula
of last theorem carries over to the case with risk providing the ex-ante version:
Theorem 50. If the expected asset prices growth is lower than the expected asset returns,
the price St is equal to the discounted future dividends, i.e.
∞ j
!
X Y 1
St = E[Dt+j Ft ] . (3.84)
1 + E[Rt+m |Ft ]
m=1
j=1
We show that a little amount of skill makes a hugh dierence for wealth growth in a
gamble - the same observation is true if one considers skills in active asset management.
Consider an investor with initial capital W0 playing a the following dice game: She invests
in each period 1 unit of her capital. The outcome of the strategy in each period is of +1
with probability p or −1 with probability q = 1 − p. She does not change her strategy
over time. The outcome in each period is an IID sequence (Xk ) of random variables. Her
246 CHAPTER 3. FUNDAMENTALS THEORY
What is the probability that she attains a nal wealth level Wf > W0 ? To derive the
wealth dynamics equation the rst step is to dene disjoint sets of events which allow to
calculate probabilities. We set
n
X m
X
AW0 ,n = {W0 + Xk = Wf , 0 < W0 + Xk < Wf , m < n}
k=1 k=1
for the set where she reaches the desired wealth level for the rst time after n plays
without being bankrupt before. Since (Xk ) are IID, the sets (AW0 ,n )n are independent.
Therefore the probability p̃(W0 , Wf ) that the investor reaches the desired wealth level
Wf sometime is given by
∞ ∞
!
[ X
p̃(W0 , Wf ) = P AW0 ,n = P (AW0 ,n ) .
n=1 n=1
captures game logic. This is a rst order dierence equation. A solution is found by
inserting the guess
rW0 −1
(
W , if p 6= q ;
p̃(W0 , Wf ) = r f −1 (3.87)
W0
Wf , if p = q.
If the game is fair (a martingale), then the probability to reach a 50 percent higher
wealth level than the starting value of 100 units is 66%. If the investor's strategy has a
small skill component such that q = 0.49 and p = 0.51, then the probability to reach the
desired level is 98%.
with a, b constants, t+1 a sequence of IID normal random variables with mean 0 and
3.6. THE EFFICIENT MARKET HYPOTHESIS (EMH) 247
variance σ2. The variable xt can be the return itself or a market price variable such as
price-dividend ratios. The regression (3.88) becomes a random walk if b=0 or if a = 0,
b=1 and xt = Rt . For the latter choice, the random walk regression
implies
t+1
X
Rt+1 = R0 + j , Et (Rt+1 ) = R0 , σ 2 (Rt ) = tσ 2 .
j=1
This shows that R is martingale and that the variance increases over time. R0 = 0
is a reasonable assumption for short term returns and it implies that discounted price
processes are martingales too. Therefore, discounted prices are martingales if the returns
are martingales with expected zero return.
for US stocks and T bills using annual data, see Table 3.20.
Table 3.20: Regression of returns on lagged returns annual data 1927-2008. t(b) is the
t-statistic value and σ(Et (Rt+1 )) represents the standard deviation of the tted value
bRt (Cochrane [2013]).
The result shows that stocks are almost not predictable while T bill returns are.
A value of b = 0.04 for stock means that a if returns increase by 10% this year the
expectation is that they will increase by 0.4% next year. Also the R2 is tiny and the
t-statistic is below its standard threshold value of 2. For the T bill returns the story
is dierent - high interest rates last year imply that the rates this year will again be
high with a high probability. Can this foreseeability of T bills be exploited by a trader?
Suppose rst that stocks would be highly predictable. Then one could borrow today and
invest in the stock market. But this logic does not work for T bills since borrowing would
mean to pay the same high rate than one receives. To exploit T bill predictability the
investor has to change his behavior - save more and consume less today which is totally
dierent from the stock case. This is a main reason why one considers excess returns Re
- return on stocks minus return on bonds - in forecasting with Rb the benchmark return:
By analysing the excess return one separates the dierent motivations 'to consume less
and to save' from the willingness to bear risk. Table 3.20 shows that considering excess
return we are back for T bills in the almost non-predictable stock case. Lo and MacKin-
lay (1999) nd that short-run serial correlations are not zero and that the existence of
'too many' successive moves in the same direction enables them to reject the hypothesis
that stock prices behave as random walks. There is some momentum in short-run stock
prices. Even if the stock market is not a perfect random walk, its statistical and eco-
nomic signicance have to be distinguished. The statistical dependencies are very small
and dicult to transform into excess returns. Considering transactions costs for example
will annihilate the small advantage due to the momentum structure (see Lesmond et al.
[2001]).
We consider longer time horizons and use market prices or yields to forecast returns
following Cochrane (2005). Following the dividend/price (D/P) issue of last section,
we consider the return-forecasting regressions of Cochrane (2013) in Table 3.21. The
regression equation reads
e Dt
Rt→t+k =a+b + t+k (3.92)
St
with Re the excess return dened as CRSP
22 value-weighted return less the three-month
Treasury bill return. The return-forecasting coecient estimate b is large and it grows for
e
σ(Et (Rt+1 ))
Horizon b t(b) R2 e ))
σ(Et (Rt+1 e
E(Rt+1 )
1 year 3.8 (2.6) 0.09 5.46 0.76
5 years 20.6 (3.4) 0.28 29.3 0.62
longer time horizon. Hence, high dividend yields D/S (low prices) mean high subsequent
returns and vice versa. The R2 of 0.28 is large when we compare it with an R2 of predict-
ing stock returns on say a weakly basis which are seen to be not predictable. Therefore,
excess returns are predictable by D/P ratios. Fama and French (1988) document that 25
to 40 percent of the variation in long-holding-period returns can be predicted in terms of
a negative correlation with past returns. Behaviorists Behaviorists attribute this 'fore-
castability' to stock market price 'overreaction' which is due to investors facing periods
of optimism and pessimism which cause the deviations from the fundamental asset values
(DeBondt and Thaler (1995)).
The above tests are not stable. First, the point estimate of the return forecasting
coecients and its associated t-statistic vary signicantly if dierent sample periods are
considered. Second, the denition used for 'dividends' impacts the results.
e Dt
Et (Rt+1 )=a+b . (3.93)
St
Since dividend/price ratio varies over time between 1 and 7, return predictability is the
same as to say that expected returns vary over time. b = 3.8 and a variation
Using
of D/P by 6 percentage points, turns into a long-term variation of expected returns of
3.8 × 6 = 22.8 percentage points which is too high given that the long-term average
expected return is 7 percentage points
Dt+k
are
When we analyze the regression of dividend growth,
Dt replaces the return in
(3.92), Cochrane (2013) states:Returns, which should not be predictable, predictable
[see Table 3.21]. Dividend growth, which should be predictable, is predictable.not
This contradicts the traditional view that expected returns are constant and that if
prices fall then future dividends should also decline: Dividends have to be predictable
since they have to approach the low price levels. The above observation states that
on average we observe a dierent pattern. To deepen the discussion, we consider the
multi-period Fundamental Asset Pricing equation (3.84),
∞ j
!
X Y 1
St = Et Dt+k . (3.94)
Rt+k
j=1 k=1
Using log-variables (lower case symbols) change products into sums and we get for one-
period from (3.94):
∞
X
Et ρj−1 (∆dt+j − rt+j ) .
st − dt ∼ (3.96)
j=1
Rearranging, it follows that long-run return uncertainty comes from cash-ow uncer-
tainty (changes in dividends and D/P ratios). The more persistent r and ∆d are, the
stronger is their eect on the D/P ratio since more terms in the summation matter. If
dividend growth and returns are not predictable, their conditional expectations are con-
stant over time, then the D/P ratio is constant which is not observed. This extension
to many periods for the D/P ratio also holds for the variance equation (3.98) where the
discounted summation enters in the return and dividend growth variables. As in the one-
period model, the long-run return and long-run dividend growth regression coecients
must add to one. By regressing the long-term return and dividend growth Cochrane
250 CHAPTER 3. FUNDAMENTALS THEORY
(2013)states:
Return forecasts - time-varying discount rates - explain virtually all the variance of
market dividend yields, and dividend growth forecasts or bubbles - prices that keep rising
forever - explain essentially none of the variance of price.
This changes the traditional view on the EMH. Traditionally, expected returns were
assumed to be constant (asset pricing model) and stocks were martingales with zero drift
(random walks). In this reasoning, low D/P ratios happens when people expect declines
in dividend growth and variations in D/P are due to cash ow news entirely (dividend
predictability). The above result states that the opposite is true. The variance of D/P
is due to return news and not to cash ow ones.
Predictability is also related to the volatility of prices. Shiller states that if prices are
expected discounted dividends then prices should vary less than their expected variables.
But prices vary wildly more than they should even if we knew future dividends per-
fectly. This is the excess volatility of stock returns pointed out by Shiller.
We claim that return predictability and excess volatility have the same cause. To
obtain an equation for the variance we rst write regressions of returns and dividend
growth on dt − pt with br , bd the respective coecients. Plugging the regressions into
(3.95) we get:
1 = br − bd , 0 = t+1,r − t+1,d (3.97)
where the residuals enter the two regression. Therefore, the expected return can be
higher if the expected dividend is higher or the initial price is lower. The only way the
unexpected return can be higher is if the unexpected dividend is higher, since the initial
price cannot be unexpected. Since a regression coecient is covariance over variance,
1 = br − bd reads:
This shows that D/P ratios can only vary if they forecast dividend growth or forecast
returns in regressions. Since the dierence between the two coecients must be one
(3.97), if one coecient is small in the regression then the other one has to be large.
p
X q
X
Rt = ak Rt−k + bk t−k + t + c
k=1 k=1
where the IID error term are t ∼ N (0, σ 2 ). The variance of the error terms is then often
modelled using a Generalised Autoregressive Conditional Heteroskedasticity (GARCH)
model by Bollerslev (1986). The literature documents patterns of persistence which vary
3.6. THE EFFICIENT MARKET HYPOTHESIS (EMH) 251
for the asset classes and the markets under consideration. While such patterns are found
on daily, weekly, monthly or even on an annual basis for stocks and bonds, the time
periods are much shorter for FX markets which we consider below.
Goyal and Jegadeesh (2018) make the two strategies comparable by correcting the
net long/short position. Since more stocks earned positive returns than negative returns
during the sample period, time series' long positions are bigger than the short positions.
The average long and short positions are $ 1.24 and $ 0.76, respectively. Therefore,
the time-series-constructed portfolio earned returns for simply being net long during a
bullish period. The authors therefore add to the cross-sectional strategy a time-varying
investment in the market equal to the dollar value of the dierence between the long
and short sides of the time series strategy each month. Doing this exercise, for NYSE
quoted stocks, the adjusted cross-sectional strategies show an annual return of 9.4 per-
cent; similar to the 9.3 percent found when using time-series strategies. Therefore, the
literature claims that time-series return predictability methods dominate cross-sectional
ones is erroneous.
of investors in a form such that no prots are possible by trading on the available infor-
mation since any prot is already captured in present prices.
Given the three parts prices, probabilities and preferences of the EMH and the strin-
gent martingale property which combines them, two critical points are immediate: Be-
haviour and technology. It is from a behavioural perspective not convincing that these
highly behavioural sensitive parts of the EMH are related with a single mathematical
property where behavioural facets do not matter. Whether the investors are greedy or
whether they fearing market crashes does not matter how they perceive the odds in price
formation (probability) nor how they value the outcome of their decision. The adaptive
EMH of Lo adapts the original EMH and makes it context-dependent and dynamic. The
adaptive EMH becomes a statement dependent on the environment of the economy and
the markets and the behavior of market participants.
Denition 51 (Lo (2004)). Prices reect as much information as dictated by the combi-
nation of environmental conditions and the number and nature of species [types of agents]
in the economy.
The behavior of the agents is considered to follow evolutionary principles. The dy-
namics of how market participants interact and therefore how the price dynamics of the
assets follows is driven by evolutionary principles which are better suited to describe the
market dynamics than the equilibrium concept in the EMH.
Lo (2004) states the following implications of the adaptive EMH. The risk and reward
relation is not stable over time since the population of agents and how they interact are
time varying. Similarly, the institutional and regulatory set-ups are also not constant
over time. A second implication is that temporary arbitrage opportunities are possible
and therefore, the EMH-critique of Grossman and Stiglitz (1980) does not apply for the
adaptive EMH. The possibility of temporary arbitrage possibilities also shapes the per-
formance of active investment strategies which in the EMH are useless. As an example
one considers the rolling monthly rst-order autocorrelation coecient of the S&P Com-
posite Index returns from January 1871 to 2003. By the EMH, the coecient should
by zero. The empirical plot shows that rst the coecient is typically positive where
there are periods of clustering values of the coecient. Finally, innovation is the key to
survival. The EMH states that certain levels of expected returns can be achieved simply
by bearing a sucient degree of risk. Since in the adaptive form the risk/reward relation
is not constant it turns out that adaption to the changing market conditions is the main
source for stabilizing the risk/return reward.
valuable information from noise by processing permanently the extremely larger number
of signals from the markets, the news and other communication types.
The discussion so far considered non-specic individuals. The examples Warren Buf-
fet and Renaissance Medallion Fund show that particular skills and expertise allow in-
dividuals to generate excess returns as if the markets were inecient while for many
other investors the same markets are ecient. Buet and the Renaissance Medallion
Fund use their skills to predict future returns in very dierent forms. Buet's goal is to
understand specic rm in detail on an idiosyncratic level and then to embed view rm
specic investment view in a sector and macro context. The Medallion Fund is a quant
fund founded by the mathematician Jim Simons. The Fund which was set-up 1998 for
the employees of Renaissance generated in the period 1998 to 2016 an annual return of
80% with 1999 the only year with a loss of around 4 percent. The fund generated in this
period more than USD 55 billion in prot which is more protable by several billions of
USD than the next best funds. Even more notably the invested AuM have been smaller
than those of their competitors.
The fund has always been very secret about the methodology used. One knows that
Simons hired top scientists from the computer industry, notably from IBM, and Ph.D
in mathematics or physics from the top universities. The model which they constructed
is based one signal detection. This means that their powerful IT system is processing
all types of signals which are generated in the world. By a signal not only simple ones
such as realized price changes are considered but also signals from speeches and from
documents are detected. The success of their model is given by their power to separate
noise from valuable information and then to translate them into trades. The model itself
is not a single strategy encoded by a quantitative model but many dierent strategies
are integrated into one system.
with M the stochastic discount factor (SDF) and Mt,t = 1. Therefore, MS is a martin-
gale. There are dierent expressions for M. The exact nature of the SDF depends on
the nature of the asset pricing model. Specifying the asset pricing model species M and
its name - SDF in general equilibrium when intertemporal marginal rate of substitution
describe M or equivalent martingale measure in derivative pricing models.
254 CHAPTER 3. FUNDAMENTALS THEORY
Second, since St is known at time t, equation (3.99) reads in terms of gross return - payo
divided by price -
g
Et (Mt,t+1 Rt+1 ) = 1. (3.101)
i.e. the return and SDF are orthogonal. Note that also the excess return Re = R − Rf ,
with Rf the risk free return, is orthogonal to the SDF. Using
i.e. the expected asset return is expressed with its covariance with the SDF. Using the
correlation notation,
e )
0 = Et (Mt+1 Rt+1 implies for asset j:
Hence, asset prices are equal to a expected discounted cash ow plus a risk premium.
Idiosyncratic risk is by denition the part that is not correlated with the SDF and hence
does not generate any premium. What can be said about the sign of the covariance in
(3.105)? Since the SDF is an indicator of bad times but assets pay o well in good times,
the covariance between them is typically negative:
This generates a risk premium and allows risky assets to pay more than the interest rate.
Setting X equal to the stock price S and writing S̃t = St /Mt , (3.106) becomes
Investors expect positive gross asset returns. The asset price dynamics is not a martin-
gale under the empirical probability.If asset price dynamics would be a fair coin toss then
returns would not be predictable.Contrarily, to generate risk premia, asset prices have to
be predictable in the statistical sense.
Example
This orthogonality equation states that the forward rate is given by the orthogonal pro-
jection:
ft,T = Et (ST ) + covt (MT , ST )Rf .
The forward price is therefore equal to the expected future spot price at time T plus a
risk premium.
Mt = bRt + t (3.108)
with R any portfolio of N assets and b a vector of weights. Here geometry enters into
play. We introduce to the geometry in the next section. We estimate the optimal value of
b. Think about M and R to vectors. Then the optimal value of b is given by the shortest
distance of M toR. But this is the perpendicular value of M on R, i.e. the orthogonal
projection. The noise term is then perpendicular to bR. In other words, regressions are
nothing but projections in a suitable space. This geometric notion is made precise in the
next section.
256 CHAPTER 3. FUNDAMENTALS THEORY
We assume that the random variables M, R and epsilon are square-integrable. The
space of returns is an innite dimensional complete normed vector space H (a Hilbert
space). Although of innite dimension, the geometric intuitions of the Hilbert space
R3 can be applied. In a Hilbert space, the notions of a basis of vectors, orthogonality,
projection, least square distance common in R3 are well-dened. The norm is induced
by the scalar product for random variables y, x:
R = b1 F1 + b2 F2 +
where for simplicity we omitted the time index and we consider the regression of a return
R on two random variables F called factors. We switch to this example since it represents
the prototype problem in AM. Let R ∈ R3 and the factor space F be 2-dimensional. We
distinguish (i) the factor space is a vector space or (ii) the factor space plane does not
intersects the origin (hyperplane), see Figure 3.17. The second case is generic in our
context since the factor space is generated by random variables plus a constant; the risk
free return.
P is an orthogonal projection of a real vector space onto a subspace if it is
A map
linear,P 2 = P (projection of a projection does not alter the result), and P 0 = P . An
orthogonal projection in Hilbert space of a vector X on a vector Y reads
hX, Y i
PY (X) = Y =: E(X|Y ), (3.110)
hY, Y i
R R
PF(R)
F
PF(R)
Figure 3.17: Left panel - projection on the factor space which is a vector space. Right
panel - projection where the factor space is an ane space, i.e. translated away from the
zero vector.
Then
2
X hR, Fi i
PF (R) = Fi . (3.112)
hFi , Fi i
i=1
Let F̃i be linear independent vectors which are not orthonormal. Set F
F1 = F1 . To get the second vector, we use the full spanning property holds:
PF (R) + PF ⊥ (R) = R , PF + PF ⊥ = I . (3.111)
Then hF̃1 , F2 i
F̃2 = P(F̃1 )⊥ (F2 ) = F2 − PF̃1 (F2 ) = F2 − F̃1 .
hF̃1 , F̃1 i
This vector is orthonormal to the rst one and the construction is continued for the next vector by
projecting the third vector on the orthogonal complement of the rst two orthogonal vectors.
258 CHAPTER 3. FUNDAMENTALS THEORY
2
X hR − a, Fi i
PF (R) = a + Fi . (3.113)
hFi , Fi i
I=1
2
X cov(R − a, Fi )
PF (R) = a + Fi (3.114)
σ 2 (Fi )
I=1
cov(x,y)
and dening
σ 2 (y)
=: βx,y , standard formulae such as the CAPM follow. Let F = RM
be the single market return factor, then for a=0 and omitting the time index:
cov(R − Rf , RM ) cov(R, RM )
= RM = RM (3.116)
σ 2 (RM ) σ 2 (RM )
y = Xβ +
with , y ∈ Rn , β ∈ RK+1 and X ∈ Rn×K the factor associated to β1 is 1, i.e. the rst
row in the matrix X has entries 1 in each cell. Using the above formalism we get:
Proposition 52. Given the above matrix linear regression, the least-square estimated
plan yb = PX (y) = X βb is given
PX (y) = X(X 0 X)−1 X 0 y , βb = (X 0 X)−1 X 0 y (3.117)
24 For example
2
X hR, Fi i
PF (PF (R)) = PF ( Fi )
i=1
hF i , Fi i
2 2 2
X hR, Fi i X hR, Fi i X hFj , Fi i
= PF (Fi ) = Fj
i=1
hFi , Fi i i=1
hFi , Fi i j=1 hFi , Fi i
2 2 2
X hR, Fi i X δij X hR, Fi i
= Fj = Fi .
i=1
hF i , Fi i
j=1
hF i , Fi i
i=1
hF i , Fi i
3.7. ASSET PRICING 259
∞
X
M= aj ej
j=0
where the sum over |aj |2 is nite. This is the analogue to nite dimensional vectors
except that we need a notion of convergence due to the innite dimensionality of the
Hilbert space. The similar decomposition applies to the factors and the factors span the
SDF if M can be replicated exactly by the factors. If the factors span a subspace of the
total space, then SDF possess an error in the factor representation.
with
β := CF−1 cov(F, R) (3.120)
the vector of multiple regression betas of the return R on the factors F and CF the
covariance matrix of the F 's.
260 CHAPTER 3. FUNDAMENTALS THEORY
The elements Fj are the risk factors and λ is the factor risk premium. If λ > 0, then
an investor is compensated for holding extra risk by a higher expected return when risk
is measured with the beta w.r.t. F. The coecient β is the coecient of an orthogonal
projection of the return R on the space generated by the factors F plus a constant.
The risky asset's risk premium is proportional to the covariance between its returns and
the SDF (its systematic risk). In the CAPM for example, the market return replaces
the SDF. Factors can be abstract random variables, portfolio returns, excess returns or
dollar-neutral returns. The model is exact, because there is no error term in (3.119). One
can always take factors to have zero means, unit variances and be mutually uncorrelated
by using
Fb := CD−1 (F − E(F )) (3.121)
When is a factor pricing model exact? By the Riesz theorem, see below, the price of
an asset q(St ) is given by a scalar product hMt+1 , St+1 i of the future asset price and the
SDF. If the SDF is element of the space spanned by the factors, then the pricing of the
asset can be done by using the factors with arbitrary precision: Beta pricing and factor
pricing models are then equivalent, see Proposition 63. In other words, the quality of the
factors used to replicate the SDF determine the quality of the representation of expected
return by their betas.
We consider the Riesz Representation Theorem. It states that any linear function on
a Hilbert space can be represented by a scalar product.
Theorem 55. (Riesz) Let H be a Hilbert space and p : X → R a linear map. There
exists a vector r∗ ∈ H , the Riesz kernel, such that
p(x) = hr∗ , xi
for all x ∈ H .
To apply it in asset pricing, let X be the future payo of an asset and p(X) the
present pricing of the payo. This pricing functional that maps future payos, elements
of H into current prices, the reals. It is natural to assume that the pricing function p is
linear, i.e. for no-arbitrage reasons we impose value-additivity. Furthermore p should be
continuous, small payo variations have small prices. That is p is a linear functional on
the space of future payos. The Riesz theorem states that there exists a random variable
M such that
p(X) = hM, Xi = E(M X).
Hence, by the Riesz Theorem to price any payo it suces to use M - the stochastic
discount factor (SDF) and to apply the expectation. The theorem does not tell how to
25 To prove this consider a one-factor model F = R in (3.119). If there is a risk free asset, then
R0 = Rf .
3.7. ASSET PRICING 261
They face the same endowment (salary) and only dier in their impatience: The
time discount rates b1 and b2 are dierent and hence the time value of money is dierent.
The only asset to invest in the nancial market is a risk-free bond B, which they can
exchange, i.e. there is no money.
An optimal policy xes the optimal consumption levels at the two dates and the
investment amount in the bond at the rst date. These optimizations determine optimal
consumption ci (B) and investment φi (B) for each investor. The policies depend on the
yet exogenous given bond price B. Inserting these strategies in the market clearing
condition xes the endogenous price B = e−Rf of the bond, i.e. the risk-free interest
rate Rf follows from the interaction of the investors. Let φk (S) be the number of bonds
investor k buys and keeps at time 0. Market clearing means φ1 + φ2 = 0: what 1 sells
(buys) must 2 buy (sell). Inserting the individual optimal investment strategy functions
xes the equilibrium risk-free interest rate
2(1 − b1 b2 )
Rf = .
b1 + b2 + 2b1 b2
All quantities which enter symmetrically in the optimization such as endowment have to
cancel in the equilibrium expressions. The time value of money is driven by impatience. If
26 We follow Cochrane (2005), Back (2010), Campbell and Viceira (2002), Cochrane (2011), Culp and
Cochrane (2003), Merton (1971, 1973), Martellini and Milhau (2015), Schaefer (2015) and Shiller (2013).
262 CHAPTER 3. FUNDAMENTALS THEORY
impatience is zero, the risk-free rate is zero. Other limit or sensitivity cases follow at once.
We derive formally the solution by allowing for more heterogeneity. The log prefer-
ences are:
1
ci0 − ei0 = − φi
1 + Rf
ci1 − ei1 = φi
with Rf the yet unspecied risk free rate. We introduce the Lagrangian L:
1
Li (ci , φi , λi ) = ui − λi0 (ci0 − ei0 + φi ) − λi1 (ci1 − ei1 − φi ) .
1+r
The FOC read:
∂Li 1 bi
0 = i
=⇒ ci0 = i , ci1 = i
∂cj λ0 λ1
∂Li
0 = =⇒ λi0 = λi1 (1 + r)
∂φi
∂Li
0 = .
∂λij
ei0 ei1 1
ci0 = + = i
PV(e )
(1 + bi ) (1 + Rf )(1 + bi ) 1 + bi
bi (ei0 (1 + Rf ) + ei1 )
ci1 =
1 + bi
−ei1 + ei0 bi (1 + Rf ) − ei1
φi =
1 + bi
(1 + Rf )(1 + bi )
λi0 =
ei0 + Rf ei0 + ei1
1 + bi
λi1 = .
ei0 + Rf ei0 + ei1
Assuming that endowment is the same for both agents, endowment cancels in the last
expression and the above equilibrium rate follows. If risk enters in the model, the FOC
conditions become equations with expected values but still the same logic applies.
We derive the fundamental asset pricing equation in the context with risk. Assuming
separable preferences, rational investors derives expected utility from two-period con-
sumption at the present date t and a future date t + 1,
with b the time preference rate. He chooses investment to maximize expected utility
where consumption is assumed to be al ready optimally chosen. There is only a single
risky asset S and two budget constraints at time t and t+1 (with e the endowment):
ct − et = −φt St
ct+1 − et+1 = φt Xt+1 .
Introducing the Lagrangian, the FOC imply the Fundamental Asset Pricing Equation
(3.122)- for asset S at time t:
St = Et (Mt+1 Xt+1 ) (3.122)
The SDF relationship between asset prices and consumption states that investments
proposed by asset managers should protect investors' optimal consumption in the short
and long run. This sound theoretical model has drawbacks. First, investments derived
from consumption data often underperform. Second, the assumption and knowledge of a
single utility function is unrealistic. Data science is a feasible and powerful alternative.
The ratio of marginal utilities in the SDF reects that investors value money more
when they need it in bad times than in good times. Marginal utility can therefore be
seen as an index of bad times and the SDF as a substitution measure between present
and future consumption is an index of growth in dierent times. The price changes of
S in the fundamental pricing equation (3.122) can have three causes: The probability p,
the discount factor M or the payo X. There is strong evidence that expected return
264 CHAPTER 3. FUNDAMENTALS THEORY
variation over time and across assets dominate and that asset valuation moves far more
on news aecting the discount factor than on news of expected cash ows, that is, the
payo X.
ST 1 1
1 + Rf = = = .
S0 S0 E(M )
Assuming a constant relative risk aversion utility function u(c) = c1−γ , 0 < γ < 1,
the SDF reads
−γ c
ct+1 −γ ln t+1
M =b = be ct
∼ b(1 − γ∆ct+1 )
ct
ct+1
up to the rst order where ∆ct+1 = ln ct . Expanding again up to rst order:
1 1
1 + Rf = ∼ (1 + γEt (∆ct+1 )) .
E(M ) b
Hence interest rates are higher if people are impatient (low b) or if expected consumption
growth is high. Since high consumption growth means people get richer in the future one
3.7. ASSET PRICING 265
has to oer high risk free rate such that they consume less now and save.
How much does Rf vary over time is the same to ask how much must one oer to
individuals to postpone consumption. This variation is given by the risk aversion factor
γ. Expanding the risk-free rate relation up to second order:
1 1
1 + Rf ∼ (1 + γEt (∆ct+1 ) − γ 2 σt2 (∆ct+1 ) .
b 2
Therefore, higher consumption growth volatility lowers interest rates which motivates
investors to save more in uncertain times.
Using
E(Rie ) = βi λ (3.124)
If assets covary positively with consumption growth or equivalently negatively with the
SDF then they must pay a higher average return. High expected returns are equivalent to
low asset prices. From a risk perspective, the above equations state that average returns
are high if beta on the SDF or on consumption growth ∆c is large. This is the above
'bad times - low consumption growth - high SDF - high returns or high asset prices' story.
Using the fundamental equation (3.122) with a risk free rate and the approximation
for the SDF we get:
Et (Xt+1 )
St = Et (Mt+1 Xt+1 ) ∼ − γ cov(Xt+1 , ∆ct+1 ) . (3.126)
Rf
Again, price is higher if the asset payo is a good hedge against consumption growth
(negative correlation).
scalar product which in our set-up is induced by an expected value. Finally, the inner
product is induced by expectation, i.e. hx, y, i := E(xy). The main source for this section
is LeRoy and Werner (2000).
All random variables dened on the asset span hSi ⊂ RS also span a Hilbert space.
Therefore, the Riesz Representation Theorem applies on the asset span functionals too.
The expectations functional and the payo pricing functional turn are of particular in-
terest. Pricing functionals p are linear functionals hSi → R.. The extension of p the
whole asset space RS is the valuation functional. If markets are free of arbitrage, p
is strictly positive. If x ∈ hSi is an Arrow -Debreu state, p=ψ is a state price. Hence,
p is a linear combination of the basis state prices. If markets are complete, the unique
representation ψ(x) = hψ, xi holds for the valuation and pricing functional. Formally:
Denition 56. The expectations functional E maps every payo x ∈ hSi into its expec-
tation E(x). The payo pricing functional p maps every payo x ∈ hSi into its price
p(x).
By the Riesz Represetation for both functionals exist a unique vector k∗ , M ∗ such
that
E(x) = E(k ∗ x)
and
The construction of the dierent kernels is straightforward. For the pricing kernel,
consider the two-dimensional set hSi = ⊂ R2 , (1/4, 3/4) the probabilities of
span(1, 1)
the expectation in the inner product and p(s) := 2s1 for s = (s1 , s2 ) ∈ hSi. Since (1, 1)
is a basis of the span, the Riesz kernel has to be a multiple a(1, 1) of the basis vector
with a ∈ R:
a 3a
p(1, 1) = 2 × 1 = 2 = E(r∗ (1, 1)0 ) = ×1+ × 1 = a,
4 4
i.e. a = 2 and the kernel reads r∗ = (2, 2). To calculate the expectation kernel, let
x1 , . . . xm be m payos P with S components for the states with probabilities ps , s =
∗ ∗
P
1, . . . , S . Then, E(x) = Ps ps xs and E(k x) = s ps ks xs . Since the expectation kernel
is in the asset span, k =
∗
v as xv : The kernel can be spanned by the payos with the
coecients unknown. But then
X
p s xs = av xs xsv
v
k∗ =
P
denes a linear system for the a's. Solving this system and using v as xv provides
the expectation kernel. If there are three states with equal probability, two payos
3.7. ASSET PRICING 267
x1 = (1, 1, 0) and x2 = (0, 1, 1), then the expectation kernel reads k ∗ = (a1 , a1 + a2 , a2 ),
2 ∗
and from
3 = E(k xj ), j = 1, 2 the linear system
2
3
1 a1 1 a1 + a2
2 = + .
3 3 a1 + a2 3 a2
k ∗ = 23 , 34 , 23 follows.
Solving the system,
Theorem 57. 1. If the risk-free payo is in the asset span, then the expectations
kernel is risk-free and equal to one in every state.
2. If the risk-free payo is not in the asset span, then the expectations kernel is the
orthogonal projection of the risk-free payo on the asset span.
3. The pricing kernel is unique regardless of whether markets are complete or incom-
plete.
4. Let ψ1 , . . . , ψS be the state prices of the S states and ps the corresponding probabil-
ities of the states. Then −M ∗ is the orthogonal projection of the vector ψ/q on the
asset span.
Denition 58. The mean-variance frontier is the set M which consists of all payos
x ∈ hSi such that there exists no other payo x0 in the asset span with the same expected
value and the same valuation.
Hence, a payo is a mean-variance frontier payo i it lies in the span of the expec-
tations kernel and the pricing kernel. Since return is dened as payo divided by price
and price is given by the valuation functional, we have for x = M ∗:
∗ M∗ M∗ k∗ k∗
RM := = , R := .
p(M ∗ ) E[(M ∗ )2 ] E[k ∗ ]
Proposition 60. Assume that the pricing and expectation kernel are not collinear.
a) The set of frontier returns is given by the line spanned by the two frontier returns Rk
∗
∗
and RM : For λ a number
∗ ∗ ∗
Rλ = Rk + λ(RM − Rk )
is a frontier return.
c) If the risk-free payo is in the asset span, then the risk-free return is the minimum-
variance frontier return. If the risk-free payo is not in the asset span, then
∗ ∗ ∗
cov(Rk , RM − Rk )
λ0 := −
var(RM ∗ − Rk∗ )
denes the minimum-variance frontier return Rλ0 .
d) Given any frontier return Rλ , which is dierent from the minimum-variance frontier
return, there exists a zero-covariance frontier return RλC , i.e. cov(Rλ , RλC ) = 0.
Using this proposition, we can recover Beta pricing models. Let Rj be the return of
an asset j. Then,
Rj = PE Rj + j
denes the projection on the space E and the epsilon term is orthogonal to this space.
Since this space is generated by the expectation and pricing kernel, epsilon is orthogonal
to these two kernels and hence has zero expectation and price. This implies that the
projected return PE Rj is a frontier return. We span this return in a new basis Rλ and
the zero-covariance return, i.e. for some parameter βj
Taking expectations and the covariance w.r.t. Rλ , which is uncorrelated with the zero-
covariance return and the epsilon, implies that the beta coecient is the ordinary regres-
sion coecient of Rj on Rλ . If the risk-free payo is in the asset span, the beta pricing
equation
E(Rj ) = Rf + βj (E(Rλ ) − Rf )
follows. Since the market return in the CAPM turns out to be also a frontier return, Rλ
can be replaced by the market return. Hence, the SML of the CAPM is a special case of
beta pricing. The analysis not only holds for a single asset but for a portfolio.
The beta pricing with one factor Rλ is generalized in the above geometric set-up in a
straightforward way. The span E is replaced by a span F of K normalized factors fj , i.e.
3.7. ASSET PRICING 269
their expected value is zero, and the risk-free asset xf . Projecting an arbitrary payo xj
on the new span space, switching from prices to return the usual representation
K
X
Rj = E(Rj ) + βjk fk + j (3.128)
k=1
follows with the beta's the factor loadings. As in the proof of the beta pricing repre-
sentation, if the pricing kernel and the risk free asset are elements of F, then the exact
factor pricing equation X
E(Rj ) = Ff + βjk λk
k
∗
holds with λk = −E(RM fk )Rf .
So far we did not consider any equilibrium economy analysis in this representation
set-up. To do so, consider a two period economy where agents derive utility from con-
sumption of a single good, the utility function is a smooth function, individuals are
strictly risk averse, there are K factors fj and where the expected error epsilon in (3.128)
conditional on the factors is zero.
Theorem 61. Under the above assumption, if the risk-free asset, the factors, and agents'
endowments at date 0 lie in the asset span and if the aggregate date 0 endowment lies in
the factor set, then exact factor pricing holds in any equilibrium in which the consumption
allocation is interior.
We nally relate this representation to the case where the SDF M is linearly related
to factor returns, such as for the CAPM
∗
Mt+1 = a + bRM,t+1 , (3.129)
if the parameters a and b are appropriately chosen. For the mean-variance model,
∗
Mt+1 = a + bRmv,t+1 ,
where Rmv,t+1 is any mean-variance ecient return. Again given any Rmv,t+1 and a
risk-free rate, we nd a SDF that prices all assets and vice versa. This shows that the
CAPM and Markowitz model are approximations to the general equilibrium
pricing kernel or SDF - the ratio of marginal utilities of consumptions at dierent
dates is approximated by ane functions in the market and mean-variance return re-
spectively.
It is worth to express the relationship between factor models and beta representations
in general since the expression of a risk premium given in (3.124) is of limited practical
use because it involves the unobservable SDF. The idea is to start with investable
factors and then derive the beta representation which is equivalent to the SDF approach.
270 CHAPTER 3. FUNDAMENTALS THEORY
The equivalence between factor models and beta pricing models is given in the next
proposition.
Proposition 63. A scalar a and a vector b exist such that M = a + b0 F prices all assets
if and only if a scalar κ and a vector λ exist such the expected return of each asset j is
given by
E(Rj ) = κ + λ0 βj (3.131)
where
1 1
λ=− cov(M, F ), κ = .
E(M ) E(M ) − 1
The K × 1 vector βj is the vector of multivariate regression coecients of the return of
asset j on the risk factor vector F .
The vector λ is called the factor risk premia. The constant κ is the same for all
assets and it is equal to the risk-free rate if such a rate exists. We mentioned above that
factor models often are not given as pay-os nor as returns, but the fundamental pricing
equation is expressed using pay-os. It possible to replace a given set of pricing factors by
a set of pay-os that carries the same information. The following proposition summarizes:
'Mimicking' means that the new SDF is as close as possible chosen to match the pay-
o. Summarizing, there is no loss of generality from searching for pricing factors among
pay-os.
Cochrane (2013) distinguishes between pricing factors and priced factors. Con-
sider M = a + b0 F and the factor risk premia λ of Proposition 63. The coecient b in
the SDF is the multivariate regression factor of the SDF on the factors. Each component
of the factor risk premia is proportional to the univariate beta of the SDF with respect
the corresponding factor. If b is non-zero for a given factor means that the factor adds
value in pricing the assets given all other factors - a pricing factor. If the component
of the factor risk premia is non-zero, then the factor is rewarded - a priced factor. The
two concepts are not equivalent except in the case where all factors are independent.
3.7. ASSET PRICING 271
But there is bad news. Factors are related to consumption data entering the SDF.
While multi-factor models try to identify variables that are good indicators of bad vs
good times - such as market return, price/earnings ratios, the level of interest rates, or
the value of housing - the performance of these models often varies over time. The overall
diculty is that the construction of the SDF by empirical risk factors is more an art than
a science. There is no constructive method that explains which risk factors approximate
the SDF in all possible future events reasonably well.
So far we did not consider how to choose risk factors for investment. We discuss some
theoretical recommendations for the choice of risk factors. First, factors should explain
common time variation in returns. Second,assuming that there exist a risk-free rate rf
and M = a + b0 F , then the denition of the SDF implies for any asset k return rk :
E(rk )
b0 cov(rk , F ) = 1 − .
1 + rf
For all assets earning a dierent expected return than the risk-free rate, the vector of
covariances between the risk factor and the asset's return must be non-zero. Regressing
the returns on the candidate pricing factors, all assets should have a statistically signi-
cant loading on at least one factor. This choice recommendation is model independent.
The next recommendation is based on the APT model. APT not only requires that
factors explain common variation in returns but the theory suggests that these factors
should also explain the time variation in individual returns. This ensures that the pay-
o and hence the price of an asset can be approximated as the pay-o of a portfolio of
factors. Therefore, the idiosyncratic terms should be as small as possible. Performing a
PCA, the largest eigenvalues follows and hence the main factors.
• there are suciently many securities available to diversify away any idiosyncratic
risk: In a large and diversied portfolio the idiosyncratic risk contributions should
be negligible due to the law of large numbers - investors holding such a portfolio
require compensation only for the systematic part.
APT does not assume an economic equilibrium nor the existence of risk factors driving
the opportunity set for investments. While CAPM and ICAPM represent the SDF in
terms of an ane combination of factors, APT decomposes returns into factors. CAPM
explains the risk premia; APT leaves the risk premia unspecied.
Assume that there are k factors Fk with a non-singular covariance matrix CF and
N >> F returns RN . Projecting the returns orthogonally on the set generated by the
factors plus a constant:
with Fk = Fk − E(Fk ) the centred factors and idiosyncratic risks i satisfying E(j ) =
cov(Fk , j ) = 0 and E(j k ) = 0 for all j 6= k . The restriction that the residuals should
be uncorrelated across assets implies:
C = β 0 CF β + C (3.133)
where C is a diagonal matrix with non-zero elements the variances of the idiosyncratic
risks, CF is the factor covariance matrix and β is a m×N matrix of betas.
Denition 65. The returns in equation (3.132) have a factor structure with the factors
F1 , . . . , Fk if all residuals are uncorrelated.
To understand APT, assume rst that idiosyncratic risks are zero in (3.132). We can
derive an exact beta pricing model starting from the fundamental asset pricing equation
E(M Ri ) = 1. Writing the expectation product as single expectations plus the covariance
term, inserting (3.132) for the return and rearranging implies the beta pricing equation
(3.131) in Proposition 63:
E(Rj ) = κ + λ0 βj (3.134)
E(M j )
E(Rj ) = κ + λ0 βj − (3.135)
E(M )
with the additional the pricing error. The idea is E(M j ) → 0 if we increase the number
of uncorrelated assets, see Proposition 4. The analysis requires a precise mathematical
modelling under the assumption that no arbitrage holds. The APT theorem states that
if there are enough assets then the beta pricing equation is approximatively true for most
assets.
3.7. ASSET PRICING 273
Example
Consider two assets with two dierent factor loadings but the same factor F. What
relationship holds between their expected returns if there is no arbitrage? Let φ be the
weight of the rst asset in a portfolio and 1−φ the weight of the second one. The
portfolio return reads (we set for simplicity idiosyncratic risk to zero)
µR = µ0 + βλ , (3.136)
First, the market for real estate is often larger in valuation than the entire stock
markets. In Switzerland, the value of real estate in (2014) was about 4 to 5 times larger
than the value of all companies listed on the SIX exchange. Second, pure real estate
risk is illiquid. The annual turnover of private owned real estate is in the low one-digit
domain. Table 3.22 illustrates the illiquidity using data from the the state of Zurich in
2011.
The holding period median value of the private persons' homes is 25 years. Hence,
the construction of a repeated sales index, which would be a transaction based price
index, is not meaningful. Third, one cannot short property. Fourth, historically real
estate risk is the most prominent a frequent driver for a nancial crisis. Fifth, friction
costs for direct real estate transaction are high. Sixth, since each property is unique, the
274 CHAPTER 3. FUNDAMENTALS THEORY
What do we mean by real estate risk? Figure 3.19 provides an overview of investments
and consumption in the real estate asset class.
Figure 3.19: Dierent use of the real estate asset class (Extension of Zürcher Kantonal-
bank (2015)).
by the US Census Bureau is compared with the Case and Shiller 'Repeat Sales' index
in Figure 3.20. price index (Case and Shiller [1987, 1989, 1990]). A constant quality
Figure 3.20: Two indices of US home prices divided by the Consumer Price Index (CPI-
U), both scaled to 1987=100. Monthly observations in the period 1987-2013 are consid-
ered (Shiller [2014]).
K
Sjt,t+1 = β0 + δ t+1 Djt+1 + t,t+1
+ t,t+1
X
βk zk,j j (3.137)
k=1
27 See Fisher, Geltner, and Webb (1994), Hansen, (2009), Silver (2018) and Shimizu et al. (2010).
276 CHAPTER 3. FUNDAMENTALS THEORY
with β the estimated weights of the characteristic. More general models account for
time varying beta over longer time periods. Hedonic models contain between 20 and 30
dierent characteristics for private property.
Figure 3.20 shows that both indices are smooth over time. For real estate price mo-
mentum dominates price volatility. The boom in house prices after 2000 is visible in the
Case Shiller index but not in the Census Constant Quality Index. The reason is that new
homes are built where it is possible and protable to build them. This is often not the
case in the expensive area of a city. Therefore, the constant quality index level through
time is more accurately determined by simple construction costs if as in the US there is
a hugh reservoir of cheap land.
Figure 3.21, left panel, shows that in the mid-1990s house prices in Zurich and Lon-
don started to grow at dierent rates. This is in line with London becoming the world's
major nancial center. In the GFC, the greater vulnerability of the Halifax index is
visible while during the whole GFC house prices in Zurich never fell.
The right panel shows forwards on the Halifax index at dierent time periods in the
GFC period. In May 2007 the forecast was still on an increasing value of the house
prices: Market participants failed to identify the GFC. During the GFC, forward levels
of the index sharply corrected in each month. The culmination point was October 2008
where the forward levels were predicted too low but the turning point of the index was
identied almost perfectly.
The EMH requires that markets are free of frictions. But in housing markets there
are many sources of friction. Figure 3.22 shows friction sources for dierent types of
real-estate investments in Switzerland. 'Direct' means that investors buy houses, 'indi-
rect' means to invest in stocks that are related to housing or investment funds and and
'derivative' refers to property derivatives dened on property indices.
Figure 3.21: Left Panel: The Halifax Greater London price index and the Zurich price
index (ZWEX) (ZKB and Lloyds Banking Group). Right Panel: Halifax Greater London
price index and forwards on the index (Syz and Vanini (2008)).
in London. The markets started in 2005 in the US where again OTC products domi-
nated. The CME started to launch 2006 futures with very limited success for residential
investment based on the S&P/ Case-Shiller Index. The most common transactions are
swaps. Derivative instruments allow investors to gain exposure to the real estate asset
class, without having to buy or sell properties by replacing the real property with the
performance of a real estate return index. Most popular instruments are swaps, total
return swaps while options are much less established.
Consider the case of derivatives on the residential, hedonic real transactions property
index ZWEX of Zurich Area. In 2006 warrants, calls and puts on the ZWEX, were issued
to allow investors to protect home owner's capital against falling future house prices (in-
dex mortgages, i.e. ordinary mortgage plus a put on the real estate index) and to oer
leveraged investments at the same time. Fix a home owner which seeks protection from
falling house prices at the end of his 5y xed mortgage contract. He buys a put option
on the ZWEX, see Salvi et al. (2008). The put option should nance possible forced
amortizations at maturity of the mortgage if ZWEX falls. To show the impact on capital
278 CHAPTER 3. FUNDAMENTALS THEORY
Figure 3.22: Frictions for investment in real-estate markets in Switzerland. Lex Koller
is a federal law which restricts the purchase of property by foreigners (Syz and Vanini
[2008]).
A second example of property derivatives are property swaps, i.e. OTC contracts, see
Geltner and Miller. Assume that a small rm BUY wants to invest in real estate without
facing high costs and illiquity of a direct investment. The rm SELL is over-invested in
real estate and wants to sell real estate market risk exposure. No party intends to buy
or sell objects they are actually invested to circumvent large transaction costs and to
3.7. ASSET PRICING 279
Figure 3.23: Eectiveness of the put option hedge for a 5 year mortgage under three
dierent real estate price scenarios (Syz and Vanini (2008)).
keep regular income stream from the physical objects. A NCREIF Appreciation Swap
('Swap') allows BUY to swap a xed return for NPI appreciation return, i.e. the return
of the property index, and SELL takes the short position of BUY, pays the oating,
quarterly NPI appreciation return and receives from BUY quarterly the xed return.
Netting of the payments occurs quarterly and notional amounts are not exchanged.
We price the Swap using a replication portfolio and no arbitrage. We assume that it
is possible to replicate a NPI return with a portfolio of assets, that there are no frictions
and short-selling is possible. Although these assumptions are violated in practice, the
pricing denes a benchmark which can be compared to the second equilibrium pricing.
The assumptions allow us to construct a risk-less hedge using the replicating portfolio and
the swap. We consider two periods, t, t + 1, t + 2, It the value level of NPI, E[y] expected
income of NPI with y the same random income in each period and S the unknown xed
leg / spread of swap.
The t+1 and t+2 part of the hedge are risk-less. Setting the NPV of the hedge
equal to zero, this means excluding arbitrage, implies for the xed leg
S = Rf − Et [y]. (3.138)
The xed spread S is independent of the NPI level value and only the borrowing costs
of the investor BUY as well the expected income stream matter. If we consider a total
return swap, i.e. all proceeds from the index are also exchanged, then expected income
280 CHAPTER 3. FUNDAMENTALS THEORY
t t+1 t+2
Short Index It −gt+1 It − Et [y]It −gt+2 It − E[y]It − It
Risk-less ZCB −Rf It 0 It
Long Swap 0 gt+1 It − SIt gt+2 It − SIt
Hedge (1 − Rf )It −(S + E[y])It −(S + E[y])It
Table 3.23: Risk-less hedge Swap, long position of BUY. ZCB means Zero Coupon Bond,
D discounting with the risk free rate and g is the growth rate of NPI.
Et [y] is also part of the index value and S = Rf follows using the same replication ap-
proach.
FI = E[RI ] − Rf
and decompose E[y] = E[RI ] − E[g] with E[g] the real estate appreciation rate. Then
the xed no-arbitrage spread
S = E[g] − FI ,
is equal to the expected index appreciation rate minus the risk premium. A no arbitrage
argument is not allowed since it is not possible to short It . We assuming linear pricing
rules in equilibrium. BUY expects the net return which consists of the NPI appreciation
return E BU Y (g), minus S plus receives Rf from the covering bond position to be not
smaller than the swap risk premia:
E BU y [g] − S + Rf ≥ Rf + FI .
SELL also considers his overall net return. It consists of S plus the expected return on
his real estate portfolio E SELL (RS ) which should be as close as possible to the return of
the NPI minus NPI appreciation return E SELL (g). Since by assumption E[y] is constant
and the NPI swap obligation is covered by the bond portfolio, net risk exposure is zero.
Summarizing, SELL's requirement is:
∞
X 1
St = Et Dt+j , (3.139)
(1 + R)j
j=1
with R the internal rate of return on expected dividends: For two stocks with the same
expected dividends but dierent prices, the stock with the lower price has to have a
higher expected return. Merton's (1973) multi-factor inter-temporal CAPM (ICAPM)
generalizes to the case of several factors assuming:
• Investors care about the risk factors market return RM and so-called innovations
Y.
In the Markowitz model, the investment opportunity set consists of all ecient and
inecient portfolios. If the investment opportunity set changes over time, then variables
Y other than the market returns drive returns. Working without these factors trivializes
human behavior and needs. Using market return only for example, all investors are
jobless since no labor income exists. The possible change of the investment opportunity
set for investors is more important for longer-term investment horizons than for shorter
ones since the deviations from a static opportunity set can become larger for longer time
horizons. The solution of the ICAPM model generalizes (3.124) to
where Θ is the average relative risk aversion of all investors and Ω is the average aversion
to innovation risk. The mean excess returns are driven by covariances with the market
portfolio and with each innovation risk factors. The geometric intuition of this beta
pricing model is the same as in the case with xed opportunity sets. The rst term
in (3.140) is mean-variance ecient but the total portfolio is no longer mean-variance
ecient. Economically, the average investor is willing to give up some mean-variance
eciency for a portfolio that better hedges innovation risk. The mutual fund theorem of
the Markowitz model generalizes to a K +2 fund theorem if there are K innovation risk
sources. Investors will split their wealth between the tangency portfolio and K portfolios
for innovation risk.
282 CHAPTER 3. FUNDAMENTALS THEORY
3.8 Applications
3.8.1 Low Volatility Strategies
Low-beta stocks outperform in many empirical studies high beta stocks and volatility neg-
atively predicts equity returns (negative leverage eect), see Haugen and Heins (1975),
Ang et al. (2006), Baker et al. (2011), Frazzini and Pedersen (2014), Schneider et al.
(2016). This means, high beta (risk) is not rewarded as it should be according to the
asset pricing equations. These denes the beta and volatility low risk anomalies.
There are dierent ways how to rationalize these anomalies by enlarging models which
lead to the anomalies. Schneider et al. (2016) show that taking equity return skewness
into consideration rationalize these anomalies. Thy generalize the CAPM which serves as
an approximation and allows for higher moments of the return distribution. This leads to
skew-adjusted betas. They use credit worthiness of the rms as the source for skewness
in returns:The higher a rm's credit risk, the more the CAPM overestimates the rm's
market risk, because it ignores the impact of skewness on asset prices (Schneider et al.
(2016)). Benchmarked returns against the CAPM then appear to be too low since the
CAPM fails to capture the skewness eect. Formally, starting with (3.124), dening the
regression coecient βi = cov(M, Ri )/var(M ) and the variable λ = −var(M )/E(M ), we
get the equivalent equation to (3.122)
cov(M, Ri ) σ(M )
Et (Rie ) = . (3.141)
σ(M ) E(M )
Schneider (2015), Kraus and Litzenberger (1976) and Harvey and Siddique (2000) dene
the risk premium as the dierence between the expected value of a derivative X based
on the historical probability P and on the risk neutral probability Q:
Risk Premium = EtP (XT ) − EtQ (XT ) . (3.142)
The two probabilities P, Q can be related to each other by the state price density L:28 f
dQ
L= , E P (L) = 1 . (3.143)
dP
To illustrate the technique, consider two states with probabilities P = ( 12 , 12 ) and Q=
1/3
(1/3, 2/3). Then in state 1, L1 = 1/2 . Therefore,
1
E P (X) = p1 X1 + p2 X2 = (X1 + X2 ) = E Q [LX] = q1 L1 X1 + q2 L2 X2 .
2
Using M =L in (3.141) and the risk premia for the market risk return we get:
cov(L, Ri )
Et (Rie ) = e
Et (RM ). (3.144)
cov(L, RM )
28 The Radon-Nykodim L derivative (math), the state price density (economics), likelihood ratio
(econometrics).
3.8. APPLICATIONS 283
The expected return on asset i is proportional to the expected excess return on the mar-
ket, scaled by the assets covariation ratio with the pricing kernel - the true beta. Since
L is not observable, the authors approximate L(R) := E P (L|R) in a power series in R.29
Using a linear and a quadratic approximation of L in (3.144) changes the true beta into
a CAPM beta (linear case) or a skew-adjusted beta in the quadratic case.
... a rm's market risk also explicitly depends on how its stock reacts to extreme mar-
ket situations .. and whether its reaction is disproportionally strong or weak compared
to the market itself. A rm that performs comparably well ... in such extreme market
situations, has a skew-adjusted beta that is lower relative to its CAPM beta. ... investors
require comparably lower expected equity returns for rms that are less co-skewed with the
market. Schneider et al. (2016)
To incorporate time-varying skewness in the stock returns the authors consider cor-
porate credit risk by using the Merton (1974) model. In this models, equity value at
maturity date is an European call option on the rm value with strike equal to debt
(which is a zero-coupon bond). For rms with high credit risk, the increased probability
to default is reected in strong negative skew of the return distribution. The forward
value of equity is then given by the expected value of the call option discounted with the
SDF M =L under P . This forward value denes with the call option value the rm's
i excess equity return Rie . The expected gross return is given by (3.144) with the linear
and quadratic approximation replacing the SDF. For the linear CAPM the betas increase
with credit risk, i.e. the asset volatility or the leverage, and the rm correlation to the
market. Comparing this beta with the skew-adjusted one, the latter one is in general
larger. The dierence increases the higher credit risk: The rm becomes more and more
an 'idiosyncratic risk factor' and hence less connected to the market the stronger the
skew is. In this sense the CAPM approximation overestimates expected equity returns,
i.e. the return anomaly.
Schneider et al. (2106) apply their model implications to low risk anomalies, the
so-called Betting-Against-Beta (BAB) strategy, see Frazzini and Pedersen (2014).
BAB is based on the empirical observation that stocks with low CAPM betas outperform
high beta stocks. Hence, investors believing in BAB goes long a portfolio of low-beta
stocks and short a portfolio of high-beta stocks. To reach an overall zero beta, the
strategy takes a larger long than short position. The strategy is nanced with riskless
borrowing. Frazzini and Pedersen (2014) document that the BAB strategy produces
signicant prots across a variety of asset markets. Using empirical evidence from 20
international stock markets, Treasury bond markets, credit markets, and futures markets
Frazzini and Pederson (2014) ask:
• How can an unconstrained arbitrageur exploit this eect, i.e., how do you bet against
29 To achieve this L is written as an innite series. The coecients in the series depend on P, Q, i.e.
the price dynamics of the assets, and the risk aversion of the investor. Geometrically, the representation
of L is equivalent to orthogonal projections of L on the space generated by the powers of R.
284 CHAPTER 3. FUNDAMENTALS THEORY
beta?
• What is the magnitude of this characteristic relative to the size, value, and momen-
tum eects?
They nd that for all asset classes alphas and Sharpe ratios almost monotonically
decline in beta. Alphas are decreasing from low beta to high beta portfolios for US
equities, international equities, treasuries, credit indices by maturity, commodities and
foreign exchange rates. Constructing the BAB factors within 20 stock markets they nd
for the US a Sharpe ratio of 0.78 between 1926 and March 2012 which is twice as much
as the value eect and still 40% larger than momentum. The results for international
assets are similar. They also report that BAB returns are consistent across countries,
time, within deciles sorted by size, and within deciles sorted by idiosyncratic risk and
are robust to a number of specications. Hence, coincidence or data mining are unlikely
explanations.
The BAP strategy is rationalized in the model of Schneider et al. (2016) as follow.
The CAPM betas increase for xed credit risk (xed volatilities and leverage) with the
rm's correlation to the market: buy stocks with low and sell stocks with high correlation
to the market. The alpha of this strategy, the excess expected return relative to market
covariance risk, is given by the rm's expected return for the skewness. These typically
positive alphas increase with increasing credit risk. Summarizing, the BAB returns can
be directly related to the return skewness induced by credit risk.
The relative impact of both explanations can vary over time. During the tech bubble
of 1999-2000 for example, cheap value stocks - which typically are cheaper because they
are riskier - were cheaper because investors were making errors.
3.8. APPLICATIONS 285
The two explanations behave dierently when a strategy becomes known. In the ra-
tional model the value strategy still works but at a level consistend with the equilibrium
demand and supply side.The equilibrium property conserves both the expected return
and the risk of the strategy.
In the behavioural explanation, the risk source is not systematically linked to the
return in equilibrium. There is no systematic demand and supply as in the equilibrium
model to guarantee that the risk premium will not go away. It is therefore dicult to be
convinced that risk remains stable over time.
Asness (2015) compares these two dierent views using historical data and the Sharpe
ratio. If a strategy has an impact on the risk premia if it becomes more common, the
Sharpe ratio is expected to fall. Either because excess return diminishes or because risk
increases. With regard to the returns, one could argue that if the value strategy becomes
more popular, then the 'value spread' between the long and short sides of the strategy
gets smaller. This spread measures how cheap the long portfolio is versus the short port-
folio. If more and more investors are investing in this strategy, then both sides face a
price movement - long is bid up and short is bid down. This reduces the value spread.
The author uses the FF approach for value factor construction. He calculates the
ratio of the book-to-price ratio of the cheapest one-third over the BE/ME of the most
expensive one-third of large stocks. Clearly, cheaper stocks always have a higher BE/ME
than the expensive stocks. But the point is to compare how the ratio of large-cheap over
large-expensive changes over time as an approximation of the attractiveness of the value
strategy. Considering 60 years of data, the ratio is very stable, with a 60-years median
value of 4. There is no downward or upward trend. The only two periods during which
the ratio grew signicantly - reaching a value of 10 - correspond to the dot-com bubble
and the oil crisis of 1973. This measurement shows little evidence that the simple value
strategy was arbitraged away in the last 60 years.
To analyze the risk dimension, the annualized, rolling, 60-month realized volatility
of the value strategy for the last 56 years is considered. Again, the dot-com bubble is
the strongest outlier followed by the GFC and the '73 oil crisis. There is again little
evidence that the volatility of the strategy is steadily rising or falling. The attractiveness
of a strategy is best measured by the in- and outows of investment in the strategy.
Increasing inows should, on a longer time scale, increase the return of a strategy and
the opposite holds if large outows occur. This was not observed in the above return
analysis.
• Conservative investors are advised to hold more bonds relative to stocks than ag-
gressive investors. This contrasts the constant bond - stock ratio in the tangency
portfolio of the CAPM. This is the asset allocation puzzle.
• Judgement of risk may be dierent for long-term and short-term investors. Cash
which is considered risk free for the short term becomes risky in the longer-term
since it must be reinvested at an uncertain level of real interest rates.
If the interest rate return is IID, then the optimal strategy is the myopic one, i.e. the
second is zero. Assume returns are not IID. If the investor becomes more risk averse,
−1
RRA tends to zero. A conservative investor will not invest in the risky asset to cap-
ture its short-term risk premium but rather fully hedge the future risk of the risky asset.
Hence, short-term market funds are not a risk-less asset for a long-term investor. Camp-
bell and Viceira (2002) show that the risk-less asset is in this case an ination-indexed
perpetuity or consol. Note that for all results an individual investor's viewpoint is con-
sidered buy not an equilibrium. Hence, possible equilibrium feedback eects on the asset
prices and returns are missing.
Predictable asset returns lead to a hedging demand. If equity is predictable, there will
be an inter-temporal hedging demand for stocks. Campbell and Viceira (2002) consider
a model where long-term investors face a time varying opportunity due to changing
interest rates or changing equity risk premia. A striking result is that a conservative
investor will hold stocks even if the expected excess return of the stock is negative. How
is he compensated for doing so? We rst assume that the covariance between risky asset
returns at two consecutive future dates is negative. This captures that equity returns
are mean-reverting: an unexpectedly high return today reduces expected returns in the
3.8. APPLICATIONS 287
future. This describes how the investment opportunities related to equity vary over time.
If the average expected return is positive, the investor will be typically long on stocks.
Given a negative correlation, for stocks with a high return today future return will be low
and hence the investment opportunity set deteriorates. The conservative investor wants
to hedge this deterioration. Stocks are just one asset that delivers increasing wealth when
investment opportunities are poor. Figure 3.24 illustrates, for a conservative investor,
three alternative portfolio rules.
Figure 3.24: Portfolio allocation to stocks for a long-term investor, a myopic investor,
and for a CIO choosing the TAA (Campbell and Viceira [2002]).
The horizontal line represents the optimal investment rule if the expected excess
stock return is constant and equal to the unconditional average expected excess stock
return. The TAA is the optimal strategy for an investor who observes, in each period,
the conditional expected stock return. The myopic strategy and TAA cross at the point
at which the conditional and unconditional returns are the same. The TAA-investor
is a myopic investor with a one-period horizon. The SAA line represents the optimal
investment of a long-term investor. There is a positive demand for stocks even if the
expected return is negative. This reveals that the whole discussion in this section can
be seen as describing the structure of strategic asset allocation (SAA). In fact, Formula
(3.13) can be transformed as follows:
The long-term investor should hold long-term, ination-indexed bonds and increase
the average allocation to equities in response to the mean-reverting stock returns (time-
varying investment opportunities). Empirical tests suggest that the response to changing
investment opportunities occurs with a higher frequency for stocks than for the interest
rate risk factor. Therefore, this long-term weight or SAA should be periodically reviewed
and the weights should be reset.
1. Liability prole - the degree to which the investor must service short-term obliga-
tions, such as upcoming payments to beneciaries.
2. Investment beliefs - whether the institution believes long-term investing can produce
superior returns.
3. Risk appetite - the ability and willingness of the institution to accept potentially
sizable losses.
4. Decision-making structure - the ability of the investment team and trustees to exe-
cute a long-term investment strategy.
Comparing this with optimal investment formula (3.14), point 3. is captured by risk
aversion, 2. denes the asset universe selection of the model and 1. is part of the utility
function.
The WEF (2011) report considers the question who is the long-term investors. They
build the following ve categories. Family oces with USD 1.2 trillion AuM, endowments
or foundations with USD 1.3 trillion AuM, SWFs with USD 3.1 trillion AuM, DB pension
funds with USD 11 trillion AuM and ve insurers with USD 11 trillion AuM. Matching
these dierent types of investors to the above listed four constraints leads to the following
long-term investment table (Source for the table is WEF (2011) and the many sources
cited therein):
The following model portfolio construction of Ang et al. (2018) provides an a prac-
titioner's approach to long and short term investment in asset classes.
Their model portfolios are parametrized by investor's preferences such as risk toler-
ance and the selection of the asset universe. Their construction combines three portfolios:
A performance benchmark reecting investor's risk appetite, a construction of the strate-
gic model relative to the benchmark which reects long-term view on market and nally
a tactical model portfolio is considered mimicking short-term views.
3.8. APPLICATIONS 289
Table 3.24: Decision represents the decision making structure, D the average duration
and Estimated the estimated allocation to illiquid investments (WEF [2011]).
The benchmark portfolio φB is a xed equity-bond portfolio, say 80/20. The chosen
fraction mimics risk tolerance of the investor. Such benchmarks can be implemented
at low costs and the performance of more complicated portfolio can measured without
diculty. The strategic portfolio φS is the solution of a mean-variance optimization
problem relative to φB where both the risk aversion and covariance matrix are long-term
parameters. Several constraints are used such as equalizing the equity components of
the strategic to the benchmark portfolio, long-only, full-investment and many more. The
short-term or tactical portfolio φS also solves a mean-variance problem where the short-
term expected returns and covariance matrix parameters enter. The two main constraints
are hφS , ei = 0, i.e. it is a zero-dollar long-short portfolio which shapes the strategic allo-
cation, and hφS , CS φs i = 1, i.e. the short-term risk aversion follows from this constraint.
This short term portfolio is weighted by market signals implying the so-called long-short
combined portfolio φC = hw, φS i where the weights wi add up to one. Adding φC + φS
denes the target portfolio φ∗ . Finally, the model portfolio φM is the portfolio which
minimizes the
∗ ∗
variance h(φ − φ , CS (φ − φ )i subject to the full-investment constraint
and linear constraints for the asset classes.
The authors use liquid ETFs to implement the approach. Besides the broad equity
and bonds which enter the benchmark portfolio they use ETFs for the styles momentum,
minimum volatility, value, quality and size. These more complicated indices are added
in the strategic portfolio and the tactical portfolio. Figure 3.25 illustrates their model
portfolio construction.
The gure shows that the start is to choose performance benchmark reecting the
preferences, i.e. blending with MSCI USA Minimum Volatility Index the MSCI USA
Index. Active risk of 250 bps relative to the performance benchmark is added in the
strategic model portfolio. For the model portfolio capturing short-term tactical views
on the chosen style, the factors are re-weighted relative to the strategic portfolio. This
means to add additional an average of 110 bps risk. The model is tested full amount of
active risk relative to the performance benchmark is approximately 300 bps. The model
is tested using data from Jan 2000 to Jun 2017. Note that not all styles existed for the
whole period, i.e. the index values are then theoretically calculated. The nal model
portfolio generated an annual return of 8.9%, outperforming the performance benchmark
by 3.4% per year. The outperformance is attributed to two sources. The strategic port-
290 CHAPTER 3. FUNDAMENTALS THEORY
Figure 3.25: Tactical U.S. equity model portfolio construction process. White MSCI
USA Index, Red MSCI USA Minimum Volatility Index, Blue MSCI USA Momentum
Index, Green MSCI Risk Weighted Index, Light Blue MSCI Value Index. (Ang et al.
(2018)).
folio tilts the factors which possess inherent and persistent risk premia. Second, the
short-term indicators have some ability to predict factor returns. These time-varying
active positions versus the strategic benchmark generate excess return.
Comparing this approach with the general optimal decision making formula, the two
components of long term and short term are present. The way how they enter in the
nal model portfolio is a multi-stage process which consists of several plausible particular
optimizations. Why the whole approach should be optimal at all is not considered at
all. Furthermore, since one period decisions are made in each model type, risks are not
distributed over time in an optimal way but in a kind of a static long term part and a
varying short-term allocation.
interval.
Are equities less risky than bonds in the long run? Siegel states (Siegel [1994]):
It is widely known that stock returns, on average, exceed bonds in the long run. But it
is little known that in the long run, the risks in stocks are less than those found in bonds
or even bills! [...] But as the horizon increases, the range of stock returns narrows far
more quickly than for xed-income assets [...] Stocks, in contrast to bonds or bills, have
never oered investors a negative real holding period return yield over 20 years or more.
Although it might appear riskier to hold stocks than bonds, precisely the opposite is true:
the safest long-term investment has clearly been stocks, not bonds.
Using the standard deviation, Siegel advices that long-term investors should buy and
hold equities due to the reduced risks of stock returns at long maturities. But such a risk
reduction only holds if stock returns are mean reverting: returns are not IID. But we
showed that a long-term buy-and-hold strategy is not optimal. The optimal strategy is
a strategic market timing strategy with a mixture of myopic and hedging demand parts.
If one follows Siegel's advice, the buy-and-hold investment strategy is not optimal. The
other logical direction is also true: an optimal long-term investment strategy does not
produce the suggested portfolio weights of Siegel.
The herding of pension funds. Pension funds consider, by their very denition,
an innite time horizon in their investments since each year there are new entrants to
the pension scheme. As long-term investors, one would expect pension funds to focus
on their long-term investment strategies. They should therefore behave dierently than
typical short-term asset-only managers. But there is a dierent investment motivation,
which may counteract long-term investment behavior: the fear of underperforming rela-
tive to their peer group, which denes such funds incentive to herd.
Such herding may be stronger for institutional investors than for private investors.
First, there is more trade transparency between institutional investors than between
private investors. Second, the trading signals that reach institutional investors are more
correlated and hence increase the likelihood of eliciting similar reactions. Finally, because
of the size of the investments, institutional herding is more likely to result in stronger
price impacts than is the herding of private investors. Therefore, to adopt a position, as
an institutional investor, outside the herd will have a stronger return impact than would
such a position if adopted by private clients.
Blake et al. (2015) study the investment behavior of pension funds in the UK, an-
alyzing - on an asset-class level - to what extent herding occurs. Their data set covers
UK private sector and public sector dened-benet (DB) pension funds' monthly asset
allocations over the past 25 years. They present information on the funds' total portfolios
and asset class holdings, and are also able to decompose changes in portfolio weights into
valuation eects and ow eects.
292 CHAPTER 3. FUNDAMENTALS THEORY
The authors also nd that pension funds mechanically rebalance their short-term
portfolios if restrictions in their mandates are breached. They therefore, on average, buy
in falling markets on a monthly basis and sell in rising markets. This is suboptimal given
the optimal investment rule (3.140). Therefore, pension funds' investments fail to move
asset prices toward their fundamental values, and hence do not stabilize nancial mar-
kets. The market exposure of the average pension fund and the peer-group benchmark
returns match very closely the returns on the relevant external asset-class market index.
This is evidence that pension fund managers herd around the average fund manager:
they could simply invest in the index without paying any investment fees.
As a nal result, the pension funds studied captured a positive liquidity premium
contrary to the expectation that these long-term investors should be able to provide
liquidity to the markets and earn a risk premium in return.
Chapter 4
Portfolio Construction
4.1 Steps in Portfolio Construction
So far, we did not consider the logic of portfolio construction but used dierent portfolios
in examples on an ad hoc basis. Several steps dene portfolio constructions:
• Allocation of assets: How much wealth (weight) do we invest at each date in the
specic assets?
The grouping of the assets or asset selection can be done on dierent levels:
• Single assets
• Risk factors
The implementation of the asset allocation can be done using dierent liquid assets:
• Cash products such as stocks and bonds
293
294 CHAPTER 4. PORTFOLIO CONSTRUCTION
• Options
Further implementation issues are liquidity, tax and compliance (eligibility, suitability
and appropriateness).
We assume that investors use the expected utility criterion as a rule of choice: The
higher the expected value is for an investment, the more is such an investment preferred.
Like any mathematical model, expected utility theory is an abstraction and simplica-
tion of reality. There exists a large academic literature which reports about systematic
violations of empirical behavior of investors compared to the expected utility theory pre-
dictions. A prominent theory is prospect theory by Kahneman and Tversky (1979) which
is also an optimization problem but typical behaviors of models such as in Markowitz
is enriched. But most investment theories used in practice are still based on expected
utility theory.
The theory assumes that investors form correctly beliefs and that they choose opti-
mal actions or decisions. The beliefs dene the probabilistic set-up about the dynamics
of future returns. One optimal action is the choice of the portfolio weights over time. The
optimal decision is based on the investor's preferences which are represented by her util-
ity function. Optimization requires to maximize expected utility subject to constraints
such as the budget constraint. Decision problems in term of mathematical optimization
are since decades an active eld of research.
If investors face situations where risks (probabilities) are not known, uncertainty
dominates. Then it makes no sense to rely on optimal investment theory but to use
heuristic reasoning, see Section 4.2.4.
We further assume that investors are impatient: They prefer 1 CHF today than 1 CHF
tomorrow.
The mean-variance model was the rst model in portfolio optimization based on the
return-risk trade-o. Markowitz stated in 1952: The investor should consider expected
return a desirable thing and variance of return as an undesirable thing. Three methods
are common to operationalize this principle:
1. Either the investor chooses a portfolio φ to maximize the expected return where
volatility cannot exceed a predened level σ, or
2. Volatility is minimized such that the expected return cannot be lower than a pre-
dened level r or
All solutions are equivalent. We formalize the ideas. Consider N risky assets with
a return vector R in a single period. The expected returns are µ = E(R) and the
covariance matrix C of the returns is given by
The objective is to maximize the quadratic utility function which reects the trade-o
between reward and risk:
θ
u(R) = φ0 R − φ0 (R − µ)0 (R − µ)φ .
2
θ
EP (u(R)) = φ0 µ − φ0 Cφ .
2
Optimization means to nd a portfolio φ which maximizes the above expected utility,
i.e.
θ 0 0
max EP (u(R)) = max φ µ − φ Cφ (4.1)
φ φ 2
1 We always assume that the utility functions are continuously dierentiable.
2 The factor 1 is used to cancel a factor 2 in calculating the optimal portfolios.
2
296 CHAPTER 4. PORTFOLIO CONSTRUCTION
1
φ∗ = C −1 µ . (4.2)
θ
The matrix C −1 is the information matrix. The elegance of this formula, which is
the simplest one in the Markowitz model, cannot be overestimated. Just plug in the
information matrix and the expected return and you will get an optimal portfolio. But
both inputs are not observable and hence must be estimated. What is the best way to do
this? This led to a half century of academic research and to frustration by practitioners
using this model. We consider the reasons in detail below. Suppose that there is only
one risky asset and risk free asset with return µf . Then the above optimal rule reads:
1 µ − µf
φ∗ = . (4.3)
θ σ2
The fraction
µ−µf
σ2
is the market price of risk. It is proportional to the Sharpe ratio.
An investor with zero risk aversion puts all the money in the asset with the largest
expected return. If risk aversion is not zero and since risk is always positive, the higher
risk, the lower the optimal level of expected utility. Formula (4.3) states that the optimal
amount invested in each asset is given by a mix of the expected returns of all assets with
the information matrix doing the mix. What is the intuition of how the information
matrix acts? Does it favours diversication? Again, we consider this below.
The success of mean-variance optimization is mathematically due to the success of
quadratic programming (QP) - easy to solve and available, powerful mathematical soft-
ware, solving Since mean-variance optimization does not imply diversication, the mean-
ingfulness of an allocation depends on the chosen constraints. Fortunately, many practical
problems with constraints can be rewritten as QP problems. A quadratic programming
(QP) is an optimization w.r.t. to a quadratic objective function and linear inequality
constraints:
θ
φ∗ = arg max φ φ
0
µ − φ0 Cφ , V φ ≤ Z (4.4)
2
where C is a N ×N matrix and V, Z are two matrices. The constraints Vφ ≤ Z allows
for equality constraints (budget - or full investment constraint), inequality constraints or
band constraints a ≤ φ ≤ b (asset class bounds in a TAA). QP problems are solved using
active set, gradient projection and interior point methods.
Several practical variation of the Markowitz problem are in fact QP, see Perring and
Roncalli et al. (2019) for details. The rst one is to consider a general benchmark b. The
expected excess return or expected tracking error reads µ() between an active managed
portfolio φ and the benchmark b
denes the active bets of the investor. The tracking error volatility TE is by denition
the volatility of the tracking error dierence:
p
TE = σ(φ, b) = σ(e) = (φ − b)0 C(φ − b) . (4.7)
Minimizing the tracking error volatility and maximizing the expected excess return (or
the alpha) can be written as QP
θ
φ∗ = arg max φ φ̃
0
µ − φ0 Cφ , (4.8)
2
where φ̃0 = φ + θCb is the regularized vector of expected returns, see below for regular-
ization. Second, consider index sampling, i.e. to replicate an index portfolio b with a
smaller number of assets than the index. The goal is to minimize tracking volatility under
some constraints such as full investment and long-only constraints which are linear. The
extra constraint which assures that the number of assets is smaller than the index asset
size can be written in the linear form
X
χ{ } ≤ Size of desired assets
where χ is the indicator function. Despite this non-linear function, the constraint is
linear and hence the whole problem is of the QP type. Other relevant models consider
a turnover constraint, i.e. the amount of sold and bought asset in an optimization is
limited, or transaction constraints, that is the net expected portfolio by subtracting the
bid and ask trading costs, is also a QP problem.
An insightful investor doubts that the probability law is known. He could therefore
consider the investment situation where dierent probabilities matter in the portfolio
choice problem. Then, uncertainty besides risk matter. Formally, let P be a set of
admissible probabilities. The optimization becomes
θ
min max EP (u(R)) = min max φ0 µ − φ0 Cφ . (4.9)
P ∈P φ P ∈P φ 2
The investor assumes that out of all possible probabilities (who denes this set?) the
worst one is chosen by a second player called 'nature'. This denes a robust optimization
problem. The solution will be more conservative than the original one. If one asset is
risk-free, this asset will attract a large part of the invested money. Although theoretically
sound, robust investments in this sense are hardly considered since the wealth allocation
is often to conservative and it is dicult to single out the set of admissible probabilities.
We do not consider this approach any further.
• The constraints dene the admissible set A(ξ). Examples are the full investment
constraint, the budget constraint, the max and min amounts for each asset class,
a turnover constraint or a downside risk bound.
4.2.1.1 Examples
Consider a single period investment problem where the investor derives utility u(W1 )
from nal wealth W1 . The investor chooses a portfolio φ ∈ P Rn for
n assets to maximize
E(u(W1 )) under the two budget P constraints at time 0 and 1: j j Sj,0 = W0 with Sj (0)
φ
the price of asset j and W1 = j φj Sj1 . The rst order condition (FOC) for optimality
reads:
E(u0 (W1 )(Ri − Rj )) = 0 , (4.10)
for all asset pairs i, j . This equation has several implications. First, Ri − Rj means
a long-short combination is optimal. Second, the FOC holds also if one asset is risk
free asset. Third, geometrically the condition states that the excess return vector and
marginal utility are orthogonal to each other, that is
3
Fourth, assume that the investor is risk averse u00 < 0. Then it is never optimal to
fully invest in the risk free asset. By contradiction, assume that the investor puts all
his initial wealth in the risk free asset. But then nal wealth W1 will be non-random
and also u0 (W1 ) is deterministic and it which can be taken outside the expected value in
(4.10). But then, unless all risky returns are the same, the FOC cannot be satised.
The utility function denes risk preferences. Consider an investor who is given the
choice a lottery that pays-o either 50 or 100 with the same probability or a lottery with a
guaranteed payo of 75: The bet has the same expected value as the guaranteed payo. A
risk-neutral investor is indierent between the two lotteries, a risk-averse investor prefers
the guaranteed payo.
4
Figure 4.1 shows the payo and utilities for the risk-averse and the risk-neutral in-
vestor. For the risk-averse investor, the expected value of the bet lies also on a straight
line but its utility value (yellow dot) is strictly lower than the utility of the guaranteed
payo (red dot). A risk-averse investor needs an extra compensation 'red minus yellow
dot' such that he becomes indierent.
reasonable strategy. But very constraint has an economic price; the shadow price. The
larger this price the lower is the constrained optimum compared to an unconstrained one.
Furthermore, adding many ad hoc constraints makes it dicult to explain whether a port-
folio is optimal due to the investor's preferences or due to the many constraints. Often in
wealth management several dozen constraints are imposed - constraints expressing client's
preferences ('not investing in hedge funds'), compliance constraints ('Chinese bonds are
excluded') or CIO-related constraints ('weight of Swiss equity is between 20 − 40% for a
specic investor').
We show the loss of utility in restricted optimization. The optimal value of an un-
restricted optimization problem is never lower than the value of a restricted problem.
Consider the minimization of the parabola u(x, y) = x2 + y 2 . The minimum is achieved
for the vector (0, 0) and the optimal value is u(0, 0) = 0. We
insert the restriction that
x + y = r > 0. This means that x and y are positioned on a line. The optimal values
r r r r2
are x = y =
2 and f ( 2 , 2 ) = 2 which is larger than the optimal unrestricted value. The
Lagrange multiplier λ associated to the constraint x + y = r has the value λ = r (shadow
price). Since the unrestricted optimum is a the origin, the larger we choose r , that is
the more distant the line is from the origin, the more value is lost. This is exactly the
statement of the shadow price.
Fact 66. Optimal dynamic investment allows to distribute investment risk not only in
the cross-section (single-period models) but also over time.
Despite the meaningfulness of multi-period models, most investment models used are
static ones. There are three main reasons. First, technology was not able in the past
to solve dynamic problem in time, i.e. machines were not fast enough. Second, most
asset managers are well-educated in static models but knowledge about dynamic models
is sparse. Third, yet static models are awed by parameter uncertainty (estimation risk).
The intertemporal set-up adds additional uncertainty.
Optimal dynamic investment is able to take into account changing future investment
opportunities in an optimal way. Static models do not have any foresight power to react
today what could happen in future periods. Changing investment opportunities are key
for long-term investors such as for pension funds.
Consider the case where you have to drive from New York to Boston. Using a repeated
static model (forward induction) you decide at each crossroad given the trac sit-
uation which direction to follow next. Using this strategy you will never arrive in Boston.
Dynamic optimally means that you start with the end in mind: You work backwards
starting in Boston. At each crossroad in the backward approach, you calculate whether
it is best to turn left or right knowing that all decisions which follow are optimal. Given
4.2. ALLOCATION - FOUNDATIONS OF INVESTMENT DECISIONS 301
The origins of GOP is attributed to Kelly (1956) which also leads to the Kelly cri-
terion in investment. Kelly was not interested in investment but wrote his work with
gambling and information theory in mind. The Kelly strategy, i.e. a GOP, is an optimal
strategy such that with probability one the strategy accumulates more wealth than
any other strategy. The expression 'with probability one' is key. Deleting this expression
leads to wrong statements and decisions concerning GOP.
To motivate GOP, consider a binary gamble, see Rotando and Thorp (1993). Let
W0 be initial wealth, Bk the bet k, p the probability of winning and q the probability of
loosing the bet. Then,
n
X
E(Wn ) = W0 + (p − q)E(Bk ) .
k=1
If the game's expectation is positive, p > q, then to maximize E(Wn ) is the same to
maximize E(Bk ) at each trial. Therefore, it is optimal to bet on all resources in each
trial - W0 = B1 is the starting bet. The ruin probability of such a strategy is 1 − pn ,
i.e you get bankrupt fast almost certain. Contrary, if one minimizes the ruin probability
then one also minimizes expected return. The GOP is an intermediate strategy between
these two over-aggressive or over-timid strategies.
Consider the strategy to invest a xed fraction c of present wealth in the next bet,
i.e. Bk = cWk−1 . If s is the number of successful bets and f the number of failures in n
bets, then
Wn = W0 (1 + c)s (1 − c)f .
If 0 < c < 1, ruin is not possible. Using the compounding identity
1/n
n log Wn Wn
e W0
=
W0
302 CHAPTER 4. PORTFOLIO CONSTRUCTION
1/n
Wn s f
log = log(1 + c) + log(1 − c).
W0 n n
Setting G(c) equal to the expected value of this growth rate, we get
3. The fastest time to reach a target wealth level starting from any level W is given
asymptotically by a strategy which maximizes expected log-wealth utility.
Rotando and Thorp apply the GOP to S&P investing using the data 1926-1984.
First, they calculate the probability of a return below a T-bill return. This probabil-
ity decreases from 38% for n = 2 years to 21% after ten years to 8% after 30 years.
The optimal xed fraction to invest is 117%, i.e. it is optimal to borrow 17% of exist-
ing wealth in each year. This suggests that GOP needs long-term investment horizons
and the optimal strategy is leveraged. Summarizing, a GOP has the theoretical advan-
tage of maximum rate of growth of wealth but it turns out to be too risky in practice.
These results triggered many discussions about the usefulness of GOP. A main cri-
tique was formulated from Samuelson in the 60's of last century. He states that if one
is not willing to accept a single bet then one rationally will never accept a sequence of
such bets: If the ruin probability is not acceptable for the rst year investment given c∗
I will never accept 30 bets of this type. This non-transitivity of preferences is refused by
Samuelson. Thorp answered that the limit GOP respects transitivity.
risk free rate r. Then even with a Sharpe ratio of 0.5 it would take almost 30 years to
beat the risk-free bond with a 90% probability.
Summarizing, GOP are too risky: No money manager can survive with a limit-oering
if he is hit say twice in the rst 5 years of his mandate. Impatience of investors rules
out any long term investment strategies which are focussing on maximum return growth
without controlling the possible nite shortfall risks. But controlling for short fall risks
in a mathematical way brings us back to a return-risk framework. A dierent approach
is to mix the mathematics of GOP with business experience by selecting those stocks
for GOP which are not expected to have shortfall risks. W. Buet seems to apply an
investment approach of this form.
One reason for the use of heuristics arises if one distinguishes between risk and uncer-
tainty. According to Knight (1921), risk refers to situations of perfect knowledge about
the probabilities of all outcomes for all alternatives. This makes it possible to calculate
optimal choices. Uncertainty refers to situations in which the probability distributions
are unknown or unknowable - that is to say, risk cannot be calculated at all. Situations
of known risk are relatively rare. Savage (1954) argues that applying standard statistical
theory to decisions in large, uncertain worlds would be utterly ridiculous because there is
no way of knowing all the alternatives, consequences, and probabilities. Using optimal so-
lutions in a world with uncertainty just adds non-controllable model risk. To understand
when people use statistical models in decision-making and when they prefer heuristics
requires the study of how the human brain functions, see Camerer et al. [2005] and
Plicher and Fehr [2013].
Ellsberg (1961) invented the following experiment to reveal the distinction between
risk and uncertainty.
8 An individual considers the draw of a ball from one of two urns:
• Urn B has 100 balls, with an unknown mix of red and black.
• USD 1 if the ball drawn from urn A is red and nothing if it is black.
304 CHAPTER 4. PORTFOLIO CONSTRUCTION
• USD 1 if the ball drawn from urn B is red and nothing if it is black.
Second, the same subjects are oered a choice between the following two bets:
• USD 1 if the ball drawn from urn A is black and nothing if it is red.
• USD 1 if the ball drawn from urn B is black and nothing if it is red.
In both cases, the rst bet is generally preferred in experiments. That is, individuals belief
in the rst case that the number of red balls in urn B is less than 50% and in the second
case the same individuals assume that the number of black balls in urn B is also smaller
than 50%. This probability assessments are inconsistent. Ellsberg's interpretation was
that individuals are averse to the ambiguity regarding the odds for the ambiguous urn B.
They therefore prefer to bet on events with known odds. Consequently they rank bets
on the unambiguous urn, A , higher than the risk-equivalent bets on B.
Caballero (2010) and Caballero and Krishnamurth (2008) consider the behavior of
investors in the following ight-to-quality episodes:
• 1970 - Default by Penn Central Railroad's prime-rated commercial paper caught the
market by surprise.
• 1987 - Speed of the stock market's decline led investors to question their models.
They nd that investors were re-evaluating their models, used conservative behav-
ior or even disengaged from risky activities. These reactions cannot be addressed by
increasing risk aversion about macroeconomic phenomena. The reaction of investors in
an uncertain environment is fundamentally dierent from a risky situation with a known
situation and environment.
In spring 2015 uncertainty about the future of Greece in the EU increased. Four
dierent scenarios were considered:
4.2. ALLOCATION - FOUNDATIONS OF INVESTMENT DECISIONS 305
• Status quo. Greece and the EU institutions agree on a new reform agenda such
that Greece receives the remaining nancial support of EUR 7.2 billion from the
second bailout package.
• Default with subsequent agreement between the EU and Greece. There is no agree-
ment under A. Greece fails to repay loans and there will be a bank run in Greece.
The ECB takes measures to protect the European banking sector.
• Grexit - that is, Greece leaves the eurozone. Greece stops all payments and the
ECB abandons its emergency liquidity assistance. Similar conclusions hold for the
Greek banking sector as under C. Greece needs to create a new currency since the
country cannot print euros.
The evaluation of the four alternatives is related to uncertainty and not to risk: the prob-
ability of each scenario is not known, there are no historical data with which to estimate
the probabilities, and the scenarios have dependencies but they are of a fundamental
cause-eect type, which cannot be captured by the statistical correlation measure. This
shows that valuable management is related to situations which are based on uncertainty.
The use of 'uncertainty' and 'risk' does not follows clear standards and conventions
in practice. A volatility index such as VIX is sometimes called a measure of uncertainty:
If volatility increases one often states that uncertainty increases. Strictly speaking this
makes no sense since the VIX is a calculated index of risk. Hence, risk increases or
decreases but this has a priori no relation to uncertainty. A similar logic is that investor
state if uncertainty increases often markets become more volatile and equity markets fall
(negative leverage eect) or if uncertainty increases, then credit spreads of corporates or
governments should widen. Again risk and uncertainty are used interchangeably. 2016
provides an example that one should not mix risk and uncertainty. In 2016 many events
happened where it was impossible to calculate risk - Brexit, election of Trump, increasing
geopolitical tensions in the Middle East, political instability in major countries such as
Brazil and Turkey for example. There were for example no data to assess the risk of the
Trump election. But if large uncertainty means large risks, then heavy market reactions
should follow. But most assets classes ended the year with positive returns. There was
almost no market reaction to the events. Furthermore, plotting an uncertainty index such
as policyuncertainty.com versus credit spreads measured in USD show that uncertainty
increased in 2016 while the spreads fell.
306 CHAPTER 4. PORTFOLIO CONSTRUCTION
The 60/40 portfolio turns out to be not diversied enough when markets are dis-
tressed or booming. The dot-com bubble and the nancial crisis of 2008 revealed that
dierent asset classes moved in the same direction and behaved as if they were all of
the same type, although capital diversication was maintained: Risk weights are not the
same as dollar weights.
Deutsche Bank (2012) reports the following risk contributions using volatility risk
measurement for 60/40 portfolios with S&P 500 and US 10y government bonds. The
long-term risk contribution, 1956 to 2012, by asset class was 79/21 percentage dierent
from a 60/40 capital diversication. The risk contribution in extreme market periods of
US government bonds varied between 53% in 19981 and 7% in 1973.
The left panel in Figure 4.2 illustrates the strong positive correlation between equity
and bonds: In the left panel, world wide equity portfolios are compared to a balanced
equity and bond portfolio. The linear relationship between the two returns with low vari-
ability indicate that a single global equity portfolio is as good as a balanced equity bond
portfolio. The performance and risk of traditional balanced portfolios is mostly driven by
the equities quota. The R2 is 95%, i.e. 95% of the risk is explained by equity risk. Hence,
asset classes consist of a bundle of 'risk factors' where the same risk factors can belong
to several asset classes. This extends to all asset in the case of systemic liquidity events:
The monthly dollar returns between the classic asset classes and alternative classes show
rather low correlation between 2000 and 2007 but increase sharply during the GFC and
remain elevated as the sovereign debt crisis follows in 2011. This failure of alternatives
to diversify during the GFC led to critique about the diversication concept based on
asset classes per se, see Figure 4.2. In the middle panel commodities and hedge funds are
added to the balanced portfolio. While the variability increases one still sees that equity
risk factors are driving the returns, the allocation of risk is only slightly improved. Still
90% of risk is explained by the equity risk factor. Finally, if one replaces equity by bonds
in the right panel, a cloud-type scatter plot follows. This indicates that equity and not
bond risk factors are the return drivers.
The time varying correlation in Figure 2.18 shows that the correlation between stocks
and bonds varies over time. Historically, periods of rising ination and heightened
sovereign risk have driven stock and bond correlations sharply positive. In contrast,
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 307
Figure 4.2: Left Panel: Monthly return equities world vs monthly return balanced port-
folio (Equities world: 50%, bonds world: 50%), Bloomberg: 12/1998-3/2013. Middle
Panel: Monthly return equities world vs monthly return balanced portfolio (Equities
world: 40%, bonds world: 40%, commodities: 10%, hedge funds global: 10%) Com-
modities database: DJUBSTR, Hedge Funds database: HFRXG. Right Panel: Monthly
return bonds world vs monthly return balanced portfolio (Equities world: 50%, bonds
world: 50%) Bloomberg: 12/1998-3/2013, local data.
correlations often turned negative when ination and sovereign risk were at low levels.
Summarizing, the 60/40 asset allocation based on asset classes correlations between
asset classes is time-varying, not risk-stable and dicult to forecast. Risk weights are
308 CHAPTER 4. PORTFOLIO CONSTRUCTION
not the same as dollar weights. Asset classes seem not to be the right level for risk
aggregation.
The maximization is done subject to the dynamic budget constraint for the wealth dy-
namics Wt . Wealth growth is driven by the price evolution of a single risky asset S, a
risk free asset B and the consumption rate at each date. The risky asset S dynamics fol-
lows a geometric Brownian motion with constant drift µ and volatility σ and the growth
rate of the risk free r. Inserting this information provides us with the dynamic budget
constraint
dW = (φµW + (1 − φ)rW − c)dt + σφW dB
with B the standard Brownian motion. The optimality principle of Bellman starting in
t0 for a period t0 + dt reads:
Z t0 +dt
V (t0 , W0 ) = max E u(t, c, W )dt + V (t0 + dt, W0 + dW ) . (4.12)
c,φ t0
Hence, the value at t0 is equal to the sum of optimal utility over short time dt plus the
value reached at t0 + dt, i.e. all decisions are optimal after t0 + dt. Expanding the future
value in a Taylor series, using the dynamics of the assets transforms the above equation
into a non-linear partial dierential equation for the value function J. The solution of
this equation implies the following optimal strategies:
9
1 µ−r 1
V (W ) = α∗ W a , c∗ = W (aα∗ ) a−1 , φ∗ = (4.13)
σ2 1 − a
where α∗ is the explicit solution of an algebraic equation involving the preference and
growth rate parameters. The optimal investment in the risky asset φ∗ is equal to the
µ−r 1
market price of risk (MPR) times the relative risk aversion
σ2 1−a . The MPR is itself
proportional to the Sharpe ratio (which is also the solution of the Markowitz problem).
This validates the claim that the Markowitz problem also holds in a dynamic context
unless the investment opportunity sets are changing over time, see Section ??. Optimal
consumption is proportional to the wealth level which is reasonable. There are many
extensions of the basic Merton model - such as many assets, adding income, allowing for
a bequest motif, adding linear investment constraints. As a fact, analytical tractability
is lost in most extensions.
Assume that risk is needed to nance the goals. Goal based investment (GBI) means
to nd a strategy φ(t) which maximizes the probability
To this objective function one adds the asset dynamics, the initial wealth level and
additional constraints. Assume that there are N risky assets which all are coupled by
a time-varying but deterministic covariance matrix C and where each asset has a time-
varying expected return µ(t). There is a risk less asset with a time-varying deterministic
short-term rate r(t). The asset dynamics denes the wealth dynamics dWt starting at
W0 . The optimal policy, using the Bellman Principle, is derived by Browne (1999):
C −1 (t)Θ(t) φ N −1 (z(t))
S
φ (t) = qR Wt (4.15)
T 0 Θ(s) z(t)
t Θ(s)
RT
with the discount factor D(t, T ) = e− t r(s)ds
, φ the density function of a standard nor-
mal distribution, N the associated cumulative distribution function, Θ = C −1 (µ − re)
W0 −1
the market price of risk (MPR), e a N -dimensional unit vector, z(t) = G T
D (t, T ) the
percentage of the discounted goal reached at time t
• The investor or asset manager at each date t observes the optimal wealth Wt and
then chooses the investment for the next (innitesimal) period according to the
310 CHAPTER 4. PORTFOLIO CONSTRUCTION
optimal formula. The problem can be discretized in order to obtain real investment
periods.
• At each date the deterministic expected means and covariances enter. These func-
tions can be determined by the CIO oce or the advisory function using a SAA
and TAA approach. Besides the actual values also the values for the remaining
life-time matter. Therefore, by changing these forecast values at time t implies a
reshaping of the optimal investment policy at this date. Given the simplicity of
the optimal formula the investment universe can be set-up by a large number of
dierent assets ensuring diversication of wealth growth.
• Suppose that all assets lose in value from the beginning for some time. Then if
wealth has dropped enough in value there is not enough time left such that the
wealth level can beat the goal. Then, the investor has to borrow or to inject
additional money or to reduce the size of the goal. Browne shows in an example
that for T = 10y the wealth has to drop more than 62% in the rst year in order
to need to borrow. If there is only one month left, then the investor must borrow
unless wealth is already of 88% distance to the investment goal.
The approach can be generalized to include income and consumption streams, beating a
benchmark portfolio and controlling for downside risk, see Browne (1999).
Denition 68. 1. If a portfolio oers a larger expected return than another portfolio
for the same risk, then the latter portfolio is strictly dominated by the rst one.
2. Portfolios that are not strictly dominated are called mean-variance ecient.
The set of these portfolios form the ecient frontier.
3. The portfolio φm at the point D is the global minimum variance (GMV) port-
.folio
The lines 1, 2b and the line between D and B are ecient frontiers.
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 311
Figure 4.3: Portfolio frontiers in the two-asset case. The portfolio opportunity set is a
hyperbola in the portfolio coordinates expected return and standard deviation.
1. There are N risky assets and no risk free asset. Prices of all assets are exogenous
given.
2. There is a single time period. Hence risks cannot distributed over time but only in
the cross-section.
5. Assets are innitely divisible. Without this assumption, we have to rely on integer
programming which makes sense and which today is feasible.
8. The vectors e, µ are linearly independent. If they are dependent then the optimiza-
tion problem does not have a unique solution.
312 CHAPTER 4. PORTFOLIO CONSTRUCTION
9. All rst and second moments of the random variables exist, i.e. the mean and
covariance are not dened.
We dene the auxiliary variables: a = hµ, C −1 µi, b = he, C −1 ei, c = he, C −1 µi, ∆ =
ac − b2 and
a c
A= .
c b
Proposition 69. Consider N risky assets and the above assumptions. Then the Markowitz
problem
1
minn hφ, Cφi (M) (4.16)
φ∈R 2
s.t. he, φi = 1 , hµ, φi = r .
with
φ∗1 C −1 µ
−1
=A . (4.18)
φ∗2 C −1 e
The portfolio weights are linear in the expected portfolio return r. Inserting φM V
2
into the variance implies the optimal minimum portfolio variance σp -hyperbola:
1 2
σp2 (r) = hφM V , CφM V i =
r b − 2rc + a . (4.19)
∆
Diversication in the mean-variance model means that adding more assets causes the
ecient frontier to widen: for the same risk, a higher expected return follows (see Figure
4.4).
The Markowitz model fails to be stable in the following sense. Consider a GMV
portfolio with two assets, hence the optimal portfolio only depends on covariance but
not on returns. Suppose that both assets have a volatility of 20 percent and full positive
correlation of 1. Then, the optimal weights are 50 percent in each asset. Suppose next
that asset 1 has only 19.9% volatility, all other numbers unchanged. Then, 100 percent
is invested in this asset and zero in the second one.
Example
Consider three assets with expected returns (20%, 30%, 40%) and covariance
0.1
C = 0.08 0.15
0.09 0.07 0.25
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 313
Figure 4.4: Dierent ecient frontiers for dierent numbers of assets. It follows that
adding new assets allows for higher expected return for a given risk level (measured by
the portfolio standard deviation). The portfolio with the lowest standard deviation is
the global minimum variance (GMV) portfolio (Ang [2012]).
We assume that the investor expects a minimum return of r = 30%. He could then fully
invest in asset 2 to achieve this return goal. But the optimization φM V shows that he
can reach this target with lower risk, where
Example
0.1
C= .
−0.1 0.15
Asset 1 seems more attractive than asset 2. It has a higher expected return and lower
risk. Naively one would invest fully in the rst asset. But negative correlation makes
an investment in asset 2 necessary to obtain an optimal allocation. The expected return
constraint is set equal to r = 0.96. We consider four strategies:
• φ2 = ( 12 , 21 ), an equal distribution.
• φ3 = (5/9, 4/9), optimal Markowitz strategy without the expected return con-
straint.
• φ∗M V = (0.6, 0.4), optimal Markowitz solution with the expected return constraint.
The following expected portfolio returns and risk for the dierent strategies hold:
Strategy µ σP
φ1 1 0.1
φ2 0.95 0.0125
φ3 0.955 0.011
φ∗M V 0.96 0.012
φ1 satises the expected return condition but risk is much larger than in all other
strategies - lack of diversication. The risk of φ3 is minimal but the return is smaller than
required. To generate the return and keep risk minimal, 40 percent has to be optimally
invested in the not very attractive asset. This is the Markowitz phenomenon: to reduce
the variance as much as possible, a combination of negatively correlated assets should be
chosen.
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 315
Figure 4.5 shows that an under-diversied portfolio follows for portfolios on the e-
cient frontier which becomes more pronounced for higher risks. Furthermore, the steep
Figure 4.5: Ecient allocations for 21 dierent portfolios. The rst portfolio is the GMV
portfolio and moving to the right optimal portfolios on the ecient frontier follow. Data
1991-2016, monthly data, long-only portfolio constraint.
vertical changes in the asset allocation indicate that the allocations are not robust:
Small changes in covariance data lead to large changes in the asset allocations. Does
a Markowitz portfolio provides a reasonable diversication for portfolios over time?
The answer, see Figure 4.10, is again no: One observes an under-diversication and
non-stability of the asset allocation.
Proposition 70. Any minimum variance portfolio can be written as a convex combina-
tion of two distinct minimum variance portfolios.
10 Sometimes called the 'two fund theorem' or 'separation theorem'.
316 CHAPTER 4. PORTFOLIO CONSTRUCTION
Formally, if φ∗M V (r) is any optimal minimum variance portfolio, then there exists a
function ν(r) for any two other optimal minimum variance portfolio, φ∗1 (r), φ∗2 (r), such
that
φ∗M V (r) = νφ∗1 (r) + (1 − ν)φ∗2 (r). (4.20)
The entire mean-variance frontier curve can be generated from just two distinct portfolios.
This holds since the ecient frontier is a one-dimensional ane subspace in Rn . The
Mutual Fund Theorem allows investors to generate an optimal portfolio by searching for
cheaper or more liquid portfolios and invest in these portfolios in the prescribed way.
This theorem led to the growth of mutual fund and ETF industry. The Mutual Fund
Theorem also holds for some dynamic models such as the Merton model of last sections.
But if there are risk sources for assets which cannot be hedged then more than two funds
are needed to construct an optimal investment strategy. In general, structure of the
investor's preferences and the structures of the asset markets both determine whether a
mutual fund theorem is valid.
The ecient frontier is a straight line which has at least one point in common with
the ecient frontier - the case, where it is be fully invested in risky assets. The portfolio
where the two frontiers intersect is the tangency portfolio T (see Figure 4.6; left panel).
Natural candidates for the mutual fund theorem are the tangency portfolio and the
risk-less-asset investment. In the right panel of Figure 4.6, dierent portfolios on the
ecient frontier are shown. The investors can add cash to become more conservative
or borrow cash for an aggressive investment. The portfolios on the Capital-Market-Line
(CML) depend on the investor's preferences θ in (4.1). The higher risk aversion, the
closer is the point in the CML to the risk-free investment. Ang (2012) estimates an
aggregate risk aversion parameter value as follow. He calculates the optimal minimum
variance portfolio using USA, JPN, GBR, DEU, and FRA risky assets only. Then he
adds a risk-free asset and searches for the point on the CML that delivers the highest
utility. This point implies a risk aversion of θ = 3. The optimal portfolio with a risk-free
asset can be seen in Figure 4.6 in the region where the aggressive investor is shown. The
investor is long on all risky assets and short on the risk-free asset. But in reality, only
half of investors invest their money on the stock market and the remainders keep their
money risk free. In some European countries stock market participation is lower than 10
percent. This is the non-participation puzzle of mean-variance investing.
µT − Rf
µp = Rf + σp
σT
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 317
Figure 4.6: Mean-variance model with a risk-free asset. Left panel - straight line ecient
frontier (CML), which is tangential to the ecient frontier when there are risky assets
only. The tangency point T is the tangency portfolio where investment in the risk-free
asset is zero. Right panel - investors' preferences on the ecient frontier. Moving from
the tangency portfolio to the right, the investor starts borrowing money to invest in the
risky assets. The investor is short cash in this region to nance the borrowing amount.
with µT , σT the expected mean and standard deviation of the tangency portfolio, respec-
µT −Rf
tively. The slope of the CML is the Sharpe ratio SR = σT . This is the price of one
unit risk of for an ecient portfolio.
To gain some ideas about stress periods, table 4.1 reports data about periods when
Swiss stock market faced a stress. Besides the maximum drawdown, the time period
where prices were falling and when they rebound are shown. The last two periods rep-
resent the global nancial crisis and the dot-com bubble, respectively. On average it
takes longer for the markets to recover than to drop and a second observation are the
heavy maximum drawdowns. This illustrates that also in an optimal portfolio choice the
318 CHAPTER 4. PORTFOLIO CONSTRUCTION
Table 4.1: Periods involving large drawdowns in Swiss equity markets. The drawdown is
the measurement of the decline from a historical peak. The maximum drawdown (MDD)
up to time T is the maximum of the drawdown over the overall time period considered,
yfp means years with falling prices, yrp years with rising prices and Av. average (Kunz
[2014]).
where µ is the portfolio return, σ the volatility of the portfolio return, and k(a) is a tabu-
lated function of the condence level 1 − a. Hence, under normality, VaR is proportional
to volatility. This translates into the optimization problem: Mean-variance is equivalent
to mean-VaR by rescaling the volatility.
Figure 4.7 shows an ecient frontier and several VaR constraints, i.e. the problem is
to maximize expected return under the constraint
P (R ≤ x) ≤ a . (4.22)
This VaR(a) constraints dene straight lines assuming normality. The impact on the
optimal portfolio choice is as follow. Starting with the benchmark of say −3% = x loss
where ? = is the Dollar VaR amount. Hence, the probability of the loss is given; the loss amount is
unknown.
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 319
capacity, the straight blue lines. The intersection between this line and the mean variance
frontier select the optimal mean-VaR portfolio. If loss capacity increases, the line moves
parallel to the right implying higher possible optimal risks and returns. The same eect
follows if for a xed loss capacity the condence level is lowered - more risk and return
becomes optimal.
Increasing Confidence
Expected
Return 99%
95%
Equities
Bonds
Portfolio Risk
Increasing Loss Capacity
-3%
-7% -9%
Figure 4.7: Mean-drawdown optimal portfolio. The straight lines represent the VaR
constraints. If loss capacity increases, the VaR lines move to the right indicating higher
risk and returns in the optimal portfolio. The same result follows if the condence level
is lowered.
We conclude with two VaR calculations. Consider a position with value USD 1
million. Assuming normality of returns, the goal is to calculate the one-day VaR on the
95 percent level. The estimated daily mean is 0.3 percent and the volatility is 3 percent.
With k(5%) = 1.6449 follows
Therefore, on average in 1 out of 20 days the loss is larger than the calculated VaR of
USD 52, 347.
To be more realistic for AM, we calculate VaR for an Euro investor with the portfolio:
There are three equity risk sources (DAX, DJ, Novartis), two FX risks USDEUR (spot
1.05) and CHFEUR (spot 0.8) and US interest rate risk for the bond, i.e. 6 risk factors.
The goal is to calculate the weekly Euro VaR on a 95% level.
320 CHAPTER 4. PORTFOLIO CONSTRUCTION
We rst need the variance and covariance information, then the calculation of the
exposure in Euro and the allocation of the EUR exposure to the risk factors using market
data in the following table. We rst calculate the EUR exposure and the allocation of the
EUR exposure to the risk factors: The portfolio variance σp2 is given by σp2 = hX, CXi
Table 4.4: EUR exposure and allocation of the exposure to the risk factors.
where X is the EUR exposure vector allocated to the risk factors and Cij = σi σj ρij .
2
Calculating these matrix products gives σp = 1600 8040 032. This is the value on an
annual basis. To obtain the result on a weekly basis, we obtain for the variance
The critical value on the 95% level is k95% = 1.644853. This implies the 1w EUR VaR
of using
√ √ √
−VaRα = σkα T = X 0 CXkα T
where the drift is zero:
∂ VaR √ CX CX
= kα T √ = kα2 T 2 .
∂X 0
X CX VaR
X ∂ VaR √ X (CX)j X
VaR = Xj = kα T Xj √ = VaRj . (4.23)
j
∂Xj X 0 CX
j j
Applying this to the portfolio, the US Treasury bond is negative: Due to its negative
correlations to the other factors the VaR is reduced by 6 percent. The largest VaR
contribution is from the DAX risk factor with 31 percent, although the exposure is only
10.5 percent. The contribution of USDEUR is 19 percent to VaR whereas the factor
exposure is the largest one of 36 percent.
φ = φGM V + φX .
To introduce the SAA, we use the unconditional long-term (equilibrium) mean of the
returns. Adding and subtracting the long-term mean µ̃ in the second component, the
solution can be written after some algebra in the form:
13
φ = φGM V + φS + φT . (4.24)
The second and the third component are the SAA and the TAA component, respectively.
The sum of the three components is an ecient portfolio.
Each SAA component φj,S is proportional to µ̃j − µ̃k for k 6= j . If the long-term fore-
casts of all assets are the same, the SAA component is zero. If the long-term forecasts
dier, the holdings are shifted to the asset with the higher equilibrium return. The size
of pairwise bets depend on the relative risk aversion θ and the covariance C which enter
12 We have
C −1 he, C −1 µi
φX = (µ − e)
θ he, C −1 ei
.
13 φS = 1 C
−1
(µ̃e0 −eµ̃0 )C −1
e and φT = 1 C
−1
((µ−µ̃)e0 −e(µ−µ̃)0 )C −1
e .
θ he,C −1 ei θ he,C −1 ei
322 CHAPTER 4. PORTFOLIO CONSTRUCTION
φS . The sum of the GMV and the strategic portfolio is called the benchmark portfo-
lio in the asset management industry and the strategic mix portfolio in investment theory.
for k 6= j . Hence, there are again bets between the assets case where there are no
bets against the same asset and the bets are of an excess return type with the SAA as
benchmark. For N assets, there are N (N − 1)/2 bets. As in the SAA case, the bets are
weighted by the covariance matrix and the relative risk aversion.
Proposition 71. Consider the active risk and return optimization in (4.25) with the full
investment constraint. The ecient frontier are straight lines in the (σ(ψ, b), µ(ψ, b))-
space. Inserting further linear constraints, the ecient frontier are non-degenerate hy-
perbolas.
• Mean-variance (MV), equal weights (EW), Global Minimum Variance (GMV) and
equal risk contribution (ERC) are four strategiess
• Risk parity (RP). The optimal portfolio weights are chosen proportional to inverse
volatility. This approach mimics negative leverage in the markets - if asset prices
fall, volatility rises. This strategy ignores the correlation structure.
Table 4.5: Risk and return gures for the dierent investment strategies. (Ang [2012]
and own calculations).
The mean-variance portfolio is the strategy with the worst performance: choosing
market weights, diversity weights, or EW leads to higher returns and lower risk. A reason
for the outperformance of the global GMV is that there is a tendency for low-volatility
assets to have higher returns than high-volatility assets.
First the Markowitz model is the most used model in portfolio allocation. There
are two main reasons for this fact. First its simple and convincing economic assumption
about the risk and return trade-o. Second, it denes a quadratic optimization prob-
lem (QP). This means hµ, φh− 2θ hφ, Cφi is minimized under a set of linear constraints
Aφ ≤ b with A a matrix and b a vector. In its simplest form QP problem are even
analytically solvable. Adding more constraints, the problem is numerically approached
where decades of research in this direction provide ecient algorithms. Summarizing,
portfolio optimization with a benchmark, a tracking-error problem, also the problem of
Black-Litterman with views, index sampling, turnover constraints and the case with lin-
ear and quadratic transaction costs are all QP! Its specic mathematical form is therefore
a success factor for mean-variance portfolio allocation.
Explained why the mean-variance analysis is successful, we consider its general prop-
erties:
• Portfolio theory in general and the mean-variance approach in particular are as-
sumed to be related to diversication. But what does this really mean?
We start with the diversication issue and recall that optimal investment is proportional
in the basic Markowitz model to C −1 µ: the information matrix mixes expected returns
for the optimal allocation and not the covariance matrix. But what can be said about
the information matrix? Stevens (1998) derives an expression for the information matrix
in the Markowitz model. Using general matrix inversion, the OLS regression of Rt,i on
the return of all other assets Rt,−i plus a noise term which is normally distributed with
mean zero and variance σi2 reads:
Proposition 73 (Bourgeron et al. (2018)) . Consider the standard Markowitz model 4.1.
φ∗i = φ∗i,0 + ω(φ∗i,0 − φ∗i,h )
where φ∗i,0 is the optimal portfolio by assuming zero correlation, φ∗i,h is the optimal port-
folio of the hedging strategies and ω is the leverage dened as the ratio between the id-
iosyncratic variance and the tracking error variance.
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 325
If tracking error is small, a larger leverage follows. This characterization shows that
MV diversication means to leverage a hedge portfolio: The MV optimal portfolio is an
aggressive portfolio by selecting a few bets!
To control for this incentives one uses constraints. The simplest one is full invest-
P
ment i φi = 1. Solving (4.1) with such a constraint amounts to consider a Lagrangian
function and then calculating the First-Order-Condition (FOC). Further constraints can
be added. Real optimizer in asset and wealth management can consider up to hundreds
of constraints. This destroys analytical tractability and in some sense leads optimization
ad absurdum: If you know what you want by imposing many constraints why don't you
simply state the investment policy? Furthermore, each constraint has an economic price,
the shadow price, which reduces unconstrained utility. This should be made transpar-
ent to the investor what the economic price of his own constraints are, such as loving a
particular stock, and what the price of constraints are induced by the AM rms such as
the band with of the SAA and TAA. From an ecient frontier perspective, adding con-
straints transforms the hyperbola into piecewise straight lines and piecewise hyperbolas,
Globally the constraint frontier lies below the ecient frontier and shifted towards more
risk.
We now consider the second challenge, how to complement the Markowitz optimiza-
tions such that for example solutions are smooth, i.e. varying inputs slightly should lead
to smooth changes of the allocation, controlling for a smooth rebalancing and controlling
1 −1
for turnover costs. Reconsidering the optimal portfolio φ = θC µ the two inputs C
and µ need to be estimated. If there is estimation error in the covariance matrix this
will be amplied for the information matrix. Since covariance matrices are large, say
1000 × 1000 matrices, and µ is also an estimate the solution of the optimality condition
is only an approximation to the unique solution if the inputs were known. We discuss
this issue below. But also the returns need to be estimated. This is not easier than for
the covariance matrix - the myriads of factors and factor all try to provide to explain or
predict returns.
The main strategy is to add an additional term T to the quadratic utility function
(Thikonov Regularization)
θ
hµ, φi − hφ, Cφi − c||Γφ − φ0 ||22
2
with c > 0, Γ a matrix, φ0 an initial portfolio and || . . . ||2 the Euclidian norm. c con-
trols the importance of the regularization term. These terms are commonly added to
promote sparsity or to reduce sensitivity to outliers. There are many dierent ways how
regularizations can be implemented.
The de-noising techniques of the covariance matrix are not sucient for obtaining
the stability of the solution. Figure 4.9 provides the intuition. Each covariance matrix
can be diagonalized where the eigenvalues are all positive, real numbers in the diagonal
matrix. Ordering the eigenvalues according to their size, we show below that the largest
326 CHAPTER 4. PORTFOLIO CONSTRUCTION
Figure 4.8: Ridge solution for a portfolio with and without a target. The gures shows
how the ridge solution provides smooth portfolio components (denoted by xj ) as a func-
tion of the control parameter c, (Source: Roncalli (2018))
eigenvalues of C account for portfolio risk. The smallest eigenvalues however matter for
the information matrix C −1 which is the proportion constant for the optimal investment
rule: The noisy eigenvalues drive optimal investment. Regularization techniques handle
this small eigenvector problems. But this is not sucient to obtain a meaningful optimal
asset allocation since as stated above, the Markowitz model makes bets on the long-short
portfolio of expected return minus the beta hedge. But these factors are distributed on
the whole range of eigenvalues. Therefore, considering the largest ones and treating the
smallest one using regularization leaves out all intermediate eigenvalues which impact
also the stability and smoothness of the optimal allocation. Hence, more than de-noising
of the covariance matrix is needed.
• Compare two constraint models. Is one allocation better than the other because
of a better model or because of the chosen constraints? Constraints are ad hoc,
discretionary decisions that impact a model's performance in a complicated way.
Size of eigenvalus
Eigenvalues of a covariance
matrix C
φ1 = 10100 and φ2 = 1 − 10100 are two solutions which we do not like. Adding a penalty
||φ||1 := |φ1 | + |φ2 | an optimal solution is the sparse solution φ1 = 1, φ2 = 0. For dierent
choices of c, Γ, dierent well-known regularization approaches follow. If Γ is set equal
to the identity matrix, the ridge regularization follows. We next denote Ĉ the unbiased
empirical covariance matrix, F an estimator which is biased but converges more quickly
∗
than the empirical covariance, Ĉ(ν) := ν Ĉ + (1 − ν)F and if ν is the minimizer of
2 1−ν ∗
E(||Ĉ(ν) − C|| ). If we set c = ν ∗ and Γ equal to the Cholesky decomposition of F ,
then the Ledoit-Wolf covariance shrinkage method follows, see Section 4.4.2 for a discus-
sion. We discuss regularization in dierent sections below.
For further reading, in addition to BL, we cite Walters (2014), Satchell and Scowcroft
(2000), Brand (2010), Meucci (2010), Idzorek (2006), Herold (2003), and He and Litter-
man (1999).
• The equilibrium market portfolio serves as a starting prior for the estimation of
asset returns.
For non-linear view, consider entropy pooling. In the construction of BL, the rst step is
to dene the reference model. Assume that for the returns RN (µ, C), where both mean
and covariance are unknown. Since the goal of BL is to model expected returns we start
with a model for the mean: µ ∼ N (π, Cπ ). Hence, µ = π + with ∼ N (0, Cπ ). The
covariance of the returns CR about the estimate π is - given µ and are not correlated -
is given by
CR = C + Cπ . (4.26)
Therefore the reference BL model is given by R ∼ N (π, CR ). The mean π represents the
best guess for µ, and the covariance Cπ measures the uncertainty of the guess. How do
we x π , the prior estimate of returns, that is to say the returns before we consider views?
Using the CAPM means that all investors have a mean-variance utility function.
Without any investment constraints, the optimal strategy φ maximizes the expected
utility given in (4.1)
θ
E(u) = φ0 π − φ0 Cφ ,
2
where we have replaced the expected returns by the unknown expected return estimate π.
The solution gives us the optimal strategy φ as a function of the return and covariance:
φ = 1θ C −1 π .
Given the equilibrium strategy φ in the CAPM we immediately get the excess return
estimate
π = θCφ . (4.27)
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 329
How do we x the risk aversion parameter? Multiplying (4.27) with the market portfolio
φ0 implies that
2
RM − Rf = θσM (4.28)
with RM the total return of the market portfolio. In other words, the risk aversion pa-
rameter is equal to the market price of risk. Using (4.28) in (4.27), the CAPM species
in equilibrium the prior estimate of returns π.
We consider next the insertion of views, where we follow Walters (2014). A view is
a statement on the market. Views can exist in an absolute or relative form. A portfolio
manager can, for example, believe that the fth asset class will outperform the fourth
one. BL assumes that views
• are fully invested (the sum of weights is zero for relative views or one for absolute
views), and
More precisely, an investor with k views on N assets uses the following matrices:
• The k×1 vector Q of the returns for each view. That is, Pπ = Q expresses the
views.
• The k×k diagonal matrix Ω of the covariance of the views, with ωnn the matrix
entries. The matrix is diagonal as the views are required to be independent and un-
correlated. The inverse matrix with the entries 1/ωnn are known as the condence
in the investor's views.
The conditional distribution of the mean and variance can be represented in the view
space as
P (View|Prior) ∼ N (Q, Ω) .
Since the matrix P is in general not invertible, this expression cannot be be written in a
useful way in the asset space. But using Bayes' theorem, a posterior distribution of the
returns that blends the above prior and conditional distribution follows. Since the asset
returns and views are normally distributed, the posterior is also normally distributed. It
is given by the Black-Litterman master formula for the mean returns πBL and the
covariance CBL
I −1 µ
πBL = Cπ−1Ω (4.29)
P Q
CBL = C + Cπ
I −1 I
Cπ = Ω .
P P
330 CHAPTER 4. PORTFOLIO CONSTRUCTION
The parameters Ω and C are not observable and must be xed additionally. C is typically
replaced by the estimated covariance matrix C
b. There are several ways of specifying Ω.
One can assume that the variance of the views will be proportional to the variance of
the asset returns, one uses a condence interval or one uses the variance of residuals if
a factor model is used. We refer to Walters (2014) for details. How do we estimate the
variance of the mean π - that is, how do we x Cπ ? BL assume the proportionality
CR = τ C (4.30)
with τ the constant of proportionality factor. The uncertainty level τ can be chosen
proportional to the inverse investment period 1/T . The longer the investment horizon
is, the less uncertainty exists about the market mean; the higher the value of τ, the less
weight is attached to the CAPM. Summarizing, the prior return distribution is a normally
distributed random variable with the mean given in (4.27 and variance (1 + τ )C . With
this choices, the Black-Litterman master formula for the mean returns π and the
covariance C read
Example
Consider four assets and two views. The investor believes that asset1 will outperform
asset 3 by 2 percent with condence ω11 and that asset 2 will return 3 percent with
condence ω22 . The investor has no other views. Mapping these views into the above-
dened matrices implies
1 0 −1 0 2 ω11 0
P = , Q= , Ω= . (4.32)
0 1 0 0 3 0 ω22
• Use reverse optimization to compute the CAPM equilibrium returns for the assets.
• Blend the CAPM equilibrium returns with the views using the Black-Litterman
model.
These steps only dene one part of the investment process of a CIO. In general, the
CIO receives information from dierent sources the investment process: A macroeco-
nomic view from research analysts, market information, chartist information and valua-
tion information.Assume that one output of this information is to ' overweight Swiss
stocks - underweight European stocks'.
This denes a pair-wise bet. All bets of this type form the tactical asset allocation
(TAA). Several questions follow:
A How strong is the bet - that is to say, how much should the two stock positions deviate
from the actual level 'overweight Swiss stocks - underweight European stocks' ?.
D How condent is the CIO and his or her team about the bet?
E Is the bet implementable and what is the precision of such an implementation mea-
sured by the tracking-error?
F Will there be a stop-loss or prot-taking mechanism once the bet has been imple-
mented?
The approach to question A is often based on the output of a formal model. That
is to say, a risk budgeting model, a BL model, or a mean-variance optimization model
proposes to increase Swiss stocks by 5 percent and to reduce the European stock ex-
posure by 5 percent. It is then common practice that this proposal is corrected by the
CIO, either because it creates too much turnover for the portfolio managers or because
he considers such a change to be too strong.
Question B is - among other things - a consistency question since, on the one hand,
the +/ − 5 percent increase in equities also changes the FX exposure of the whole TAA
and, on the other hand, there could be a CHF-EUR bet following from the many in-
formation sources. Typically - question C - bets are made for one month. This is the
standard time after which the CIO and his or her team review the TAA.
Question D is the information risk issue. Information risk is dierent from statistical
risk. The most well-known statistical risk measurement in the industry is the tracking
error, which measures the volatility of alpha over a period of time. The risk sources
are market, counterparty, and liquidity risk of the assets. Bernstein (1999) denes in-
formation risk as the quality of the information advantage of a decision-maker under
uncertainty.
Reconsider the above Swiss stock-European stock bet. This view must be driven by
our information set, as well as by the proprietary process of analyzing the information
and data. To evaluate information risks, we ask (Lee and Lam [2001]):
These questions suggest that some information risk may be quantied with a good deal of
precision while in most cases precise measurement of information risks seems impossible,
and well-informed judgement may be necessary. This may result in a nal statement on
the decision-maker's condence of adding alpha. If, say, the condence is 50 percent, we
are not condent at all about the bet. A standard approach to measuring the perfor-
mance of bets is the hit rate (HR).
A hit rate of 60 percent means that we add alpha in 60 percent of the months in
which we make an active bet. The condence in adding alpha can be interpreted as the
expected value of the hit rate. Information risk is then quantied by the expected hit
rates of our investment views.
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 333
Example
We follow Lee and Lam (2001). They assume that alpha is normally distributed
around its mean value. Then, there is a unique one-to-one mapping between the hit rate
HR and the information ratio IR. To derive this relation, we have for the α of an asset
which follows a normal distribution:
with α the arithmetic average alpha and T E the tracking error. Changing variables:
Z ∞
1 1 2
HR = √ e− 2 y dy
2π − TE
α
i −α
with x = αTE and dening the information ratio
α
IR =
T
, we get: Z ∞
1
HR = √ f (y)dy = 1 − Φ(−IR), (4.33)
2π −IR
with f the standard normal density function, Φ the standard normal distribution function
and IR the information ratio. Once the expected alpha and expected tracking error, and
therefore the expected information ratio, are stated, the complete ex ante distribution of
alpha is specied. The hit rate is the area to the right of 0% alpha. Using the square-root
law the following information risks, condence levels, and information ratios follow:
Table 4.6: Information risks, condence levels, and information ratios (Lee and Lam
[2001]).
The rst property derives from the mentioned problems using quadratic optimization.
The second one reects the diculty of forecasting expected returns. Although only risk
is explicit, returns are implicit and the approach therefore a priori does not lead to very
conservative portfolios.
• Dene and solve the risk-budgeting problem. This implies the investment strategy.
1. The risk of two portfolios is smaller than the sum of the risks.
2. The risk of a leveraged portfolio is equal to the leveraged risk of the original port-
folio.
3. Adding a cash amount to a portfolio reduces the risk of the portfolio by the cash
amount.
Let B1 and B2 be two risk budgets in USD. For a strategy φ = (φ1 , φ2 ), the risk bud-
geting problem is dened by the two constraints, which equate the two risk contributions
RC1 and RC2 to the risk budgets - that is to say, the strategy is chosen such that the
following equations hold:
Summing the left-hand sides of (4.34) is, by the Euler principle, equal to total portfolio
risk. The sum on the right-hand side is the total risk budget. Problem (4.34) is often
recast in a relative form. If bk = cBk is the percentage of the sum of total risk budgets,
(4.34) reads
The goal is to nd the strategies φ which solve (4.34) or (4.35) . This is in general a
complex numerical mathematical problem. But introducing the beta βk of asset k,
The weight allocated to component k is thus inversely proportional to the beta. This
equation is only implicit since the beta depends on the portfolio φ. The next proposition
summarizes some explicit solvable cases.
Theorem 74. Consider the risk budgeting program (4.35) for N assets with volatility
risk measure.
bk σ −1
φk = P k −1 . (4.38)
j bj σj
336 CHAPTER 4. PORTFOLIO CONSTRUCTION
3. If correlation is minimal, i.e. ρ = − N 1−1 among all assets, the ERC portfolio
follows:
σ −1
φk = P k −1 . (4.39)
j σj
1. implies for example that the higher volatility of a component, the lower is its
weight in the RB portfolio. For equal risk contributions (ERC) model where all weights
for the risk budget bk are set equal to 1/N , Maillard et al. (2008) show that the volatility
of the ERC model is furthermore located between the volatility of the minimum variance
(MVP) portfolio and the volatility of an equally capital weighted (EW) portfolio:
14
The ERC portfolio is equal to the MV portfolio if (i) the correlation is constant and
(ii) the correlation value attains its lowest possible value. The ERC is equal to the EW
portfolio if all volatilities are identical.
Denition 75. The (ERC) approach is called the risk parity (RP) approach.
Although closed-form analytical solution for risk budgeting problems are possible only
in some particular cases, there is a simplied heuristic allocation mechanism - inspired
by the allocation (4.36):
−m
Riskk
φk = L × P −m (4.43)
k Riskk
with Risk any risk measure, L the portfolio leverage which is needed if one denes ex-
ante a risk level for the portfolio (risk-targeting approach) and m a positive number.
If m = 0, the portfolio is equally weighted. For increasing m, the portfolio allocation
becomes more and more concentrated on the assets with the lowest individual risk. For
example, the GMV portfolio follows if all correlations are set equal to zero and m=2
and ERC by assuming that all correlations are constant and m = 1.
Teiletche (2014) illustrates some properties for the above four portfolios using Ken-
neth French's US industry indices, 1973-2014; see Figure 4.10.
Figure 4.10: Risk-weighting solutions for EW, GMV, MD, and RP (ERC) portfolios
using sector indices from Kenneth French. The variance-covariance matrix is based on
ve years of rolling data (Teiletche [2014]).
Figure 4.10 indicates that GMV has a preference for lower volatility sectors (e.g.,
utilities or consumer non-durables), MD prefers low correlation (e.g., utilities or energy),
EW is not sensitive at all to risk measures, and RP (ERC) is mixed. The RP and EW
show similar regular asset allocation patterns and GMV and MD asset allocation patterns
are much less regular. The latter react much more to changing economic circumstances
and are therefore more defensive.
Maillard et al. (2009) compare the ERC portfolio with 1/N and MVP portfolio for
a representative set of the major asset classes from Jan 1995 to Dec 2008.
15 The ERC
portfolio has the best Sharpe ratio and average returns. The Sharpe ratio of the 1/N
portfolio (0.27) is largely dominated by MVP (0.49) and ERC (0.67). MVP and ERC
dier in their balance between risk and concentration. The ERC portfolios are much
less concentrated than their MVP counterparts and also their turnover is much lower.
Lack of diversication in the MVP portfolios can be seen by comparing the maximum
drawdown values: The value for MVP is −45% compared to −22% of the ERC portfolio.
When we restrict the risk measurement to volatilities, the heuristic approach (4.43)
15 The asset class representatives are: S&P 500, Russell 2000, DJ Euro Stoxx 50, FTSE 100, Topix,
MSCI Latin America, MSCI Emerging Markets Europe, MSCI AC Asia ex Japan, JP Morgan Global
Govt Bond Euro, JP Morgan Govt Bond US, ML US High Yield Master II, JP Morgan EMBI Diversied),
S&P GSCI.
338 CHAPTER 4. PORTFOLIO CONSTRUCTION
takes the following generic component-wise form (Jurczenko and Teiletche [2015]):
φ = kσ −1 , (4.44)
1
k=P −1 .
k σk
So, (4.44) becomes the heuristic model (4.43) with m=1 and zero leverage. If we use a
volatility-target constraint σT for the risk-based portfolio, we get
σT σT
k= = (4.45)
N Concentration N C(ρ)
with ρ the average pair-wise correlation coecient of the assets and C(ρ) the concentra-
tion measure
16
p
C(ρ) = N −1 (1 + (N − 1)ρ) . (4.46)
The concentration measure varies from 0, when the average pair-wise correlation reaches
its lowest value, to +1, when the average correlation is +1. Hence, k increases when the
diversication benets are important - that is, when the correlation measure decreases.
In this case, each constituent's weight needs to be increased to reach the desired volatility
target: the risk-based portfolio even becomes leveraged. Risk-based investing often faces
the criticism that it cannot allow for views. This is not true, see Jurczenko and Teiletche
(2015) and Roncalli (2014).
16 To prove this formula, we write Λσ for the diagonal matrix with the vector of volatilities σ on its
diagonal, ρ the correlation matrix of returns and I the identity matrix. The covariance matrix can be
written in the form C = Λσ ρΛσ which implies
hσ −1 , Λσ ρΛσ σ −1 i = he, ρei .
implies (4.45).
4.4. ESTIMATION: THE COVARIANCE MATRIX 339
The estimation problem of the covariance matrix faces estimation risk: The true pa-
rameters in the models are not known and one has to estimate these parameters given
only a nite data set. Whichever statistical approach we choose, there is risk that the
estimated parameters are dierent from the unknown, true parameter values.
There are dierent methods to estimate the covariance. We can classify the methods
in three dimensions:
To quantify this, let R(j) be the rate of return in the past month j. The average
return of n observations assuming IID returns has itself a mean R and a standard devi-
√
ation σ/ n. These are the true values. For an assumed annual return of 12%, the true
monthly return is R1m = 1%. For an annual standard deviation of σ = 5% the monthly
√
estimate σ1m = 5/ 12 = 1.44% follows. This estimate is larger than the mean itself, i.e.
not meaningful. Using n = 60 (ve years of data), the standard deviation estimate be-
comes 0.00645, which is not signicantly smaller than the mean. If we would like to have
√
a standard deviation estimate of, say, 1/10 of the mean, the equation 0.05/ n = 0.001
implies n = 2, 500. This corresponds to a time series of more than 208 years (2,500/12).
1 b −1,S S
φM V = C µ
b . (4.47)
θ
Assuming that the plugged-in parameters are the true ones leads to zero estimation
risk. But this is not an optimal approach.
18 One has to dene a procedure outside of
the investment optimization program which xes the values of the parameters.
Bouchaud and Potters (2009) illustrate this. They consider the Markowitz model
without the full investment constraint. The optimal policy, if we assume the true ρ
known:
ρ−1 µ
φM V = r (4.48)
hµ, ρ−1 µi
with r the expected mean return. The true minimal risk is then
2 2 1
σM V = hφM V , ρφM V i = r . (4.49)
hµ, ρ−1 µi
We compare this optimal case with the in-sample and out-of-sample risks. The in-sample
estimate uses the known empirical correlation matrix ρbS of the corresponding period.
The out-of-sample matrix uses the empirical correlation ρ̃
S which is observed in the next
period. The portfolio risks read:
18 Tu and Zhou (2003), Kan and Zhou (2011), Zellnter and Chetty (1965), Pastor and Stambaugh
(2000).
4.4. ESTIMATION: THE COVARIANCE MATRIX 341
2 2 2
σM V,in ≤ σM V ≤ σM V,out . (4.51)
How far away are the in- and out-sample risk from true risk? Pafka and Kondor (2004)
show that for IID returns and large portfolios:
2 2
p 2 N
σM V,in = σM V 1 − q = σM V,out (1 − q) , q = . (4.52)
T
√
The in-sample risk is 1−q smaller than the true risk and while the out-of-sample risk
√
is larger than true risk by the value 1/ 1 − q . This denes data snooping.
But also estimation risk of the mean matters. Ang (2014) estimates the original
mean-variance frontier using data from January 1970 to December 2011. The mean
of US equity returns is 10.3 percent. Ang changes the mean to 13.0 percent. Such a
change is within two standard error bounds. The minimum variance portfolios for a
desired portfolio return of 12 percent are given in Table 4.17. This change caused the
US position to change from -9 percent to 41 percent, and the UK position to move from
48 percent to approximately 5 percent.
Table 4.7: MV portfolios for two dierent expected equity returns (Ang [2014]).
Returning to the estimation of the covariance matrix, the main question is how to
reduce the number of degrees of freedom for the estimation purpose? The fully agnostic
view of assuming equal weights to be optimal, φ = 1/N , is the other extreme to using
sample estimates. Assuming EW means to avoid the need to estimate any input param-
eter - neither variances, correlations nor returns. DeMiguel et al. (2009) compare 14
optimized portfolio approaches across 7 datasets with the 1/N EW investment. Surpris-
ingly, 1/N is dicult to beat by the 14 optimal portfolios. They empirically compare
the Sharpe ratios, analytically derive the critical estimation window length for mean-
variance strategy to outperform 1/N and use simulations to extend the models to classes
of models which are designed to control estimation risk. The ndings are:
• Empirically none of the 14 portfolio models consistently dominates 1/N across all
data sets in terms of Sharpe ratio and turnover.
• Using US stock data, for 25 assets in the portfolio the critical estimation window
0
is around 3 000 months of data. This gure doubles for twice as much assets in the
portfolio.
342 CHAPTER 4. PORTFOLIO CONSTRUCTION
• Models which control for estimation risk also need very long data series to outper-
form 1/N .19
These results contradict the common view that heuristics is less successful than statis-
tical optimization models. Ignoring part of ambiguous information - insucient historical
data for estimation of model input parameters - is what makes heuristics 1/N robust for
the unknown future. But as we discuss below, there are now convincing alternatives to
beat this agnostic case.
The 1/N model can also be used to get insights about which estimation risk is more
severe - return or covariance risk? We follow Rohner (2014). Assume that µ and C are
known for 10 assets such that 1/10 is invested optimally in each asset. To see the impact
if either of the two parameters is not known, simulate multivariate normal returns, use
rolling window estimation with an integration period of 100 and calculate 300 sample
means and estimated covariances. Use these estimates to calculate for each date the
optimal portfolio weights, see Figure 4.11.
Figure 4.11: Signicance of return and covariance estimation risk. Source: Rohner [2014].
Even if the distribution of returns is known, estimated portfolio weights can deviate
largely from their theoretical optimal values. It follows, that estimation error in expected
returns has a larger impact than dependency estimation errors on the optimal portfolio
weights. Therefore, GMV portfolio which do not depend on estimated returns are more
19 They consider Bayesian portfolios, portfolios with moment restrictions, portfolios with short-sale
constraints and combinations of optimal portfolios.
4.4. ESTIMATION: THE COVARIANCE MATRIX 343
Having argued against the agnostic and the N2 methods to estimate the covariance
matrix, the next step is to consider low dimensional methods or methods which grow
linearly with N . 20 The linear shrinkage approach or the factor model approach are
examples of low dimensional models and we compare these approaches with the order N
approach of Ledoit and Wolf (2018), a non-linear shrinkage approach.
Let
K
X
Rk ∼ µ + βkj vj
j=1
an approximation with W = (v1 , ..., vK ) where (vj ) is an orthonormal basis for subspace
K. Then, W 0W = I do to he orthonormality of the vectors (vj ). Finding the best linear
t means to solve the least square optimization:
N
X
min0 ||Rk − (µ + W βk )||2 .
µ,W,βk ,W W =I
k=1
Optimizing for µ implies
µ∗ = µN ,
i.e. the sample mean follows. Optimizing βk implies (orthonormality of the v 's):
βk = W 0 (Rk − µN ).
Inserting this in the objective function implies
N
X
min ||W (Rk − µN )0 W W 0 (Rk − µN )||2 = (N − 1)tr(W 0 C S W )
W,W 0 W =I
k=1
20 Ledoit and Wolf (2003, 2004a,b), Kan and Zhou (2007), Brandt et al. (2009), DeMiguel et al. (2009,
2013), Frahm and Memmel (2010), and Tu and Zhou (2011).
344 CHAPTER 4. PORTFOLIO CONSTRUCTION
with CS the sample covariance and tr the trace of a matrix. But the matrix under the
trace can be diagonalized according to the spectral theorem of linear algebra applies:
Proposition 76. Let C be a symmetric, positive denite and real matrix of dimension
N × N . There exists a diagonal matrix Λ and matrix W such that
W 0 CW = Λ. (4.53)
The diagonal elements of Λ are real-valued and positive (the eigenvalues λ1 , ..., λN ). The
eigenvalues solve the polynomial equation det(C − λI) = 0 with I the identity matrix.
Given any eigenvalue λk , the solution of the linear equation Cvk = λk vk is called an
eigenvector vk . They form an orthonormal basis and W = (v1 , ..., vN ).
Hence, the PAC is given by nding the largest eigenvalues and the corresponding
eigenvectors. The restriction to the largest eigenvalues means 'de-noising' the covariance
matrix. A covariance matrixC of dimension N × N does not tell us how much the
unobservable risk rivers of the N assets add to the total portfolio variance. Transforming
the matrix using PCA allows us to derive how important the risk factors are in explaining
portfolio risk? Consider Figure 4.12 where in the left panel the closing values of the Dow
and S&P 500 index are shown. The two series are heavily dependent: A data point of
the Dow corresponds to a S&P closing price such that the pair is close to the diagonal
(think about the bifurcation for low closing prices). The dependence can be o-set, if we
rotate the coordinate system. In the new coordinate system, data points have almost no
variance in the y2 direction but only one in the y1 direction. Therefore, the y1 -direction
factor explains most of the portfolio variance. De-noising then means to neglect the
y2 -risk contribution. PCA does this.
The eigenvectors explain the variance of the factors in (2.2):
Factors with low eigenvalues add only little to the portfolio risk and are therefore avoided
- the de-noising of the covariance matrix. But the eigenvalues that are important from
a risk perspective are the least important ones from a portfolio optimization perspective
where C −1 matters, see (4.3) φ = 1θ C −1 µ. But the eigenvalues of the information matrix
are the reciprocal values 1/λk of the eigenvalues λk . This trade-o between risk and
investment is one reason why portfolio managers often do not use portfolio optimization
methods. Furthermore the small values of the inverse eigenvectors needed for optimal
portfolios are not robust - a small change of the values heavily changes the portfolio.
Therefore, regularization techniques are used.
Figure 4.12: Closing values for the S&P 500 and Dow Jones Index in 2006. The red
coordinate systems denote the rotation applied in PCA.
we get the two eigenvalues, λ = 3 and λ = 2. Therefore, matrix M is also positive denite
and satises all the mathematical properties of a covariance matrix. The information
matrix M
−1 has the inverse eigenvalues 1/3 and
1
2 on its diagonal, which shows that the
ranking order reversion. Solving the two linear systems for the eigenvectors implies
As an application, consider the linear factor model (2.2) with N assets and K risk
factors F. How does one estimates the model? Let R = (R1 , . . . , Rt ) be the N ×T matrix
and assume that K < N. Then (Kempthorne, Factor Models, MIT Lecture Notes, Fall
2013)
[% ] PCA of C
Factor 1 Factor 2 Factor 3
Asset 1 65 -72 -22
Asset 2 70 69 -20
Asset 3 30 -2 95
EV 8 0.8 0.3
Cumulated σp -contribution 88 97 100
Table 4.8: PCA analysis of covariance matrix. Note that the eigenvalues of C −1 are
12, 119, 380 for the factors 1, 2, 3, i.e. the inverse ordering relation compared to the
covariance matrix. The rst factor in the covariance matrix is a market factor since all
components in the eigenvector are positive. It has the largest eigenvalue and contributes
88 percent to the portfolio's volatility.
• C
b= 1 ∗ ∗0
Sample covariance
TX X .
• PCA: C
b=W
cΛ c0.
bW
• α
b0 = x̄
• βb0 = W b m ) 12
cm (Λ where the subindex indicates the submatrix of the rst m columns
• D b − diag(βb0 βb0 )
b 0 = diag(C)
0
b0 = βb0 βb0 + D0
• C 0
• Step 3 Adjustment
Consider the S&P 500, SMI, Eurostoxx 50, and Nikkei 225 indices from Apr 1995 to
Apr 2015. Calculating the correlation matrix on a weekly basis using the closing prices:
1
0.8 1
ρ=
0.82 0.88
s
1
0.67 0.56 0.58 1
The data indicate that the correlation between the European and American markets is
stronger than between the Japanese market and the European or American one. We
therefore set up a two-linear-factor model.
The matrix β follows from the likelihood estimation
−0.015 0.21 0.29 0.35
β=
.91 0.93 0.96 0.76
The portfolio is long only in one factor, the market factor by denition, and long/short
in the second factor. Here it is short in the S&P 500 and long in the other three indices.
Given a PCA analysis of a covariance matrix - how noisy are the estimated eigenval-
ues? Random Matrix Theory (RMT) considers the study of the eigenvalues, eigenvectors
of large-dimensional matrices whose entries are sampled according to known probability
densities.
21 Basically, if the eigenvalue distribution of a covariance matrix is close to those
of a matrix of completely random entries, then randomness dominates in the covariance
matrix. A main feature of RMT is universality: The asymptotic behavior of random
matrices is often independent of the distribution of the entries. A second one is that
the limiting distribution takes non-zero values only on a bounded interval, displaying
sharp edges. Sharp edges indicate that eigenvalues outside of the asymptotic range are
non-random.
where R is a N ×T matrix whose rows are the time series of the returns, one row for
each stock. We assume that returns are normalized by their standard deviation such
that their variance is 1. Suppose that the entries of R are random IID variables with
2
mean zero and variance σ , i.e. R ∼ N (0, C). R is a random matrix. Using PCA,
the hope is to nd a low dimensional structure in the distribution which corresponds
to large eigenvalues of C. How close are the spectral properties of CS and C? If N is
xed and T → ∞, the law of large numbers guarantees E[C S ] = C . But N is often of
the order of T or even larger. In this case it is not clear whether C S converges towards C .
10%
9%
8%
7%
6%
Frequency
5%
4%
3%
2%
1%
0%
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3
Eigenvalues
Figure 4.13: Simulation for N = 1000 stocks and T = 500, i.e. daily data for two years.
The red line is the theoretical eigenvalue distribution of Marcenko and Pastur. Source:
Gatheral [2008].
The random matrix corresponds to the null hypothesis that the set of stocks consid-
ered are strictly independent, and that the correlation matrix is the identity matrix. Any
deviation from this structure in the empirical correlation matrix suggest the presence of
true information. All eigenvalues which belong to the theoretical spectrum of eigenvalues
are noisy and should not be considered in portfolio optimization.
How are the PCA eigenvalues related to the eigenvalues of the random matrix covari-
4.4. ESTIMATION: THE COVARIANCE MATRIX 349
The random matrix CS is a random Wishart matrix, λpm are the theoretical minimum
and maximum eigenvalues of the random correlation matrix and ρ is the Marcenko-Pastur
density.
22 The proof of the theorem is based on the following moment expansion and
combinatorics:
Z λ+
1
E[R0 R)k ] = λk dρ(λ) .
N λ−
What can be said about the distribution of the largest eigenvalue? Can we nd
the cut-o the eigenvalues separating noisy from eigenvalues with have true information?
What can be said about the eigenvalue distribution if N >T and can the IID assumption
in RMT be relaxed? The answer to the rst question is given by the Tracy-Widom law:
The probability distribution of the largest eigenvalue can analytically expressed in case
of normally distributed random variables. We refer to the literature for details.
Figure 4.14 compares the case of a random identity matrix with the eigenvalue dis-
tribution of a risk model (blue histogram). The higher frequency for large eigenvalues
indicates that the largest eigenvalues in the risk model which determine the risk is not
driven by noise compared to the identity matrix assumption. In other words, the risk
model is able to capture true risk information. A similar conclusion follows for the small
eigenvalues which dominate in optimal portfolio construction. For the intermediate eigen-
values there is virtually no dierence to the pure noise case. These the factors which
matter in the Markowitz model in the long-short portfolio of the expected return minus
the beta hedge which lead to unstable optimal allocations.
22 ρ(λ) is dened as the limit ρN (λ) =: 1/N PN δ(λ − λj ) with δ the Dirac delta function.
j=1
350 CHAPTER 4. PORTFOLIO CONSTRUCTION
10%
9%
8%
7%
6%
Frequency
5%
4%
3%
2%
1%
0%
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0
Eigenvalues
Figure 4.14: Simulation for N = 1000 stocks and T = 500, i.e. daily data for two years.
The blue histogram corresponds to a the eigenvalue frequencies of a risk model.
Wolf (2004) extended Stein's approach to the covariance matrix. To this end, we use the
Frobenius norm for a N ×N matrix A:
p
||A||F ; = tr(A0 A)/N .
Linear shrinkage is the approach where a highly structured estimator I, say the iden-
tity matrix representing the 1/N approach, is combined with the unstructured sample
covariance matrix CS with N 2 -growth in the form
Ĉ = a1 F + a2 C S , ν ∈ [0, 1].
The shrinkage value aj is constant. To nd the optimal weights, one solves
Proposition 78. The above optimization problem has the unque solution:
Ĉ ∗ = a∗ µI + (1 − a∗ )C S (4.57)
where
E[||C S − C||2F
a∗ = , µ = tr(C 0 I)/N .
p
S
E[||C − µ||F 2
The optimal solution can be interpreted as shrinking the sample covariance matrix
towards the shrinkage target µI with (shrinkage) intensity a∗ . Ledoit and Wolf (2003)
state: The beauty of the principle is that by properly combining two 'extreme' estimators
one can obtain a 'compromise' estimator that performs better than either extreme.
To illustrate the shrinkage idea, consider a N ×N covariance matrix where σij is the
non-observable true covariance for i 6= j and covij is the sample covariance. The squared
deviation of the weighted average from the true value reads (1 − a)covij − σij )2 which
is a loss measure. Since the sample covariances are random, we take the expected loss
function and minimization is a routine quadratic optimization with the following optimal
shrinkage intensity
P
var(covij )
j>i
a= P .
|var(covij + σij |
j>i
sample covariance matrix but with the eigenvalues replaced by the convex combination
λ̂i = a∗ µ + (1 − a∗ )λi
where λi is an eigenvalue of the sample covariance matrix. Then one uses dierent
shrinkage intensities for dierent sample eigenvalues. Since there are N eigenvalues, the
method is of order N. As in the case of linear shrinkage, one denes a similar opti-
mization problem which leads to an infeasible estimator. One therefore also relies on
asymptotic analysis. But since there are N parameters in the limit N, T to innity, the
number of parameters explodes. To perform the analytics one has to use the machinery
of random matrix theory. We refer to the literature.
We review two extensions of the set-up: The extension to dynamic models and the
extension to factor models. The dynamic models allow us to get rid of the IID assumption
for the T observations. In order of not running in the curse of dimensionality problem,
instead of multivariate GARCH models, Ledoit and Wolf suggest to use a version of
the dynamic conditional correlation (DCC) model of Engle (2002) based on correlation
targeting. They dene a GARCH(1,1)-type of model for the conditional covariance of
devolatized returns, see the literature for details.
The second extension is to use factor models to estimate the covariance matrix of a
large universe of asset returns. Setting up a factor model, the approach is to use shrinkage
estimation for the residual covariance matrix of a general factor model. The factor model
can be static, i.e. the intercepts and the factor loadings are time-invariant, the conditional
covariance matrix of the vector of factorsis time-invariant and the conditional covariance
matrix of the vector of errors is time-invariant. Dynamic factor models are then given
by assuming all static components to be dynamic except the intercept since the authors
found that in this context of portfolio selection the conditional factor models do not work
better.
The test goal is the estimate of the GMV portfolio without any short-sales restrictions.
They consider 11 portfolio but we restrict to the following cases:
• 1/N portfolio.
4.4. ESTIMATION: THE COVARIANCE MATRIX 353
Table 4.9 presents the results. Since the standard deviation of the true GMV portfolio
Table 4.9: Performance measures for various estimators of the GMV portfolio. AV,
average; SD, standard deviation; SR, Sharpe ratio; FF, Fama-French three-factor model.
All measures are based on 10,080 daily out-of-sample returns in excess of the risk-free
rate. In the rows SD, the lowest number appears in bold. In the columns Lin and Non-
Lin, signicant out-performance of one of the two portfolios over the other in terms of
SD is denoted by asterisks: *, **, and ** indicate signicance at the 10%, 5%, and 1%
level, respectively (Ledoit and Wolf [2018]).
the rotation-equivariant portfolios. ForN = 250 and 500, Sharpe ratio gains are 0.08
and 0.06 or in relative terms 15% and 12%, respectively. If one forms the factor portfolio
Non-Lin-Sharpe, then it outperforms FF which outperforms SF (numbers not displayed.)
Summing up, Non-Lin dominates all other rotation-equivariant portfolios portfolios in
terms of the standard deviation and additionally Lin in terms of the Sharpe ratio. Con-
sidering the summary statistics of portfolio weights over time, the most dispersed weights
among the rotation-equivariant portfolios are found for Sam. The three shrinkage meth-
ods have generally the least dispersed weights. The authors provide robustness tests,
tests with transaction costs and tests where individual stocks are replaced by the Ken
French portfolios.
Table 4.10 presents the results for the case where dynamic and factor models are
used.
N=100 N=1000
AV SD IR AV SD IR
EW 16.55 21.33 0.78 17.55 20.3 0.87
NL 14.76 14.16 1.04 15 8.75 1.71
DCC-NL 14.95 14.13 1.06 14.82 7.95 1.86
EFM1 15.37 16.5 0.93 16.33 12.78 1.28
EFM5 15.22 15.49 0.98 15.94 11.39 1.4
AFM1-NL 14.79 14.16 1.04 15 8.75 1.72
AFM5-NL 14.78 14.17 1.04 14.9 8.75 1.7
AFM1-DCC-NL 14.69 14.02 1.05 15.76 7.84 2.01
AFM5-DCC-NL 14.58 14.09 1.04 15.28 7.91 1.93
Table 4.10: Annualized performance measures (in percent) for various estimators of the
Markowitz portfolio with momentum signal. AV = average; SD = standard deviation;
and IR = information ratio. AV is the average of the 10,080 out-of-sample returns and
then scaled to one year. SD is the standard deviation of the 10,080 out-of-sample returns
and then scaled to one year. IR is the ratio AV/SD. EFM means Exact Factor Models,
AFM Approximate Factor Models. The number after EFM and AFM stands for the
number of considered Fama-French factors. DCC means the dynamic model and NL
non-linear shrinkage. *** denotes signicance at the 0.01 level.(Ledoit and Wolf [2018]).
The return signal is given by momentum, i.e. the geometric average of the previ-
ous 252 returns on the stock but excluding the most recent 21 returns. The vector of
these averages dene the expected return µ signal. The rst result is that all models
consistently outperform the 1/N model. Second, approximate factor models consistently
outperform the exact factor models. Third, DCC-NL outperforms the other structure-
free models and the exact factor models and AFM-DCC-NL consistently outperforms
DCC-NL for large portfolio sizes. For the one-factor AFM-DCC-NL with N = 1000 the
outperformance is statistically signicant. In a nutshell, dynamic models dominate static
ones, 1/N becomes a dominated strategy in this ne-tuned non linear shrinkage approach
4.5. FACTOR MODELS 355
and dynamics plus one factor does better than using more factors. This is an indication
that instead of searching a large number of factors a sound dynamics of the estimation
of the covariance matrix leads to more performing results.
The rst one uses theory. The classic is the CAPM where the market portfolio return
is only factor which determines expected returns. Merton (1973) extended the theory to
the inter-temporal context. In this model any state variable that predicts future invest-
ment opportunities such as term premium, volatility premium, default premium, ination
dene additional factors.
Statistical factor selection is a second approach with the arbitrage pricing theory
(APT) of Ross as the classic model. Finally, identifying factors based on rm charac-
teristics with the famous the three-factor model of Fama and French (1993) denes the
empirical approach to facto selection.
Risk Premia
Grouping
Interest
of Assets Equities Credit Currencies Commodities
Rates
Carry EQ Dividends IR Carry CR Carry HY FX Global Carry CO Carry
EQ Merger Diversified vs. IG (Curve)
Arb IR Muni/Libor
Equi- Interest Real Volatility EQ Glob Vol IR Vol FX Vol Basket CO Vol
ties Rates Estate Divers.
EQ Mean FX Vol Single
Reversion CO Vol Single
Covered by Global Market Only weakly covered by Global Market Portfolio and Factors driving the Premia Orthogonal
Portfolio (Assumptions)
score led to a larger return than those with a lower score. This empirical feature called
EQ Quality. If one believes that this historical return pattern will continue to hold on
average in the future, one can invest in such a strategy. A long-short EQ implementation
removes directional risks. There are institutional investors which do not want or do not
can invest in long-short vehicles. But investing long only in a risk premia is not market
neutral. Market neutrality is lost and correlation between risk premia and between
traditional asset classes moves signicantly away from a weak correlation structure. But
a long-short strategy is not free of risk, see the momentum crash below. Factor investing
has emerged as the new paradigm among sophisticated institutional investors. A large
body of literature suggests that shorting is dicult to implement. Therefore, institutional
investors often prefer long-only approaches since they are also less exposed to liquidity
risk, have greater capacity, and do not require the use of leverage or derivatives. The
producer oer the risk premia products as fully transparent indices. Dierent wrappers
are used for risk premia investment - UCITS funds, ETFs or structured notes.
Quality Ran
Stock Stock Q-Score 20 % highest Q-
Figure k
Score
y Long Position
StockA 2.5 Stock
1 3.0
C
StockB
1.6
2 Stock F 2.95
StockC
Historical
3.8 Stock
3 2.93 return
y A
y StockD 0.1
Stock
4 2.91 20 % lowest
StockE Z
2.0 Q-Score
Stock Short Position
5 2.86
… … S
… …
… … Historical
n Stock -3.0 return ARP Strategy
B
market of −0.125, and an annualized Sharpe ratio of 0.82. They document that mo-
mentum is pervasive for equities, currencies, commodities, and futures. The maximum
monthly momentum return was 26.1% and that the worst ve monthly returns were
−79%, −60%, −46%, −44%, and −42%. Intuitively, the premium is positive if the win-
ner's return is larger than the loser's one. In a momentum crash, past winners will be
future losers and vice versa - you are wrong both the long and short leg of the investment.
This happened in fast market rebounds:
• In June 1932 the market bottomed. In the period July-August 1932, the market
rose by 82 percent. Over these two months, losers outperformed winners by 206
percent.
• In March 2009 the US equity market bottomed. In the following two months, the
market was up by 29 percent, while losers outperformed winners by 149 percent.
Firms in the loser portfolio had fallen by 90 percent or more (such as Citigroup,
Bank of America, Ford, GM). In contrast, the winner portfolio was composed of
defensive or countercyclical rms like AutoZone.
The rationale is simple. Suppose markets are crashing. Then losers already lost in value
before the crash and during the crash they are becoming extremely cheap if one beliefs
that they will not default. Since investors are convinced that markets will recover, the
demand for the losers exceeds the gainers one which leads to the winner-loser reversal.
Byun und Jeon (2018) suggested to adapt the momentum strategy in order to re-
duce the impact of momentum crashes. They considered to observe past returns for 12
358 CHAPTER 4. PORTFOLIO CONSTRUCTION
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov J=3 K=3
Screen Wait Buy / Sell
Formation
Period
Skip 1
Month
Holding
Period
Figure 4.17: We assume that stocks are screened based on their past return over the
last J = 3 months, where also J = 6, 12 months are used. This screening identies
the past winners and losers and denes the formation period. After this identication,
no action is taken for one month. The reason is to lter out any possible erratic price
uctuations in the past winners and losers selection portfolio. Finally, in the holding
period the selected stocks are hold for K = 3 months where again longer holding periods
are possible. Afterwards the positions are closed. This procedure is repeated monthly
leading to an overlapping roll-over portfolio allocation.
months but only invest 1 month and the decision criterion for going long and short is
past cumulated 52 weeks return. Therefore, the authors expect that 52-week high sub-
sumes the predictive power of past 12-month return and investing only for one month
adapts to often seen fast momentum reversals. This mimics that as the market rebound,
investors demand increases on stocks that are far from their 52-week highs. This bias
induces that the 52-week high negatively related with future returns. The authors show
that during the crash periods, stocks far from their 52-week highs outperform stocks near
their 52-week highs.
Figure 4.19 shows the return of investing $1 in 1956 until 2015 in a market factor,
and the styles size, value or momentum.
23
In this long-term view three observations are immediate: Simple grouping of assets
can lead to signicant outperformance of the market return for long periods but there
23 Value eect. Low price-to-book (P/B) stocks (value stocks) typically outperform high P/B stocks
(growth stocks). Size eect. Smaller stocks typically outperform larger stocks.
4.5. FACTOR MODELS 359
Figure 4.18: Long-only momentum strategies. Left panel - momentum strategies 1947-
2007 Right panel - momentum strategies during the GFC (Daniel and Moskowitz [2012]).
Figure 4.19: Investment return of $1 in 1956-2014 in the market, market plus value,
market plus size and market plus momentum factor (Ken French's website).
360 CHAPTER 4. PORTFOLIO CONSTRUCTION
are also short time periods where factor investment can crash. This is most dramatic
seen for the momentum crash during the GFC. Finally, the factors do not seem to be
independent - momentum and markets crash and boom in parallel during the GFC and
the following decade.
• exhibit signicant premiums which are expected to persist in the future [Persis-
tence].
• be not correlated among themselves and to asset classes in good times and nega-
tively correlated in bad times [Independence].
The notion of 'good' and 'bad' times is made precise in economic theory by the stochastic
discount factor (SDF).
The nancial industry denes factor investing similar to the Professor's report. Deutsche
Bank [2015] states additional to the above requirements:
• Accessible - risk factors must be accessible at a level of cost that is suciently low
to avoid the dilution of the return.
• Fully transparent - strategies are fully systematic and work within well-dened rules.
Factor investing means alternative strategies dened on liquid assets and not the creation
of new, illiquid asset classes. Transparency radically changed in the last decade. In the
past, an investment bank's oering of a momentum strategy basically was a black box
for the investor. Today, each factor is constructed as an index with a comprehensive
documentation about the index mechanics, risks and governance. Hedge funds often use
factor investing strategies too but often they are not transparent.
• Identify rst the key objectives of the portfolio and the preferences of the investor.
• Start with a long list of potential risk factors and select a core portfolio made up
of the most attractive risk factors. Figure 4.20, shows the cross asset risk factor
list of DB.
• Finalize the list of selected risk factors and construct a portfolio using a risk-parity
methodology.
Figure 4.20, upper panel, shows the cross asset risk factor list of DB and some key gures.
Risk and return properties of the dierent risk factors dier. Therefore, if one invests
into a portfolio with a target volatility to control downside risk, leverage is needed because
else combining a low vol 2% interest rate risk premia with a 12% vol equity premia
makes no sense. Figure 4.12 shows monthly correlations. The lower triangular matrix
correlations are calculated for turbulent markets; those for normal markets are shown in
the upper triangular matrix.
25
The correlation for the equally weighted portfolio of risk factors (ARP) implies an
annualized correlation of 4% in normal markets and 5% in stressed ones. The correlation
with traditional asset classes is also low. Correlations between the dierent asset classes
are however much larger. In this sense the risk factors are closely mutually uncorrelated
EQ IR CR FX CO
Category
Equities Interest Rates Credit Currencies Commodities
Carry EQ Dividends IR Carry Diversified CR Carry HY vs. IG FX Global Carry CO Carry (Curve)
EQ Merger Arb IR Muni/Libor
14.0% 1.6
1.4
12.0%
Return & Volatilität
1.2
Sharpe Ratio
10.0%
1
8.0%
0.8
6.0%
0.6
4.0%
0.4
2.0% 0.2
0.0% 0
IR Vol
EQ Mome
COM Vol
FX Mome
IR Carry (Divers.)
EQ Vol
EQ Carry (Div)
FX Value
FX Carry (Balanced)
FX Carry (G10)
IR Carry (Muni/Libor)
CR Mome
CR Carry
IR Mome
COM Mome
EQ Value
Figure 4.20: Upper Panel: Risk factor list of DB London. COM Mome (Trend)
Risk factors are grouped
according to their asset class base and the ve styles used by practitioners. Lower Panel:
Average annualized volatilities, returns and Sharpe ratios for the risk factors (DB [2015]).
compared to asset classes. This lower correlation is due to the use oong and short po-
sitions. Short positions give factors the appearance of lower correlations. We discuss in
the next section that it is impossible to produce more ecient portfolios, in sample, by
expressing exposures as factors instead of assets, as long as the investable units are the
same.
Low beta portfolios, that is to say, a portfolio of risk factors should have low correla-
tion to equities and bonds in normal market periods and negative correlation to equity
in turbulent markets, are of particular importance since they promise to resist a joint
market downturn. Suitable risk factors are value and momentum risk factors for all asset
classes, low beta risk factors, quality, and US muni curves vs Libor. The correlation of
this portfolio to equity is −1.6% and to bonds 7.6%. In turbulent markets, correlation to
equity is −37.5% and to bonds 8.8%. The Sharpe ratio is very high and the maximum
drawdown is low, at −5.6%, see Table 4.12.
A deeper analysis of the correlation structure reveals that the risk factors can be
clustered into three broad groups.
4.5. FACTOR MODELS 363
Table 4.11: The correlations in the top-left cell is the average equally-weighted portfolio
of factos (AP) correlation of a portfolio of all DB risk premia. PE means Private Equity.
In the lower triangular matrix the correlations are calculated for turbulent markets; those
for normal markets appear in the upper triangular matrix (DB [2015]).
Table 4.12: Summary statistics for the low beta portfolio (DB [2015]).
364 CHAPTER 4. PORTFOLIO CONSTRUCTION
DB (2015):
• High beta, higher information ratio factors. These factors exhibit high information
ratios but also contain some equity market risk.
• Low beta, stable correlation factors. Factors with moderate correlation levels which
are typically stable.
• Negative beta, lower information ratio factors. Factors that exhibit negative corre-
lations to equity markets.
This observation leads to timed factor portfolio investments, see the literature for details.
We conclude this section by comparing a low volatility portfolio of risk premia of JP
Morgan - the 7.5% target volatility index - with the MSCI world, see Figure 4.21.
Figure 4.21: Top panel. The return of JP Morgan risk premia index and MSCI world
2006-2016. The middle statistics shows the cumulative returns of the two indices for
three stress events. The bottom panel shows the monthly returns of the JPM index and
for the three stress events - GFC, EU debt crisis, Q1 2016 - the returns of the MSCI are
also shown. (JPM [2016]).
The top panel shows that investing world wide diversied did not provided any pos-
itive return in the ten year investment period if the concept of asset diversication is
used. The JPM index, contrary showed an impressive performance. More detailed, the
risk premia performance slope is not the same in the ten years period: After the GFC
2008 until the end of 2012, the returns were largest with very low risk. Then for about
one and a half years there was a stand still period which was followed by a positive return
4.5. FACTOR MODELS 365
period with larger risks - the return chart is more zigzagged than in previous years. If
we compare the performance of the JP Morgan Index with MSCI in three stress periods
- GFC, EU debt crisis and Q1 2016 - we observe that the risk premia index did well
compared to the MSCI in the GFC and the EU debt crisis: The construction mechanics
to be uncorrelated to traditional asset classes in general and negatively correlated in
market stress situations worked. In the Q1 2016 event, things are more complicated.
While the same can be said for Jan and Feb 2016, the March data show that the risk
premia index largely underperformed MSCI. To understand the reason, from an asset
class perspective, there was a sharp and fast rebound of stock markets after the ECB's
president Draghi's speech. This rebound was to fast for the risk premia index' rebal-
ancing quarterly basis rebalancing frequency. Furthermore, the speech of Draghi also
aected credit risk premia in a way which is the exception rather than the rule: The
credit spread tightening was more pronounced for the Itraxx Europe Main index than
for the Crossover index of the same family. This means that risk factors collecting the
credit risk premia generated negative returns since they were wrong both the long and
the short risk premia portfolios. A similar remark applies to interest rate risk premia.
How many risk factors are they? Harvey et al. (2015) use 313 published works and
selected working papers and catalogue 316 risk factors and Hou, Xue, and Zhang (2017)
report in their study 447 factors. It is clear that not hundred of factors will be rewarded,
i.e. they are anomalies. We show in the section backtesting that an appropriate use
of statistical methods rules out most of them. Hou et al.(2017) for example nd that
two-third of the 447 factors are insignicant at the 5 percent level using the usual critical
t-value of two and 85 percent becomes insignicant if a critical value of three is used.
Table 4.13 shows for premia indices and for individual premia show that most premia
fail to deliver a promised performance. Essentially, the indices show over the last three
year zero performance. What are the reasons for this underperformance compared to the
promising values of the premia providers in the section before? First, backtesting is not
used correctly, see Section Backtesting. Second, global stock markets closed out their
worst year since the nancial crisis. The equity market has been stoked by concerns of a
slowing global economy, tightening monetary policy and mounting geopolitical tensions
(trade war between the US and China, Brexit). The year on the stock market was
marked by zigzag movements that followed one another quickly. This was an expression
of uncertainty, which was exacerbated by the constant tweets of the US President: One
day he threatened in the trade war and the next he spoke again of great achievements.
The long-short factor portfolios could not follow this rapid change; investors were often
366 CHAPTER 4. PORTFOLIO CONSTRUCTION
Table 4.13: Performance of risk premia YTD (12. 12. 2018) and for the last three years.
In the upper part risk premia indices' performances are shown. Below, I selected the
best and worst performing individual risk premia for the six asset classes on the three
year basis. (Source: HFR Database,(2018)).
4.5. FACTOR MODELS 367
• Interval error, i.e. say monthly estimates on the large past sample deviate from
annual forecast in the future period.
• Small-sample error, i.e. estimates on the large past sample can dier from estimates
on the smaller future sample.
• Stationarity between large past and smaller future samples can be dierent.
• Reducing the dimensionality of the set of assets to a smaller set of factors reduces
noise more eectively than reducing dimensionality to a smaller set of assets.
In all four cases the prediction errors for the future investment could be more reliable
for factors than for assets. If this is the case, then factor allocation dominates asset
allocation.
Before we consider these issues, we comment on the widely documented fact that the
pairwise correlations among risk factors are often lower than those among asset classes.
Does this imply that risk factors are superior to asset classes? Sources are Idzorek and
Kowara (2013) and Martellini and Milhau (2015). Idzorek and Kowara (2013) rst pro-
vide an answer in an idealized world where the number of risk factors is equal to the
number of asset classes where unconstrained mean variance optimization is considered.
The same dimensionality of asset classes and risk factors implies a one-to-one relationship
and then with no surprise, returns are the same.
The authors then consider a real world example. They focus on liquid US asset
classes and risk factors. The number of risk factors (eight) is not equal to the number of
asset classes (seven). The data set are monthly data starting Jan 79 until Dec 11. They
rst conrm that the average pairwise correlation for risk factors 0.06 is smaller than for
asset classes 0.38. Besides long-short for factors another main reason is that the market
368 CHAPTER 4. PORTFOLIO CONSTRUCTION
portfolio is part of the asset classes but not of the risk factors. The authors then consider
two dierent time horizons to derive the optimal allocations: The full time series and in
the second case Jan 02 to Dec 11.
The risk factor weights dene a lower dimensional space that the asset classes weights
since there are more constraints for the long-short risk factors. This lower dimensionality
seems favouring the asset classes. But it is in fact not possible to state which opportunity
set is larger since the exposure of risk factors can be −100% compared to asset classes
which are long only. Summarizing, the opportunity sets are complex large dimensional
spaces and it is not possible to nd out in general which set is larger. Since an ecient
frontier dominates another one if and only if assets dene the opportunity set, both sets
of frontiers are subject to the same constraints, and the results are shown in the same
return units as the inputs, it is not clear which optimal asset allocation - assets or factors
- dominates.
Figure 4.22: Optimal asset classes versus optimal risk factors. Left panel: Long time
series. Right panel: Short time series. The US asset classes are large value stocks, large
growth stocks, small value stocks, small growth stocks, Treasuries, mortgage backed
assets, credit and cash. The risk factors are market, size, value, mortgage spread, term
spread, credit spread and cash (Idzorek and Kowara [2013]).
The results indicate that by cherry picking a particular historical time period, almost
any desired result can be found. This illustrates that there is nothing obvious about the
superiority of asset allocation based on risk factors. This result does not depend on the
fact that historical data are used. Idzorek and Kowara (2013).
4.5. FACTOR MODELS 369
The interval error arises if the assumption of analysts that the square-root rule apples
between one month past estimates and longer periodicities is not true. That is lagged
auto-correlations are zero which evidence reveals to be wrong. The standard deviation
−1
NP
of the cumulative continuous returns of x over N periods, Rt,N = Rt+N , reads
N =0
v
u
u N
X −1
σ(Rt,N ) = σ(Rt,1 ) N + 2
t (N − m)ρt,t+m . (4.58)
m=1
If auto-correlations is zero, the square-root rule follows. A similar formula holds for
the correlation between the cumulative returns over N periods and the one-period case.
Again, the longer-interval correlations will dier from shorter-interval correlations due
to the auto-correlations. Errors due to non-zero lagged correlations, the interval error
IE , are dened as the absolute dierence between the parameter estimate R1 using the
full-sample, one-month returns, to the full-sample, three-year rolling returns RR scaled
by the full-sample standard deviation of three-year returns:
|R1 − RR |
IE = .
σR
. An IE of 0.3 means that the parameter value estimated from monthly returns is 0.3
standardized units away from the parameter value estimated from three-year returns,
expressed in monthly units. We refer to Cocoma et al. (2017) for the denitions of the
small-sample error (SSE) and the independent-sample error (ISE).
The authors distinguish between two types of assets: asset classes and industry group-
ings and three types of factors: fundamental factors, security attributes, and statistical
factors derived from principal components analysis. We refer to their paper for the con-
struction of factors, the asset selection and the various tests.
They summarize that no evidence was found that factors produce more stable results
for IE, SSE and IS than assets across varying frequencies. On the contrary, they found
evidence of the opposite, on average. The same conclusion follows when comparing the
complexity reduction of the asset set to factor versus the reduction to a smaller asset set.
No evidence was found that factors are meaningfully more eective than assets at noise
reduction.
where α is the intercept, βi,M the slope or regression coecient, and t the standard nor-
mal error term satisfying (2.2). Beta measures the unit changes in stock excess return
for every unit change in market excess return. The intercept indicates the performance
of the stock that is not related to the market and that a portfolio manager attributes to
her skills.
Example
Consider the linear regression between an European equity fund's returns (dependent
variable) and the EUROSTOXX 50 index (independent variable). Statistical analysis
implies for 20 observation dates the estimates β = 1.18, SEE 18 = 20 − 2
= 0.147 and
degrees of freedom. The Student's t-distribution at the 0.05 signicance level with 18
degrees of freedom is 2.101. This implies the condence interval 1.18±(0.147)∗(2.101) =
0.87, 1.49. There is only a 5 percent chance that β is either less 0.87 or greater than 1.49.
There is a 95% condence that this fund is at least 87% as volatile as the S&P 500, but
no more than 149% as volatile, based on our ve-year sample.
The risk premium of the asset i is E(Ri ) − Rf and the market portfolio risk factor is
F = E(RM ) − Rf . The CAPM states that some assets have higher average returns than
other ones but it is not about predicting returns. An asset has a higher expected return
because of a large beta and not the other way around. Furthermore, projection theory
implies that the beta is the projection coecient:
cov(Ri , RM )
βi,M = . (4.61)
σ 2 (RM )
Summarizing, the time series regression (4.59) xed the β which enters the CAPM
model (4.60) which predicts that alpha should be zero.
The linear relation (4.60) in the CAPM between the excess return of an asset and
the market excess return follows from the following assumptions:
4.5. FACTOR MODELS 371
• All investors have the same beliefs about the future security values.
• Investors can borrow and lend at the risk-free rate, short any asset, and hold any
fraction of an asset.
• There is a risk-free asset in zero net supply. Since markets clear in equilibrium,
total supply has to equal total demand. Given the net supply of the risk-free asset,
we combine the investor's portfolios to get a market portfolio. This will imply that
the optimal risky portfolio for each investor is the same.
• All information is accessible to all investors at the same time to all investors - there
is no insider information.
• Markets are perfect: There are no frictions such as transaction costs or lending or
borrowing costs, no taxes, etc.
• For each title i, the linear relationship between risk and return (the security market
line [SML]) (4.60) holds:
with the beta given in (4.61) measuring the risk between asset i and the market
portfolio M .
The SML implies that beta measures how systematic risk is rewarded in the CAPM,
there is no idiosyncratic risk entering the SML.
26 There is no reward, via a high expected
rate of return, for taking on risk that can be diversied away. A higher beta value does
not imply a higher variance, but a higher expected return.
27
26 If an asset i uncorrelated with the market, its beta is zero although the volatility of the asset may
be arbitrarily large.
27 β = 1 implies E(Ri ) = E(RM ), β = 0 implies E(Ri ) = Rf and β < 0 implies E(Ri ) < Rf .
372 CHAPTER 4. PORTFOLIO CONSTRUCTION
The behavioural assumptions that all investors consider the same mean-standard de-
viation chart implies that all possess a mean-variance ecient portfolio. By the mutual
fund theorem, each minimum variance portfolio is a combination of a risk less asset and
a xed risky asset portfolio. Therefore, all investors invest in all risky assets in the same
proportions. Since demand equals supply in the asset market equilibrium, all investors
must hold the market portfolio which in turn is mean-variance ecient. Therefore, no
investor needs to perform a mean-variance analysis but just invest in the market portfolio.
The linearity of (4.62) implies that the portfolio beta is the sum of asset betas mul-
tiplied by the portfolio weights. In the CAPM, all optimal portfolios are a combination
of the risk-free portfolio and the market portfolio. Tobin's separation states how indi-
vidually tailored portfolios can be constructed. First, the portfolio manager constructs
the risk free and market portfolio. Then, investment advisor determines his risk prole
which xes the optimal allocation between risk-free and risk investments.
Inserting cov(Ri , RM ) = ρ(i, M )σk σM in (4.62) implies for the Sharpe ratio
µk − Rf µM − Rf
SRk := = ρ(k, M ) . (4.63)
σk σM
The Sharpe ratio of asset k is equal to the slope of the CML times the correlation
coecient. Comparing SML and CML, see Figure 4.23, all portfolios lie on the SML but
only ecient portfolios lie on the CML.
28 Finally, SML plots rewards vs systematic risk
while CML plots rewards vs total risk.
Consider three risky assets A, B , and C and 3 investors with capital of 250, 300, and
500, respectively, who have the following portfolios:
Consider three risky assets, the market portfolio, and a risk-free asset given by the
data in Table 4.17 (taken form Kwok (2010)):
The CML implies, at the standard deviation levels 10 percent and 20 percent, respec-
tively, expected returns of 13 percent and 16 percent. Therefore portfolio 1 is ecient,
28 A portfolio lies on both the SML and CML if the correlation between the portfolio return and the
market portfolio is 1.
4.5. FACTOR MODELS 373
Figure 4.23: Left panel - capital market line in the Markowitz model. Right panel -
security market line in the CAPM model. Assume that the borrowing and lending rate
are dierent. Draw the CML for these two rates.
but the other two portfolios are not. Portfolio 1 is perfectly correlated with the market
portfolio but the other two portfolio have non-zero idiosyncratic risk. Since portfolio 2
has a correlation closer to one it lies closer to the CML. The expected rates of return
of the portfolios for the given values of beta, calculated with the SML, agree with the
expected returns in the tabl:
µ = µf + (µM − µf )β = 13%,
Portfolio σ ρ with RM β µ
1 10% 1 0.5 13%
2 20% 0.9 0.9 15.4%
3 20% 0.5 0.5 13%
Market portfolio 20% 1 1 16%
Risk-free asset 0% 0 0 10%
Jensen's alpha
αk := µk − Rf − βk (µM − Rf ) (4.64)
is a performance measurement between the realized and theoretical returns of the CAPM.
Since alpha is a return it should be used for the compensation of portfolio managers.
While the Sharpe ratio can be illustrated in the return-volatility space, Jensen's alpha
is shown in the return-beta space. Jensen's alpha measures how far above the SML the
asset's performance is. It does not considers the systematic risk that an investment took
on earning alpha. The Treynor Ratio measurement (TR) adjusts for this systematic risk
taken:
µk − Rf
TRk := .
βk
The TR equals the slope of the SML for the actively managed portfolio. If the CAPM
holds, then the Treynor ratio is the same for all securities. Both, the Jensen and Treynor
measurements do not adjust for idiosyncratic risk in the portfolio.
The appraisal ratio (AR) or information ratio (IR) divides the excess return over
the benchmark by the tracking error (TE).
Values of the IR around0.5 are considered to be good values while a value greater
than 1 is extraordinary. The IR generalizes the Sharpe ratio since it substitutes the
passive benchmarks for the risk-free rate.
The beta of A is equal to its market portfolio correlation times its volatility divided
0.9×15%/20% = 0.675. The Sharpe ratio for A is SR =
by the market volatility - that is,
(12% − 4%)/15% = 0.53. Jensen's alpha for portfolio A reads 12% − 4% − 0.675(15% −
4%) = 0.575% and the Treynor ratio for A is given by (12% − 4%)/0.675 = 0.119. The
IR and the TE follow in the same way. We nally get:
4.5. FACTOR MODELS 375
It follows that portfolio C is the best portfolio. We summarize the relevance of the
dierent performance measurements:
• Beta is relevant if the individual risk contribution of a security to the portfolio risk
is considered.
• TE is relevant for risk budgeting issues and risk control of the portfolio manager
relative to a benchmark.
• The Sharpe ratio is relevant if return compensation relative to total portfolio risk
is considered.
• Jensen's alpha is the maximum amount one should pay an active manager.
• Treynor measurement should be used when one adds an actively managed portfolio,
besides the many yet existing actively managed one, to a passive portfolio.
Warnings: If return distributions are not normal since they show fatter tails, higher
peaks, or skewness, then the use of these ratios can be problematic, since higher moments
than the second one contribute to risk. Furthermore, the IR depends on the chosen time
period and benchmark index. Finally, the chosen benchmark index aects all benchmark-
based ratios: Managers benchmarked against the S&P 500 Index have lower IR than
376 CHAPTER 4. PORTFOLIO CONSTRUCTION
Standard assumptions for testing CAPM are rational expections, i.e. in particular
realized returns are a proxy for expected theoretical returns and that the holding period
of assets is known, typically one month.The CAPM equation which should be tested
raises several questions: Are beta's stable measures of systematic risk? Are the expected
returns linearly related to the betas (Q1)? Is beta the only systematic risk measure (Q2)?
Does the expected return of the market portfolio exceeds the expected return of assets
uncorrelated to the markets (Q3)? Finally, do assets uncorrelated to the market portfolio
have the risk-free rate return (Q4)? There two linear test of the CAPM equation. Once
the returns of dierent assets are regressed over the betas (cross-section, Q2, Q3) and
once the CAPM equation for each individual asset over time is regressed (time-series, Q4).
The cross-sectional regression is used to test the CAPM equation. over a period
T years. Since expected returns are not measurable, the CAPM equation is tested for
average annual realized returns. The temporal individual asset test using time-series
regression tests the CAPM on a number of xed sub-periods up to time T: Excess asset
return is regressed over the excess market return in each sub-period.
Using the time series regression equation,
to estimate alpha, beta and epsilon, it follows that. The estimates of beta βb are volatile
both for stocks and for sectors; see Figure 4.24.
Since the CAPM is only interesting for portfolios with beta the signicant risk mea-
sure, an application to single securities does not make sense.
We consider tests of the CAPM where we restrict to three key papers. The beta
instability led to CAPM test for portfolios only. The rst one is the paper of Black,
4.5. FACTOR MODELS 377
Figure 4.24: Beta estimates for AT&T (left panel) and the oil industry (right panel)
(Papanikolaou [2005]).
Consider the cross-section where the factors F are non-traded portfolios with an ex-
isting risk-free rate. Using the time series regression, estimates of factor risk premia and
pricing errors can be obtained. But in the cross-section the estimation is simplied us-
ing the two-pass regressions. First, betas are estimated from the time-series regressions,
29 Consider a period of say 50 years, i.e. 600 months. Use say 60 months to estimate the beta for each
stock (pass one). Rank the securities by the estimated betas and form ten portfolios. Recalculate the
betas for the next ve years, and so on which denes a rolling regression. We then have monthly returns
for the time period minus ve years for each portfolio. Calculate mean portfolio returns and estimate
the beta coecient for each of the 10 portfolios. This provides beta estimates for the portfolios. Do pass
2 for the portfolios, i.e. regress the portfolio means against portfolio betas, that is estimate the ex-post
SML.
378 CHAPTER 4. PORTFOLIO CONSTRUCTION
and then a cross-sectional regression of average returns on betas follows. That is the
estimated betas are in the second step the explanatory variables.
30 The pricing errors
are given by the cross sectional residuals α̃. The estimates of the cross-section can be
obtained by OLS or by using GLS more ecient estimates follow since the cross-section
residuals are correlated. The betas in the second-pass CSR are time series estimates
which leads to the problem of errors-in-variables. Shanken (1992) showed how to correct
the standard errors of the risk premium and pricing error estimates. The predictions
of CAPM are that alpha is zero, lambda is equal to the market premia, that any other
variables are zero. Typically, alpha is estimated to be positive, lambda is positive but
smaller than the market premium and other factors are not rejected.
So far all results are under the assumption of a small number of assets and the
estimators are time horizon T consistent. If the number of assets increases for a xed T,
the error-in-variables problem also leads to biased and inconsistent coecient estimates.
Shanken (1992b) derives an estimator that is N -consistent. Finally, Gagliardini et al.
(2011) explore the properties of these estimators under both T, N → ∞.
Suppose that the R2 is large in the cross-sectional CAPM equation (4.60). The
CAPM then explains the cross-section of average returns successfully and the alpha in
cross-section is small. This can be the case even if the R2 of the time series regression
(4.59) is low. The main goal of the CAPM is to see whether high average returns in the
cross-section are associated with high values of the factors.
Summarizing, ndings are that excess returns on high-beta stocks are low, that excess
returns are high for small stocks and that value stocks have high returns despite low betas
while momentum stocks have high returns and low betas. The CAPM does not explain
why in the past rms with high B/M ratios outperformed rms with low B/M ratios
(value premium), or why stocks with high returns during the previous year continue to
outperform those with low past returns (momentum premium). Despite these ndings,
the CAPM is used for guring out the appropriate compensation for risk, is used as a
benchmark model for other models, and is elegantly simple and intuitive.
The conditional CAPM works as follows. Consider two stocks. Suppose that the
times of recessions and expansions are not of equal length in an economy, that the mar-
ket risk premia are dierent and that the two stocks have dierent betas in the dierent
periods. The CAPM then observes only the average beta for each stock for both periods.
Assume that this beta is 1 for both stocks. Therefore, the CAPM will predict the same
excess return for the two stocks. But in reality the two stocks will show due to their
heterogeneity dierent returns for the two dierent economic periods. One stock can for
example earn higher return than explained by the CAPM since its risk exposure increases
in recessions, when bearing risk is painful, and decreases in expansions. Therefore such a
stock is riskier than the CAPM suggests and the CAPM would detect an abnormal high
return suggesting this is a good investment. The conditional CAPM corrects this since
return comes from bearing the extra risk of undesirable beta changes.
Lewellen and Nagel (2006) did not questioned the fact that betas vary considerably
over time. But they provide evidence that betas d o not vary enough over time to
explain large unconditional pricing errors. As a result, the performance of the conditional
CAPM is similarly poor as the unconditional model: It is unlikely that the conditional
CAPM can explain asset-pricing characteristics like book-to-market and momentum.
These statistical criticisms are not unique to the CAPM. Most asset pricing models are
rejected in tests with power.
While the CAPM has a theoretical foundation, the FF model is an ad hoc model in-
troduced to better t empirical data. The three factor model is routinely included in
empirical research.
We follow Kenneth French's web site for the FF factor construction. The factors are
constructed using the six value-weighted portfolios formed on size and book-to-market.
• SMB (small minus big) is the average return on the three small portfolios minus
380 CHAPTER 4. PORTFOLIO CONSTRUCTION
• HML (high minus low) is the average return on the two value portfolios minus the
average return on the two growth portfolios
1 1
HML = (Small Value + Big Value) − (Small Growth + Big Growth) .
2 2
• Whether a stock belongs to, say, Small Value depends on its ranking. Small Value
contains all stocks where the market value of the stock is smaller than the median
market value, say, of the NYSE and where the book-to-market ratio is smaller than
the 30 percent percentile book-to-market ratio of NYSE stocks.
• SMB for July of year t to June of t + 1 includes all NYSE, AMEX, and NASDAQ
stocks for which there exist market equity data for December of t − 1 and June of
t, and (positive) book equity data for t − 1.
Why should one include factors which cannot explain average returns? The CAPM
worked until stocks were grouped by their book-to-market ratio (value) but it still works
when stocks are grouped according to their size. If FF were only to consider factors
which explain the average returns then they could left them out. But size is important
for return variance reduction.
E(Rk ) = βk E(RM )
where we set the risk free rate to zero. Include an additional industry portfolio in the
regression, i.e.
Rt,k = αk + βk,M Rt,M + βk,I Rt,I + t,k .
The regression generically leads to a coecient βk,I > 0 and taking expectations:
This additional industry portfolio return contradicts that the CAPM is perfect. To
resolve the puzzle, one uses a nested projection approach: First project the industry
portfolio on the market return:
∗
Rt,I := Rt,I − E(Rt,I ) = Rt,I − βI,M Rt,M .
This is equivalent to beta-hedge the portfolio. The expected value of the new return is
zero if the CAPM is right. Run a regression on this orthogonality-adjusted CAPM. This
improves the R2 , the t-statistics and the volatility of the residual while the mean of the
CAPM is unchanged.
portfolios from 78 percent to 93 percent in the FF portfolios. Roncalli (2013) states that
the
2
improvement in the R is not uniform:
• The dierence in R2 between the FF and the CAPM is between 18 percent and 23
percent in the period 1995-1999.
• The dierence then decreases and is around 11 percent during the GFC.
• In the period starting after the GFC and running until 2013 the dierence is 7
percent.
• SMB and HML explain the variation of returns across stocks; the market factor
explains why stock returns are on average higher than the risk-free rate.
Are the FF factors global or country specic? Grin (2002) concludes that the FF
model exhibits its best performance on a country-specic basis. This view is largely
accepted. While FF performed originally regressions on portfolios of stocks, Huij and
Verbeek (2009) and Cazalet and Roncalli (2014) provide evidence that mutual fund re-
turns are more reliable than stock returns transaction costs, trade impact, and trading
restrictions are of less impact.
Figure 4.25 illustrates the dierent FF factors' performance since 1991 and momen-
tum factor. The size factor only generates low returns. This is the reason why most
risk premia providers do not oer size risk premia.
31 Cyclicality is common to most risk
factors. Some factors show persistent excess risk-adjusted returns over long time peri-
ods but over shorter horizons they show cyclical behavior with underperformance. Ang
(2013) argues that the premia exist to reward long-horizon investors for bearing that risk.
FF (1993) tested their model in the period 1963-1991. They rejected the assertion that
all intercepts from the regression of excess stock returns on excess market return, SMB
31 The gure shows periods with momentum crashes. Heavy monthly losses occurred during the Great
Depression. The risk factor faced losses of up to 50 percent in one month. The risk factor performed
much better in the post WWII period until the burst of the dot-com bubble. In this period, investing
USD 100, say, in 1945 led to a payback of USD 3,500 around 50 years later. The average monthly return
over the whole period is 0.67 percent.
382 CHAPTER 4. PORTFOLIO CONSTRUCTION
250
Monthly returns of momentum risk factor,
18.00%
1927 ‒ 2014
200
16.00%
14.00%
150
12.00%
100 10.00%
8.00%
50
6.00%
4.00%
0
2.00%
-50
1991
2007
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2008
2009
2010
2011
2012
2013
2014
0.00%
Figure 4.25: Left panel - FF annual factor performance in the period 1991-2014 starting
each year in January and ending in December. Mkt is the market return, RF the risk-
free return, and WML the momentum factor. Right panel - monthly returns of the
momentum risk factor (Kenneth French's web site).
and HML are zero. The FF performed better than any single factor model and it failed
only slightly due to the the low-B/M portfolios. Their return was too low and the return
on the big-size portfolio was too high; i.e. the size eect was missing in the lowest-B/M
quintile.
∞
P 1
Et (1+R)j
(Yt+j − ∆Bt+j )
Mt j=1
= (4.69)
Bt Bt
with M the current market cap, Y total equity earning , ∆B the change in total book
value in the period, and R the internal rate of return of the expected dividends. Equation
(4.69) follows from the fundamental pricing equation; see Equation (3.139). Equation
(4.69) implies that B/M value is an imperfect proxy for expected returns: The market
cap M also responds to forecasts of earnings and investment (expected growth in book
value) which dene the two new factors. The regression (4.67) reads (neglecting time
4.5. FACTOR MODELS 383
indices)
X
Ri − Rf = βi,M (RM − Rf ) + βi,k Rk + αi + i , (4.70)
k∈{SMB, HML, RMW, CMA}
with RRM W the earnings risk factor (dierence between robust and weak protability)
and RCM A the innovation risk factor (dierence between low- and high-investment rms).
This is again an exact factor model by denition. The explicit construction of the risk
factors is a long/short combination similar to (37); see Fama and French (2015).
Fama and French (2015) rst analyze the factor pattern in average returns following
the construction of the three-factor model:
Figure 4.26: Return estimates for the 5x5 size and B/M sorts. Size is shown on the
vertical and B/M on the horizontal. OP are the earnings factor portfolios and Inv the
investment factor portfolios. Returns are calculated on a monthly basis in excess to the
one-month US treasury bill rate returns. Data start in July 1963 and end in December
2013, thus covering 606 months (Fama and French, 2015).
384 CHAPTER 4. PORTFOLIO CONSTRUCTION
Panel A in Figure 4.26 shows that average returns typically fall from small to big
stocks - the size eect. There is only one outlier - the low portfolio. In every row, the
average return increases with B/M - the value eect. It also follows that the value ef-
fect is stronger among small stocks. In Panel B, the sort B/M is replaced by operating
protability due to the denition found in Fama and French's 2015 paper. Patterns are
similar to the size-B/M sort in panel A. For every size quintile, extremely high rather
than extremely low operating protability (OP) is associated with a higher average re-
turn. In panel C the average return on the portfolio in the lowest investment quintile is
dominates the return in the highest quintile. Furthermore, the size eect exists in the
lowest four quintiles of the investment factor.
The authors perform an analysis to isolate the eect of the factors on average return.
The main results are:
• Persistent average return patterns exist for the factors HML, CMA, RMW, SMB.
• The model explains between 71 percent and 94 percent of the cross-section variance
of expected returns for HML, CMA, RMW, SMB.
• HML (value) becomes a redundant factor. Its high average return can be completely
generated by the other four factors, in particular to RMW and CMA.
• Small stock portfolios with negative exposure to RMW and CMA are problematic:
Negative CMA exposures are in line with evidence that small rms invest a lot.
Negative exposures to RMW, contrary, is not in line with a low protability.
Why Fama and French did not introduce momentum? Asness et al. (2015) state
that momentum and value are best viewed together, as a system, and not stand-alone.
Therefore, it is not a surprise that value becomes redundant in the ve-factor model
where momentum is not considered. The authors redo then the estimation of 5 factor
model where they also nd that HML can be reconstructed and is better explained by
a combination of RMW and CMA. But the other direction is not true. CMA cannot be
explained for example by HML and RMW. The authors then add momentum which is
negatively correlated to value: Then, value becomes statistical signicant in explaining
returns.
quarterly or even semi-annually. Second, some factors are pro-cyclical with the business
cycle while others are historical defensive or not related to the business cycle. Pro-cyclical
is value, growth, momentum, size and liquidity. Defensive or of low volatility are factors
exploiting the volatility, yield and quality. This suggests that there should be a discre-
tionary control about which factors should be included in the investment portfolio. Given
the periodicity of the cyclical behavior such a control should take place on an annual or
even bi-annual basis.
The portfolios are based on factor scores. They rank each individual factor f at each
date t and normalize it from zero (worst) to one (best) to obtain a factor score sf,i,t for
each security i. Assuming N securities and F factors, they write St for the F × N factor
score matrix. To build the aggregate score at , the factor scores are weighted with time-
dependent weights φt , i.e. at = φ0t St . The essential part of the model is the choice of the
weights φ. As a benchmark they use the naive strategy that equally weights the factor
scores over time. Setting the weights equal to the sign function of past factor returns
provides the rst timing which does not take into account the interaction eects of the
other factors. To include interaction among dierent factors a one-month momentum
strategy that invests in the optimal weight combination of the most recent month is
created, see the paper for the details how this is done. The nal timing strategy is to e
information of the covariance matrix, i.e. a minimum risk optimization. To estimate the
large dimension covariance matrix the shrinkage estimator of Ledoit and Wolf (2003) is
used. Figure 4.27 shows the results for dierent strategies.
Figure 4.27: The cumulative logarithmic excess return in US dollars above the one-month
Treasury bill rate of the value weighted market portfolio in the US (MKT US) dashed
line, and the optimized one-month momentum strategy 3 FF, 5 FF and 5 FF including
momentum (FFC), together with the excess return of the Opt strategy over the MKT
strategy for transaction costs of 0, 5, 10, and 15 Bps (bottom). The analyzed period is
from July 1963 to June 2018. Source: Leippold et al. [2019]
implementable investment strategy. A client portfolio can dier from the model portfolio
due to frictions such as tax constraints for example. On average, advisor portfolios con-
tain 17 direct holdings, with mutual funds and ETFs the main instrument types. Advisor
portfolios are grouped in ve classes from conservative portfolios (<30% equities) to ag-
gressive portfolios (>80% ) in equities. This classication is comparable to the balanced,
growth, etc. classication which is common in Europe. Most advisor portfolios have an
equity weight of 50% to 65%.
A hierarchical factor model is used to compare the many dierent types of portfolios.
The rst factor level are macro factors economic growth, which is mostly accessed through
equities, real rates, ination, credit, EM and commodity. Each macro factor is proxied by
a representative portfolio. Economic growth is for example modelled as a weighted basked
of various equity indices from around the world. Advisor portfolios are dominated by
exposure to economic growth. For 88% of advisor portfolios economic growth risk account
for 74.7% of portfolio volatility. On average, rates and credit exposures explain 66.9%
and 21.8%, respectively, of xed income variation. But the U.S. Bloomberg Barclays
Aggregate Bond Index is 104.3% rates and -4.9% - investors are relative short rates and
long credit. Furthermore, advisor models are consistently short on duration to safeguard
4.6. BACKTESTS 387
against the potential of unexpected rising interest rates. The second level are style factors
which allow comparison within specic asset classes. For equities, they investigate the
exposure to value, momentum, small size and low volatility strategies. Advisors do not
have meaningful style factor exposures in equitye except for small size stocks. Table ??
summarizes statistics.
Table 4.18: Statistics for advisor portfolios. The total number of BlackRock Portfolios
collected between October 2017 and September 2018., as of September 30, 2018, is 9'940.
Ex ante average annual volatilities as of 9/30/2018. The benchmark for the Conservative
cohort is 11% S&P500, 4% MSCI All Country World ex US and 85% Bloomberg Barclays
U.S. Universal Index. For other cohorts the weights of the three indices vary. Fees are
in bps. (Lawler et al. (2018)).
The average number of individual equity holdings is 3.5 and the median is 2.5. Figure
4.28 shows the breakdown of macro factors across the dierent cohort with an increasing
exposure to equity for more aggressive advisors and an overall not signicant exposure
to style factors.
4.6 Backtests
Backtests are historical simulations of quantitative investment strategies. The tests com-
pute the P&L of the strategy if it had been run over that time period. The performance
is expressed using performance measures such as the Sharpe ratio,. Backtests often look
very promising for investment. But many practitioners fear that once they invested into
a backtested strategy, the backtesting-performance evaporates. This fear is justied if
statistics is not used appropriately, as it is too often the case.
Figure 4.28: Decomposing macro factor exposure. Other Macro means mostly exposure
to equity as of 9/30/2018. Source: Lawler et al. (2019)
ment strategist beliefs that the following mathematical proposition of Fermat regarding
prime numbers provides meaningful investment signals:
Proposition 80. For any prime number p, the division of 2p−1 by p always leads to a
remainder of 1.
Dividing 213−1 for example by 13 implies 315 plus the remainder of 1. This holds for
all prime numbers. But the converse is not true. If a division of 2
p−1 by p leads to a
remainder of 1, it does not imply that p is a prime number. But the converse is 'almost
true': There are very few numbers that satisfy the division property and are not prime.
In the rst 10, 000 numbers there are only seven such numbers, one of them is 1105.
He relates this prime number theorem to stock market performance as follow: Select
those stocks where one of these seven numbers are embedded in the CUSIP identiers.
32 Given the aforementioned seven numbers, there is only one CUSIP code that contains
such a number: CUSIP 03 110510. This CUSIP represents the stock Ametek. This stock
had exhibited, by the time of Lo's writing, extraordinary performance: a Sharpe ratio of
0.86, a Jensen alpha of 5.15, a monthly return of 0.017, and so on.
There is no reason why the link 'Prime Number Theorem - CUSIP Stock Selection '
should work in general. The relationship driven return is simple luck. Here the highly
32 A CUSIP is a nine-character alphanumeric code that identies a North American nancial security
for the purposes of facilitating clearing and settlement.
4.6. BACKTESTS 389
Consider an order statistics example. Assume that there are N = 100 IID securities
with standard normal distributed annual returns with mean of 10 percent and standard
deviation of 20 percent. The probability that the return of security k exceeds 50 percent
is then 2.3 percent.
33 It is unlikely that security k will show this strong return. But if
we ask for the winner return
34 - that is to say, the probability that the maximum return
will exceed 50 percent, the probability is 90 percent.
But this winning question does not tell us anything about the nature of the winning
stock since they are IID distributed. Nothing can be inferred about the future return if
one knows at a given date which stock is the winner. Choosing today the past winner
and predicting that it will also be the future winner is data snooping. The prediction is
only related to luck.
4.6.2 Overtting
Researcher in investments algorithms often publish their results using in-sample results
where the number of trials is not stated. Not reporting the number of all trials increases
the probability of overtting: The published investment algorithm fails to t additional
data or predict future observations reliably. There is risk that a high Sharpe ratio in-
sample but with zero Sharpe ratio out-of-sample is reported. Consider an investment
algorithm for stock investment where 1000 paths are simulated. If one selects and pub-
lish the best performing path, then all investors using this algorithm will be disappointed.
The example is from Bailey et al. (2014). Consider an IID sequence of normal
returns with mean µ and volatility σ. The annualized Sharpe ratio can be computed as
(Lo (2002))
µ√
SR = T
σ
where T is the number of returns per year. The true values of the drift and volatility are
not known. Hence they are estimated, leading to an estimated annualized Sharpe ratio
SR. Lo proves that this estimate converges asymptotically for large y, the number of
years used to estimate the Sharpe ratio, to:
2
1 + SR
2T
SR → N (SR, ).
y
SR → N (0, 1) .
The following proposition is key:
3.5
2.5
Expected Maximum
1.5
0.5
0
5 55 105 155 205 255 305 355 405 455 505 555 605 655 705 755 805 855
Number of Trials N
Figure 4.29: Overtting of backtests for µ = 0 and y = 1 and minimum expected backtest
length.
The results carry over to y 6= q by scaling the above result. Again, the more indepen-
dent congurations a researcher tries, the more likely is overtting. Hence, increasing N
means to use a higher acceptance threshold for the backtested result to be trusted. In-
creasing the sample size y, the above overt problem can be at least partially mitigated.
This means that a minimum backtest length can be calculated such that one does not
selects an in-sample strategy with Sharpe ratio the expected maximum one but which
has an expected out-of-sample Sharpe ratio of zero, see Figure 4.29.
4.6. BACKTESTS 391
This trade-o implies for say 6 years data at hand, that no more than 100 independent
model congurations should be tried. Else, almost surely strategies are produced with
positive Sharpe ratios in-sample but zero ones out-of-sample. The authors state: A
researcher that does not report the number of trials N used to identify the selected backtest
conguration makes it impossible to assess the risk of overtting.
The Financial Math Organization present some questions and answer to overtting
in a blog
35 which relates to the paper of Bailey et al. (2014). We focus on some questions
and answers.
• Why do so many quantitative investments fail? ... some of the most successful in-
vestment funds in history apply rigorous mathematical models (..., Winton, Citadel,
...). Many of them are closed to outside investors, and the public rarely hears about
them. This void is often lled by pseudo-mathematical investments, which apply
mathematical tools improperly as a marketing strategy. One of the most widely
misunderstood experimental techniques is historical simulation, or backtesting.
• Is it true that every backtest is intrinsically awed? Not at all. ... The purpose of
our research is to highlight how easily backtest results can be manipulated, ...
• Can the 'hold-out', i.e., reserving a testing set to validate the model discovered in
the training set, method prevent overtting? Unfortunately, this method cannot
prevent overtting. ... Perhaps the most important reason for hold-out's failure is
that this method does not control for the number of trials attempted. If we apply
the hold-out method enough times (say 20 times for a 95% condence level), it is
expected that we will obtain a false negative (i.e., the test fails to discard an overt
strategy). ...
• Are you saying that Technical Analysis is a form of charlatanism? No. Technical
analysis tools rely on a variety of lters that make them prone to overtting. We
are simply stating that technical analysts and their investors should be particularly
aware of the risks of overtting. When the probability of backtest overtting is
correctly monitored, technical analyses may provide valuable insights to investors.
√
SR = T ×t . (4.72)
Therefore for a xed time horizon an increasing SR implies an increasing t-ratio which
implies a higher signicance level, and vice versa for the other direction. This is equivalent
to a lower p-value for a single strategy test:
√
pS = P (|R| > t) = P (|R| > SR T ) . (4.73)
Assuming a distribution for the returns, a distribution for the t-statistics and hence the
Sharpe ratio follows. Summarizing, if SR is the right measure to value performance,
(4.72) states that this is one-to-one related to the t-statistics. Back to a specic trading
strategy protable test, assuming normality and that the strategy is not protable (hy-
pothesis is null), then the chance to make an error of the rst kind is 5 percent: decide
to reject, meaning to implement strategy which would lose money. Since the hypothesis
is null, the rejection was false - a false discovery happened. What is the appropriate
p-level if multiple tests are used? A practitioners is to apply ad hoc rules in their back-
testing rules: Discount the Sharpe ratios of single test in the backtests by 30% or even
50%. While easy to implement, this approach fail to have any justication.
36 To prove this let pM be the p-value for the multiple test dened as:
pM = P ( max |Ri | > t) = 1 − (1 − pS )N . (4.74)
i=1,...,N
Using the standard t = 2 value for a single test or equivalently pS = 5%, implies for N = 100 pM = 99%.
The search for a strategy which is at least as protable as the observed strategy largely reduces the
statistical signicance of the single test. In this sense pM is seen as the adjusted p-value which takes
data mining into account. Equating the two p-values, the adjusted or haircut Sharpe ratio follows which
is smaller than SR (since pM > pS ).
4.6. BACKTESTS 393
Consider a hedge fund manager using Commodity Trading Advisors (CTAs) strate-
gies. That is, he relates detected changes of trend in the securities to changes in the
exposure. Dierent parameters dene the change detection such as length of the time
series to calculate the moving averages, tresholds to enter and to exit and from a risk
management perspective, stop-loss trigger points. Given that many assets are tested, the
number of combinations are millions of even billion ones. Suppose that each strategy is
individually tested, say by calculating the Sharpe ratio for each trial and test its signif-
icance on 95% level. Given the large number of individual tests, multiple testing raises
the concern that an increasing number of them will be positive purely due to chance.
That is, a large fraction of the individual tests that ex post are positive will be false
discoveries, i.e. are due to chance. If the false discovery rate is 100%, the signicance
of all individual tests is completely uninformative.
The conservative FWER rule was improved by Holm (1979) and Benjamini and
Hochberg (1995). They proposed to allow non-performing strategies as long as the there
are enough performing ones. In doing this, we gain power to detect the skill-full man-
agers. But how many non-performing strategies are we willing to accept? We x the rate
of false discoveries (FDR) to 20% - we are willing to accept that out of ve strategies
one is a non-protable one. Assume that 2 out of the 100 strategies add value while the
other ones destroy wealth. Benjamini and Hochberg found an upper bound, i.e. even if
all 100 strategies are null, we will get our 20 percent by adjusting the threshold. In this
case it also follows that the strategies are normally distributed. If some strategies are
protable, then we get a better rate than 20 percent.
How do we nd the threshold which gives the chosen FDR? The theory is rather in-
volved, but an algorithm is used to derive the correct threshold. We expect100×0.05 = 5
signicant variables. Starting with the p-value of 2 6 variables: We get only
we get say
one skill-full manager while 5 have no skills. The ratio 5/7 = 71% is much higher than
the 20% accepted FDR. The algorithm then increases the p-value from 2 such that the
ratio of expected to observed variables becomes 20%. The resulting number of observed
variables s such that the division of the expected variables by s equals the FDR rate
says that if we know that there are 2 performing strategies among the 100 ones, then
by controlling the FDR to 20%, the test has the power to discover the strategies s. In
variables selection terms, we are willing to add estimation noise to our model (variable
which is not important) as long as we add relevant information as well (include more
relevant variables).
If the tests are dependent, then pM depends on the joint distribution of all N sin-
gle test statistics. To limit the occurrence of incorrectly discovered protable strategies
- false rejections of the null hypothesis occurs more likely than in a single test - two
methods are used: The method which controls the family-wise error rate (FWER) and
the control of the false discovery rate (FDR). Both methods dene type I errors in mul-
tiple testing thus generalizing type I error probabilities for single tests. Summarizing,
394 CHAPTER 4. PORTFOLIO CONSTRUCTION
FDR conceptualizes the rate of type I errors in null hypothesis testing when conducting
multiple comparisons. FDR-controlling procedures are designed to control the expected
proportion of discoveries, i.e. rejected null hypotheses that are false (incorrect rejections).
Formally, we denote by R the number of rejections, N the tested hypotheses and N0|r
the fraction of false discoveries.
FDR considers the proportion of false rejections and it is based on the false discovery
proportion (FDP), the proportion of type I errors dened by
N0,r , Fraction of false discoveris if R > 0;
FDP = (4.76)
0, if R = 0.
FDR measures the expected proportion of false discoveries among all discoveries, i.e.
F DR = E[F DP ]. Given the type I error denitions, p-value adjustments control for
data mining. Based on the adjusted p-values, the corresponding t-ratios are transformed
into Sharpe ratios. There are dierent methods to transform p-values. Two methods for
FWER are:
Bonferroni's Method:
pBonf
(i) = min(N p(i) , 1)
Holm's Method:
pHolm
(i) = min(max(N − j + 1)p(i) , 1) .
j<i
For FDR the method of Benjamini, Hochberg, and Yekutieli (BHY) reads
pBHY
(i) = p(N ) if i=M
and if i≤M −1
N c(N )
pBHY
(i) = min(pBHY
(i+1) , p(i) )
i
PN 1
with the normalization constant c(N ) = k=1 k and where the p-values are ordered
descending in the algorithm.
8 × 2.72
pBHY
(7) = min(0.5485, 0.16758) = 0.5209
7
4.6. BACKTESTS 395
√
Fund Ret Vol SR T t-stat t-value p-value
Energy -19,58 16,16 -1,21 1,41 -1,71 0,95637 0,08726
Diversied Dividend 6,70 3,87 1,73 1,41 2,45 0,99266 0,01468
Multi-Asset Income 1,58 3,70 0,43 1,41 0,60 0,72575 0,54850
Global RE Income 5,14 2,14 2,40 1,41 3,40 0,99966 0,00068
Low Vol Equity Yield 8,03 5,38 1,49 1,41 2,11 0,98257 0,03486
Low Volatility Yield 7,77 5,37 1,45 1,41 2,05 0,97982 0,04036
Real Estate 9,20 9,37 0,98 1,41 1,39 0,91621 0,16758
Dividend Income 9,25 4,37 2,12 1,41 2,99 0,99861 0,00278
Table 4.19: 8 investment funds from Ivesco. Data from January 2015 to December 2016.
(Engesser (2018)).
and the other adjusted p-values follow in the same way. Doing the calculation, we ob-
serve that all p-values increased except the highest one and that only two of them,
pBHY
(2) = 0.0302, pBHY
(1) = 0.0148 are statistically signicant compared to the ve signi-
cant strategies in 4.19 before correcting the p-values.
The next example considers the FWER for adaption to the momentum strategy
following the construction of Kenneth French. He considers all stocks on NYSE and
NASDAQ, where six portfolios are formed according to the market cap (small, big) and
historical returns (high, medium and low). We consider data from July 1963 to December
2012, i.e. 594 monthly returns. The null hypothesis is that returns are not dierent from
zero. Calculating rst the performance of the strategy without any adjustments using
the Sharpe ratio we get:
µ√ 0, 7 √
SRp.a. = 12 = 12 = 0.57.
σ 4.29
Calculating the p-value using
First, the multiple testing problem using the FDR control replace the single testing
problem p-values. Second, there must be a huge number of putative papers that did not
nd any signicant explanation for the cross section of expected returns. These papers
were never published and hence their information content did not enter the traditional
statistical setup. There are two reasons for these non-publications. You don't make an
academic career in nance by publishing non-results and it is also dicult to publish
a replication of a successful argument. There is a bias toward publishing papers that
establish new factors. Third, Lewellen et al. (2010) show that the explanatory powers of
many documented factors are spurious using cross-sectional R-squared and pricing errors
to judge the success of new factors. The Fama-French 25 size-B/M portfolios in their
three factor model explain more than 90%(75%) of the time-series variation in portfolios'
returns (cross-sectional variation in their average returns). Any new factor added to this
model which is correlated with size and value but not with the residuals will produce a
large cross-sectional R-squared.
Harvey et al. (2015) apply the false discovery proportion (FDP) and the false discov-
ery rate (FDR). The authors derive the following results. Between 1980 and 1991, only
one factor is discovered per year growing to around ve factors in the period 1991 - 2003.
In the last nine years, the annual FDR has increased sharply to around 18: 164 factors
were discovered in the last nine years, doubling the cumulated 84 discovered factors of
the past. They calculate t-ratios for each of the 316 factors discovered, including those
in working papers. The vast majority of t-ratios exceed the 1.96 benchmark and the
non-signicant factors typically belong to papers that propose a number of factors.
The authors apply their method rst to the case in which all tests of factor cross-
section returns are published. This false assumption denes a lower bound of the true
t-ratio benchmark. They obtain three benchmark t-ratios, two of which we describe:
• Factor-related sorting results in cross-sectional return patterns that are not ex-
plained by standard risk factors. The t-ratio for the intercept of the long/short
strategy returns regressed on common risk factors is usually reported.
• Factor loadings as explanatory variables. They are related to the cross section of
expected returns after controlling for standard risk factors. Individual stocks or
stylized portfolios (for example FF 25 portfolios) are used as dependent variables.
The t-ratio for the factor risk premium is taken as the t-ratio for the factor.
They transform the calculated t-ratios into p-values for all three methods. Then, these
p-value are transformed back into t-ratios, assuming that standard normal distribution
accurately approximates the t-distribution, see Figure 4.30
Figure 4.30 presents the benchmark t-ratios for the three dierent methods. Using
Bonferroni the benchmark t-ratio starts at 1.96 and increases to 3.78 by 2012 and will
reach 4.00 in 2032. A corresponding p-values for 3.78 is for example0.02 percent which
is much lower than the starting level of 5 percent. Since Bonferroni detects fewer discov-
eries than Holm, the t-ratios of the later one are lower. BHY t-ratio benchmarks are not
4.6. BACKTESTS 397
Figure 4.30: The green solid curve shows the historical cumulative number of factors
discovered, excluding those from working papers. Forecasts (dotted green line) are based
on a linear extrapolation. The dark crosses mark selected factors proposed by the lit-
erature. They are MRT (market beta; Fama and MacBeth [1973]), EP (earnings-price
ratio; Basu [1983]), SMB and HML (size and book-to-market; Fama and French [1992]),
MOM (momentum; Carhart [1997]), LIQ (liquidity; Pastor and Stambaugh [2003]), DEF
(default likelihood; Vassalou and Xing [2004]), IVOL (idiosyncratic volatility; Ang, Ho-
drick, Xing, and Zhang [2006]), DCG (durable consumption goods; Yogo [2006]); SRV
and LRV (short-run and long-run volatility; Adrian and Rosenberg [2008]), and CVOL
(consumption volatility; Boguth and Kuehn [2012]). T-ratios over 4.9 are truncated at
4.9 (Harvey et al. [2015]).
monotonic but uctuate before the year 2000 and stabilize at 3.39 after 2010.
Figure 4.30 shows the t-ratios of a few prominent factors - the main result in this
section:
398 CHAPTER 4. PORTFOLIO CONSTRUCTION
The authors extend the analysis by testing, for example, for robustness and assuming
correlation between the factors. The above results did not change notably. The analysis
suggests that a newly discovered factor today should have a t-ratio that exceeds 3.0,
which corresponds to a p-value of 0.27 percent. The authors argue that the value of
3.0 should not be applied uniformly. For factors derived from rst principles, the value
should be less.
Harvey et al. (2015) - Many of the factors discovered in the eld of nance are likely
false discoveries: of the 296 published signicant factors, 158 would be considered false
discoveries under Bonferonni, 142 under Holm, 132 under BHY (1%) and 80 under BHY
(5%). In addition, the idea that there are so many factors is inconsistent with the princi-
pal component analysis, where, perhaps there are ve 'statistical' common factors driving
time-series variation in equity returns (Ahn, Horenstein and Wang (2012)).
4.6.5 p-Hacking
In general, p-hacking means to push down the p-value to create signicance. For ex-
ample, testing multiple hypotheses increases the likelihood of false results. That is, the
null hypothesis is rejected, although it is correct: The p-value is actually larger and not
signicant. Chordia, Goyal and Saretto (2017) show how the published performance of
investment strategies is doubtful since the manner in which they are evaluated does not
align with research quality standards. First, there is a publication bias since only those
strategies that are signicant are reported as only they have a viable path to publication.
Second, data snooping leads to a number of false rejections of the null. Finally, a number
of data choices, test procedures, and samples may be tried until a signicant result is
discovered and only the signicant result is reported. All this is referred to p-hacking.
They use all accounting variables on Compustat data base and basic market variables
on CRSP data base. They construct all possible trading signals based on the data item
of Compustat satisfying minimal requirements. The signals consist of all types levels and
growth rates, ratios of two levels or growth rates, i.e.
x1 − x2
x3
and all possible permutations. This leads to a total of approximatively 2.1 million sig-
nals in 1972-2015. It is clear, that most of these signals are economically meaningless
combinations of items. But this large sample accounts for existing and yet to be studied
trading strategies. Using this sample they ask whether they can put a bound on the
magnitude of p-hacking and furthermore, after accounting for p-hacking, how likely is a
4.6. BACKTESTS 399
The authors use FDP to control the proportion of false discoveries, since the trading
strategies are not independent of each other (cross-correlation in stock returns) and FDP
deliver statistical cutos that rely on the cross-correlations present in the data. They
calculate measures of risk-adjusted performance for each strategy by rst constructing a
long-short portfolio based on the top and bottom decile of each signal's distribution, com-
puting portfolio alphas using the Fama and French (2015) ve factor model augmented
with the Carhart (1997) momentum factor and they calculate the Fama and MacBeth
(1973) (FM) coecient for each signal.
Imposing a tolerance of 5 percent FDP and the same signicance level, the critical
value for alpha t-statistic is 3.79 (for FM it is 3.12). This numbers are comparable to
those of Harvey et al. (2015). At these thresholds, 2.76 percent of strategies have signi-
cant alphas and 10.80 percent have signicant FM coecients.
37 Using single hypothesis
testing (SHT) with t-statistic higher than 1.96 rejects the null hypothesis in about 30
percent of the cases for both alpha and FM t-statistics. The majority of the discoveries
(rejections of the null of no predictability) based on SHT without accounting for the very
large number of strategies that are never made public are likely false.
The authors add economic reasoning to this so far purely statistical considerations to
gain more robust conclusions. They impose consistency between performance measures
obtained by portfolio sorts (alpha) and those derived from FM regressions. Eliminating
strategies that have statistically signicant t-value for alpha but insignicant for FM, or
vice-versa, reduces the number of successful strategies to 806 under MHT and to 33,881
under SHT.
The second restriction are economic hurdles based on the Sharpe ratio, i.e. they elim-
inate strategies that do not have a Sharpe ratio higher than that of the value-weighted
market portfolio. Imposing the two economic hurdles leaves us with 17 strategies that
are both statistically and economically signicant under MHT and 801 under SHT. The
the likelihood of a researcher nding a truly abnormal trading strategy tends to zero.
Surprisingly, the 17 surviving strategies fail to have any economic meaning - the
sorting makes no economic sense of these strategies. The authors conclude that the
standard of market eciency is as strong as ever. A dierent conclusion is that while
accounting and economic based sorting is meaningless, this could be dierent for nancial
market signal based sorting such as implied vs realized volatility, credit basis trades or
carry trades.
37 The larger critical values for FM than for the alphas are due to the longer tails of the former one.
400 CHAPTER 4. PORTFOLIO CONSTRUCTION
r
2
E(R) = σ ∼ 0.8 × σ ≡ 80% percentile .
π
Since risk scales with the square root of the number of trades, risk equals for n trades
√
nσ . Consider two portfolio managers. One manager is always successful; the other
is successful in x% of all trades. Both trade n times. The information ratio (IR), the
measure of a manager's generated value, measures the excess return of the active strategy
over risk:
Excess Return Active Strategy over Benchmark
IR = , (4.77)
Tracking Error (Active Risk)
where the tracking error is the standard deviation of the active return. For the investor
with 100% success rate, we get
q
2
nσ
r
π 2n
IR = √ =
nσ π
The trader with a success rate of x percent faces a loss in 1 − x percent of the trades
leading to a net prot x − (1 − x) = 2x − 1. Hence, after n trades
r r
2 2n
Ex (R) = (2x − 1)nσ , IRx = (2x − 1) . (4.78)
π π
For a xed success rate x an increasing trading frequency n increases the information
ratio. But raising the trading frequency brings about diminishing returns due to the
38 E(R) = x2
q
.
R∞
√ 1
2πσ 2 0
e− σ dx = σ 2
π
4.6. BACKTESTS 401
Percentile IR
90 1
75 0.5
50 0
25 -0.5
10 -1
The skill versus frequency of trading (breadth) trade-o reads qualitatively, see (4.78),
IR
x∼ √ (4.79)
n
is of dierent severity for dierent asset classes. Many investors in interest rate risk
trade one a monthly or quarterly basis since they are exposed to fundamental economic
variables. They cannot increase their trading frequency arbitrarily. To achieve a high IR
they need to be very successful. But if markets are ecient, this is not possible. One
expects to observe more skills within (global) asset managers which can exploit inecien-
cies between dierent markets. It is easier to increase the IR by increasing the trading
frequency but this increases trading costs. Beside the naive approach to trade more often
other methods are to enlarge the set of eligible assets for the asset managers or to expand
the risk dimension by allowing investment strategies which generate separate risk premia.
Following this rst example, add some structure to the discussion. Skill have dier-
ent meanings. In its basic form a measure of skill is a hit ratio. It accounts for playing
well a game. This is not a statistical measure. The information coecient IC is such a
statistical measure of skill.. The measure correlates forecast residual return with ex post
residual return. The information ratio relates skill, say IC, directly to capital market
theory such as the CAPM, i.e. by assuming specic IC properties and investor decision
process.
The IR has similar to the alpha an ex-post and an ex-ante interpretation. Ex-post
it measures an achievement; the a ratio of (annualized) residual return to (annualized)
residual risk. Such a realized IR is often negative and in a return regression it is related to
the t-statistic one obtains for the alpha. Roughly, the IR is equal to the alpha's t-statistic
divided by the square root of observation years. The ex-ante IR measures opportunities
given by the expected level of annual residual return per unit of annual residual risk.
402 CHAPTER 4. PORTFOLIO CONSTRUCTION
Proposition 84. Consider mean-variance portfolio optimization where the optimal ac-
tive weights φA maximize the utility function µA − λσA 2 with the expected active return
and active return variance. If the residual stock returns are uncorrelated and if no budget
constraint is imposed, then:
√
IR ∼ IC BR = Skill × Frequency , (4.80)
where IC is the information coecient of the manager and BR - the strategy breadth
- is the number of independent forecasts of exceptional returns we make per year.
IC measures the correlation between actual realized and predicted returns and pro-
vides a measure of a manager's forecasting ability. Equation (4.80) states that the in-
vestors have to play often (high BR) and play well (high IC) to win a high IR. The
fundamental law (4.80) is additive in the squared information ratios. Formula (4.77)
√
shows the same intuition: 2x − 1 represents IC and n represents BR. The derivation
of (4.80) depends on several assumptions, see Buckle (2005) for a review of the assump-
tions. Roughly on a behavioral sid, the portolio manager knows the metric of skill and
h optimizes skill, according to a model, say the CAPM. Regarding securities, the same
skill level applies to all asset choices and the sources of information are independent -
forecasts are unbiased and residual returns have zero expected value. Next, the infor-
mation coecient is a small number and the impact of estimation error in investment
information on out-of-sample optimized investment performance is not considered. Some
consequences following Grinold (1999)are:
Portfolio Alpha αp
IR = = . (4.81)
Portfolio Residual Risk p
4.6. BACKTESTS 403
i.e. residual risk orthogonal to the systematic return. The objective of an active mean-
variance asset manager is to maximize:
θ
E(u) = αp − 2p . (4.83)
2
Replacing the alpha by the IR using (4.82) implies the optimal level of residual risk:
IR
∗p = . (4.84)
θ
Using the fundamental law,
√
IR IC BR
∗p = = . (4.85)
θ θ
The breadth allows for diversication among the active bets and skill increases the pos-
sibility of success so that the overall level of aggressiveness ∗ can increase.
A manager wants to forecast the direction of the market each quarter. The market
direction takes only two values - up and down, i.e. the random variable x(t) = ±1
with mean zero and standard deviation 1. The forecast of the manager y(t) takes the
same values and has the same mean and standard deviation as x(t). The information
coecient IC is given by the covariance of x and y . If the manager makes N bets and is
correct N1 times (x = y) and wrong N − N1 times (x = −y), then
1
IC = (N1 − (N − N1 )) . (4.86)
N
The fundamental law of active management has been generalized. One reason is that
the IR given in (4.80) seems to overestimate the IR which a portfolio manager can
reach. Assume a forecast signal with an average monthly IC of 0.03 and a stock universe
of 1, 000, Then, the expected annualized IR from (4.80) is 3.29. This is beyond what the
best portfolio managers can realize. Ding (2010) generalizes the law by considering time
series dynamics and cross-sectional properties. He shows that cross-sectional ICs are
dierent from time-series ICs and that IC volatility over time is much more important
for a portfolio IR than breadth: Playing a little better has a stronger impact on the IR
than playing a little more often. He proves
IC √
IR = p BR , (4.87)
1 − IC2
404 CHAPTER 4. PORTFOLIO CONSTRUCTION
cov(R, g)
E(R|g) = E(R) + (g − E(g)) . (4.88)
var(g)
The covariance term is the IC. This equation relates forecasts that dier from their
expected levels. The rened forecast is then dened as the dierence between E(R|g)
and the naive forecast E(R), the consensus expected return. It is the informationless
forecast. The naive forecast leads to the benchmark holdings.The forecast formula has
the same structure as the CAPM or any other single factor model. This is not a surprise
but follows from a linear regression analysis.
We analyze how skill-full are fund managers. Scaillet et al. (2013) use the FDR
to control for false discoveries or mutual funds that exhibit signicant alphas by luck
alone. They estimate the proportions of unskilled, zero-alpha, and skilled funds in the
population. A fund is unskilled if the return from stock picking is smaller than the costs
(alpha is negative net of trading costs and expenses), a zero-alpha fund if the dierence
is zero, and a skilled fund otherwise (alpha is strictly positive).
We consider the distribution function for the three groups unskilled, zero-alpha, and
skilled funds. Grouping the three distribution functions as a function of the t-statistics,
we have three density functions with the zero-alpha group density function in the middle,
see Figure 4.31. The two density functions overlap - unskilled overlaps with zero-alpha
and zero-alpha with skilled. Pick the latter region of overlap. If a fund has a high enough
t-value, then if the fund belongs to the group of zero-alpha funds, the probability of this
4.6. BACKTESTS 405
fund having the high t-value is driven by luck. Therefore, in the cross-section distribution
of all funds, some funds with high t-values are genuinely skilled and others are merely
lucky.
Figure 4.31: Intuition about luck and skill for the three groups of mutual funds unskilled,
zero-alpha and skilled. (Scaillet et al. [2013]).
Of course, it is not possible to observe the true alphas for each fund. The inference
for the three skill groups is carried out as follows. First, for each fund, the alpha and its
standard deviation are estimated. The ratio of the two estimates denes the t-statistic.
Choosing a signicance level, the t-estimate lies within or outside the threshold implied
by the signicance level. Estimates outside are labelled signicant. The FDR measures
the proportion of lucky funds among the funds with signicant estimated alphas. The
data set are monthly returns of 2, 076 actively managed US open-end, domestic equity
mutual funds that existed at any time between 1975 and 2006 (inclusive).
Of the funds, 75.4 percent are zero-alpha, 24.0 percent are unskilled, and 0.6 percent
are skilled. Unskilled funds under-perform for long time periods. Aggressive growth
funds have the highest proportion of skilled managers, while none of the growth and
income funds exhibit skills. During the period 1990-2006, the proportion of skilled funds
decreases from 14.4 to 0.6 percent, while the proportion of unskilled funds increases from
406 CHAPTER 4. PORTFOLIO CONSTRUCTION
9.2 percent to 24.0 percent. Although the number of actively managed funds increases
over this period, skilled managers have become exceptionally rare. This is also reected
in a decreasing overall alpha in the period reaching -1% in 2016, see Figure 4.84. These
facts seem to be a good motivation for passive investments.
What could be reasons for these facts, although the education level of the average
asset manager increased during the two decades? After the peak in 1993 when the alpha
started to decline, the internet was launched. The cost of information started to decrease
over time. Therefore markets became more and more ecient. In other words, luck has
become more important than skill over time. But luck is not persistent. This leads to
an overall decreasing alpha of the industry. They authors test whether funds lose their
outperformance skills due to their increasing size. They treat each ve-year fund record
as a separate 'fund' and nd that the proportion of skilled funds equals 2.4 percent,
implying that a small number of managers have 'hot hands' over short time periods.
Figure 4.32: Proportion of unskilled and skilled funds (Panel A) and total number of
mutual funds in the US versus average alpha (Scaillet et al. [2013]).
Skilled funds are concentrated in the extreme right tail of the estimated alpha distri-
bution. This suggests a way to detect them. If in a year tests indicate higher proportions
of lucky, zero-alpha funds in the right tail, then the goal is to eliminate these false dis-
coveries by moving further to the extreme tail. Carrying out this control each year, they
nd a signicant annual alpha of 1.45 percent. They also nd that all outperforming
funds waste, through operational ineciencies, the entire created surplus.
The authors re-examine the relation between fund performance and turnover, expense
ratio, and size. For each characteristic, the proportion of zero-alpha funds is around 75%.
The proportion of unskilled funds is qualitatively larger for funds with high turnover -
many unskilled funds trade on noise to pretend that they are skilled. The size of the
fund has a bipolar eect: Both the proportion of unskilled and skilled funds are larger
than for smaller funds.
What about European funds? Scaillet (2015) considers 939 open-end funds between
2001 and 2006. The main ndings are rst, the proportion of zero-alpha funds is 72.2
percent, the proportion of skilled funds is 1.8 percent, and the proportion of unskilled
funds is 26 percent. Second, in skilled funds, we nd low betas with respect to MSCI
Europe. Some skilled funds are known to play bonds and depart from their pure equity
mandates.
First, they do not consider equity markets only but also take into account a multi-
risk factor analysis for the xed income mutual funds. Risk factors change in the level,
slope, and curvature of the local yield curve, together with a credit spread. Second,
they compare value-weighted returns of active against index mutual funds within the
same investment category. This allows them to avoid choosing multi-factor benchmarks
and they can compare two investable alternatives where in both alternatives the corre-
sponding friction costs and restrictions are included. They use 30 dierent investment
categories across asset classes. Finally, they distinguish between retail and institutional
funds and they change the statistical methods of last section.
We consider the last point in more details. The studies of Scaillet et al. (2010) and
Fama and French (2010) state or assume that autocorrelation is of minor importance.
Leippold and Ruegg test for autocorrelation in mutual fund returns using a distribution-
free test. They nd that already in the rst three lags serial dependence can be found for
20 percent of single mutual funds and 30 percent of mutual fund portfolios. This evidence
calls for temporal dependence control in the analysis of single and portfolios of mutual
funds alphas against dierent benchmark models. They suggest to block-bootstrap the
alpha of a strategy to its benchmark returns, see Ledoit and Wolf (2008, 2011). This im-
408 CHAPTER 4. PORTFOLIO CONSTRUCTION
proves inference accuracy for dependent time series data and the bootstrapped t-statistics
and p-values are then the inputs in the multiple hypothesis frameworks, see Romano and
Wolf (2005a). Since the authors test whether single active or index funds signicantly
outperform the theoretical multi-factor models, there are many hypotheses and thus they
use the FDR. For portfolios of mutual funds there are only a few hypotheses and they
use the FWER.
Figure 4.21 summarizes some ndings which are comparable to those of the former
section. The result shows the dierences between retail and institutional funds, for ex-
Retail US Glob. EU Jap Asia Aver USD CHF EUR GBP Aver
Active
Zero alpha 55.1 39.6 66.2 67.9 83 62.3 38.9 71 77.8 83.3 67.8
Skilled 0 0 3 5.7 0 1.7 23.3 3.3 22.2 16.7 16.4
Unskilled 44.9 60.4 30.8 26.4 17 35.9 37.8 25.7 0 0 15.9
Index
Zero alpha 61.9 30.1 76.5 73.7 100 68.4 41.6 93.3 71.6 100 76.6
Skilled 0 0 3.6 5.9 0 1.9 29.2 6.7 28.4 0 16.1
Unskilled 38.1 69.9 19.9 20.4 0 29.7 29.2 0 0 0 7.3
Instit.
Active
Zero alpha 69.3 53.5 78.5 88.2 97.4 77.4 38.5 77.4 60.1 82.7 64.7
Skilled 0 0 8.2 9.4 0 3.5 40.7 22.6 39.9 17.3 30.1
Unskilled 30.7 46.5 13.3 2.4 2.6 19.1 20.8 0 0 0 5.2
Index
Zero alpha 66.9 55.7 91.9 92.5 90.6 79.5 57.5 71.4 56.5 95 70.1
Skilled 0 0 6.8 0 0 1.4 21.3 26.2 43.5 0 22.7
Unskilled 33.1 44.3 1.4 7.5 9.4 19.1 21.3 2.4 0 5 7.2
Table 4.21: For equity the ve-factors benchmark model including the regional model of
the Fama and French homepage for MKT, SMB, HML and WML and AQR homepage for
BAB. For the xed income benchmark model the four factors are 'shift', 'twist', 'buttery'
and the spread of the BBB to the AAA credit spread from MSCI. The Morningstar
database from Dec 1991 to Dec 2016 includes 61,269 funds (Source: Leippold and Ruegg
[2018]).
ample the percentage of skilled active institutional funds with 3.5% compared to 1.4%
and 1.9% for skilled single mutual funds. For the active and index mutual funds only
managers in Europe and Japan have skills. For xed income funds the number of zero
alpha funds is lower. The highest skills are observed in the US and Euro market.
Figure 4.33 represents the hall of fame of successful investors which prove to out-
perform the S&P500 for at least more than 10 years The only persistent quantitatively
managed investments from Renaissance is based on top secrecy about the used methods
4.6. BACKTESTS 409
and the hiring of top scientists from the natural and IT sciences which apply algorithms.
Only one money manager of the alternative investment group is listed in the hall of fame.
Furthermore, it is notable that the macro investors dominate the fundamental investors
which cannot be grouped to the Buet/Graham school. Finally, the appearance of Lord
Keynes shows that it was possible to successfully outperform the US markets in days
where technology was in a state of infancy but instead relying on deep understanding of
the macro economy.
Figure 4.33: Hall of Fame of investors (gurufocs, Hens and FuW [2014]).
410 CHAPTER 4. PORTFOLIO CONSTRUCTION
Chapter 5
The process in Figure 5.27 can be split into two steps. Raw data are transformed into
model variables such as averages, aggregates, conditioning of the raw data. The raw data
is complex, huge, structured and unstructured. The second step is to generate outputs
using algorithms. Pre-processing the data such that they can used by the algorithm
requires much more time than using the alorithms afterwards. In the last years many tools
and software packages were designed to master the complexity of data pre-processing.
The data are not only available in dierent formats, they are also not complete, have
dierent integrity properties, are intermittently exible and are only partially digitized.
Thanks to the innovation in data pre-processing, analytics using algorithms are in the
focus.
1 The sources in this section are Lin (2015), Roncalli (2014), McKinsey Global Institute (2011, 2013),
Varian (2013), Hastie et al. (2009), Harvey et al. (2014), Novy-Marx (2014), Bruder et al. (2011), Freire
(2015), Fastrich et al. (2015), Zou (2006), DeMiguel et al. (2009), Belloni et al. (2012), Burges (1998),
Smola and Schölkopf (2004), Jaakokla (2006).
2 1 peta means 1015 or 1.000 trillions.
411
412 CHAPTER 5. ASSET MANAGEMENT INNOVATION
Internal
Structured
Develop/
Prediction Retain/
Acquire
Semi-Structured Visualization
Unstructured Costumers
Clustering Optimize
S
F 1 Peta F Pricing
Analytics
External
Improve
Learning Products
Structured
Algorithms Marketing
Unstructured
Figure 5.1: Denition of big data adapted from Roncalli [2014]. The economic value of
the big data process starts at the end with a clear business perspective.
1. develop customers,
2. retain customers,
3. acquire customers.
4. digitize all documentation work such as trade conrmations, legal contracts, work
ow documentation (Legal - and Reg Tech).
The market for big data rose from USD 7.3 bn in 2010 to 130 bn in 2016 and to
USD 189 bn in 2019 (wikibon.org, Forbes, IDC). The revenues for the providers of large
data are distributed in large data hardware, software and service revenues. Large IT
5.1. BIG DATA 413
companies like IBM, HP or Dell dominate in absolute revenues. But the share of total
sales in these companies is still at a low single-digit percentage. New companies with
large big data revenues are Palantir and Pivotal.
Articial intelligence (AI), machine learning and deep learning are dierent concepts,
see Figure 5.2.
3 Originally, the goal of articial intelligence (AI) was to model the
AI
Big Data
ML
Deep Learning
Supervised (Classifikation)
Unsupervised (Patterns)
Reinforcement
brain as an articical neural network. Some research still heads in this direction. But
many apply AI as a tool today to solve problems in engineering, nance, economics,
marketing, etc. Machine learning (ML) is a narrower concept. ML, a statistical theory,
extends well-known methods such as linear regression to situations where the data set
is enormous or where the linearity assumption is not suitable. While econometrics is
based on causal inference, ML is not. ML is based on prediction and categorization
using optimization. A learner or algorithm detects characteristics on a training set such
as typical words in email spamming and applies the insight to new emails. Being a
probabilistic theory there can be errors. In the task to classify emails into spam and
non-spam ones, the word 'casino' is labelled as a spamming indicator. But the word can
also appear in a non-spamming email. While human learners can rely on common sense
to lter the meaning of such a word, a machine learner needs well-dened principles in
order of not reaching useless conclusions. Basic is the incorporation of prior knowledge
or a hypothesis that biases the learning mechanism; the inductive bias. Evidently, there
is a trade-o between too restrictive and too broad biases.
3 AI industry start-ups rose from $ 282 million in 2011, to $2.4 billion in 2015 (WEF (2017) and the
number of merger and acquisition deals in AI also raised to 20 to 40 deals per annum.
414 CHAPTER 5. ASSET MANAGEMENT INNOVATION
If human tells the machine's algorithm what is correct answer on a training set a
supervised learning problem is considered - the teacher's case. If the values of the
output are not known, unsupervised learning is used. It means to nd structures
or meaningful groups on the inputs. This arises in consumer behaviour where the algo-
rithm tries for example to pool customers with similar behaviour.This section is based
on Luxburg and Schölkopf (2008), Shalev-Schwartz (2016), Hazan (2016), Bruna (2018).
5.2.1 Set-Up
X is the set of examples or instances such as pictures of animals where the goal is to
classify them into cats and non-cats. Every x ∈ X has features such as four legs, two
ears. Y is the label space such as the binary set +1, −1 (cat, no cat). The data set S
consists of all pairs of instances and labels,
S = {(x1 , y1 ), . . . (xm , ym )} ⊂ X × Y.
The data are randomly split into labelled training data, test data with hidden
labels and the validation data used for parameter tuning.
X, Y, S are the inputs in the statistical learning model (ML). The output is a pre-
diction rule or hypotheses f : X → Y where f ∈ F. F is a space of functions such
as linear, polynomials or more general functions or a set of rectangles, circles. F the
hypothesis class. Given a training set and a set F, the goal is to nd the optimal
parameters θ for the function f (θ) ∈ F such that the classier is able to classify well all
new data of the test set.
The following assumption describes the mechanism which generates the data: There
exists a joint probability function P on X ×Y, each training example (xi , yi ) is sampled
IID from P and Y is given by some unknown function h:X →Y.
Note that P is not known. Else learning becomes trivial. By denition, P is not
changing over time. This stationarity of the unknown distribution is relaxed if nancial
time series are forecasted. We write |F | for the power of a set F; i.e. the number of
elements. If F is the set of classiers from X with m examples into a yes/no classier
5.2. MACHINE LEARNING (ML) 415
set, then |F | = 2m .
Suppose rst, that there are only a nite number of apples. An algorithm could mem-
orize all labels in the training set. But this is not what we would call learning. Assume
that the number of apples is not bounded. Without any a priory knowledge (hypothesis)
from a human the learner might always err. Without any a priori hypotheses, learning
cannot be dened.
We provide the learner with more knowledge and assume that the environment pro-
duces the labels of the apples by applying an unknown function h : X → Y, h ∈ F :
There is a functional relationship between apples' features and their taste sweet or sour.
But this is still too little information since F is too big. It can contain any polynomial
functions, any trigonometric functions, any geometric gure classiers or any stochastic
process for example. By assumption, F is a nite set of rectangles which are aligned to
the axis'. We restrict the set F of classiers to be rectangles. This is a simple type of a
priori knowledge.
We furthermore assume that the largest rectangle has size 200 g and 100 mm. This
turns the learning problem into a nite one - there are only a nite number of possible
rectangles as classiers. The prediction rule f is f (x) = 1 if x is element of the interior
of a rectangle and else the value is −1. The learner knows F but not h. Figure 5.3
illustrates the construction.
The size |F | of rectangles is bounded but still a large number. A rectangle of size m
m
n
times n contains
2 2 = 41 m(m − 1)n(n − 1) rectangles. Hence,
200 100
|F | = 2 ≤ 200 mn
2 2
weight
weight
Figure 5.3: Left panel. Optimal rectangle classier. Blue denotes sour and red sweet
apples. Right panel. There exists no optimal rectangle classier. The optimal region,
yellow, is a complex domain which classies correctly the shown apples but it will hardly
correctly classify additional apples. This corresponds to overclassication (similar to
overtting) which means that the optimal algorithm shown will poorly generalize to
further not yet classied apples.
Figure 5.3 realizability does not holds: Realizability simplies the theory. This assump-
tion can is waived by using so-called agnostic learning.
2. Halving Learner. He behaves as the consistent learner, except that he predicts the
majority of f (xt ) where f ∈ Ft . Hence, a time t the learner errs if at least half of
the function in Ft will not be in Vt+1 .
Theorem 85. The consistent learner makes at most |F | − 1 errors; the halving learner
log2 |F |.
For the halving learner, the results follows by induction on |Ft+1 | ≤ |Ft |/2 since for any
error, half of the functions in Ft will not be in Ft+1 . Although the halving learner makes
dramatically less errors, the runtime of halving grows with |F |. Therefore, the algorithm
is ruled out from an computation eciency perspective. For 200 million rectancgles, the
5.2. MACHINE LEARNING (ML) 417
consistent learner can make at most 200 million -1 errors while the halving learner makes
at most 27 errors:
log2 (2000 0000 000) ∼ 27
How well does the algorithm f perform? The error function or true risk measures
the performance given a perfect classier h and the unknown P
The identity P (A) = E(χA )) shows for a set A that risk is an expected value. Therefore,
with l the loss function. The error or risk can be equivalently expressed as the
expected loss. Since P is not known, this risk is purely theoretical. To compare risk with
the best possible learning rule, we dene the minimum risk value, Bayes risk,
R∗ = inf R(f ) .
f ∈F
For a binary classication, the classier leading to minimum risk can be explicitly calcu-
lated.
1/m, then the probability of not seeing x2 tends to one. Therefore, we are satised if
R(f ) ≤
with the accuracy chosen a priori by the human. There is a second problem arising
from the randomness of the input data. The probability that the learner observes the
same example over and over again is not zero: R(f ) ≤ cannot be guaranteed by any
algorithm. We allow the algorithm to fail with a chosen condence probability δ over the
random choice of examples. Summarizing, the learner asks for training data S containing
m(, δ) examples. This denes Probably (with probability at least 1 − δ) Approximately
(up to accuracy ) Correct learning - PAC learning..
Let's apply the theorem to the apple classication. |F | has almost 200 million ele-
ments. It follows - thanks to the logarithm - that about 140 500 of training examples are
sucient to learn with a desired precision of one percent. The theorem shows that a
clear requirement for the precision of the learning algorithm must be set. Then the math
says how many data, here apples, are needed to learn satisfactorily: With 14'500 apples
the algorithm can determine with 1% error and 99% certainty whether an apple is sweet
or sour.
There is one piece missing in the discussion: How do we dene a risk or error measure
which can be observed? We ll this gap in the next section.
ŷ = θ0 + θ1 x
with x the feature size, ŷ the predicted house price, θ0 the unknown intercept and θ1
the slope of the straight line. The algorithm predicts that you could sell the house for
5.2. MACHINE LEARNING (ML) 419
linear
House Price Euro
300’000 quadratic
x
x x x
x x
250’000 x x
x x x
200’000 x
x
x x
150’000 x
100’000
300'000 euros. This is a too high price as the gure shows. The algorithm could maybe
do better by tting the parabola, i.e. a second-order polynomial:
ŷ = θ0 + θ1 x + θ2 x2 .
Then the house price prediction is around euros 250'000 which is closer to true prices.
Which function ts best observable house prices? Whichever function is chosen it is a
supervised learning algorithm. Our choice of the function is the hypothesis. This is a
regression problem since we want to predict a continuous valued output.
House pricing in reality uses more than 20 features - age of the house, centrality, view,
standard of construction, distance to the next public transport station etc. Assume that
in the house price prediction in Napoli additional features are the number of rooms,
the age of the house and the number of oors. Using many features, vector and matrix
notation simplies the presentation and the understanding what is going on.
features in the exercise below, the design matrix and its transpose read:
1 5 1
1 3 2
1
1 1 1 1 1 1
2 2
, X0 = 5 3 2 4 5 1
X=
1 4 1
1
1 2 2 1 3 1.
5 3
1 4 1.
m × (n + 1) design matrix X
(k)
The consists of all entries xj plus a rst column of
(k)
1 elements for feature parameter in front of the feature x0 . This matrix contains all
features' information across all test data.
Features are often of dierent numerical size as in the house pricing example: 300
meters squared size of the house, 2 bedrooms, 3 km distance to the next railway station,
etc. The features need to be comparable for two reasons. First, if an algorithm calcu-
lates the nearness of data, then 300-200 and 3-2 are the same on a relative scale but
in absolute term the rst dierence dominates the second one. An example are Netix
recommendations which compares the nearness or distance of your movie preferences to
other Netix users. The range of all features should be normalized such that each feature
contributes approximately proportionately to the prediction. A second reasons is much
faster convergence of the gradient descent algorithm to nd optimal parameter values if
data are normalized, see below.
The scaling of features can be done in many dierent forms. Either all features are
normalized such that the features take values in [−1, 1] by dividing trough the largest
number. Another method rst de-trends the features by subtracting the mean value µi
of the features and then divides by the standard deviation σi , i.e.
x i − µi
x̃i = . (5.3)
σi
This normalization is often used since powerful convergence theorems of probability the-
ory apply.
5.2.2.3 Learning
We search for an algorithm that chooses the parameter vector θ in linear regression such
that the predicted house prices deviate as few as possible from the true values. Learning
here means to nd the optimal parameters. Hence, ŷ − y should be small. Since we do
not want positive deviations to oset negative ones, J = (ŷ − y)2 is minimized. This is a
quadratic function and convex if we use linear
0
regression ŷ = θ x. The dierence should
be minimal over all training data sets.
5.2. MACHINE LEARNING (ML) 421
The following quadratic objective function J is used to measure the average degree
of deviations which is also called the empirical cost - , risk - or error function on
the training set:
2
m m n
1 X 1 X X (k) (k)
J(θ) = (hθ (x(k) ) − y (k) )2 = xj θj − yj . (5.4)
2m 2m
k=1 k=1 j=1
Contrary to the discussion of last section this function is perfectly observable on the
training data. This is an empirical expression since we do not introduce any probabilistic
model in the denition and it can be calculated. In matrix notation:
1
J(θ) = (Xθ − y)0 (Xθ − y). (5.5)
2m
Note that
L(h, y) = (fθ (x)−y)2 is the loss function. If Xθ = y , on all training data the prediction
is perfect, then the cost or error function J is zero. The corresponding function on the
test set with N data points is
N
1 X
J(θ) = L(fθ (x(k) ), y (k) ) (5.6)
2N
k=m+1
Figure 5.5 illustrates the levels curves where the cost function has constant value with
two variables. The levels are ellipses in our cases. Starting from an initial point x0 in
the center is the minimum cost point x∗ . The closer the level lines are the steeper is the
corresponding cost function - think about the level curves as altitude curves on a map.
The derivative perpendicular to the level curves, called the gradient, is larger the closer
the level curves are. We show that the gradient descent algorithm converges faster to the
optimum value the steeper the level curves are.
The minimization of cost or error means to nd the right critical point. Critical
points are by denition points where the rst derivative vanishes. This is necessary for a
point to be say a minimum. But it is not sucient. Figure 5.6 shows the dierent types
of critical points. By denition at a critical point the rst derivative f 0 (x) equals zero
- at slope of the tangent. Besides maxima and minima, saddle points are the types of
critical points.
If the function has global minimum or maximum, we are in the best situation since
an algorithm to nd this point can be designed. But if a saddle point arises we are
422 CHAPTER 5. ASSET MANAGEMENT INNOVATION
Feature 2
Xq=y
Steep Surface
Region
Feature 1
Figure 5.6: Three types of critical points and the diculty to nd the global minimum
if a functions has several critical points.
5.2. MACHINE LEARNING (ML) 423
facing problems in particular with algorithms since in some directions the algorithm is
slipping away from the saddle point and in other ones the algorithm bounces back after
an iteration step. Whether a critical point is say a global minimum depends not only
locally on the vanishing derivative but one the function as a whole. In optimization
theory for so-called convex functions, such as our empirical risk error, lead us in the best
possible solutions: A local minimum is a global minimum. But unfortunately in deep
learning models the optimization criterion is not convex and the optimization becomes
very intricate and many questions are left open for future research.
A second example of diculties arises for curves with several minima. starting on the
left in Figure 5.6 we are likely to end up in the rst local minimum with the algorithm -
it will not nd the global minimum. Starting on the right side of the graph we encounter
the problem that the function can be very at in a certain region. Then convergence
can become very slow. So it is not enough in optimization to assume that the algorithm
solve the rst order condition. We need to consider the second order derivatives of the
function too, the Hessian, in order to control for saddle points for example.
5.2.2.4 Gradient
The gradient ∇ is a dierential operator: It acts on smooth functions f with n variables
and taking values in the real leading to:
0
∂f (x) ∂f (x) ∂f (x)
∇f (x) = , ,..., .
∂x1 ∂x2 ∂xn
Note that by denition the gradient vector measures the changes of the function at a
point in the direction of the standard basis. The gradient denotes the direction of greatest
change of the function f - that's why it is so important in the gradient descent algorithm
to nd minima.
Why is the gradient pointing in the direction of greatest change of the function? By
denition, the gradient measures how fast your function is changing with respect to the
standard basis. Let us compare this with the change of the function in an arbitrary
direction v, where v is a unit vector. The projection of the gradient at a point x on this
vector v 0
is given by the scalar product (∇f (x)) v . Calculus implies
with α the angle between the gradient and v. Since |v| = 1, the expression is maximal
when the cosine is one, i.e. when v is pointing in the same direction as the gradient.
Consider the function f (x, y) = 4x2 + y 2 . The level curves are ellipses, see Figure
5.7. The gradient vector reads ∇f = (8x, 2y). The gradient is normal to the level curve
through (x, y); it points in the direction of greatest rate of increase of f (x, y).
The gradient is a linear operator, it satises the product and chain rule.
424 CHAPTER 5. ASSET MANAGEMENT INNOVATION
∇f(x,y)
−𝛁f(x,y) ∇f(x,y)
∇f(x,y)
Figure 5.7: Level curves of f (x, y) = 4x2 + y 2 and the normal gradient vector shown at
dierent points.
Figure 5.8 is a snapshot of an app from the website mathinsight.org which interac-
tively illustrates how the gradient and a directional derivative evolve in a relief.
Given a function f and any point x, we would like to nd the minimum as fast as
possible with the algorithm. Hence we consider at the point x the direction where the
function is steepest. Since there are many possible directions where we could move let v
be a unit vector. The directional derivative tells us how strongly the function is changing
locally in the given direction and we then choose the direction with largest change - we
know that this is the negative gradient. The directional derivative is the derivative of
the function:
∂f (x + αv)
|α=0 = v 0 ∇f (x) = |v||∇f (x)| cos β.
∂α
As we know, to minimize f means to choose v in the direction of the gradient. This is
known as the gradient descent method. It reads in our cost minimization problem as
follow: For every parameter component θj in step n:
∂J(θn )
θj,n+1 = θj,n − α
∂θj,n
5.2. MACHINE LEARNING (ML) 425
Figure 5.8: The function is shown as surface plot and a two-dimensional level curve
plot. The red point can be moved where the gradient (red vector) and the directional
derivative (green vector) are to be calculated. is illustrated by the red vector emanating
from the red point as well as by its shadow below the surface plot. The angle between
the gradient and the directional derivative can be chosen interactively.
with the learning rate parameter α. If the derivative is large in absolute value, then the
parameter is updated by a large amount: We are far away from the minimum of the cost
function and in a steep region of function. If the derivative is zero, we are at the optimum
cost level. In optimization, a descent direction is a vector p that moves us closer towards
a local minimum. Formally, in the iterate n of a calculation of a minimum a descent
direction vector pk is dened by
This guarantees that for small steps along the direction pk the function f is reduced. In
the gradient descent algorithm the descent vector is itself the gradient: pk = ∇f (xk ).
Given a descent direction, the line search algorithm computes a step size or the learning
rate α that determines how far the algorithm should move in the descent direction, see
below.
Analytically, the partial derivative of the cost function for the gradient descent reads:
m
1 X
θj,n+1 = θj,n − α (fθn (x(k) ) − y (k) )xxj , j = 1, . . . , n.
m
k=1
The gradient descent needs to choose a learning rate α and one needs possibly many
iterations to come close to the minimum value. In the analytic approach we don't need
426 CHAPTER 5. ASSET MANAGEMENT INNOVATION
How do we make sure that the gradient descent is working correctly and how is the
learning rate α chosen? A powerful method is to plot the cost functions against the
number of iterations by choosing a value for α. This is not a mathematical approach but
a practitioners one which is widely used. We consider below a mathematical approach.
If the chart looks like in Figure 5.9, then you are on the right track.
Convergence Overshooting
# iterations # iterations
Figure 5.9: Plot of the cost function against the number of iterations. Left Panel:
Convergence. Right Panel: Overshooting when the learning parameter is for example
chosen too large.
After each iteration you always get a value of θ. Insert the value in the cost function
and calculate the costs. If gradient works properly, the cost function should decrease
after every iteration. When do we stop, i.e. when we say that the algorithm empirically
(not mathematically!) converged? Dene a small number say 1/3000. If the values
of the cost functions show changes by increasing the iteration number which are smaller
than this number, then you stop the algorithm.
What if the cost function is growing as a function of iterations, see right panel in
Figure 5.9? Then the chosen α was too large. The gure illustrates how overshooting
can then arise leading to a non-convergence. What if takes extremely long to reach con-
vergence? Then the chosen α is too small.
For dierent applications, the curvature of the cost function and the chosen learning
rate are dierent. This leads to applications with convergence after just a few dozens
of iterations up to cases where thousands and even millions of iterations are needed.
5.2. MACHINE LEARNING (ML) 427
Figure 5.10 illustrates level curves where gradient descent is slowly converging due to
the very at plateau. We consider more advanced algorithms to overcome such types of
shortcomings.
Figure 5.10: Level curves with a slowly converging gradient descent. Source: Sven Leyder,
Argonne National Laboratory, 2016.
We consider the mathematical approach of the analysis whether and how fast an
algorithm converges. So far we considered the rst order derivative. This lead to critical
points which can be maxima, minima or saddle points. To decide which type of critical
point we face we have to consider the second order approximation of the function, i.e. the
second order derivative. The second derivative tells us whether a gradient step will cause
as much of an improvement as we would expect based on the gradient alone. While the
rst derivatives measure the slope of the tangent, the second derivative measures curva-
ture.
The second derivative has mixed signs. It fails to be positive or negative denite which
428 CHAPTER 5. ASSET MANAGEMENT INNOVATION
implies that the critical point is a saddle point but not a maximum or minimum. This
mixed signs means in this case that the critical point is a saddle point. Note that we
always assume that the order of partial derivatives does not matter. This means that
the Hessian matrix is assumed to be a symmetric matrix. This assumption is not critical
for most application in machine learning.
To see how the Hessian aects the gradient descent algorithm, we make a second-order
Taylor series approximation to the function f (x) around the current point x0 :
1
f (x) = f (x0 ) + (x − x0 )0 ∇f (x0 ) + (x − x0 )0 H(x0 )(x − x0 ) + error.
2
With the learning rate alpha, inserting the gradient descent rule x = x0 − α∇f (x0 ) into
the Taylor series implies:
1
f (x0 − α∇f (x0 )) = f (x0 ) − α(∇f (x0 ))0 ∇f (x0 ) + α2 (∇f (x0 ))0 H(x0 )∇f (x0 ) + error.
2
There are three terms: the original value of the function, the expected improvement
due to the slope of the function, and the correction for the curvature of the function.
When this last term is too large, the gradient descent step can actually move uphill and
the algorithm will not converge. When the curvature term is negative or zero, the Taylor
series approximation predicts that increasing the learning rate will forever will decrease
the function value to be minimized. When the curvature term is positive, the optimal
learning rate or step size that decreases the Taylor series approximation is given by (take
the derivative w.r.t the learning rate and set the expression equal to zero)
If (!) the function is well approximated by the second order polynomial, then the Hessian
or curvatures determines the scale of the learning rate.
m
1 X
J(θ) = (fθ (x(k) ) − y (k) )2 → min! (5.7)
2m
k=1
The rst order condition for a minimum is the following system of equations:
∇θ J(θ) = 0.
m m
1 X 2 X
∇θ J(θ) = ∇θ (fθ (x(k) ) − y (k) )2 = ∇θ (fθ (x(k) ) − y (k) )
2m 2m
k=1 k=1
continuing
m
1 X (k)
∇θ J(θ) = (θx − y (k) )x(k) = 0.
m
k=1
Xθ = y
is the equation for optimal theta values. But the inverse of X does not exist - the number
of data points m and the number of features n + 1 are not the same and hence the matrix
is not a square one which is a necessary condition for its inversion. But multiply the
equation with the transpose matrix X0 from the left:
X0 Xθ = X0 y.
θ∗ = (X0 X)−1 X0 y
is the analytical expression for the optimal parameters. Fine, so why bother about the
gradient descent algorithm if we just need to plug in the values in the above formula?
The matrix (X0 X)−1 is of dimension (n + 1) × (n + 1). To calculate such inverse matrices
is costly, i.e. the computation time costs grow with n3 . Hence, if you have 10 features
it is order 1'000 but with 1000 features the costs for computation are of the order 10 ,
9
i.e. one billion. For large scale problems the analytical solution becomes slow while the
gradient descent works well also if the number of features is large.
430 CHAPTER 5. ASSET MANAGEMENT INNOVATION
Consider a regression of the health status of individuals using dierent features. One
features x1 is the weight in kg and the other one x2 is the body height in meters. Naively,
one could set
ŷ = fθ (x) = θ0 + θ1 x1 + θ2 x2
and estimate θ for the the health status prediction. Insight from medicine learns us that
the above choice is nonsense. It it the ratio x1 /x22 , called the body-mass-index, which is
a meaningful indicator or feature for the health status. Think about why a square of the
body height enters in the index denition.
ŷ = fθ (x) = θ0 + θ1 x + θ2 x2 + θs x3 .
Setting x = x1 the size, x2 = x2 the squared size and x3 = x3 the cube of the size
denes a multivariate linear regression. This is trivial but you have to be careful about
feature scaling. Suppose a house size is 100 squared meters. Then, the square is 100 000
0 0
and the cube is 1 000 000. So the ranges of the features become very broad for higher
order polynomials. Features need to be scales to become comparable if you say apply
the gradient descent method.
We start to develop a classication algorithm for the binary case of tumor classica-
tion with two states malignant or benign. We rst e rst discuss why linear regression is
not a good idea to consider. Assume that the feature size is the only one used to classify
tumors in benign (0) or malignant ones (1), see Figure 5.12.
Classification Classification
h(x)=q’x h(x)=q’x
Malign =1 x xx x x xx x x
Threshold
0.5
x x x x x x x x
Benign=0
Size Tumor Size Tumor
Then a tted linear regression as shown in the gure looks reasonable in the Left
Panel when we choose a threshold value of 0.5. But now assume that we have more data
points of tumors with large size. Then the estimated coecients of the linear regression
changes such that the slope of the regression becomes worse in separating the points.
The newly added data are of little information value since large tumors are most likely
malign and adding even larger ones is not very informative for the classication problem.
The gure shows that some large tumor sizes can be classied wrongly due to the x
threshold value and the moving linear regression. Using a straight line misses the intu-
ition that a kind of a threshold size would separate better the data points: A separation
function which looks more like a step function makes more sense.
Choosing a x threshold value using a step function to separate the points can be
meaningful if you consider electrical circuits for example. But in social, nancial or
medical application such a zero-one function might be too severe and lead to errors. A
softer function which approximates the zero-one cuto one is used. Such functions are
called sigmoid functions.
The logistic regression is such a smoothed out step function approach. The range pre-
dictions of logistic regression are always between zero and one. Again, note that although
the word regression appears, the highly non-linear logistic regression is a classication
algorithm applied to settings where the label y is a discrete value.
432 CHAPTER 5. ASSET MANAGEMENT INNOVATION
To start with, instead of the not very useful multivariate linear regression
ŷ = fθ (x) = θ0 x
we consider a non-linear transformation which give us a smoothed-out step function.
This means, we consider a function g for logistic regression or the logistic function:
ŷ = fθ (x) = g(θ0 x)
For x a real number,
1
g(x) =
1 + e−x
takes value in (0, 1) and it is an S-shaped approximation to the step function, see Figure
5.13.
0,8
1
1 1 + 𝑒 −𝑥
0,6 1 + 𝑒 −3− 𝑥
0,4
0,2 1
1 + 𝑒 −3𝑥
0
-10
-9,7
-9,4
-9,1
-8,8
-8,5
-8,2
-7,9
-7,6
-7,3
1,1
8,3
-7
-6,7
-6,4
-6,1
-5,8
-5,5
-5,2
-4,9
-4,6
-4,3
-4
-3,7
-3,4
-3,1
-2,8
-2,5
-2,2
-1,9
-1,6
-1,3
-1
-0,7
-0,4
-0,1
0,2
0,5
0,8
1,4
1,7
2
2,3
2,6
2,9
3,2
3,5
3,8
4,1
4,4
4,7
5
5,3
5,6
5,9
6,2
6,5
6,8
7,1
7,4
7,7
8
8,6
8,9
9,2
9,5
9,8
Figure 5.13: Sigmoid function. Multiplying x by 3, i.e. using larger parameters θ makes
the sigmoid function look more like a step function. Adding −3 to the exponent, i.e. θ0
is a changed does not change the shape of the sigmoid function but shifts it parallel to
the x-axis.
The sigmoid function, our hypothesis, asymptotes approaches 1 for x to innity and 0
for x to minus innity. Since the outputs are in the unit interval, they can be interpreted
as probabilities. A prominent example are probability of default in rating systems of
counter parties asking for loans in banking. In this case the features are nancial variables
such as liquidity ratios, earnings per share, investment ratios or qualitative variables such
as the quality of the management, the competitive strength of the rm in its sector:
The product θ0 x is called the score S of the rating system. This is a real number. This
delivers a ordinal ranking of the counter parties of a bank: The higher the score the lower
the credit worthiness of the client. But this is not yet a price which can be charged to
the clients, i.e. how much do they have to compensate the bank for taking their credit
risk on their balance sheet. This requires to turn the score into a Probability of Default
(PD) on a one-year time horizon using the sigmoid function. A PD of say of 3% is then
mapped via a so-called master scale into a price bucket and a rating: All PDs between
in the interval [2.5%, 3.5%) are mapped into a rating BB and their price is 3.25% per
annum. To arrive at the master scale interval boundaries and the price in each interval,
the parameters θ are tted such that the sigmoid function satises additional business
requirements. First, choosing the thetas, which means calibrating the model should rst
lead to PD, should lead to a credit risk price which covers over a whole portfolio of
counter parties the eective losses in one year. Second, the shape of the sigmoid function
should be chosen such that as many as possible of good risks accept the bank's pric-
ing and do not decide to take an oer from a competitor and vice versa, that there is
no incentive for bad risks to accept the banking oer since it is more favourable than
oers from competitors. This trdeos are handled optimally by considering the shape
and the level (i.e. the parallel shifts) of the sigmoid function appropriately. So clearly,
the classication of clients into the rating classes from AAA, AA+, ... , BB-, D is a
multi-classication problem and not a binary one.
1
P (y = 1|x, θ)) = 0.2 ∼ .
1 + e−1.4
The logistic regression hypothesis gives an estimates of the probability that y is equal to 1.
1
Since g(0) = 2 , to predict y = 1, the argument has to be larger than zero which
than 50 percent. This means that θ x > 0 is the bound-
means the probability is large
0
Using higher order polynomials, more complex decision boundaries such as circles,
ellipses can be generated.
dimensional and every label y is either 0 or 1. We already introduced the cost and loss
function above in (5.7):
m m 2 m
1 X 1 X 1 1 X
J(θ) = (fθ (x(k) )−y (k) )2 =: − y (k) =: Loss(fθ )x
(k) (k)
, y ).
1 + e−θ0 x
(k)
2m 2m 2m
k=1 k=1 k=1
This was the appropriate loss function for linear regression, but for the logistic regression
this function no longer has a single minimum, i.e. it can have many local optima using
the squared dierence function contrary to the linear case where a single global minimum
exists, see Figure 5.14. But this is not a way to follow since algorithms to search for the
minimum costs and hence deliver the estimate of theta will likely to be trapped into local
minima. We would like to have a convex loss function, i.e. with a single minimum.
0,4
0,35
0,3
0,25
0,2
0,15
0,1
0,05
0
-5 -4,8 -4,6 -4,4 -4,2 -4 -3,8 -3,6 -3,4 -3,2 -3 -2,8 -2,6 -2,4 -2,2 -2 -1,8 -1,6 -1,4 -1,2 -1 -0,8 -0,6 -0,4 -0,2 0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 1,8 2 2,2 2,4 2,6 2,8 3 3,2 3,4 3,6 3,8 4 4,2 4,4 4,6 4,8 5
2
1
Figure 5.14: Plot of the loss function
1+e−θ0 x
−y . We assumed that for almost all
0
negative values of θ x, y = 0 holds, and similarly, for all positive values. Around the
value zero, we added some errors, i.e. y=1 for values of θ0 x < 0 and similar for y = 0.
These few errors create the erratic behaviour of the loss function around zero. Note that
also for zero error, the loss function is not convex!
Convex functions play an important role in many areas of mathematics and in par-
ticular in optimization problems. A strictly convex function on an open set has no more
than one maximum or minimum. A real-valued function dened on an n-dimensional
interval is called convex if the line segment between any two points on the graph of the
function lies above or on the graph, see Figure 5.15.
If the function is dierentiable, then it is convex if and only if its second derivative is
non-negative on its entire domain. Example of convex functions are x, x2 , ex , |x|. The
5.2. MACHINE LEARNING (ML) 435
of nding the step length in the direction pn which has the largest impact on the function
f. This reduces the function as strong as possible: xn is the current best guess in step
n, the vector pn is a search direction, and the number α is the step length. Such inexact
line searches provide an ecient way of computing an acceptable step length.
Clearly, pn has to be a direction where f decreases, i.e. a descent direction p0n ∇f (xk ) <
0. Dierent methods of algorithm lead to dierent choices of the descent direction. For
the gradient descent algorithm, pk = −∇f (xk ).
Given a descent direction, the step length αk is assumed to satisfy the Wolfe condi-
tions:
i) f (xk + αk pk ) ≤ f (xk ) + c1 αk p0 k ∇f (xk ),
ii) −p0 k ∇f (xk + αk pk ) ≤ −c2 p0 k ∇f (xk ),
with the constants 0 < c 1 < c 2 < 1. Set c1 ∼ 10−4 and c2 ∼ 0.9 for Newton or quasi-
Newton methods, see next section. Inequality i), the Armijo rule, ensures that the step
length decreases f suciently. The function evaluated in the descent direction is smaller
than the function before moving in this direction plus a term proportional to the descent
direction which is negative. ii) is a curvature condition. It ensures that the slope has
been reduced suciently.
fj (x1 , . . . , xn ) = 0, j = 1, 2, . . . , k
which can in general only be found by an iterative process which approximates the solu-
tion. The the Newton-Raphson method uses a second order approximation to f compared
to the gradient descent, which uses only the rst order approximation.
The idea for a single non-linear function f of one variable is to compute the x-intercept
of the tangent line, i.e. the rst order Taylor approximation. We write xn for the current
5.2. MACHINE LEARNING (ML) 437
approximation and derive xn+1 . The equation of the tangent line to the curve y = f (x)
at xn is
f (x) = f 0 (xn )(x − xn ) + f (xn ).
f’( xn )
slope
f( xn )
Root xn+1 xn
The x-intercept of this line is taken as the next approximation, i.e. solving
Improvements replace the inverse Jacobian by an approximation since the exact calcu-
lation grows as O(N 3 ) with the dimension of the Jacobian matrix. The asymptotics is
the same as for matrix multiplication. The latter one follows from the worst case where
N3 multiplication of scalars are needed for a N ×N matrix and (N − 1)N 2 additions for
computing the product of two square matrices which leads to the claimed asymptotic.
We are interested in the method for optimization problems. Hence the root we
Therefore, f is replaced by f and f by f
search is the root of the rst derivative:
0 0 00
is the updating rule for the algorithm. If the function f is positive quadratic, then after
one application of the of the updating rule we reach the minimum. If the function is more
complex than quadratic, then iteration of the updating rule often leads to a faster conver-
gence to the minimum than using gradient descent if the Hessian is positive denite. But
if we encounter saddle points, then the Newton methods can be non-converging. Hence,
in the NewtonRaphson algorithm the descent direction is given by pn = −H −1 ∇f (xn ).
xn+1 = xn + d = xn − H −1 ∇g.
As an application of the Newton method consider the saddle function f (x, y) = x2 − y 2 ,
see Figure 5.11. The Hessian and its inverse read:
2 0 −1 1/2 0
H= ,H =
0 −2 0 −1/2
and the gradient is ∇f = (2x, −2y)0 . This implies:
xn+1 xn 1/2 0 2xn 0
= − = .
yn+1 yn 0 −1/2 −2yn 0
5.2. MACHINE LEARNING (ML) 439
The Newton method led you to the saddle point at the origin in one step. The
gradient descent method will not lead to the saddle point. The gradient is zero at the
saddle point, but a tiny step out would pull the optimization away.
Clearly, the example is specic in the sense that after one step we found the saddle
point in the algorithm independent where the starting point is. This is particular to this
second order polynomial. Although the Hessian is not positive denite, we have a saddle
point and no minimum, this critical point can be found. This is an atypical situation for
Newton's method to work well. Since Newton works well and often outperforms gradient
descent if the Hessian is positive denite.
So far, we have to calculate at each stage the second derivatives of the Hession. In
Quasi-Newton methods, the Hessian matrix of second derivatives is not computed but
the Hessian matrix is approximated using specic updates given by gradient evaluations.
The BroydenFletcherGoldfarbShanno (BFGS) algorithm is such a method.
The search direction pn at stage n is given by the solution of the analogue of the
Newton equation:
f (xk + αpn )
over the step length α > 0. The quasi-Newton condition imposed on the update of H̃n
is:
H̃n+1 (xn+1 − xn ) = ∇f (xn+1 ) − ∇f (xn).
Instead of requiring the full Hessian matrix at the point f xn+1 to be computed as
H̃n+1 an approximation is used:
• Neural networks typically fail to dene convex problems. This issue together with
the occurrence of many saddle points in the high dimensional feature space and the
computational burden limit the use Newton's method for training large neural net-
works. See deeplearningbook.org Section 8.6 Approximate Second-Order Methods
for a overview.
• Alternatives such as the quasi-Newtonian stated above have their own issues.
• In machine learning outside of deep learning, BFGS and variants of it are fairly
common optimization algorithm.
• Often, even in the literature, drawbacks of algorithms are stated which follow due
to the non correct use of the methods and are hence of limited information content.
Assume that the true values are generated by the same hypothesis f but with the
unknown, true parameters θ̂. Then we can write for the vector y of true values
y = f (θ̂, x) +
with the noise = (1 , . . . , m ) a normal distributed variable with mean zero, variance
E(0 ) = 2
σ . Noise is independent of the data set X. Hence, true data are generated by
a signal f and noise.
What happens if we consider a dierent training data set also with m data points?
Then the empirical loss and error functions change: They depend on the features, pa-
rameters and training data sample. Suppose that we consider many dierent training
data sets, each of them drawn at random from a large data pool. We calculate an average
empirical error over all the samples. The expectation is that these average number ap-
proaches the unknown true error in some mathematical sense. This true error is assumed
to be also an average, the expected value of the true loss function with an unknown
5.2. MACHINE LEARNING (ML) 441
probability P, i.e. EP (L(x, y)) is the true error which we dened in the last section.
Inserting y = Xθ̂ + with the assumed correct values θ̂ of the parameters implies
6
θ∗ = θ̂ + (X0 X)−1 X0 .
The parameter estimates can be decomposed into the sum of correct underlying param-
eters and estimates based on noise alone. If each set is drawn from a large data set
independent from all other ones and if we assume that we consider many independent
training data sets X, then the average parameter value conditional on the specic data
input X is given by
EP [θ∗ |X] = θ̂
since EP [|X] = 0 and the design matrix is not stochastic. Therefore, our parameter
estimates are unbiased w.r.t. to the unknown true parameters .on average when averaging
is over many training sets. In the same way, see the exercises, the conditional covariance
or variance can be calculated:
∗
cov(θ , θ∗ )|X] = σ 2 [θ∗ |X] = σ 2 (X0 X)−1 . (5.10)
Contrary to the expectation, the covariance is a function of the design matrix and it is
proportional to the unknown variance of the noise
Pn σ2. We introduce the Euclidian norm
||x||2 = 2
j=1 xj of a n dimensional vector. Note that (Xθ − y)0 (Xθ − y) = ||Xθ − y||.
Using the calculated bias and variance we obtain the mean squared error (MSE)
of the parameter estimates for xed true parameters (omitting the X dependency):
6
θ∗ = (X0 X)−1 X0 y
= (X0 X)−1 X0 (Xθ̂ + )
= (X0 X)−1 X0 Xθ̂ + (X0 X)−1 X0
=: (X0 X)−1 X0 Xθ̂ + A
= X−1 X0−1 X0 Xθ̂ + A
= X−1 IXθ̂ + A
= Iθ̂ + A
= θ̂ + A.
442 CHAPTER 5. ASSET MANAGEMENT INNOVATION
where we used:
We summarize:
σ 2 (n + 1)
E[||θ∗ − θ̂||2 |X] = σ 2 tr(X0 X)−1 ) ∼ (5.12)
m
with n the number of feature and m the number of training set examples. If the number
of training set increases, the mean squared error of the parameter estimates vanishes.
Contrary, for xed number of data sets an increasing number of features increases the
mean squared error of the parameters. Although we dont know the noise variance σ2 for
the correct model it only appears as a multiplicative constant. Hence, it will not aect
how we should choose the features: A natural approach would be to choose them to mini-
mize the above trace. But this makes sense only if the assumed linear relation holds true.
Assuming y = θ̂ sin(x) + in one dimension for the true model and plugging this
in (5.9) it follows that the optimal parameter of the linear regression θ∗ is no longer
unbiased :
EP (θ∗ |X) = θ̂ sin(x).
Besides variance, we also get a bias-squared term in the mean squared error (MSE) of the
parameter in equation (5.11). The next formulae. Let y = f (x) + with f any unknown
5.2. MACHINE LEARNING (ML) 443
function and noise with mean zero and variance σ2. Let h(x) be any hypothesis used to
predict y and X the IID sampled training data set.
We have seen in the former linear case that the MSE is only given by variance but
now with the sine function generating the data and us using a polynomial function in
the regression, a bias term also follows. As in the linear case, the mean squared error
(MSE) between the estimated and true values is of interest. We could follow the strategy
such that MSE gets as small as possible - using many data sets to reduce variance and a
high order polynomial to approximate the sine function well on the test data in order to
also minimize bias. Plotting the MSE as a function of model complexity we indeed get
a decreasing error with increasing model complexity, see Figure 5.17.
Error Test
Variance Error Test
Error
Error
Bias
Error Training
Variance Bias
Figure 5.17: Left Panel. Behavior of Training and Test error as a function of amounts
of data. Right Panel: Bias-Variance tradeo and model complexity. Decreasing and
increasing Test error as a function of model complexity and the optimal model complexity.
Hence, it looks like that by taking say a 10th order polynomial and many training
data samples that we achieve the best MSE. This is true for training data but it fails to
be true for the test data MSE. And it is on this set of unseen data where we want the
algorithm to perform well. Why is the test data MSE increasing after a certain level of
complexity, see Figure 5.17?
gives the better t if the training data set is looking complicated since we have much
more parameters to t the complicated data structure. But by choosing many parameters
for tting the training data we are likely to t spurious data pattern which are common
to the training data but fail to be true patterns which also show up in the test data or
unseen data (overtting). The extremely precise training data t will miss to be closed to
the new test data. This failure of the algorithm to generalize well, the random artefacts
of the training data do not reproduce on other data, leads to an increasing error for the
test data if model complexity is becoming too large. The property of a u-shaped test
MSE as a function of model complexity is an intrinsic property of statistical machine
learning models; the bias-variance tradeo. Figure 5.18 shows the dierent cases when
using dierent complexity of the polynomials.
Figure 5.18: The authors perform three ts to the shown training data. The training
data was generated synthetically. the feature was chosen randomly and y was a quadratic
function of the feature with noise. The center plot assuming a quadratic function does
not suer from overtting or undertting. Overtting is due to a ninth order polynomial
which perfectly matches all training data. But the wild graph produces a lot of articial
patterns inconsistent with the synthetic quadratic function. Source: Deep Learning, Ian
Goodfellow, Yoshua Bengio and Aaron Courville, MIT Press, 2016.
Our ultimate goal in machine learning is to try and minimise the expected test MSE,
that is we must choose a statistical machine learning model that simultaneously has low
variance and low bias. In order to estimate the expected test MSE, we can use techniques
such as cross-validation.
Summarizing, bias and variance are two notions in overtting and undertting. Bias
is the dierence between the average prediction of our model and the correct value.
5.2. MACHINE LEARNING (ML) 445
Model with high bias pays very little attention to the training data and oversimplies
the model. It leads to high error on training and test data. Variance is the variability
of model prediction for a given data point or a value which tells us spread of our data.
Model with high variance pays a lot of attention to training data and does not generalize
on the data which it has not seen before. As a result, such models perform very well on
training data but has high error rates on test data.
We conclude with the formal decomposition of the error in bias, variance and ir-
reducible risk for any hypothesis h and true model f. The error Err(x) or cost is by
denition the expected dierence between the predicted and the true value, i.e.
h i 2
:= E (y − h(x))2 |X = BiasX h(x) + VarX h(x) + σ 2
Err(x, X)
2
= [E(h(x)) − f (x))]2
Bias h(x)
Var h(x) = E (h(x) − E(h(x)))2 .
Hence, error is the sum of squared Bias, variance and the irreducible error which is a
measure of the amount of noise in our data. Our data will have certain amount of noise
or irreducible error that can not be removed.
Proposition 89. The minimum of the loss function L = ||y − Xθ||2 is given by the OLS
parameter estimates θ̂ = (X0 X)−1 (X0 y). The bias is given by Bias(θ̂) = E(θ̂) − θ and the
(y−Xθ̂)0 (y−Xθ̂)
variance Var(θ̂) = σ 2 (X0 X)−1 where σ 2 is estimated from the residuals σ̂ 2 = m−n .
The OLS estimator is unbiased but its variance can be huge if predictor variables are
highly correlated or if there are many predictors (for n → m, the variance explodes). To
446 CHAPTER 5. ASSET MANAGEMENT INNOVATION
reduce variance one has to introduce some bias which makes us move in Figure 5.17 from
the right-hand side where unbiased OLS puts us towards the center with an optimized
trade-o.
5.2.5 Regularization
5.2.5.1 Theory
Regularization is a method to counter act overtting. Consider the hypothesis
fθ (x) = θ0 + θ1 x + θ2 x2 + θ3 x3 + θ4 x4 + θ5 x5
where the parameters of the fourth and fth order term are much smaller than the other
ones. Clearly, these parameters add to overtting since they add to model complexity.
We do not want to set them manually equal to zero but penalize the algorithm if he
chooses them - we do not throw away information but set a hurdle that they get a
positive weight only if they are indeed important. Regularization achieves this. It takes
the cost function and modies it such that all parameters are shrunk - those which are
already small become almost negligible. We modify our cost function by introducing an
extra term:
m m
!
1 X X 1
J(θ) = (hθ (x(k) ) − y (k) )2 + λ θk2 = (||Xθ − y||2 + λ||θ||2 ) (5.13)
2m 2
k=1 k=1
with λ the regularization parameter. This addition induces a trade o between the goals
to t the training set well (rst expression) and to keep the number of parameters small
(second term). Using this regularized objective function we obtain a smoother t and
gives a much better hypothesis. For λ large, all parameters are penalized in the sense
that they are getting close to zero. Hence, we have to be careful of not ending up with
undertting by choosing the parameter too large.
with I a (n + 1) × (n + 1) matrix which has in its rst rows zeros and the remaining n × n
matrix is the identity matrix. To check this we write in matrix notation
1
(θ0 X − y)0 (θX − y) + λθ0 Iθ .
J=
2
Taking the derivative w.r.t. θ:
∇J = Xθ − y + λIθ = 0,
5.2. MACHINE LEARNING (ML) 447
Considering the bias and variance of the regularized model it follows that we add a
little bias but can reduce the variance instead and the parameter estimates are shrunk
towards zero and more so the larger the value of λ. To prove these claims we just need
to repeat the argumentation and calculation of last section which we omit. The result
are
EP (θ∗ |X) = I − λ(λI + X0 X)−1 θ̂
showing the bias. Using the spectral theorem of linear algebra one can show that the
matrix I−λ(λI+X0 X)−1 indeed leads to a shrinkage of the parameters, i.e. EP (θ∗ |X) ≤ θ̂.
One can next express the variance and the MSE explicitly for the regularized problem.
The variance expression reads
−1 −1
σ 2 (θ∗ |X) = σ 2 Iλ − X0 X) − λσ 2 λI + X 0 X .
We omit the derivation of these formula but instead to an explicit calculation for a one-
dimensional case.
Suppose that there is a single feature which can take only two values 1 and −1.
Hence, the design matrix X reads
1 −1 0 1 1 1 −1 2 0
X= , XX= = .
1 1 −1 1 1 1 0 2
But then
2+λ
0 0
λI + X X =
0 2+λ
and the inverse matrix of this diagonal matrix is simply
1
0 −1 2+λ 0
(λI + X X) = 1 .
0 2+λ
1 1
2 ∗ 2 2+λ 0 0
σ (θ |X) = σ 1 1 − λ 2+λ 1 .
0 2+λ 0 2+λ
If λ>0 the regularized variance is always smaller than the non-regularized one. This
is indeed the benet from regularization: we can reduce large variance at the cost of
introducing a bit of bias.
If noise dominates the size of the input parameters σ 2 > θ̂02 + θ̂12 , for λ=2 the MSE is
smaller than half the variance for example.
that are absent or rare in low-dimensional spaces become generic.
N
X N
X
L0 = min (yj − x0i β)2 + λ βj2 = min ||y − xβ||2 + λ||β||2 (5.15)
β∈RN β∈RN
i=1 j=1
Proposition 90. The minimum of the loss function L0 is given by the parameter esti-
mates β̂ = (x0 x + λI)−1 (x0 y) and
with βOLS the beta of the ordinary least square problem with zero penalty term.
If λ increases, variance decreases and the bias increases. Ridge regression shrinks all
coecients of OLS by a uniform factor. It does not set any coecients to zero. What
is the optimal value for λ? A traditional approach is to choose λ such that some in-
formation criterion (AIC for example) is smallest. A ML approach is to minimize the
cross-validated sum of squared residuals (or some other measure).
N
X N
X
L00 = min (yj − x0j β)2 + λ |βj | = min ||y − xβ||2 + λ||β||21 . (5.16)
β∈RN β∈RN
j=1 j=1
The innocent looking dierence is that we consider the penalty term using a dierent
distance measure stick - L1 versus L2 norm. But the impact on the optimal betas is
qualitatively and quantitatively dierent. Increasing lambda, the model parameters are
not only becoming smaller but some original small parameters are attaining the value
zero. Therefore, under LASSO a feature selection happens.
5.2. MACHINE LEARNING (ML) 449
Consider a two dimensional problem. The level sets of the quadratic term in the loss
function are ellipses. The penalty terms are circles around zero in the L2 -norm and a
diamond around zero in the LASSO case. The minimum is the point where the ellipses
and the circle or diamond intersect. In the ridge case, this will be generically at a point
which is not on one coordinate axis: . both beta estimates are not zero. In the diamond
case generically the intersection is on one axis and hence one parameter is zero.
The analytical solution of the problem L00 in the case where the xi are orthonormal
is given next:
Proposition 91. Assume that the vectors xi are orthonormal. The minimum of the loss
function L00 is given by the parameter estimates:
!
β̂ = β̂jOLS max 0, 1 −
Nλ
.
|β̂jOLS |
N
X
0
min φ Cφ + λ |φj | , s.t. : e0 φ = 1 , φ0 µ ≥ r . (5.17)
φ∈RN
j=1
Deviations from the zero vector are punished since one superimposes a 'V'-type function
to the risk function. Small values of φ eventually are reduced to zero. This results in a
sparser investment vector. There are many dierent variants of the LASSO approach, see
Fastrich et al. (2013) and Zhou (2006) for the adaptive LASSO approach to counteract
some biases inherent in (5.17).
Bruder et. al (2013) compare the OLS-mean variance approach with the LASSO-
mean variance one for the S&P 500, with monthly rebalancing between Jan 2000 to Dec
2011, see Table 5.1.
The LASSO approach shows a better risk adjusted performance than the traditional
one. The extreme losses are comparable in both approaches. LASSO approach does not
450 CHAPTER 5. ASSET MANAGEMENT INNOVATION
provides any form of a tail hedge. The turnover is much smaller for the LASSO approach
which is a consequence sparse optimal investment vector and information matrix in the
LASSO approach. Google stock is for example hedged in the OLS model by 99 stocks
compared to 13 stocks only in the LASSO model.
LASSO requires powerful numerical software tools. Take MSCI world with around
10 500 stocks. Theoretical convexity of the problem is lost in most type of LASSO ap-
proaches due to the sparsity of the matrix: curvature is almost zero. One needs to
search carefully for the true global minimum. Since the covariance matrix is high dimen-
sional with many zero entries, its inversion becomes delicate. One has to use advanced
algorithms to produce a meaningful inverse.
5.2.6 Theory ML
5.2.6.1 Learning Finite Classes
Assume F, S are given and realizability holds. Empirical risk is dened by
which counts the errors of the algorithm is observable, contrary to theoretical risk (5.1),
and it depends on the data set. Empricial Risk Minimization (ERM) is given by
any algorithm fERM that minimizes empirical risk:
This is the most important estimator for unknown theoretical risk. It should be consid-
ered with care since overtting may lead to a very low performance of the ERM.
Theorem 92 (ERM PAC Learnable, Finite Case) . Assume that F is a nite set, real-
izability holds, fERM is dened in (5.19) and
1 |F |
m ≥ log
δ
R(fERM ) ≤ .
This theorem applies to any machine learning model satisfying the assumptions; it
does not restrict P nor F . We prove more general theorems below.
to the best f ∈ F : Denition 88 remains unchanged except that RP (f ) < is replaced
by
5.2. MACHINE LEARNING (ML) 451
5.2.6.2 Generalization
Consider a hypothesis class F
fm the classier with smallest empirical risk Remp (fm ).
and
Is true risk R(fm ) small too? fm is applied on all of X with the
Is the error still small if
unknown P ? The strong law of large number states that |R(fm ) − Remp (fm )| converges
to zero for m → ∞ for a xed f . The classier fm then generalizes well.
But there are two issues to consider. First, the rate of convergence is left unknown in
the strong law of large numbers. Convergence speed can be that slow that the number
m of test data needed for a given accuracy becomes extremely large. Second, empirical
risk should approximate true risk uniformly in F and P, i.e. not only for a xed f. This
denes consistency.
Why is consistency important? Consider a nite set F such that for each f there
exists a sample where the dierence between true and empirical risk is small. But for
a given sample we don't know how many of the functions in F the dierence is small.
Furthermore, fm which minimizes empirical risk need not minimize true risk too. The
dierence between the two risk measures can become large. We want to rule out both
cases; empirical risk needs to converge towards true risk independent of f ∈F - this is
called uniform convergence.
The inequality of Hoeng and Cherno addresses the speed of convergence for a
given xed function f . Empirical risk is close to the actual risk:
2
P (|Remp (f ) − R(f )| ≤ ) ≤ 2e−2m . (5.20)
Hence, for m → ∞ suciently large, the training error provides for example a good
estimate of the test error. Thinking about the risk as an expected value or a mean, the
bound states that the mean value dierence of two random variables are getting close in
probability with increasing size of the data at an exponential rate.
We consider empirical risk minimization. We would like that empirical and unknown
true risk become close to each other independent of the chosen f ∈F and the unknown
probability P. Then ERM is called a consistent algorithms. The dierence between
empirical and true risk should become simultaneously small for all functions f ∈ F .
Uniformity requires that even for the worst possible function f :7
But then
7 If you are not familiar with the supremum, replace it by the maximum.
452 CHAPTER 5. ASSET MANAGEMENT INNOVATION
The quantity on the right hand side is what uniform law of large number deals with. Prov-
ing uniform convergence of empirical risk also implies uniform convergence of |R(fm ) −
R(fF )| where fF ∈ F is the theoretical classier which minimizes true risk:
|R(fm ) − R(fF )|
= R(fm ) − R(fF ) ( Rfm is the optimum)
= R(fm ) − Remp (fm ) + Remp (fm ) − Remp (fF ) + Remp (fF ) − R(fF )
≤ R(fm ) − Remp (fm ) + Remp (fF ) − R(fF )
≤ 2 sup |R(f ) − Remp (f )|
f ∈F
where we used in the second last line Remp (fm ) − Remp (fF ) ≤ 0 by denition of fm being
the optimum for ERM and in the last line we used that f (fm ) ≥ R(fF ) by denition of
the optimal theoretical risk. Therefore,
Hence, if we can prove consistency for the empirical risk inequality, consistency for
|R(fm )−R(fF )| follows. The proof for the empirical risk consistency expression supf ∈F |R(f )−
Remp (f ) is done in several steps.
where the second distribution P 0 refers to the IID distribution of a sample with
size 2n, Remp measures the risk on the rst n samples and Remp 0 on the second n
samples.
Using the ghost trick we can replace the unknown R(f ) on the left hand side by
empirical risk in the desired bound.
• Step II Finiteness Step I implies that the innite set F can be replaced by a
nite one F with at most 2
2m elements in F :
The bound on the left hand side requires to consider an innite set which can be
reduced to the calculation using a nite set on the right hand side.
5.2. MACHINE LEARNING (ML) 453
• Step III Shattering Coecient, Union Bound, Hoeding The last step is
to bound the above expression by:
2 /4
P 0 (sup |Remp (f ) − Remp
0
(f )| ≥ /2) ≤ S(F, 2m)e−m (5.21)
f ∈F
with S the shattering coecient, see below. The the union bound trick as well the
Hoeding inequality are used in the proof.
If the RHS of the last expression converges to zero for m to innity, then ERM is con-
sistent for the innite function set F. Given the exponential function, if the shattering
coecient is not growing to strong, convergence follows.
Given uniform convergence, how should the space F be chosen such that the shat-
tering coecient times the exponential function in Step III converges? If we choose F
to be all functions, then the classier fm contains all Bayes classiers which leads to
inconsistency. This follows from the so-called no-free lunch theorem in ML which should
not be confused with the no-free lunch theorem in the advanced theory of no arbitrage
pricing.
To prevent such a situation, we use prior information or a hypothesis to restrict F.
Clearly we do not want to reduce it to the extend that the classier with zero error or
small error (PAC, Agnostic) is ruled out. The following error decomposition shows that
there is an intermediate reduction, see Figure 5.19:
fF
fBayes fn
with Wm
2 = − ai )2 .
P
i (bi
2
We rewrite (5.24) using δ := 2e−2m as
s
log 2δ
P (|Remp (f ) − R(f )| ≤ )≤δ . (5.25)
2m
This abstract result is not very useful for applications since there is no characterization
whether for a given set F the uniform law of large numbers holds. Since our set F is high
dimensional we face the problem: which properties of F determine uniform convergence
We start with the union bound trick.
2
X
P (sup |Remp (f ) − R(f )| ≥ ) ≤ P (|Remp (fi ) − R(fi )| ≥ ) ≤ 2me−2m . (5.28)
f ∈F i
8 Convergence means in the almost everywhere sense. The inequality of DKW quanties how fast an
empirical distribution function approaches the distribution function from which the empirical samples
are drawn. It generalizes the Glivenko-Cantelli Lemma of uniform convergence of empirical functions.
We dene the empirical distribution function:
m
X
Fm (x) = 1/n χ{Xi ≤x} , x ∈ R.
i=1
This proves that empirical risk minimization over a nite set F is consistent with respect
to F: The supremum can be taken outside of the probability. Equivalently, for each
function f ∈F we have with probability 1−δ
s
log 2δ
R(f ) = Remp (f ) + log m + . (5.29)
2m
We see that uniformity increases the error bound by the factor log m. This nite dimen-
sional theory cannot be directly generalized to innite sets F since for m to innity 5.29)
becomes meaningless.
5.2.6.5 VC Theory
Can F be learned if its cardinality is innite, i.e. does the theorem about agnostic learn-
ing for nite F generalize? We start with an example which shows that niteness of F
is a sucient condition for learnability but not a necessary one. Hence, the size of the
class F is not the measure needed to classify the complexity of ML models in learnable
and non-learnable ones.
PAC learnable and agnostic learnable. To simplify the calculation assume realizability,
i.e. f∗ is an algorithm which perfectly classies the data. To nd fERM , the algorithm
selects the maximal r such that to no real number x < r is assigned the value 1, i.e.
r > f ∗. Let fˆ be the function chosen by our algorithm. Then there is a region [f ∗ , fˆ]
with probability epsilon where f ∗ and fˆ disagree: fˆ assigns 0 to an x in this interval while
f ∗ assigns 1. On the remaining interval (fˆ, ∞) both functions agree with probability 1−.
Then,
|S|
Y
/ [f ∗ , fˆ]) =
P (RP (fˆ) > ) ≤ P (∀(xi , yi ) ∈ S, xi ∈ / [f ∗ , fˆ]) ≤ (1 − )|S| ≤ e−|S| .
P (xi ∈
i=1
1 1
fˆ is
If we choose |S| ≥ m(, δ) = log δ , the probability that the error of larger than
can be made smaller than δ , i.e. the algorithm is learnable. The innite set F is
described by a single parameter. We guess that if an innite set F can be described by
a nite number of parameters, then the set is statistically learnable. This is true for the
above example in higher dimensions. But it fails in general to be true.
The key step in determining which innite sets F can be learned is based on the
ghost sample trick idea of Vapnik and Chervonenkis: It reduces an innite to a nite
problem where the union bound trick can be applied and where the factor m in nite
dimension is replaced by a capacity measure which can be computed for innite sets.
5.2. MACHINE LEARNING (ML) 457
Let x1 , . . . , x m be data points and Zm be the sample of the m points (xi , yi ). Set
|FZm | equal to the cardinality of F restricted to Zm . Although F is innite, |FZm | is
nite. The shattering coecient S(F, m) of F is dened by:
S(F, m) = max{FZm |x1 , . . . , xm ∈ X}
m instances Xm from input space X is shattered by a function
In other words, a set of
class of F 2m labellings can be generated using functions from F . If we
if all possible
3
consider three points in the plane, i.e. m = 2, there are 2 = 8 labellings. Using
hyperplanes as the only functions in F , the points are shattered by the hyper planes, see
Figure 5.20.
the bound for consistency is only an upper bound, i.e. a sucient condition, we cannot
not directly conclude that the ERM is inconsistent if we use all functions. The following
condition
log S(F, m)/m → 0
denes a necessary and sucient condition for ERM to be consistent. Using this on the
unrestricted set of functions
Therefore, nally
Proposition 98 (Vapnik and Chervonenkis) . For any δ > 0, with probability 1 − δ any
function f ∈ F satises
r
4
R(f ) = Remp (f ) + (2 log S(F, 2n) − log δ) . (5.30)
m
The shattering coecient which we used so far has the drawback that it is dicult to
calculate. It turns out that a dierent capacity gure, the VC dimension is better suited.
To dene this number, a sample Zm of size m is shattered by the class F if the function
class can realize any labelling on the given sample, i.e. |Fm = 2m . The VC-dimension
A(F, m) is dened as the largest number m such that there exists a sample of size m
which is shattered by F. If the VC dimension of F is nite, then F is learnable:
Theorem 99. F is PAC learnable if and only if the VC dimension of F is nite. Then,
the complexity mF (, δ) grows at the same rate as
VC-dim(F)
1
log .
δ
For agnostic learning, the rst epsilon is replaced by its squared number.
The VC-dimension measures the ability of a set of functions to t available nite data.
A set of functions has VC-dimension d if there exist h samples that can be shattered by
this set of functions, but there does not exist h + 1 samples that can be shattered. If
one considers the half-planes in Rd , then VC is d + 1, see Figure 5.20 for the case d = 2
since there exists three points that can be shattered but four point cannot be shattered.
If en (y), n = 1, . . . , m, is a set of m linearly independent function, then the function
f (y, θ) = χP θn en (x)+a>0
n
5.2. MACHINE LEARNING (ML) 459
y(θ) = x0 + hθ, xi
that is each function is parametrized by θ. These function form the class F. Dierent
classier (hypothesis classes) of linear predictors are compositions g ◦ †(θ). In a binary
classication g is chosen to be the sign function and in a regression, g is the identity
function.
where (setting x0 = 0)
(
+1, hθ, xi ≥ 0
f (x, θ) = sign(hθ, xi) = , θ ∈ dn .
−1, hθ, xi < 0
Each classier forms a hyperplane that is perpendicular to the vector θ and intersects
at the origin. The θ vector is orthogonal to the plane. It points in the direction where
hθ, xi increases most, see Figure 5.21. The sign does not change if we change the order
of the x's: The linear classier does not cares about the nearness of the labels. The next
theorem summarizes learnability:
Assuming m data points in the training set and realizability, the ERM classier
for half-spaces is expected to make zero error on the training set. The ERM can be
implemented by using the perceptron alogrithm on half-spaces. The idea is to adjust the
parameters θ incrementally in order to minimize classier training error step-by-step.
Consider the task to classify images into 'access/no access' to a building. If f (x, θ) = +1
460 CHAPTER 5. ASSET MANAGEMENT INNOVATION
q q
<q,x> < 0
-1 labelled images
<q,x> < 0
-1 labelled images
the image has been classied correctly. By the perceptron update rule the adjustment
after k steps for image m reads:
θ(k+1) = θ(k) + ym xm
i.e.
ym hθ(k+1) , xm i = ym hθ(k) , xm i + ||xm ||2 ≥ ym hθ(k) , xm i .
Given a mistake at stage k , the updated value becomes more positive in k + 1 and after a
certain number of updates the value becomes positive - the kept xed image is classied
correctly. Then the next image is considered. Updating the parameters in the same way
leads to a correctly classied new image after some steps. This is continued for all images.
But will these updates keep the former updates stable - the convergence question of the
algorithm?
Proposition 101. Assume that for all test images m exists a constant γ > 0 such that
hym (θ∗ ), xm i ≥ γ and that all training images have bounded norm ||xm || ≤ r. Then the
perceptron algorithm converges in a nite number of steps k with
r2 ||θ∗ ||2
k≤ . (5.31)
γ2
5.2. MACHINE LEARNING (ML) 461
The number γ is called the margin, a name whose meaning will become clear be-
∗ ∗
low. θ is the decision parameter for the plane hθ , xi = 0. Therefore, the assumption
∗
hym (θ ), xm i ≥ γ > 0 means that there exists a linear classier in our class with nite
parameter values that correctly classies all training images. The inverse upper bound
γ2
is the smallest distance in the image space from any image to the decision bound-
r2 ||θ∗ ||2
∗
ary specied by θ . It measures how well the two classes of images are separated by a
linear boundary. This is the geometric margin γgeom and it inverse is a measure of how
dicult the problem is: The smaller the geometric margin the more dicult the problem
under consideration, see Figure 5.21.
r2
k≤ 2
.
γgeom
Remarkably, the bound does not depend directly on the dimension of the images (pixels)
nor on the number of training images. Nevertheless, the bound turns out as a measure
of complexity of the problem of learning linear classiers - the VC-dimension.
How well does the perceptron classify images which are not in the training set? If
r2 |
the two assumptions of the theorem hold true also for new images, then after k≤ 2
γgeom
9
mistakes in classifying the new images , all further images will be classied correctly. In
this sense, the above result generalize to the new images.
We assumed that there exists a linear classier that has a large geometric margin. Is
it possible to nd such a large margin classier directly? The next section provides the
answer.
γ2
max , yk hθ, xk i ≥ γ, ∀k . (5.32)
θ ||θ||2
9 The algorithm does not know when he made a mistake. This detection has to be added to the model.
462 CHAPTER 5. ASSET MANAGEMENT INNOVATION
SVM
<q,x> > 0
+1 labelled images
<q,x> = 0
Decision Boundary
<q,x> < 0
-1 labelled images
Figure 5.22: The optimal SVM hyperplane is shown together with a non-optimal hyper-
plane where the margin is smaller than in the optimal case. The two data points which
are element of the two hyperplane belonging to the SVM optimal one are denoted by a
square.
This problem is recast in a more suitable form. Replacing max by min provides a
quadratic objective function, inserting the usual factor 1/2 and since the result depends
on the ratio θ/γ , we set without loss of generality γ = 1. Summarizing, the problem
reads
1
min ||θ|| , yk hθ, xk i ≥ 1, ∀k . (5.33)
θ 2
This denes a quadratic optimization problem which can be generalized. Using the
Lagrangian, the Kuhn-Tucker conditions are necessary and also sucient for this convex
problem for an optimum. If αk is the Lagrange multiplier associated to constraint k in
the optimization problem, the complementarity condition of Kuhn-Tucker
αk (yk hθ, xk i − 1) = 0
implies that only data points which are elements of the two hyperplanes in Figure 5.22
marked with the square can have αk > 0 since they are the only points where the con-
straint holds with equality. These two data points are called support vectors. For all
other data points the alphas are zero. In the pixel example, the solution depends only
on the subset of images which are exactly on the margin. The remaining images do not
matter. Hence, the support vectors are sucient to dene the training set.
5.2. MACHINE LEARNING (ML) 463
The many conditions in the Kuhn-Tucker which are due to the inequality constraint
make it dicult to solve the problem. Therefore, one transforms the optimization prob-
lem from the above formulation (primal model) to its dual model form which is easier
∂L
to solve. That for, one solves in the primal model the equation
∂θ = 0 with L the
Lagrangian w.r.t. to θ and substitutes this solution back into the Lagrangian which im-
plies the dual Lagrangian LD which depends only on alpha, y and xk . From a statistical
learning theory perspective, maximizing the margin means minimizing the VC dimension
of the support vector machine. Support vector machines minimize bot empirical risk and
the condence interval.
So far we did not considered the typical situation where images are dicult to classify
because of labelling errors, i.e. some few images pop-up in the wrong half-plane in the
optimal solution. We alter the optimization problem of SVM to account for these types of
errors in the maximum margin linear classier. The simplest form is to introduce 'slack'
variables. Slackness means that we measure the degree to which each margin constraint
is violated and associate a cost for the violation in the objective function. The problem
then reads:
n
1 X
min ||θ|| + c ξk , yk hθ, xk i ≥ 1 − ξk , ξk ≥ 0 , ∀k . (5.34)
θ 2
k=1
where ξ are the slack variables. If we have to set ξk > 0, then the margin constraint is
violated (possible misspecication) and the penalty costs occur. Increasing the constant
c, i.e. increasing the penalty costs, leads to ξk = 0 for all k. We are back in the original
problem. For small c many margin constraints can be violated. It is reasonable to ask
whether this is indeed the trade-o we want.
So far we assumed that data sets can be separated linearly by a hyperplane. But
often a non-linear curve is needed instead. We refer to the literature for the powerful
methods which rely on the clever idea to transform the non-linear into a linear one by
mapping the data into a higher dimensional space and then to use the above linear theory
in this space (kernel methods).
formance 'no' means that the track record of the asset manager is not outperforming the
benchmark track record.
To start with the construction of the tree, we choose the employment status as a rst
node or a question and we count how strong the three dierent types of employment status
allow us to attain as many pure states as possible: This are states where the number the
number of asset managers under consideration either all belong to the performing class or
to the non-performing one. This are the most informative states whereas a node in tree
which leads to 50 percent performers and 50 percent non-performers is not informative
at all - we ip a coin to obtain the same information level. Let us rst construct a
decision tree on an ad hoc basis and then in a second step analyze how the tree should
be constructed optimally. Figure 5.23 shows the dierent steps in the construction of the
tree.
The blue nodes in the gure are pure nodes and the red ones are non-pure end nodes
where there is either no question left to ask or whether the question cannot split the
node further towards a pure node. The node 1/1 after the CFA question is an example
where asking for degree question is useless since both have a master's degree.
P
To construct the tree optimally we use the entropy measure S=− i pi log2 pi . The
lower the entropy for a given question, the higher the information gain by asking the
question. From the raw data, p1 = 10/17 and p2 = 7/17 are the probabilities to be an
outperforming AM or not. The entropy of the raw data is:
10 7
Sraw = − log2 (10/17) − log2 (7/17) = 0.977
17 17
5.2. MACHINE LEARNING (ML) 465
Question 4
Bachlor 1 /2
Question 2
Yes 1/3 University
Employment Question 3
Bachlor 2/0
Status
Question 5
Self-Employed
University Yes 1/0
3/3
Master 1/3 CFA
No 0/3
Employed
5/0
which is close to 1, the total uninformative case. The entropy for Academic question is
2 4
Sacad = − log2 (2/6) − log2 (4/6) = 0.981
6 6
similar uninformative as the raw data and for the Employment Status
6 6 5
Sempl = − × 0.981 + + × 0 = 0.677.
17 17 17
Therefore, the information gain of Sempl relative to the raw data is highest 0.3, the gain
from Academic is 0.034 and from the CFA is 0.021. This denes the ordering of the
questions in the tree and if there are much more questions which questions should be
ruled out since they add only little entropy gains, i.e. information gains.
Root Node
Branches
Sub-trees Splitting
Denition 102. A classication tree is a decision tree in which each node has a binary
decision based on Xi < a or not for a xed value a ∈
mathbbR.
The root node contains all data (Xi , Yi ). For both models, the prediction space is
cut into disjoint subsets. Splitting from top down cuts the prediction space into new
branches as long as the user dene to terminate the splitting process. If there are too
many splits, overtting follows: Performance will be poor when applied to new data.
Pruning is the counter measure to reduce overtting. How does the tree decides when
to make the next split? Many dierent algorithms are used. The general goal is that at
each node, feature Xi and the threshold a are chosen to minimize resulting diversity in
the children nodes. Consider the splits w.r.t. Creditworthiness and Sector, respectively.
The Gini index split calculates rst the index for each sub-node and then the index is
calculated for a split weighted Gini score of each node of that split.
The weighted Gini index for the split Creditworthiness is then 1/4 ∗ 10/30 + 1/4 ∗
20/30 = 1/8 + 1/6 = 0.29. For the split Sector, the Gini Index is
The Gini score for Split on Sector is higher than the other one. The node split will be on
Sector. Intuitively, there is more diversity in the Sector split compared to the other split
5.2. MACHINE LEARNING (ML) 467
where prediction is close to random coin toss. From an information perspective, the purer
a node is the less information is needed to describe the node. Hence, the above dened
entropy S is another quantity to calculate a split. A method for continuous variable are
reduction of variance calculations using the above two step procedure of calculating the
variance for each node and then using the weighted average for the split variance value.
To control for overtting, that is extreme 100% accuracy on training set by making
one leaf for each observation, is achieved either by setting constraints or by pruning.
Constraints can be set on the parameters in the tree: One can x the minimum number
of observations in the nodes for a split, dene the minimum samples for a terminal node,
the maximum depth of tree, the maximum number of terminal nodes or the maximum
features to consider for split. These restrictions prevent the model from learning relations
which are specic to the node but do not generalize.
Splitting is a myopic approach: The algorithm checks locally whether a split should
happen but does not consider a global view. The algorithm only stops if it reaches a
constraint value. In this sense, algorithms are greedy: Myopic decision makers which do
not take into account any future decisions. They act in the analogy to investment as one
period optimizer in a multi-period context. Pruning is the choice which consider eects a
few steps ahead. The implementation follows the usual backward induction logic. First,
generate the decision tree to a large depth and then work backwards by removing all
nodes which imply negative returns.
468 CHAPTER 5. ASSET MANAGEMENT INNOVATION
Comparing tree based models with linear logistic regression for classication and lin-
ear regressions the classic models are appropriate if the relationship between dependent
and independent variable is indeed linear. But if there non-linearities between the vari-
ables then only tree based models can account for them. Further more, tree based models
are often simpler to explain than their linear counter-parts.
Often not a single model is used but an ensemble of models to achieve a better
accuracy and model stability. Like any ML model, tree based models suer from the
bias-variance tradeo. Small trees for example lead to low variance and high bias. In-
creasing the complexity of the model a reduction in prediction error due to lower bias
follows. But at some point, high complexity starts to overt the model that is, variance
is increasing. Ensemble models are a method to manage the bias-variance trade-o. En-
semble methods include Bagging, Boosting and Stacking approaches. Boosting combines
many 'weak' or high bias models in an ensemble that has lower bias than the individ-
ual models, while bagging combines 'strong' learners in a way that reduces their variance.
Bagging reduces the variance of predictions by combining the result of multiple clas-
siers modelled on dierent sub-samples of the same data set. Starting with a training
setX of size N , bagging generates M new training sets Xi0 each of size M by sampling
from X uniformly and with replacement. The M models are tted using the above M
samples and combined by averaging the output for regression or voting for classication.
This averaging procedure stabilizes the single algorithms.
Typically, we don't know the probability P (c|x̃). But Bayes' Theorem tells us how to
calculate this quantity using P (x̃|c) and P (c) instead:
P (x|c)P (c)
c∗ = arg max
c∈C P (x̃)
and since P (x̃) is the same for all classes, the maximum does not change if we delete the
denominator:
c∗ = arg max P (x̃|c)P (c). (5.35)
c∈C
5.2. MACHINE LEARNING (ML) 469
The Naive Bayes Classier assumes that the attributes are conditionally independent
given the classication:
n
Y
P (x̃|c) = P (x̃i |c) . (5.36)
i=1
This assumption is for example violated if we consider text classication (Natural Lan-
guage Processing (NLP)). Here x is a sequence of words and say there are two classes
c1 , c2 which classify the text into 'complaints' and 'non-complaints. The probability that
a whole sequence of words given the class 'complaint' is the same as the product of the
probabilities over the individual words does not hold true since the meaning of a sentence
is not generated by independence between the single words.
Denition 103. The naive Bayes classier nds the most probable class for x̃:
n
Y
c = arg max
∗
P (x̃i |c)P (c) . (5.37)
c∈C
i=1
The prediction ŷ is whether a new client with given features will buy or not buy the oered
portfolio solution. The features of the new client are x̃ = (Risk Prole=Low, Experi-
ence=Medium, Bias = Yes, Liquidity = Fair). Two classes are of interest: c1 means that
an investor will buy the portfolio and c2 that she will not do so. To decide this question,
470 CHAPTER 5. ASSET MANAGEMENT INNOVATION
we use the Naive Bayes Classier formula and start with P (c1 ) = 13/18, P (c2 ) = 5/18.
Table 5.2.10 summarizes the necessary conditional probabilities.
Finally,
P (c1 )P (x̃|c1 ) = 0.05 , P (c2 )P (x̃|c2 ) = 0.005.
The individual x̃ will most likely buy the portfolio product.
Consider a new client x̃ (red dot). If we consider he rst nearest neighbour, he be-
longs to the class who does not buys the product. Considering the 3 nearest neighbours,
we assign to the new client the class blue, i.e. buy the product. Considering the 5
nearest neighbours, a next assignment follows and so on. The reason to consider only
an odd number of neighbours is to avoid unambiguous assignment. We denote by Nk (˜§)
the neighbourhood of x̃ of thek instances given a distance metric d. Which metric d
should one choose? If the inputs x are real numbers, the Euclidian distance is a possible
metric.
10 If inputs are binary valued, the Hamming distance is used.
If we use the Euclidian distance, the distance in income and the distance in risk
aversion are of dierent sizes. Therefore, the attributes are normalized to take values
in [0, 1] and furthermore, it attributes have dierent weights, the terms in the Euclidian
norm are weighted respectively. Typically, the weighting function v(x, y) can be chosen
inversely proportional to d(x, y), the closer two components of x, y are, the more weight
is attributed. The classication task is to nd the class cj ∈ C such that the weighted
distance in a neighbourhood is maximized, i.e.
X
c(x̃) = arg max v(x, x̃)d(cj , c(x)) . (5.38)
c∈C
x∈Nk (x̃)
10 The Manhatten distance is given by d(x, y) = P |xi − yi |, the p-norms are used or the Chebyshev
j
distance which is given by d(x, y) = maxi |xi − yi |.
472 CHAPTER 5. ASSET MANAGEMENT INNOVATION
One way is to assume that the signal impact the time series volatility of the factors.
This implies a new adjusted volatility σa which captures the additional information in the
sentiment signals. The simplest relationship between the original and adjusted volatilities
is a linear regression
σa (t) = a0 + a1 Ξ(t)σ(t)
with Ξ the sentiment analytic signal. If there exists for each factor an adjusted volatility,
then the factor θk = σa,k /σk is integrated in the risk model as follows. Setting
Aθij = θi Aij
we get
2
σa,k = θk2 σk2 .
The modied model X = Aθ would change the volatilities as desired without changing
the correlation structure.
5.2.13.1 Pre-Processing
The dataset has two parts. Part I 10,000 consists of complaints regarding three dier-
ent topics credit reporting, debt collection and mortgage from the U.S. database from
October 2018 to March 2019. Part II consists of 10,000 authorized random tweets from
various users located in the U.S. Retweets, which don't have meaningful content, and
tweets with less than 200 characters are excluded. Each complaint and tweet has around
199 and 46 words, respectively.
Four steps follow afterwards in the pre-processing, see Figure 5.27: First, we remove
anything that doesn't provide useful information for classication such as 'X' covering
5.2. MACHINE LEARNING (ML) 473
Evaluation /
Validation
- Performance assessment
- Feedback loop
Data Text
Modeling
Acquisition Preprocessing
Text Pre-processing
Modeling
To whom it may concern : Can ANYONE help? I have fullled all requirements re
Claim † XXXX dd XXXX Green Tree Loan Servicing, & I am being told that I MUST
wait until further investigations are complete before the check which was issued by XXXX
XXXX can be released to me for completion of repairs to the home I live in. I have an
XXXX 15 year old daughter who is suering in this HEAT WAVE -along with XXXX
cats, a bird my wife and myself. No hotel will take us. My next step will be an attorney
& the Media. XXXX XXXX
474 CHAPTER 5. ASSET MANAGEMENT INNOVATION
may concern can anyone help fullled requirements re claim dd green tree loan ser-
vicing told must wait investigations complete check issued can released completion repairs
home live i year old daughter suering heat wave along cats bird wife no hotel take us my
next step attorney media
TP + TN
acc = .
M
The hit rate or recall is given by rec = T P/P , i.e. how many complaints are correctly
classied given the total number of complaints. and the false alarm rate by F P/N .
Precision pre = T P/P 0 measures how many of the predicted complaints are actually
complaints. The F-Score is another evaluation metric that represents the harmonic mean
of precision and memory. From a FI perspective, FN rate is more important to control
than FP rate: Better to quickly realize a false alarm than to neglect a complaint because
476 CHAPTER 5. ASSET MANAGEMENT INNOVATION
Predictions
TP FN
Complaints True Positive False Negative
P
True
FP TN
Tweets False Positive True Negative
N
Total P’ N’ M
ons thinks it is general text. A late or improper response to complaints would jeopardise
the customer's relationship. Therefore, we use the F2 value, which is twice as weighted
as the precision
(1 + β 2 )pre × rec
F2 =
β 2 × pre + rec
where β equals 2. High accuracy rates could be attributed to chance. We use the
bootstrap method with replacement to estimate the overall accuracy, F2 value, and
classication performance of the classiers. Here, the classiers build their models based
on a temporary training set, and the models are evaluated using the appropriate test set.
We repeat this procedure times with a for loop where dierent bootstrapped samples
are randomly generated for each iteration. The total accuracy rate and F2 score are the
average accuracy rates and F2 scores calculated for each iteration.
The confusion matrix the Naive Bayes classier reads in an example:
Predicted Predicted
Complaint General
Actual Complaint 49.54% 3.12%
Actual General 1.87% 45.47%
The diagonal elements show the correct predictions. There is almost no general text
predicted to be a complaint (Type I error). About 8% of the complaints are not detected
as such (Type II error). The business risk the rm faces due to big data analytics is that
employees who write the answer do not recognize the Type II error communication, then
5.2. MACHINE LEARNING (ML) 477
the customer will receive an improper response. The parametrization of the algorithm
can change the ratio Type I to Type II error.
Figure 5.29: Performance of the four classiers Naive Bayer, Support Vector Machines,
Random Forrest and Articial Neural Network. (Zin (2019)).
The authors use 300 000 individual stocks over the time horizon 60 years from 1957
to 2016. Our predictor set includes 94 characteristics for each stock, interactions of each
characteristic with eight aggregate time series variables, and 74 industry sector dummy
variables, totalling more than 900 baseline signals. Some of our methods expand this
predictor set much further by including non-linear transformations and interactions of
the baseline signals. Gu et al. (2018)
Dimension reduction or penalization techniques are needed in the OLS case. Using
parameter shrinkage and variable selection, which both limit the degrees of freedom in
the regression, brings the out-of-sample R-squared back to 0.09% per month. Principal
component regression (PCR) and partial least square (PLS), which reduces the dimension
of the predictor set to a few linear combinations of predictors, raise the out-of-sample
R-squared to 0.28% and 0.18%, respectively. Non-linear specication further improves
predictions. The authors use generalized linear models, regression trees, and neural net-
works. Regression trees and neural nets lead to a R2 between 0.27% and 0.39%. The
economic gains are considerable. An investor in the S&P 500 which uses neural network
forecasts reaches a 21 percentage point increase in annualized out-of-sample Sharpe ratio
(0.63) relative to the 0.42 Sharpe ratio of a buy-and-hold investor. Forming a long-short
decile spread, sorted on stock return predictions from a neural network, the strategy
earns an annualized out-of-sample Sharpe ratio of 2.35 compared to the Sharpe ratio of
0.89 of their benchmark.
We describe some of the used machine learning methods. All methods have the
objective to minimize the mean squared predictions error (MSE). They describe an asset's
excess return of asset j as an additive prediction error model:
Rtj1 = E(Rt+1
j
|Ft ) + jt+1 = g(ztj ) + jt+1 (5.39)
) where zj is the vector of predictors. Since predictions do not depend on time and the
individual stock the estimates of risk premia are more stable for any individual asset
contrary to standard methods where the cross-sectional is re-estimated in each period.
Since g depends on ztj information prior to t or from other stocks is not used. ML re-
quires careful construction of the sub-samples for testing, estimation and hyperparameter
tuning to control model complexity for the out-of-sample performance. Depending on
5.2. MACHINE LEARNING (ML) 479
the algorithms used, dierent tuning methods are in force, see Guo e al (2018) for details.
Dierent choices of the function g dene dierent models. In the simple linear model
imposes conditional expectations can be approximated by a linear function of the raw
predictor variables, i.e. g = hzti , φi with the parameter vector φ. The objective function
is the ordinary mean square error loss
N,T
1 X j i
2
L= Rt+1 − hzt , φi .
NT
j,t=1
Using statistical robustness methods (the Huber loss function), this least square objec-
tive can be tuned to account better for observations which are more informative. Penalty
models induce sparsity in the sense that they force small variable to become zero.
If predictors are highly correlated, the above shrinkage and selection methods are
not optimal. It is better to choose an average of the predictors as the sole predictor in a
univariate regression. This average approach is the essence of dimension reduction. Prin-
cipal components regression (PCR) and partial least squares (PLS) are two approaches.
PCR starts with a principal components analysis (PCA), which conserves the covariance
structure among the predictors, and then the leading predictors, given by the highest
eigenvalues, are used in the predictive regression. PCR rules out coecients by consider-
ing the covariation among the predictors before considering their goodness in predicting
future returns. PLS contrarily rst performs a dimension reduction by exploiting co-
variation of predictors with the forecast target. We refer to Yeniay and Göktas for a
comparison of PCR, OLS, PLS.
The generalized linear model expresses the model return forecast error as a sum
of an approximation error (bias, not knowing the true model g ∗ ), an estimation error
(variance, not knowing the true parameters in the model g) and an intrinsic error term.
Generalized linear means that non-linear univariate transformations of the predictors are
considered. As it turns out ex-post a weakness is that it does not allows for interactions
among predictors. Considering multivariate functions of predictors would generate such
interactions. But the number of parameters of such a model becomes computationally
intractable.
Instead regression trees are used for incorporating multi-way predictor interactions.
Formally,
K
X
g(zi,t , θ, K, L) = θk χ{zi,t ∈Ck (L)}
k=1
where Ck (L) is one of the K partitions of the data and θk is the sample average of
outcomes within the partition. This formula says that given a tree consider all paths
starting from the root node to the end nodes (sum over K ), at each node in a given path
check whether the feature is above or below the threshold value (indicator function), and
480 CHAPTER 5. ASSET MANAGEMENT INNOVATION
multiply all indicator functions in a given path. Based on the basic decision tree model,
boosting and random forest ensemble methods are introduced in order to stabilize the
results, to improve the performance and to manage the bias-variance tradeo.
Given these models, the out-of-sample R2 for individual excess stock return forecasts
is calculated:
(Ri,t+1 − R̂i,t+1 )2
P
i,j∈τ
R2 = 1 − P 2
Ri,t+1
i,j∈τ
where τ indicates that only the testing sub-sample is used for tting. The metric is used
without demeaning which is meaningful since we consider individual stocks and not broad
indices. Using the historical averages adds a lot of noise. All models under consideration
increase their monthly R2 by 3 percentage points when benchmarked to the historical
means. Table 5.2 shows the monthly out-of-sample stock level prediction performance.
Table 5.2: Monthly R2 for the entire panel of stocks using OLS, OLS using only size,
book-to-market, and momentum (OLS-3), PLS, elastic net (ENet), random forest (RF),
gradient boosted regression trees (GBRT), and neural networks with one to ve layers
(NN1-NN5). The generalized linear model (GLM) are not reported (bad performance
anyway). '*' indicates the use of Huber loss instead of the l2 loss. Top 1,000 or bottom
1,000 means 1000 stocks by market value. (Guo et al. [2018])
The negative results for OLS reect the in-sample-overt. Restricting OLS to three
style premia or using penalization as in ENet improves the performance signicantly.
Regularizing the linear model via dimension reduction improves predictions even fur-
ther, the PLS case. Hence, dimension reduction dominates variable selection. Boosted
trees and random forests are competitive with these methods. Neural networks are the
best performing predictor overall. But the drawback of neural network is the diculty
to interpret the results - what is the economic meaning of the dierent layers one to
ve, why do they generate the performance. Therefore, neural networks fail to be inter-
pretable which is a serious drawback in asset from a client perspective and also from a
regulatory one.
Assessing the statistical signicance of the return performances, the authors use
Diebold- Mariano test statistics for pairwise comparisons of a column model versus a
row model. The statistics implies that the performance dierences among regularized
linear models are all insignicant, that is all OLS models, ENet, PLS and PCA pro-
duce statistically indistinguishable forecast performance. Random forest and boosted
5.2. MACHINE LEARNING (ML) 481
trees improve over linear models marginally. Again, neural networks are the only models
that produce large and signicant statistical over all linear models. When one consid-
ers which characteristics matter in the dierent model, a few characteristics turn out to
signicantly contribute in all models to the return performance: Momentum on several
time scales, volatility characteristics, spreads for example matter. That is, as we have
seen in other parts market driven characteristic dominate macro economic or accounting
type characteristics.
Correlation matrices can be represented as complete graphs, which lack the notion
of hierarchy: Each investment is substitutable with another. There is no hierarchical
relationship. All nodes are of the same importance. Small estimation errors are magnied
in such a structure. Consider an investor, which invests in many assets where some assets
are close substitutes to each other while others are complementary. Say stocks with a
similar liquidity and of the same economic sector are more substitutable than stocks
which have dierent characteristics. Such a classication of the dependence leads to a
tree structure which includes hierarchical models and not a symmetric complete graph
where weights between any nodes can vary freely, see Figure 5.30.
While a covariance matrix has N × (N − 1)/2 edges to connect the N nodes, a tree
has only N − 1 edges to rebalance the weights among peers at various hierarchical levels.
Furthermore, in the covariance matrix the weights distribution has no natural point to
start with. But in a tree the weights are distributed top-down which is consistent with
many asset managers investment behaviour.
The HRP algorithm is constructed in three steps: First, similar investments are
grouped into clusters, based on a proper distance metric. This denes tree clustering.
Second, the rows and columns of the covariance matrix are reorganized such so that the
largest values lie along the diagonal. This leads to a quasi-diagonalization of the clustered
tree. With such a quasi-diagonalization the problems of inverting the covariance matrix
are circumvented. Third, the allocations is split top-down through a recursive bisection
of the reordered covariance matrix.
The tree construction, step one, is done in several steps. The rst stage is generate
tree clustering out of the data. Let a T ×N matrix of observations X be given with N
the number of asses and T the periods. The goal is to map the N column vectors into
a hierarchical structure of clusters, such that allocations can ow downstream. The rst
482 CHAPTER 5. ASSET MANAGEMENT INNOVATION
Finally, the matrix d¯ij is updated by appending d¯i,u(1) and dropping the clustered columns
and row j ∈ u(1), see the example for illustration:
1 − − 0 − − 0 − −
ρ = 0.7 1 − → d = 0.3873 0 − → d¯ = 0.5659 0 − → u(1) = (1, 2)
0.2 −0.2 1 0.6325 0.07746 0 0.9747 1.1225 0
and
0 − − −
0
¯ i,h=1,...4
0.5659 0 − −
u(1) → di,u(1) = 0 → (d) = .
0.9747 1.1225 0 −
0.9747
0 0 0.9747 0
Finally, the above steps are recursively such that N −1 clusters can be appended to the
matrix D until the algorithm stops when the nal cluster contains all of original items.
The sequence of the cluster formation can be illustrated using a dendogram.
The next stage, quasi-diagonalization, reorganizes the rows and columns of the co-
variance matrix, so that the largest values lie along the diagonal. This operation places
similar investments close to each other and dissimilar ones far apart. The used algorithm,
which we do not discuss, preserves the order of the clustering.
Figure 5.31: Quasi-diagonalizes of the clustered correlation matrix, in the sense that the
largest values lie along the diagonal. HRP does not require a change of basis unlike the
PCA approach for example. HRP works with the original investments. (Lopez [2016]).
484 CHAPTER 5. ASSET MANAGEMENT INNOVATION
Stage three uses the fact that inverse-variance allocation is optimal for a diagonal
matrix. For the quasi-diagonal matrix of stage two one approach to achieve a recursive
bisection is split the allocations of the quasi-diagonal matrix between adjacent subsets
in inverse proportion to their aggregated variances.
The author compares in- and out-of-sample the minimum variance portfolio (GMV),
the inverse volatility portfolio construction (IVP) of risk budgeting (where correlation
information is discarded) and the HRP. In all portfolio constructions he applies long-only
constraints. The simulation is done for 10 assets. The GMV allocates 92.66% on 5 top
holdings and 0 to three assets. The HRP allocation is in between the highly concentrated
GMV and the IVP almost equal distribution. The GMV and the HRP portfolios have
almost the same risk although GMV only uses half of the assets. Therefore, an event im-
pacting the top ve assets will have a more severe impact than in the HRP case. From an
out-of-sample perspective, Gaussian returns are generated with mean zero and 10% stan-
dard deviation, random shocks are added to account for price jumps and the portfolios are
rebalanced monthly (every 22 observations). The simulations are repeated 10, 000 times.
All mean portfolio returns out-of-sample are essentially zero. But the variance of the out-
of-sample portfolio heavily dier. The GMV variance is out-of-sample the highest one,
72.47% greater than in the HRP's. Intuitively, shocks aecting a specic investment
penalize the GMV concentration. Shocks involving several correlated investments
penalize IVP which ignores the correlation structure. HRP protects against common
and idiosyncratic shocks by balancing between diversication across all investments and
diversication across clusters of investments at multiple hierarchical levels.
5.3 Blockchain
Blockchain, a technology, Bitcoin, a crypto currency, and cryptography, mathematics of
encription/decription and digital signatures, are the three main pillars in a 'digital asset'
world.
5.3.1 Cryptography
Cryptography is a main mathematical discipline in a digital world. It makes protection
and validation possible in a world of strangers when we want to exchange values in a
blockchain at a zero human trust level. The main goals of cryptography are
4. Obligation (non-repudiation)
5.3. BLOCKCHAIN 485
For the rst goal, encription and decription are used. For the other tree goals, digital
signatures are used.
This shows that cryptography needs inverse functions or properties of the preimage of
functions, the so-called one-way function. These are functions where it is easy to calcu-
late f (x) from x but hard to invert, i.e. to calculate x from f (x).
Although one-way functions are believed to exist, mathematical proofs about their
existence are missing. An example is the factoring function of prime numbers, i.e.
f : (x, y) → f (x, y) = xy . The product is simple to calculate but nding the prime
factors is dicult.
Denition 104. Private keys are denoted pkX with X the owner of private key and vkX
denotes a public key.
486 CHAPTER 5. ASSET MANAGEMENT INNOVATION
Diffie–Hellman
Symmetric-key Asymmetric-key
key exchange
Figure 5.32: Symmetric key, asymmetric key and Die - Hellman key exchange (Source:
Wikipedia [2016]).
Rivest, Shamir and Adelman proposed a rst candidate trapdoor function, the RSA
system. Before w consider this algorithm, we introduce to basic number theory - modular
arithmetic (MA).
The maths is used to formalize the encryption and decryption algorithms as well as gen-
eration of the keys. MA is arithmetic for integers where the addition of two numbers
restarts after a certain value (modulus). The clock is a prototype example: Hour arith-
metic's is modulo 12, written mod 12. 3 = 15 mod 12 means 3 − 15 = −12 is an integer
multiple of 12. An equivalent denition of a = b mod n is that there exists an integer k :
a = kn + b.11 We summarizes some calculus rules.
11 Examples are 40 = 18 mod 11, −40 = 8 mod 6, −40 = 8 mod 8. The modulus operation denes an
equivalence relation on the integer numbers. The relation satises reexivity, symmetry and transitivity.
5.3. BLOCKCHAIN 487
Basic for cryptography is the existence of the inverse a−1 of a. If c=d mod φ(n),
where φ is Euler's totient function, then a
c = ad ( mod n) provided a is coprime with n;
this replaces the last false statement in the above properties. The totient function
aϕ(n) ≡ 1 mod n.
x = aφ(n)1 bmodn
As an example, assume that to each letter of the alphabet we associate one number
0 to 25. Encryption means
y = ax + b mod(n)
12 Two integers a and b are coprime if the only positive integer factor that divides both of them is 1 or
equivalently that their greatest common divisor is 1.
488 CHAPTER 5. ASSET MANAGEMENT INNOVATION
We set the key (a, b) = (9, 13) and gcd(a, 26) = 1. We decrypt the letter C which is
mapped into the number 2. Then
y = 9 · 2 + 13 mod(26) = 31 mod(26) = 5.
Decryption implies
x = 3(5 − 13)mod(26) = 2
where the calculation of the inverse is the tedious part. Since the gcd of 9 and 26 is 1,
the inverse a−1 exists. The number of possible keys 12 · 26 = 312 is very small.
Denition 106. A set G is a group if there is an operation ∗ such that for two elements
of gq , g2 ∈ G also g1 ∗ g2 ∈ G, the operation ∗ is associative, there exists an unit element
e such that g ∗ e = e ∗ g = g for all g and for each g there exists an inverse g −1 such that
g ∗ g −1 = g −1 ∗ g = e.
a1 + a2 = s1 n + r1 + s2 n + r2 = (s1 + s2 )n + qn + r = r
The set Z∗n consists of all integer numbers m, 1 ≤ m ≤ n such that gcd(m, n) = 1.
Proposition 107. Zn is a group under addition modulo n. Z∗n is a group under multi-
plication modulo n.
The order of the group Z∗n is given by Euler's totient function and the group is cyclic:
All group elements can be generated by multiplication of a single element, if and only
if n = 1, 2, 4, q k , 2q k where k is a positive integer and q any prime number dierent from 2.
67 × 67 = 4489 = 91 × 49 + 30 .
Therefore, the result after the rst multiplication is 30. This is then again multiplied
with 67, which is larger than 91 and applying the same division as above, the result
is 8 (the remainder). This is repeated in total 5 times leading to the number 58 - the
encryption E of C = 67 is E(67) = 58. This is the message Alice receives. Now she uses
the private key number 29 and multiplies the 58 with itself 29 times where she uses the
same logic - after each multiplication we do the next multiplication with the remainder
- for decryption D:
58 × 58}
| {z = 67
29 times, modulo 91
490 CHAPTER 5. ASSET MANAGEMENT INNOVATION
D(58) = D(E(67)) = 67 .
If you don't know the private key number 29, then you don't know how many times
you have to multiply 58 with itself in the above time consuming way to calculate and
consider in each step the remainder. Besides the easy part of multiplication (encryption),
decryption is a hard to solve factoring-type.
Summarizing,
• Alice choose prime numbers p, q , which she keeps secret, and set n = pq .
• Alice chooses vkA such that the greatest common divisor of vkA and φ(n) is 1, i.e.
the public key and the Euler function number φ(n) are coprime vka ∈ Z∗φ(n) .
• Alice computes the inverse of vkA , the private key satisfying pkA vkA = 1 mod (φ(n)).
• Alice makes n and the public key public keeping p, q and the private key secret.
Although the example explains the basic concept, real-life algorithm are more rened.
13
The example did not consider in detail how the keys are distributed and management
in a public key system. Die and Hellmann invented the Secret Key Exchange (SKE).
Fix a prime p and a generator g in the cyclic group Z∗p where g, p are public known. Alice
picks at
∗
random an element x ∈ Zp−1 and Bob picks at random an element y ∈ Zp−1 .
∗
Alice calculates
a = gx mod p
and Bob calculates b = gy mod p. The keys x, y are private to Alive and Bob, respectively.
Alice sends Bob a and Bob sends Alice b. But then
ay = (g x )y = g xy = (g y )x = bx ∈ Z∗p .
Hence, Alice and Bob can both calculate the result without a prior meeting to generate
a shared key. If Eve wants to calculate ay or bx , then she faces the problem that she
does not knows x or y. To nd these numbers she has to compute the discrete logarithm
which is believed to be not tractable.
13 There more rened mathematical concepts are used for the multiplication and factorization problem.
Instead using multiplication dened on nite integer sets one use so-called elliptic curve cryptography,
see Sullivan and Collabrini for an introduction.
5.3. BLOCKCHAIN 491
Hash functions accelerate database lookup by detecting duplicated records in a large le.
The hash function is deterministic: For the same input always the same hash-output
follows. The term 'cryptographic' means that the hash function needs to satisfy some
security, authentication or privacy criteria. First, the time to compute the hash should
be short for any message input. Second, to reconstruct a message given a hash result is
impossible unless one tries all possible combinations; there are too many combinations.
Changing the message only by a little amount of information should change the hash
value in a way that the new and the old hash look uncorrelated. Example of a SHA224
hash where in one sentence a dot is added:
0x730e109bd7a8a32b1cb9d9a09aa2325d2430587ddbc0c38bad911525
SHA224(The quick brown fox jumps over the lazy dog.)
0x619cba8e8e05826e9b8c519c0a5c68f 4f b653e8a3d8aa04bb2c8cd4c
Finally, it should be a hard problem to nd two dierent inputs which lead to the same
output - the so-called collision resistance. Summarizing, using cryptographic hash func-
tion makes it easy to verify that some input data maps to a given hash value, but if
the input is unknown, it is dicult to reconstruct it by knowing the hash value. For the
proof-of-work in Bitcoin transactions one has for example to compare fast and easily data
of arbitrary size and to be sure that the message which was digital signed did not changed.
Hash functions are not one-way functions. Historically, popular cryptographic hash
functions have a lifetime of around 10 years before they were broken.
Protocol - the le storage problem
A client wants to store a le on a server. The le has a name F and data M. He
wants to retrieve le F later.
In a basic protocol
• Client deletes M
• Server returns M
What if server is adversarial and returns M' instead of M? A simple solution is that the
client does not delete M and then compares M' with M. But this requires enough memory
to store the data M.
The RSA system allows to implement digital signatures as follow. Alice wants to
sign electronically a document M. Alice signs M by appending the digital signature
DS(M ) = f −1 (M ) with f is Alice's trapdoor function, i.e. only Alice knows the trapdoor
information. But then anybody can check the validity of the signature since f (f −1 (M )) =
M. This shows that the signature becomes invalid if in the message M is changed. In the
RSA system, DS(M ) = M pkA mod n. Using Alice's public key, anybody can calculate
If the result equals M, then the signature Md must have been created by Alice which is
the only to know pkA . Figure 5.33 illustrates the digital signature process for a hashed
message.
Alice Bob
M #(M)
Broadcasting
M= M M #(M)
Hello Bob +DS(#(M),pkA) +DS(#(M),pkA) =ad987
Identical?
hash
DS(#(M),pkA,
#(M) DS(#(M),pkA) DS(#(M),pkA) vkA, )
=ad987 =9a8c7 =9a8c7 = ad987
Figure 5.33: The process of signing a hashed document: Hashing the document, signing
the hashed document using the private key, broadcasting the document plus the signed
hash and decomposition of the broadcast in two pieces: Hashed documents and the
verication of the signed hash. If the two results agree, then Alice signed the document
and the document did not change during broadcasting.
public key vkA = 17 with the Euclidian algorithms pkA = 89. Using this generated key
i.e. the validation using the public key only and the known number n proves that Alice
signed the document.
In the case of Bitcoin, public keys (or addresses) correspond to identities of Bitcoin
users. A Bitcoin user can send a message or transaction from his address by signing it
with his private key. At Bitcoin there is no central place which registers and identies
the users. Each user registers himself by generating - as often as he wants - a new
address. At rst glance, this decentralized identity management gives the impression
of granting users a high degree of anonymity and privacy. This impression is put into
perspective when looking over time. Movements are assigned to each address, which
are visible to all participants and behind which patterns can be identied using data
analytics. Furthermore, at some point in time today say a criminal using Bitcoin for
money laundering needs to leave the Bitcoin network by exchanging the Bitcoins in say
dollars. It is there where secret services position their software to reveal the identity of
the criminal. One therefore often speaks of Bitcoin as a pseudonymous system.
5.3.6 Blockchain
With the implementation of Bitcoin at the beginning of 2009, something new was cre-
ated: Bitcoin enables joint accounting with participants who do not trust each other, do
not know each other and do not know how many other participants are in the system.
The technology that makes this possible is called Blockchain and allows a new data man-
agement model. The term blockchain refers to the fact that transactions are grouped
into blocks and conrmed together. The conrmation in turn attaches the block with
the new transactions to a chain of previous blocks and thus incrementally builds up a
transaction history. If transactions are not grouped into blocks but the decentralized
infrastructure is kept one speaks about Mutual Distributed Ledger Technology (MDLT).
We do not dierentiate between MDLT and blockchain in the sequel.
Denition 108. A mutual distributed ledger technology (MDLT) denes ownership (mu-
tual), a technology (distributed servers) and the object (ledger)14
The basic functionality corresponds to the model of the Replicated State Machine:
Participants manage a quantity of data (state) by holding a copy of the data (replica)
locally and executing operations on it that change the data. The initial state has to be
14 The records in the ledger consider ownership, transactions, identity of assets. In order to allow for
communication between agents, they need to agree (consensus) about the state and authenticity of the
ledger.
5.3. BLOCKCHAIN 495
the same for all participants and the operations are deterministic: Any participant who
applies the operations in the same order to the initial state will arrive at exactly the same
end result. In such a system, consensus is the consensus when all participants agree on
the current state of the data. In the example of Bitcoin, the data is the Bitcoin balance
of individual participants and the operations are transactions between these participants.
Two requirements are both necessary for a blockchain making sense: Decentraliza-
tion dominates centralized architecture and trust. Technological decentralization
is a well-known concept. Trust means that trust in a decentralized P2P is preferred over
the trust in a central network with say 3rd party validators.
We consider blockchains for money transfer in more details for Bitcoin cryptocur-
rency which is one of the few up and running blockchain applications. Traditional money
transfer using traditional banking services and trust is shown in Figure 5.34. We do not
consider how money is generated and how money is represented but we consider the
third control structure transaction execution. When Alice sends Bob CHF 10, they
15 Rubin is an excellent video on youtube about the basics. References for this section are Duivestein
et al. (2016), Tasca (2016), Aste (2016), Rifkin (2014), Swan (2015), Peter and Panayi (2015), Davidson
et al. (2016), UBS (2015), Nakamoto (2008), Franco (2014), Bliss and Steigerwald (2006), Peters et al.
(2014), Zyskind et al. (2015), Berentsen and Schär (2018).
496 CHAPTER 5. ASSET MANAGEMENT INNOVATION
Bank B
Bob
Account Bob
Bob
Central 3rd Party Validations
Private Ledger
Alice owner of CHF 10?
Alice pays Bob 10 Double Spending?
Figure 5.34: Alice paying CHF 10 to Bob using the centralized banking system. On a
payment level, Alice announce her willingness to pay to her bank A. The bank checks
whether Alice possesses CHF 10 and makes sure that there is no double spending. This
third party validation is repeated by the central bank where the bank accounts of bank
A and B are checked.
both use 3rd trusted parties - banks. Alice orders her Bank to transfer the money to
Bob's bank. Both banks keep the accounts - i.e. the ledgers. Both banks are trusted
- Alice Bob do not need to know each other. The banks check whether the money can
be transferred - the transaction legitimization? The central bank is then a trusted 3rd
party acting between the ledgers of the two banks and running an own ledger where the
bank's account balances are recorded.
3. Transaction Consensus. At each date a central party allows for ecient exe-
cution and it xes at each date in a unique way the distribution of money in the
whole system.
These parts of transaction execution hold for all monetary system, also for crypto
currencies. Blockchain attempt to change this classical money transfer in three respects:
5.3. BLOCKCHAIN 497
We describe in principle how this works; leaving aside the many details which matter
if one considers an implementation of a blockchain.
Consider this in more details for Alice, Bob and Eve, where Alice sends CHF 10 to
Bob and Bob CHF 5 to Eve. One - there is no trusted 3rd party - has to assure for
example that Alice is Alice, that Alice possesses the money, that she did not promised to
pay the same CHF 10 to multiple recipients and that indeed Bob is receiving the money
see Figure 5.35.
Open Ledger
CHF 20
Distributed Open Ledger CHF 20 Distributed Open Ledger
Alice Eve
Alice has CHF 20
Alice Eve Alice has CHF 20 CHF 5
Alice Bob CHF 10 CHF 15 Alice Bob CHF 10
Bob Eve CHF 5 Bob Eve CHF 5
Alice Eve CHF 15 Alice Eve CHF 15
CHF 10 CHF 5
CHF 10 CHF 5
Figure 5.35: Left Panel: Open and distributed open ledger technology. Right Panel:
Validation of transactions. New transactions are grouped into a new block and after its
validation - the consensus work to install unambiguous asset ownership - the block is
added to the existing blockchain. Each block is further marked with a time-stamp and a
digital ngerprint (a hash) (identication number) of the previous block. This hash ('K'
in the above example) identies a block uniquely and the verication of the ngerprint
can be easily done by any node in the network.
The central open ledger records that Alice indeed has CHF 20 on her account and
that she is able to pay CHF 10 to Bob. Both transactions are recorded and time-ordered
linked. If Alice wants to pay CHF 15 to Eve while only CHF 10 are left, in the open
ledger the participants realize that she fails to have enough cash. In a next step, this
498 CHAPTER 5. ASSET MANAGEMENT INNOVATION
central ledger is then removed by making a copy of this ledger and save it on the servers
of all participants (the nodes): A distributed or decentralized open ledger architec-
ture.
Transaction feasibility means that a payment from Alice to John who is not directly
connected to Alice is possible: Alice broadcasts the payment instructions to her next node
Bob, who broadcasts to his next node and so on. There are many paths linking Alice and
John. The payment network works also if some links are not functioning: A decentralized
system is more robust than a centralized one. The drawback of the decentralized system
is that there are no admission constraints, each node can broadcast any type of infor-
mation: Each node needs to be able to check the validity of each transaction information.
The consensus for Bitcoin is called proof-of-work (PoW). Each miner is free to
choose the amount of transactions which he wants to validate. They solve a purely nu-
merical problem unrelated to the block's content (mining). More precisely they solve
a cryptographic puzzle using a try-and-error approach indicated by 'K' in the gure.
A miner who solves his problem rst attaches his proof-of-work, to his block and then
broadcasts it. All other miners can easily verify the correctness of the PoW. The vali-
5.3. BLOCKCHAIN 499
dated block is added to the blockchain. The PoW requires eort such as energy spent
by the computers and investment in the hardware for example. Nakamoto (2008) argues
that PoW generates a stable consensus, i.e. a single chain, if miners always take the last
solved block as the parent for their next block. Each block under PoW-consideration
needs to make reference to a yet validated block where the new block after consensus
nding should be linked to. This reference is done using a hash value. Changing a
past validated block by the miner changes the hash which leads to inconsistencies in the
blockchain. The participants by construction consider always the longest chain which
contains legitimate transactions. Therefore, to cheat a miner needs to be able to recalcu-
late a whole chain afresh for validation before a single new block is validated by another
miner. This is practically not feasible. To generate a new block by a miner takes just
a short time. Without restricting this block generation process, validation for consensus
would become impossible since the frequency how blocks are generated dominates the
speed of propagation in the network. Therefore, the process is slowed down such that on
average each ten minutes a new block is mined and veried.
The winner gets remunerated for his eorts (new Bitcoins generation). He takes it
all. In a PoW one authenticates the fact that resources have been spent to solve a crypto-
graphic problem. These denes the economic incentives: The more computer calculation
power a miner invests the higher the likelihood that he will mine the block as rst ones. If
Alice wants to cheat by using a double-spending strategy, she rst has to spend resources
in order to validate the block containing her fraudulent transactions. PoW validation is a
peer-to-peer type consensus mechanism since the validation can be veried to be true by
all miners. No trust is needed and no node can simply claim to have found a key without
having spent resources due to the easy possibility of verication of the candidate solution.
Summarizing:
• The rules are contained in the Bitcoin protocol, an open source cryptographic
protocol.
• Switch from single third party trust to distributed ledger trust for transactions.
• Unambiguous ownership rights at any moment in time due to the consensus mech-
anism.
• The P2P complete stranger consensus PoW is the most expensive and slowest
consensus mechanism.
• Fork. Assume that a miner attaches his mined block not to the last validated but
the second last one - a fork follows. Miners can choose to attach validated block to
the original chain or to the other one of the fork. Then there are competing versions
of the ledger. Forks reduce the the credibility and reliability of the blockchain.
Even if, eventually, all miners agree to attach their blocks to the same chain, the
occurrence of the fork is not innocuous. A fork can also occur when some miners
adopt a new version of the mining software that is incompatible with the current
version. Does the blockchain protocol rule out the occurrence of forks?
We close this section with an analogy to the Coin of Yap problem. It is a problem
which the population in the Yap islands in Western Pacic Ocean faced. The Yaps
produced stone money. There were ve dierent sizes of stones where the largest one
needed around 20 men to be transported. It was not possible to carry the stones from
one island to the next one for exchange reasons using the canoes. How could one use the
stones for payment if they could not be physically exchanged against the goods? The
solution was to store the ownership information in the consciousness of the Yap people
(the blockchain): The Yap knew who owes the dierent stone pieces. They did not need
to move them when ownership changes since the public memory records the changes
in ownership. There is a society consensus over ownership. If there is a conict, the
stronger strain wins. Due to the limited size of islands and population the system costs
never became too high to become ineective.
• •Anyone can participate in the protocol and receive say Bitcoin as rewards by
performing the PoW-based mining operation.
• The mechanism of pouring currency in the system via proof of work, makes it
feasible for anyone (possessing sucient hashing power) to participate.
• The ledger itself is public, readable and writeable by anyone who possesses Bitcoin.
• Producing transactions and/or blocks can only be performed after being authorized
by the other nodes.
• In their simplest form the set of nodes is static: the set of nodes implementing the
protocol is xed and determined at the onset of protocol execution.
on-chain assets such as virtual currency are transacted. Since the number of actors is
smaller in permissioned blockchains, only a small number of participants need to operate
which makes such networks more scalable than the permissionless ones.
Figure 5.36: Emergence of dierent network topologies (Celent [2015], UBS [2015]).
Since actors in a permissioned network are not anonymous, the time-consuming and
expensive PoW is not needed. Much simpler and faster consensus schemes apply. It
is possible to use classical consensus algorithms from the eld of distributed computing
such as Paxos or Practical Byzantine Fault Tolerance (PBFT). These protocols are based
on polls in which participants vote on the next operation to be applied. This is possible
because each participant knows how many votes will result in a majority and when the
vote will be successful. An example of a permissioned blockchain is Ripple.
The market dynamics for blockchain consisted 2017 of about 300 start-ups worldwide
and more than eighty percent of the global banks running blockchain projects (WEF
(2017)). 20% of the global banks will have a commercial blockchain product by the end
of 2017 (IBM (2016)) and global investments in this technology are estimated to be USD
1.5 bn (WEF (2017)). To PoW is a costly way to reach consensus. In February 2018
the energy needed to perform the PoW is similar to total energy consumption of Romania.
a random a participant on the basis of the data in the system , say selected tokens which
are linked to an address. The chosen address may make the next proposal for the further
development of the blockchain. In such a proof-of-stake system, the probability of being
allowed to make the next proposal increases with the tokens of a participant. This
eliminates the need for time-consuming proof-of-work calculations, and participants with
a greater interest in the continued existence of the system (as they have invested in it)
make relatively frequent decisions. However, the implementation of this concept is not
easy, as participants are able to behave strategically and thus increase their inuence in
the system or behave incorrectly. Therefore, most proof-of-stake systems so far use a
combination of proof-of-stake and proof-of-work to solve these manipulation attempts,
but accordingly have the high energy consumption as a disadvantage.
5.3.8.1 Bitcoin
Considers miners which want to do the PoW. All the blocks in the Bitcoin block chain
have a short string of meaningless datacalled a nonce attached to them. The mining
computers are required to search for the right meaningless string such that the block as a
whole satises a certain arbitrary condition. Specically, it is required that the SHA-256
hash of the block have a certain number of leading zeros. A miner selects the message
M of Alice ready for validation and selects a random number k , the nonce, and let all
information running through the hash, i.e. he calculates ](M + k). If this result is larger
than the thresholds T , he chooses a new k and continues until ](M + k) < T . Then the
miner broadcasts k and everybody can easily check that the hash is indeed smaller than
the threshold level. The nonce is a 32-bit data string. Varying the nonce is a trivial
task since 232 amount to around 4 billion possibilities which today can be checked in a
few seconds. Therefore, to increase the complexity the transactions are grouped into a
so-called Merkle tree form. In a Merkle tree data blocks are grouped in pairs and the
hash of each of these blocks is stored in a parent node. The parent nodes are in turn
grouped in pairs and their hashes stored one level up the tree. This continues until the
root node is reached.
The SHA-256 has function is used whose output is 64 digit string. Consider the hash
000000000000004c296e6376db3a241271f43fd3f5de7ba18986e517a243baa7.
5.3. BLOCKCHAIN 503
which was the hash 2013 of a block ready for the miners. It has 16 ciphers, all number
from 0 to 9 and letters from a to f. The hash starts with 16 zeros, the threshold level.
The diculty of the problem is not constant over time, this means the number of zeros
in the header is varying in a non-manipulable way. The diculty is calibrated in such
a way that it is possible to nd a block in about 10 minutes. The SHA hash goes one
way: It has 2256 outputs which one needs to evaluate in order to break the hash or to
calculate the input.
The Bitcoin system is managed dierently from a centralized network. How is the
management organized such that the system can be improved and deciencies can be
corrected when there is no central party with the power to do so? To prevent that a
member of the network changes the network which is not in the interest of the users can
be avoided by either sanctioning such actions or by setting incentives such that for each
member the dominating strategy is not to deviate from the existing rules, i.e. a kind of a
Nash equilibrium. It is this game theoretic concept which is implemented in the Bitcoin
system. To allow for changes, a voting system is used where a predened majority has to
exist before a change is implemented. This democratic rule is very complicated since not
all nodes have the same rights and action spaces (miners have an advantage) but other
user groups also have the possibility to form coalitions which then can try to enforce
their views. In any case, in such a DMLT, no one can be forced to follow any decision.
If part of th e community is not willing to follow a change but they decide to use the
old code, then the system separates into two systems: A forc realized. In a soft forc the
rules for consensus are stricter than in the original chain, i.e. new ledger entries are also
valid under the old system. In a hard fork, the new register entries are not longer valid
under the old rules of the blockchain before the fork happened.
5.3.8.2 Settlement
The process where a buyer and a seller agree to exchange a security (trade execution) and
the date where the trade is settled (assets are exchanged) can be 2 or 3 days depending
on the jurisdiction and the type of asset. A longer period between trade execution and
settlement raises settlement risk - the risk that one leg of the transaction may be com-
pleted but not the other, and counter party risk - one party defaults on its obligation.
Besides the reduction of risk, a decentralized blockchain technology could also reduce the
costs the trade and settlement process.
Trading.
• The investors (buyer and seller) who wish to trade contact their trading member
which place their orders on the exchange.
504 CHAPTER 5. ASSET MANAGEMENT INNOVATION
• The trades are executed in the exchange or any other platform such as a multilateral
trading facility or an organized trading system.
Clearing.
• Clearing members who have access to the clearing house or the central counter
party, which are also trading members, settle the trades.
• Clearing and settlement can be bilateral, i.e. settled by the parties to each contract.
The G20 enforces after the GFC to switch from bilateral to central counter party
(CCP) clearing for the OTC derivatives. A CCP acts as a counterparty for the
two parties in the contract. This simplies the risk management process, as rms
now have a single counterparty to their transactions. Through a process termed
novation, the CCP enters into bilateral contracts with the two counterparties, and
these contract essentially replace what would have been a single contract in the
bilateral clearing case. This also leads to contract standardisation and there is a
general reduction in risk capital required due to multilateral netting of cash and
fungible securities. Therefore, CCP means that the bilateral clearing topology is
transformed into a centralized or star shaped one. From a systemic risk perspective,
while the more risky bilateral connections are replaced by less risky centralized ones
the major risk concentration is now located in the few CCPs.
Settlement.
• The two custodians, who are responsible for safeguarding the assets, exchange the
assets where a typical instruction is 'delivery versus payment': Delivery of the
assets will only occur if the associated payment occurs.
Using a blockchain means to transform the centralized CCP topology back into a decen-
tralized one where there is no need for an CCP. In the trading-clearing-settlement cycle,
a consortium blockchain can be used as follow to satisfy the present standards. On the
trading level, a consortium of brokers can set up a distributed exchange, where each of
them operate a node to validate transactions. The investors still trade through a broker,
but the exchange fees can be drastically reduced. On the clearing level, a consortium of
clearing members can set up a distributed clearing house, thus eliminating the need for a
CCP. Contrary to bilateral clearing, the contract stipulations are administered through
a smart contract which reduces risk management issues. If the securities and money are
digitalized, settlement does not need any custodians with securities depositories but the
assets are part of the permissioned blockchain.
Consider banks (the nodes) which search for a technology to record and enforce
nancial contracts such as cash, derivative or any other type of products. More precisely,
the banks want to record and manage the initiation and the life cycle of nancial contracts
between two or more parties which is grounded in the legal documentation of the contracts
and which is compatible with the existing emerging regulation in an
These requirements lead to the solution Corda. We state the most important changes
compared to the Bitcoin blockchain.
First, there are no miners and there is no proof-of-work since no currency needs to
be generated (mining) and due to the mixed private/public association of information
no general consensus on the ledger is needed. The advantages are avoidance of costly
mining activities, of a deationary currency and of a concentration of the mining capa-
bilities in a few nodes. Second, Bitcoins can only contain a smaller amount of data due
to the xed length data format. This is not useful if one considers all economic, legal and
regulatory information in an interest rate swap between two parties. Corda encodes the
information of arbitrary complex nancial contracts in a contract code - the prosa of the
allowable operations dened in term sheets is encoded. Corda call this code state ob-
jects. Consider a cash payment from bank A to a company C . The state object contains
the legal text describing the issuer, the date, the currency, the recipient etc. and the
codication of the information. This state is then transformed into a true transaction if
the bank digitally signs the transaction and if it veried, that the state object is not used
by another transaction. Hence, there are two type of consensus mechanics. First, one
has to validate the transaction by running the code in the state object to see whether
it is successful and to check all required signatures. This consensus is carried out only
by the parties engaged in the transaction. In other words, the state object is a digital
document which records all information of an agreement between two or more parties.
Second, parties need to be sure that the transaction under consideration is unique. This
consensus which checks the whole existing ledger is done by an independent third party.
Summarizing, the ledger is not globally visible to all nodes. The state objects in the
ledger are immutable in the same way as we described it for blockchains. Given that
not all data is visible to all banks, strong cryptographic hashes are used to identify the
dierent banks and the data.
Why are the leading banks pushing this system? They can all use only one ledger
which makes reconciliation and error xing in today's individual ledgers at topic of the
past. Furthermore, the single ledger does not change the competitive power of the banks
in the ledger. The economic rationale, prot and risks to enter into a swap remain within
506 CHAPTER 5. ASSET MANAGEMENT INNOVATION
UBS and Goldman Sachs but the costs and operational risks of the infrastructure are
reduced due to the collaboration to maintain shared records. In other words, while the
banks keep the prot and loss from their banking transactions unchanged to the present
competitive situation, they reduce the technology cost part by cooperation.
Denition 110. Smart contracts are digital contracts allowing terms contingent on de-
centralized consensus that are self-enforcing and tamper-proof through automated execu-
tion.
Vitalik Buterin wrote 2013 the white-paper. The market capitalization of Etherum
amounted to USD 1 bn in October 2016 and USD 74 bn in December 2017. Ethereum
5.3. BLOCKCHAIN 507
What happens if the software of a smart contract has a fault or the logic of the soft-
ware allows someone to use the software in his favour? This was the case in the so-called
Decentralized Anonymous Organization (DAO) Hack. DAO was a form of investor-
directed venture capital fund. It was the biggest crowdfunding experiment in the world
raising USD 150 millions within 21 days. However, on June 17 2016, a hacker exploited a
security bug on the smart contract and transferred USD 50 millions to his own account.
Th cryptocurrency Ether lost 50% of its value on the same day. Since the hacker did
nothing illegal but was just smarter than those who wrote the smart contract code there
was a priori no reason to consider any actions regarding the validity of the transaction.
But many in the community were invested and hence faced personal losses if one would
not o-set the hacker's transaction.
The rst alternative was to cancel the transaction and restore the money to the DAO
users. The second choice was to do nothing. Then the hacker would keep with USD
50 millions and a lot of people invested in the DAO would loose their investment. The
508 CHAPTER 5. ASSET MANAGEMENT INNOVATION
cancellation of the transaction, leading to a hard fork, would enable all DAO investors
to exchange their tokens at a xed price, as in a currency reform. They only need to
update their software. The old DAO exists on the old Ethereum blockchain, but should
die out without an investor. The token of the hacker would become worthless. But parts
of the community refuse the update. They see a violation of the ideals of Ethereum.
In protest, they stay in the old blockchain and baptize it Ethereum Classic. Instead of
losing value, the DAO wins. The event damaged the reputation of the technology from
a security perspective. In addition, the community damaged its reputation during the
period: responsibilities were not clear, blaming started and it was not possible to nd a
single solution.
16 Store of value means that money must be able to be reliably saved, stored, and retrieved and the
value must remain relatively stable over time. Medium of exchange means that it is used to compare
the values of dissimilar objects, as a standard of deferred payment, that is an accepted way to settle a
debt. Unit of account is a standard numerical monetary unit of measurement of the market value of
goods, services, and other transactions. Divisibility and fungibility are other characteristics of an unit
of account.
5.4. CURRENCIES AND CRYPTO-CURRENCIES 509
Swiss Economy ?
Economy
People's belief in the value of money is fundamental to any currency: It is not possible
to enforce value to a currency if people do not want to accept the currency. Figure 5.39
provides an overview over dierent currencies.
Table 5.3 summarizes some features of at money, money issued in a permissioned
blockchain and money issued in a MDLT.
5.4.3 Bitcoin
First, Bitcoin represents a crypto-currency.
17 This means a unit of a Bitcoin is used
to store and transmits values between individuals who belief in this currency. Second,
Bitcoin represents a communication medium. All individuals using or creating Bitcoins
communicate by the Bitcoin protocol via the internet. The protocol is the code which
contains the set of rules used in the Bitcoin system.
At the time of writing, the number of Bitcoin transactions is around 3000 000 trans-
action per day which is approximatively equal to USD 3 bn at market exchange rates in
November 2017 and the market cap of Bitcoins by the end of 2015 is USD 261 billion
(Source: Blockchain.info). A crypo-currency combines two main components: A new
currency such as Bitcoin and a new decentralized payment system - the blockchain.
17 The text follows Antonopoulos (2015), Jogenfors (2016), Aste (2016), Khan Academy (2016), Boehme
et al. (2015) and Tasca (2016), BIS (2018). For an economic review see Bank of England (2014) and
Boehme et al (2015).
510 CHAPTER 5. ASSET MANAGEMENT INNOVATION
Figure 5.39: Overview of the dierent currencies. Source: Source: Bech and Garrati
(2017).
Bitcoin has value because people belief in it. If people stop to believe in it, as long as
there is no real economic production backing the coin, then the value evaporates. That
belief can be created quickly and vanish also rapidly. We consider the period after the
rst Gulf War in the Kurds region of Iraq. Kurds used in their areas of Iraq the Iraqi
Swiss Dinar.
18 Hence, although a legal tender existed in Iraq, the Saddam dinars, it
became worthless in the Kurd regions. People cannot to be forced to belief in a currency.
So far, we did not compared digital and crypto curencies, see Figure 5.40.
18 'Swiss' because the printing plates were made in Switzerland and stolen.
5.4. CURRENCIES AND CRYPTO-CURRENCIES 511
Digital Crypto
• Every non-physical currency is a digital currency • Subset of digital currencies
• Digital currencies consist of numbers and digits • Features: Privacy, distributed mutual ledger for transaction recording,
cryptography
• 90% of global currency are digital, most of it is a fiat currency
• Etherum, Bitcoin, Litecoin and more than 1’000 other crypto currencies
• Online banking, mobile payment, Paypal, Mint, credit cards are based on digital
currencies • 99% without a regulatory or institutional backing, most are not considered to
be legal tenders
• Digital currency possess a monetary regulatory and institutional setting. They
are accepted as legal tenders • Lost cryptographic key to get access to the cryptocurrency or stolen coins are
not replaced and lost for the economic system forever since there is no 3rd
• Money generation is mostly done by the inside money mechanism and party in the system
secondary by central banks
• Coins are generated by the mutually distributed ledger technology
• Convertible in cash. Payments are not public
• Fully transparent but full anonymous payment system
• There is no anonymity
• 7x24, payments without knowing the recipients, payments only possible in the
• 7x24, worldwide payments, no need to know the recipient peer group which accepts the coin
• Banks and other intermediaries act as 3rd party validators using accounts as • Trust, security and protection, see below
ledgers which are centrally stored and not public. Trust is in this validators.
Errors can be off-set, stolen coins are often replaced and a lost identification • High volatility, i.e. Vol BTCUSD is around 14 xtimes larger than Vol CHF USD
for authentication can be replaced by a new one (lost ID card)
by the beginning of 2020, 1.6 mn have been stolen and 5.01 mn have been lost. More
than 35 percent of all coins have been therefore stolen or lost.Reasons for losing the coins
are loss of private keys, operational risks by sending the coins to the wrong address when
people do not use the QR code but type wrongly the address or even by sending the coin
512 CHAPTER 5. ASSET MANAGEMENT INNOVATION
to the genesis block to exchange them. Bitcoin are not fungible. 1 Bitcoin is not equal
to 1 Bitcoin. Premiums of up to 15% are paid for Bitcoin that is freshly mined. The
reason is that these Bitcoins certainly do not have a harmful history with their owners
and are therefore unproblematic when exchanged for a Fiat currency via stock exchanges
or banks. The all-in cost of mining Bitcoin is about USD 5,600 at the beginning of 2020
at a price of about USD 8,500, which means that the miners' prot is currently almost
USD 2,000 per Bitcoin received if they win the PoW.
The Bank of England (2014) states that volatility of Bitcoins is 17 times larger than
the volatility of the British pound: The use of Bitcoins as a short-term storage medium
is questionable although nothing can be inferred about its value as a long-term storage
medium. The number of transactions of retail clients is used to measure their willingness
to accept Bitcoins as a medium of payment. Since this number is not observable, proxy
variables are used instead such as data from 'My Wallet', see Bank of England (2014).
The analysis shows that the number of transactions per wallet is decreasing since 2012
to a value 0.02 transactions per wallet. Most clients buy-and-hold their Bitcoins instead
of using them. Finally, there is little evidence that Bitcoins are used as units of account
since.
The traditional payment systems are safe, cost-eective and scalable, i.e. they handle
high volumes. Visa, Mastercard and Papal handle between 3'500 and 240 transactions
per second, while for Bitcoin and Ether the number is a around 7-20.
19 Bitcoins are
so far only cheaper to produce than those in centralized system since the miners in the
crypto-currency system receive as a subsidy new currency coins for their proof-of-work
eorts. Given that the production of new Bitcoins is decreasing over the next decades,
that energy production to achieve consensus grows over-proportionally and if the ex-
change value of Bitcoin will not stabilize at a large value compared to the USD, then the
eect of subsidies diminishes leading to increasing costs for Bitcoins issuance. Removing
centralized trust by using a P2P trustless MDLT is costly in several respects. The en-
ergy consumption of the Bitcoin miners equals 2018 total energy consumption of the 20
million nation of Romania. Etherum is also highly energy intensive. It will be of vital
importance whether other consensus than the PoW can be designed and which will be
accepted such that the MDLT will consume much less energy.
The number of hashes drive energy costs. Aste (2016) estimates that to keep a capital
of around USD 10 bn secure in the Bitcoin blockchain annual costs of 10% are needed.
The reason is the number of hashes which are generated every second of 1 bn times 1 bn
fro the PoW. Given the high transaction costs, most users access their cryptocurrency
not directly but via an intermediary such as crypto-wallet providers or crypto exchanges.
That is, the main motivation of Bitcoin of not needing a central third party such as a
central bank end by trusting often unregulated third parties. It is then no surprise that
19 Committee on Payments and Market Infrastructures, Statistics on payment, clearing and settlement
systems in the CPMI countries, December 2017; www.bitinfocharts.com; Digiconomist; Mastercard;
PayPal; Visa; BIS calculations.
5.4. CURRENCIES AND CRYPTO-CURRENCIES 513
fraudulent or hacked institutions such s the Mt Gox leads to thefts and zero-recovery
losses for the users.
Permissioned crypto currency often do not face some of the above problems of Bit-
coins. The World Food Programme's blockchain-based handle payments for food aid
serving Syrian refugees in Jordan. The unit of account is centrally controlled by the
World Food Programme. Vsing a permissioned version of the Ethereum protocol, the
decits of Ethereum were overcome (slow, expensive) and transaction costs are reduced
by 98% also relative to bank-based alternatives.
Scalability is another limitation since the transaction ledger is growing over time.
The Bitcoin ledgers amount 2017 to 170 GB with a growth of 50 GB in 2017. Therefore
a simple Fermi-type calculation shows that the network size needed to replace standard
currency regimes is out of any feasible size. This not only means storage of data but also
the needed processing capacity for transaction verication.
Figure 5.41 shows market capitalization of Bitcoin, Ripple and Etherum, the average
transaction costs, that Bitcoin mining is around the 10 minutes as it should be and the
mining in Etherum takes much less time reecting the proof-of-stake approach. Com-
paring the number of daily Bitcoin - around 1000 000 by the end of 2015 (Coinometrics,
Capgemini) - with the number of daily transactions by Visa (212 mio.), MasterCard (93
mio.) and all other traditional entities together summing up to 340 mio. - the Bitcoin
percentage is 0.03% of this total transaction volume.
The algorithm of the Bitcoin protocol dene the supply side. Therefore supply is
xed and inelastic which is one source of the high price volatility. Since every currency
loses value if it fails be a scarce resource, new Bitcoins are issued in a controlled way.
Bitcoins do not specify a claim on somebody contrary to digital money created by the
creation of loans since each loan creates a deposit position on the loan borrower's bank
account. The demand and supply for Bitcoins has no physical foundation and the total
supply of Bitcoins is limited to the creation of 21 million Bitcoins. Given the rule-based
creation process, this amount will be reached around 2041. With this xed supply side
and its diminishing rate of productions, Bitcoins are a deationary currency. Bitcoins
miners are in some sense the clearing houses which maintain the book-keeping system
and verify the validity of transactions.
Bitcoins do not have a well-dened governance structure as central banks have. The
identity of any participant in the network is for example unveried. This contradicts the
increasing regulatory and legal ghting against money laundering or tax hiding activi-
ties. Prominent in the early days were the uses of Bitcoin in the anonymous Silk Road
platform. The main activity in this platform was trading narcotics. The U.S. investi-
gation estimated that in the period Feb 2011 to Jul 2013 9.9 million Bitcoin payments
were made with an equivalent of USD 214 million. After the demise of Silk Road an
unclear number of successors or competitors are actively using Bitcoin. But the initially
signicant fraction of money inow into the Bitcoin system from criminal activities - the
residual value - signicantly decreased. Tasca (2016) reports that in 2012 the relative
income for black market and online gambling had a share in the Bitcoin income ow
of around 70%. This number collapsed in the last two years to less than 10%. Bitcoin
transaction are contrary to real or electronic payments strictly irreversible. This prop-
erty is due to the desire to keep the Bitcoin system at a manageable level. Changing the
protocol, as we discussed above, follows a complicated game theoretic motivation which
can lead to forks and where dierent types of network members have dierent rights and
5.4. CURRENCIES AND CRYPTO-CURRENCIES 515
From the risk perspective counter party risk of currency exchanges is critical. Ex-
changes active in Bitcoin charge transaction fees between 20 and 200 bps. The number
of such exchanges is modest since the exchanges need an internet infrastructure which
is able to withstand attacks. The rules to set-up an exchange are strict in the U.S. and
also in UK or Germany for example. Prominent is the default of Mt. Gox exchange in
Japan in 2012. They reported that they lost 7540 000 Bitcoins of their customers which
amounts to USD 450 million. The counter party risk of exchanges matters for the clients
since most convert their electronic currencies into Bitcoins and leave the Bitcoin at the
exchange. The exchange acts as a bank. Moore and Christen (2013) estimate that 45
percent of the currency exchanges terminated operations. While large exchanges often
faced security problems, the reasons for the smaller ones are unknown. Therefore, if the
exchange which in fact act as a bank holding Bitcoins accounts of the customers shuts
down counter party risk realized. The loss given default following Moore and Christen
(2013) is 46% - only 54% of the closed exchanges reimburse their customers.
The following statements which are often heard in the FI summarizes the discussion:
But this statement does not mean that a dierent coin based on a blockchain which
is more mature can become an important crypto currency, see the Section about Libra
below.
Figure ?? gives an overview of dierent blockchain consus mechanism. The gure
shows that there is no such thing as 'the' blockchain technology but that there are
many dierent types of technologies. All technology has pros and cons which are to be
considered if a specic application is to be implemented.
We start with some market facts. In 2016 the most active miners are located in China
who cover around 50% of the total market share (Tasca (2016)), followed by Europe with
around 25%. This is also reected in the traded currency pairs. The traded volume
CNY/BTC is about three times larger than the USD/BTC one. This dominance of
Chinese activity can also be observed in the number of active Bitcoin clients normalized
516 CHAPTER 5. ASSET MANAGEMENT INNOVATION
by the number of users which have direct access to the internet: The number in China
is around 5 times larger than the second largest numbers of the US or Russia. Bitcoin
start-ups raised around USD 1 bn in the three years 2012 − 2015 with an annual growth
rate of 150%. This rate dominates other start-up rates such as crowdfunding, lending
or banking in general by factor 2 − 3. If a mining pool gains 51 percent of computing
capacity, they can attack the network by rewriting in principle all blocks and generate a
new blockchain. The pool gash.io in January 9, 2014 possessed 45 percent of the min-
ing power and needed to appeal pool members to exit the pool. Summarizing, mining
industry is an oligopoly where the market share of the ten largest miners is between
70% − 80% by the end of 2015 (Tasca (2016). This raises security concerns since to gain
51% consensus about a block transaction verication becomes more risky the less miners
contribute to the majority value.
A stream of theoretical work focus on a rational analysis of the system. They treat
Bitcoin as a game between competing rational single miners or pools of miners which
maximize a utility function which captures the incentive structure for the system. The
goal is to prove under which condition Bitcoin achieves a stable game theoretic equi-
librium. Overall, the results are rather pessimistic. This means, unless one does not
impose strong conditions attacks on the Bitcoin mining protocol follow leading for exam-
ple to forks on the blockchain. Eyal and Sirer (2013) for example show that the Bitcoin
protocol is not incentive-compatible. They show that an attack from colluding miners
5.4. CURRENCIES AND CRYPTO-CURRENCIES 517
leads to a revenue which exceeds their fair revenue value. They propose a modication
of the protocol which then protects against selsh mining pools. Sompolinsky and Zohar
(2013) analyze the implications of high volune throughput on Bitcoin's security against
double-spend attacks. They show that the strength of the attacks can weaken to reverse
even accepted transactions if volume increases. They propose a reorganization of the
Bitcoin blockchain by new rules which have been implemented by the Ethereum project.
the expected success outlook of a competing mining pool. Lewenberg et al. (2015) ana-
lyze the stability of mining pools. The authors examine the dynamics of pooled mining
and how they should share the rewards when they behave in a cooperative way. Using
cooperative game theory, for particular networks under under high transaction loads the
distribution of the rewards is unstable. This means, some miners have an incentive to
switch between the pools. These ndings are in contrast with the empirical observation
no fork or substantial slowdown that is attributed to rational attacks has been observed
to date.
Given this dierence between theory and observations, Badertscher et al. (2018) ask:
How come Bitcoin is not broken using such an attack? Or, stated dierently, why
does it work and why do majorities not collude to break it?
Why do honest miners keep mining given the plausibility of such attacks?
They use a rational-cryptography framework for capturing the economic forces that
underly the tension between honest miners and deviating miners, and explain how these
forces aect the miners' behavior. They show how expected revenues of the miners in
combination with a high monetary value of Bitcoin, can explain the fact that Bitcoin is
not being attacked in reality even though majority coalitions are in fact possible. Hence,
assumptions about the miners' incentives, which depend solely on costs and rewards for
mining, can substitute the honest-majority assumption.
5.4.5 Libra
Facebook (FB) published in June 2019 the details of the Libra blockchain. Compared
to most small Fintech initiatives, the Libra Association was populated by economic and
technological giants: Mastercard, PayPal, Visa, Ebay, Uber, Lyft, Spotify, Vodafone,
Coinbase among others. Some of them such as Mastercard, PayPal or Visa are nancial
intermediaries whereas Facebook is social network and Vodafone is a telecommunication
rm. Hence, the Libra Association consisted of around 100 rms of dierent sectors. Note
that during 2019 almost all payment rms withdraw due to the strong political pressure
in the US on Libra, see below. The cryptocurrncy coins should have low volatility (stable
coin) relative to some stable at currency to avoid Bitoin-like volatility. Therefore, Libra
is linked to a broad basket of ordinary currencies and low-risk government bonds. The
coins should transfer via Facecook channels to the payment centers PayPal or Visa,
they are traded at Coinbase, stored at Xapo and accepted at Ebay, Uber and Spotify.
Summarizing, see Figure 5.43, Libra is a hybrid structure.
518 CHAPTER 5. ASSET MANAGEMENT INNOVATION
Balance Sheet
Liability
Asset
Equity
Libra Authorized
reserve resellers
BC
Libra End-Users
Financial Sector Interface P2P-Financial P2P
Sector
Figure 5.43: The structure of Libra. The gure shows the interconnected balance sheet
in the economy which is split into the traditional nance sector which backs Libra, the
traders linking the P2P economy of Libra with the blockchain (BC) and the nancial
sector. in the P2P part, end users can act in a P2P way with other end users or with
the traders (Adapted from Müller (2019).
The linkage to the nancial sector via the reserves stabilizes the currency which is a
main advantage to the else highly volatile crypto currencies. But this link also generates
some delicate issues. Libra is linked to the traditional payment system infrastructure
which is old and has to be renewed. It is too costly and too slow in particular if cross
country payments are considered. Given this link, Libra can be seen as a New USD
where the old infrastructure is only partly used but the currency is privately controlled
and mined. In this sense Libra can be seen as a wake-up call for the traditional payment
infrastructure: Either they develop a new system in the near future or the private sector
will simply install such a system.
Therefore, political actors are watching the currency project of Facebook suspiciously.
Some fear that Libra could put systemically important banks under pressure and severely
restrict the monetary policy leeway of states. Above all, nancial politicians in the USA
are critical. Some accuses Libra of endangering national security, posing a risk to cyber
security and torpedoing data protection. The fact that FB is the gurehead of Libra is
proving to be a heavy burden in the political arena. Many politicians are currently wor-
ried about data protection. Facebook has already repeatedly trampled on the protection
of the data of billions of people in the past, which is why Libra can also be expected to
5.4. CURRENCIES AND CRYPTO-CURRENCIES 519
Critical voices can also be heard in Europe. Some predict that Facebook could be-
come a shadow bank that circumvents regulations. The French Finance Minister called
on central banks worldwide to investigate whether the new Facebook currency could be
a possible gateway for money laundering and terrorist nancing. FB announced that it
has a technological solution up its sleeve for the anonymity problem of its blockchain
platform and that identities could be veried. In addition to data protection and system
stability, the governments in Europe and the USA also have concrete interests at stake.
If Libra were to become a success and its reserve policy increasingly abandoned the tra-
ditional currencies, this could considerably reduce the money creation prots of states.
In Switzerland, the Swiss National Bank (SNB) currently distributes one billion francs
a year to the Confederation and the cantons. Since the dollar is still the world's reserve
currency, seigniorage is particularly important in the USA. US President Donald Trump
recently stressed the international supremacy of the dollar and explained that there is
only one right currency in the US.
While Libra is facing this opposition, maybe the real problem for the monetary sys-
tem is in China. China is heavily investing in new payment technologies and while the
western governments focus and stop Libra a Chinese crypto currency can emerge which
cannot be stopped as Libra and which is then likely to challenge the USD as the main
world currency. The Chinese initiative is based on the advanced technological status and
its acceptance by the population.
A further problem is that Libra can be seen as a derivative of the USD. But then
several intricate regulatory questions arise.
Since regulators emphasize 'mass regulation before mass adoption' Libra faces a rough
regulatory process; By Libra's goals it will have systemic importance and thus has to
comply with highest prudential standards. The importance of central governance and
FB data privacy track record add to the concerns.
520 CHAPTER 5. ASSET MANAGEMENT INNOVATION
Who could use Libra? There are almost 2.5 billion Facebook users. This denes an
enormous client potential if Libra would be open to retail clients too. Many of these users
live in places where there is little trust in the traditional nancial and state institutions.
If Libra is stable in value preservation, why should these people not use this system?
Even if the system receives a vast amount of data about the behavior and preferences of
customers?
The Libra code is open source and Facebook creates a own blockchain using its
own programming language Move. The blockchain is not owned by Facebook but by the
association. The association is based in Switzerland and hence Facebook is relieving itself
of its responsibility to governments and regulators. Each member operates validator-
nodes (miners) and the fee to become a member is 10 mn USD. This amount times the
100 starting member denes the 1 bn USD backing of the coin by USD. Facebook is
just one member. With this structure Facebook cannot be accused to control a possible
worldwide crypto-currency and the decentralization simplies it for Facebook that the
currency is used on the Facebook channels which is the goal of Facebook. It is intended
that Libra which at the beginning should be a wholesale cryptocurrency should become
open to anybody.
As in a classic blockchain, miners attach to a read-only database transactions bun-
dled into blocks as part of a consensus process. The blockchain uses the best features
of other structures such as Ethereum, Ripple and IOTA among others. The blockchain
should scale to billions of accounts, require high transaction throughput, low
latency and an ecient storage system for high capacity. Source: LIBRA
White Paper. It is a centralised enterprise, potentially a gigantic systemically relevant
fund manager (100% backup), supporting government debt. Even if Libra could tech-
nically manage blockchain consensus for many miners it will have no self-interest to go
the 100 nodes, as this would dilute RoE. As it is also explicitly stated, LIBRA is just
the starting point: The system should become the basis for future innovations in the -
nancial sector. Note that the programming language Move allows to create digital assets
and smart contracts in general. Transactions are only functions of the current state of
the blockchain and not of historical states. It thus keeps the option open to prune old
transaction data or to enable full nodes to verify transactions even if they do not have
the full history. This would reduce the storage problem dramatically. Ethereum data
base which requires the full history has to date reached 1 and 2 terabytes in size.
The blockchain is transparent and users can store their own keys and verify the
blockchain. Libra is a permission-free, low-cost digital payment method. Libra is chal-
lenging payment service providers not in the association and the issuers or at money.
Chapter 6
Proofs
We prove Proposition 37:
Proof. To prove the proposition the standard no arbitrage argument is used. Assume
F (t, T ) > S(t) · er(T −t) . We set up a portfolio W as follows. We borrow a money amount
S(t) to buy the cheap stock S and go short the more expensive forward for the same
amount. Then W (t) = 0. At T , we pay back the loan, sell the stock to full the forward
contract obligation and settle the forward contract which pays S(T ) − F (t, T ). The cash
balance at T is
Using such a strategy we start with zero value and end at T with certainty with a a
positive value - this is an arbitrage which allows for the construction of a money machine.
A similar argument applies for the other inequality.
We prove Proposition 4:
Proof. The proof follows from the fact that the variance of the sum is equal to the sum
of the variances since there is no covariance:
N N
X 1 1 X Nc
σp2 = var Rj = 2 var Rj ≤ 2
N N N
j=1 j=1
We prove Proposition 5:
Proof. The proof is only slightly more complicated than the former proof, and leads to
the result:
var 1
σp2 = + (1 − )cov .
N N
521
522 CHAPTER 6. PROOFS
By increasing the number N of assets, the average portfolio variance var can be made
arbitrarily small - the portfolio variance is determined by the average covariance. But
the average portfolio covariance approaches a non-zero value.
We prove Proposition 7:
Proof. Dierentiate both sides of the equation f (tu) = tf (u) with respect to t, apply the
chain rule, and choose t = 1. For the converse, let g(t) = f (tu). Since htu, ∇f (tu)i =
f (tu) we have
1 1
g 0 (t) = hu, ∇f (tu)i = f (tu) = g(t) .
t t
Solving this dierential equation for g implies g(t) = g(1)t. This implies f (tu) = g(t)t(f u).
We prove the optimal dynamic investment decision rules of the Merton models 4.13:
Proof. We rst split the integral in two parts for small dt:
Z t0 +dt Z T
J(t0 , w0 ) = max Et0 ,w0 u(t, c, W )dt + u(t, c, W )dt + f (W (T ), T )
c t0 t0 +dt
dWt = g(t, c, W )dt + σ(t, c, W )dBt , W (t0 ) = w0 . (6.1)
Using the Principle of Optimality, the control function in the second integral should be
optimal for the problem beginning at t0 + dt in the state W (t0 + dt) = w0 + dW . Hence,
Z t0 +dt Z T
J(t0 , w0 ) = max Et0 ,w0 u(t, c, W )dt + Et0 +dt,w0 +dW u(t, c, W )dt + f (W (T ), T ) .
c t0 t0 +dt
hR i
T
Optimality implies Et0 +dt,w0 +dW t0 +dt u(t, c, W )dt = J(t0 + dt, w0 + dW ), i.e.
Z t0 +dt
J(t0 , w0 ) = max Et0 ,w0 u(t, c, W )dt + J(t0 + dt, w0 + dW ) . (6.2)
c t0
We next approximate the second value function since dt is small. This also allows us to
assume that the control c is constant over a time interval with length dt. We get:
This looks like a second order expansion in the state variable - but the square of Brownian
motion (dB)2 ) is linear in time (see the part and appendix on continuous time nance),
2 2 2
i.e. (dW ) = (g(t, u, W )dt + σ(t, u, W )dB) = σ dt. The only random component in the
above value function expression is therefore the term ∂w JdW . Since E[dB] = 0, we get
Therefore,
1. Taking formally the derivative w.r.t. to c in the above PDE gives us optimal
decision making c as a function of the unknown value function J.
2. Reinsert this candidate into the fundamental PDE (6.4) solve the resulting J-
equation with the boundary and initial conditions (if any).
3. Use this explicit solution J to obtain the fully specied optimal policy c∗t and the
∗
optimal controlled state dynamics Wt .
Insering J(t, W ) = e−rt V (W ). into the fundamental PDE leads after cancelling of the
exponential function to
ca
2 2
0 = max − rV + ∂w V g + +1/2∂ww V σ .
c,ω a
(6.5)
The wealth dynamics Wt follows from the asset dynamics and the consumption rate.
There is a risky asset with dynamics dS/S = µdt + σdB where the drift and the volatility
are constant and a so-called risk less asset with dynamics dB = Brdt. The growth rate of
wealth is the equal to the weighted sum of the asset growth rates minus the consumption
rate, i.e.
dW/W = ωdS/S + (1 − ω)dB/B − c/W dt .
The weight ω is equal to the number of risky assets times their price S divided by total
wealth. Inserting the asset dynamics in the wealth growth rate equations gives the nal
wealth dynamics:
ca
1
0 = max − rV + (ωµW + (1 − ω)rW − c)∂w V + + (σωW )2 ∂ww
2
V . (6.6)
c,ω a 2
524 CHAPTER 6. PROOFS
Taking the derivative w.r.t. to the two choice variables, setting them to zero gives the
candidate solutions (First Order Conditions):
∗ 1
∗ r−µ 1
c = (∂w V ) a−1 , ω = ∂w V . (6.7)
σ 2 W ∂ww
2 V
This candidate optimal choice solution possess a drawback - they depend on the yet
unknown value function. One has to determine the value function V. To achieve this,
we reinsert the optimal candidate functions into the fundamental PDE. This gives an
equation for the unknown value function V:
1 1−a (r − µ)2 (∂w V )2
V = (∂w V ) a−1 + rW ∂w V − . (6.8)
a 2σ 2 2 V
∂ww
This is a highly non-linear equation but the value function V (w) is proportional to the
expected value of ca . Therefore, a guess is V (W ) = αW a as a candidate solution with
α a constant. Testing this guess in the PDE we see that all terms are proportional to
W a : We can factor out this power function times a complicated function which does
not depend on the state variable W. Since this product has to be zero for all W, the
complicated function has to be zero which gives us a value for the constant α and we
obtained in this way a solution for the unknown value function. To carry this out we
insert this guess into (6.8):
(r − µ)2 a
1 − a a−1
a 1
0=W α α − 1 + ra − .
a 2σ 2 a − 1
| {z }
=:F (α)
1
L = hφ, Cφi + λ1 (1 − he, φi) + λ2 (r − hµ, φi),
2
the rst order conditions
∂L
∂φ1
∂L
∂L ∂φ2
0 = := .
, (6.9)
∂φ .
.
∂L
∂φN
525
0 = Cφ − λ1 e − λ2 µ (6.10)
1 = he, φi (6.11)
r = hµ, φi . (6.12)
φ = λ1 C −1 e + λ2 C −1 µ .
Multiplying this last equation from the left with e and µ, respectively, and using the
normalization condition and the return constraint, we get a linear system for the two
Lagrange multipliers:
1 = λ1 he, C −1 ei + λ2 he, C −1 µi
r = λ1 hµ, C −1 ei + λ2 hµ, C −1 µi . (6.13)
he, C −1 ei he, C −1 µi
y = τ =: Aτ . (6.14)
hµ, C −1 ei hµ, C −1 µi
If A is invertible, we are done since then y = Aτ can be trivially solved for τ. This
∗
determines the Lagrange multipliers λi and inserting this result in φ∗ = λ∗1 C −1 e+λ∗2 C −1 µ
gives us the optimal portfolio and proves the proposition. We prove that within the given
model, the matrix A is invertible, i.e. we claim that det A = ∆ > 0. To prove this we
use the Cauchy-Schwartz inequality, i.e. for two arbitrary vectors x, y we have
where the strict inequality holds if the two vectors are independent. To rewrite the
determinant in the form needed for the Cauchy-Schwartz inequality, we have rst to
dene the vectors x, y . Therefore, we use the decomposition C = U U 0, which always
exists for strictly positive denite, symmetric matrices. Using this, we get
where we used
hx, A0 Axi = hAx, Axi
526 CHAPTER 6. PROOFS
and properties of the matrix inverse. Proceeding in the same form with the other elements
of A and dening hµ, C −1 ei = hy, xi we get
1
λ∗1 = (A−1 y)1 = −hµ, C −1 µi + rhe, C −1 µi
(6.15)
∆
1
λ∗2 −1
−he, C −1 µi + rhe, C −1 ei .
= (A y)2 = (6.16)
∆
Proof. Let φ and ψ be two solutions of the Markowitz portfolio problem. Then they
satisfy linear FOC but then also any convex linear combination aφ + (1 − a)ψ also
satises the FOC. Using the that the sum of weights times the Lagrange multipliers add
up to one, the weight a follows.
Proof. The V aRα for the quantile α and a xed time horizon solve implicitly the in-
equality:
P (X ≤ −Varα ) ≤ α .
If P ∼ N (µ, σ 2 ), the inequality reads
Z −VaRα (x−µ)2
1
√ e− 2σ 2 dx ≤ α .
2πσ −∞
Z − VaRσα −µ
1 1 2
√ e− 2 z dz ≤ α .
2π −∞
The upper limit of the integral depends on α, the mean and variance. Setting the variance
to unity and the mean to zero, then for a given α the critical factor k or the VaR follows.
For α = 0.01, i.e. a VaR of 99% condence, numerically solving
Z kα
1 1 2
√ e− 2 z dz ≤ 0.01
2π −∞
527
kα = −2.33 follows. Increasing the condence interval to 99.9 percent, i.e. α = 0.001,
the critical value becomes kα = −3.09.
Z − VaRσα −µ
1 1 2
√ e− 2 z dz ≤ α .
2π −∞
follows
VaRα −µ
− ≤ kα ,
σ
or the V aRα is under normality equal to
−VaRα ≤ σkα + µ .
−VaRα = σkα + µ .
This is the VaR for a xed time horizon. Calculating for example variance on an annual
basis but the VaR on a weekly basis, the square-root rule implies:
√
−VaRα = σkα T
Proof. We prove the SML relationship and form a portfolio consisting of asset i and the
market portfolio M where we invest the fraction of wealth φ in i and 1−φ in M. The
expected rate of return of this portfolio is
µφ = φµi + (1 − φ)µM
As a function of φ, the pair (σφ , µφ ) traces out a curve in the risk-return space. The
curve cannot cross the CML, this would violate the property that the CML is an ecient
boundary of the feasible region. Hence, as a passes through zero, the curve traced out
by (σφ , µφ ) must be tangent to the CML at M. In other words, the slope of the CML
and of the curve at the point M must be equal, where the point M is where φ = 0.
528 CHAPTER 6. PROOFS
µi − µ0 = βi (µT − µ0 ) . (6.17)
To check this:
dσφ 2
|φ=0 = cov(i, M ) − σM σM .
dφ
Next,
drφ
drφ dφ |φ=0 (µi − µM )σM
|φ=0 = dσφ
= 2 .
dσφ cov(i, M ) − σM
dφ |φ=0
drφ
Since the slope of
σφ |φ=0 should equal the slope of the CML, we have
(µi − µM )σM µM − Rf
2 = .
cov(i, M ) − σM σM
Solving for the expected return of asset i proves the claim.
Proof.
Proof. The proof uses the Separating Hyperplane Theorem. A hyperplane can be written
in the form
hx, ai = d .
Subspaces of Rn are kernels of linear maps F. The Riesz-Fischer theorem implies
F (x) = hx, ai = 0
for the kernel. Since each ane subspace is representable by F (x) = d we dene:
1 1
U = {(x, y) ∈ R2 | x > 0, y ≥ } , V = {(x, y) ∈ R2 | x > 0, y ≥ − } .
x x
The sets are disjoint and convex. But they are not compact and therefore they cannot
be strictly separated.
Figure 6.1: The hyperplane separates the two convex sets A and B in R2 . A set is convex if
any 'line with end and starting point in the set remains fully in the set'.
be the shortest distance between C and K. For C compact and K closed a minimizing
point x0 , y0 exist, i.e.
Let Hx0 be the hyperplane through x0 , which is perpendicular to the line y0 x0 . We write
Hx0 as follows:
Hx0 = {z ∈ Rn |hy0 − x0 , z − x0 i = 0} .
This function is continuously dierentiable and we have φ(λ) ≥ φ(0), ∀λ ∈ [0, 1], since
x0 is closest to y0 . Therefore, φ0 (λ) = −2hy0 − x0 , x − x0 i + 2λhx − x0 , x − x0 i and
φ0 (0) = −2hy0 − x0 , x − x0 i ≥ 0 ,
i.e.
hx0 − y0 , x − x0 i ≤ 0 , ∀x ∈ U,
since C is convex. In the same way one shows that for Hy0 the inequality
hy0 − x0 , y − y0 i ≤ 0 , ∀x ∈ U
it follows that Hx0 separates the sets C, V and the same is true for Hy0 . Therefore, Hz0
separates the sets strictly.
Proof. ⇒. Let ψ be a vector where all components are strictly positive. We claim that
it is a state vector if each attainable payo V = Pφ implies hψ, V i = hS0 , φi (we omit
the time index T ). V = Pφ implies
hP0 ψ, φi = hS0 , φi .
Hence, for each attainable payo V = Pφ the identity hψ, V i = hS0 , φi holds. There-
fore, if all components of V are positive, also hS0 , φi ≥ 0 holds, i.e. arbitrage is not
possible.
⇐. We set
and X
K = {x ∈ RK+1 | xi ≥ 0, xi = 1} .
i
M is an augmented space of payos. It consists of all payos at date T plus the price
of the portfolio −hS0 , φi at time zero. K is a simplex. M is a convex and closed set
and K is compact. Since the compact set lies in the positive orthant the denition of no
531
arbitrage implies that M and K are disjoint. The Separation Theorem then applies:
There exists a vector z ∈ RK+1 such that hz, xi < b < hz, yi for all x ∈ M, y ∈ K .
Since M z ∈ M ⊥ . But
is a linear space, these inequalities can only hold if the vector
then b > 0. Since also hz, yi > b > 0 for y ∈ K , all components of the vector z are
zk
positive. This allows us to dene the state price density as ψk :=
strictly
zK+1 and ψ
solves S0 = P0 ψ . To prove this, recall that z ∈ M⊥ and therefore for each strategy vector
φ∈ RN :
0 = hzK+1 ψ, Pφi − zK+1 hS0 , φi = zK+1 (hP0 ψ, φi − hS0 , φi) .
Therefore, hP0 ψ, φi = hS0 , φi, i.e. P0 ψ = S0 , holds for all strategies φ. This proves the
claim.
Theorem 113. (Riesz ) Let X be a Hilbert space and p : X → R a linear map. There
exists a vector r∗ ∈ X , the Riesz kernel, such that
p(x) = hr, xi
for all x ∈ X .
Proof. We recall some facts from linear algebra and projection geometry rst:
Let M and M 0 be subspaces of Rn . Then M 0 is the complement of M . If M is a linear
of R , we dene the orthogonal complement M :
subspace
n ⊥
M ⊥ = {x ∈ Rn | hx, y, i = 0 , ∀y ∈ M } .
i each vector x ∈ Rn can be written as the sum of two vectors z∈M and z0 ∈ M 0 i.e.
x = z + z0 .
x = y + y0 , y0 ∈ M ⊥ ,
The kernel and the image of a linear map f : Rn → Rm are dened as follow:
ker f := {x ∈ Rn | f (x) = 0} ⊂ Rn
and
imf := {y ∈ Rm | y = f (x), x ∈ Rn } ⊂ Rm .
The dimension formula
dim Rn = dim ker f + dim imf
holds.
dim M = dim ker l + dim iml = dim ker l + 1 = dim ker l + dim(ker l)⊥ .
Since the kernel is a subspace it follows dim(ker l)⊥ = 1. Let e∈M be a basis of (ker l)⊥ .
We decompose the vector y∈M
y = y 0 + λe, y 0 ∈ ker l, λ ∈ R .
and therefore
he, yi
λ= .
he, e, i
For all y∈M we get:
where we used the linearity of l and that y 0 ∈ ker l. But this implies
he, yi l(e)e
l(y) = λl(e) = l(e) = hẽ, yi, ẽ = .
he, e, i he, e, i
This proves, that each linear functional can be represented in the claimed form by a
scalar product. Uniqueness follows by taking two dierent vectors ẽ and ẽ0 and showing
that they indeed have to agree.
Proof. To do.
Proof. To do.
Proof. We prove the direction 'SDF implies the expected return representation'. Take a
0
SDF M = a + b f and consider an asset i with return Ri . The general covariance formula
applied to 1 = E[M R] implies
1 1 1 1
E[Ri ] = −1− cov(M, Ri ) = −1− b0 cov(f, Ri ).
E[M ] E[M ] E[M ] E[M ]
Ri = αi + βi0 f + i
the vector of betas is given by βi0 = Cf−1 cov(f, Ri ) with Cf the factor covariance matrix.
Substituting this expression into the above expected return formula for the asset return
we get
E[Ri ] = κ + Λ0 βi
where
1 1
κ= − 1 ,Λ = bCf .
E[M ] E[M ]
This proves the claim.
To prove the other direction, we assume that E[Ri ] = κ + Λ0 βi holds for some scalar
κ and some vector Λ for each asset i. We search
0
for a, b such that M = a + b f follows.
Since
it suces to have κ= 1
E[M ] −1 and b = −E[M ]Cf−1 Λ. Choosing
1 1
b=− Cf−1 Λ , a = (1 + µ0f Cf−1 Λ)
1+κ 1+κ
the random variable M = a+b0 f is such that for each asset i the equation E[Ri ] = κ+Λ0 βi
holds. Therefore,
1 1
E[Ri ] = −1− b0 cov(f, Ri )
E[M ] E[M ]
holds too and M is a SDF.
The proof is taken from Wikipedia. We prove the bias-variance equation ??:
534 CHAPTER 6. PROOFS
2
Var[X] = E[X 2 ] − E[X]
Rearranging:
2
E[X 2 ] = Var[X] + E[X]
Proof. The following bounds are used over and over in statistical learning theory.
−2 2
P (|Sn − E(Sn )| ≥ ) ≤ 2e Wn (6.20)
with Wn2 = − ai )2 .
P
i (bi
The proof uses a technical lemma and the Cherno bounding method.
Lemma 115. Let X be a random variable with expected value zero and taking values in
the interval [a, b]. For s > 0,
2 (b−a)2 /8
E[esX ] ≤ es .
x − a sb b − x sa
esx ≤ e + e .
b−a b−a
535
a sb b sa
E[esX ] ≤ − e + e = (1 − p + pes(b−a) )e−sp(b−a) .
b−a b−a
E[esX ] ≤ eg(u) .
The function g satises g(0) = g 0 (0) = 0 by taking the derivative the second derivative
00
satises g (u) ≤ 1/4. Taylor's theorem up to second order around zero implies for some
c ∈ [0, u] (the rst two terms in the series are zero):
1 u2 s2 (b − a)2
g(u) = u2 g 00 (c) ≤ = .
2 8 8
Let X be a non-negative random variable and > 0. The inequality of Markov states
E[X]
P [X ≥ ] ≤ .
Hence for s > 0:
E[esX ]
P [X ≥ ] = P [esX ≥ es ] ≤ .
es
The Cherno method means to nd a positive s such that an upper bound on a random
expression is minimized:
P
P (Sn − E[Sn ] ≥ ) ≤ e−s E[es i (Xi −E[Xi ]) ]
Y
= e−s E[es(Xi −E[Xi ]) ]
i
Y
−s
≤ e es(Xi −E[Xi ])
i
−s s i (bi −ai )2 )/8
P
= e e
−22 /Wn2
:= e
by using rst the Markov inequality, then the independence of the random variables,
then the technical lemma and nally by choosing s appropriately. This concludes the
proof for Sn − E[Sn ]. The same bounds hold for E[Sn ] − Sn and hence the proof of the
theorem follows.
0
χRemp (f )−R(f )≥ P (Remp (f ) − Remp (f ) ≤ /2) ≤ P 0 (Remp (f ) − Remp
0
(f ) > /2) .
4var(f ) 1
P 0 (Remp (f ) − Remp
0
(f ) > /2) ≤ 2
≤ 2
n n
since random variables with values in the unit interval have a variance of less than 1/4.
Putting things together we have:
1
χRemp (f )−R(f )≥ (1 − ) ≤ P 0 (Remp (f ) − Remp
0
(f ) > /2) .
n2
Taking expectations w.r.t. the rst sample proves the result.
Therefore, hθ∗ , θ(k) i grows at least linearly and ||θ(k) ||2 increases at most linearly. We
consider the cosine
hθ∗ , θ(k) i kγ
cos(θ∗ , θ(k) ) = (k) 2 ∗ 2
≥√ .
||θ || ||θ || kr2 ||θ∗ ||
By combining the two we can show that the cosine of the angle between θ(k) and θ∗ has
to increase by a nite increment due to each update. Since cosine is bounded, we can
only make a nite number of updates.
Chapter 7
Appendix
AM Firm Description USD bn 52w 2y 3y 5y
Vanguard S&P 500 ETF 224 20.2 14 10.4 15.7
Vanguard 500 Inx 182 19.6 14.1 10.3 15.6
Vanguard TSM Idx, Adm 138 20.2 14 10.4 15.7
iShares:Core Instl Idx, Inst 133 20.2 14 10.4 15.7
Vanguard S&P 500 123 19.5 14 10.1 15.48
Vanguard TSM Idx;Inv 121 19.6 14.1 na na
Vanguard TSM Idx;Inst+ 116 28.1 12.6 6.7 8.1
Vanguard Tot I Stk, Ins 108 19.6 14.1 10.3 15.6
Vanguard TSM Idx, Inst 92 20.2 14 10.4 15.7
Fidelity Instl Indx, InsP 89 30.8 15.5 13.3 16.8
Vanguard Contrafund 88 28.3 12.7 6.8 8.2
Vanguard Tot I Stk, Ins 86 19.6 14.1 10.3 15.6
Vanguard TSM, Idx, ETF 85 14.3 10.5 7.7 10.8
Vanguard Wellington;Adm 85 3.5 2.75 2.3 1.9
American Tot Bd II, INV 84 24.1 15 12.1 16.4
iShares:MSCI Funds Gro, A 81 27.4 9.8 6.1 8.7
Vanguard EAFE ETF 81 3.7 2.9 2.4 2.0
Vanguard Tot BD, Adm 78 20.2 14 10.4 15.7
American 500 Index, ETF 77 12.7 9.8 6.2 9.6
Fidelity Funds Inc, A 72 20.1 14 10.4 15.7
American 500 Idx, Pr 71 15.08 8.5 4.9 7.9
Dodge Funds CIB, A 68 14.9 15.1 9.6 16.4
Vanguard Cox Stock 65 27.5 11.2 7.3 9.3
Dodge FTSE ETF 65 26.5 11.8 4.51 10
537
538 CHAPTER 7. APPENDIX
Largest Custodians
Rank Provider Assets under custody USD bn Reference date
1 BNY Mellon 28,300 Sep 30, 2014
2 J.P. Morgan 21,000 Mar 31, 2014
3 State Street 20,996 Mar 31, 2014
4 Citi 14,700 Mar 31, 2014
5 BNP Paribas 9,447 Jun 30, 2014
6 HSBC Securities Services 6,210 Dec 31, 2013
7 Northern Trust 5,910 Sep 30, 2014
8 Societe Generale 4,915 Sep 30, 2014
9 Brown Brothers Harriman 3,800 Mar 31, 2014
10 UBS AG 3,438 Sep 30, 2014
11 SIX Securities Services 3,247 Dec 31, 2013
12 CACEIS 3,200 Dec 31, 2013
References
1. D. Acemoglu, A. Malekian and A. Ozdaglar, Network Security and Contagion, Journal of
Economic Theory 166, 536-585, 2016.
2. A. Acquisti, C. Taylor, and L. Wagman, The Economics of Privacy, Journal of Economic
Literature 54. 2442-492, 2016.
3. Accenture, Digital Business Era: Stretch Your Boundaries, Accenture Technology Vision
2015, 2015.
4. C. Ackermann, R. McEnally and D. Ravenscraft, The Performance of Hedge Funds: Risk,
Return, and Incentives. Journal of Finance, 833-874, 1999.
5. V. Agarwal, N.D. Daniel and N.Y. Naik, Role of Managerial Incentives and Discretion in
Hedge Fund Performance. The Journal of Finance, 64(5), 2221-2256, 2009.
6. A. Agrawal, J. Horton, N. Lacetera and E. Lyons, Digitization and the Contract Labor
Market: A Research Agenda, in A. Goldfarb, S. Greenstein and C. Tucker, Economics of
Digitization: An Agenda. National Bureau of Economic Research, 2013.
7. h. Albrecher, P. Embrechts, D. Filipovi¢, G. W. Harrison, P. Koch, S. Loisel, P. Vanini
and J. Wagner, Old-Age Provision: Past, present, Future. European actuarial journal,
6(2), 287-306, 2016.
8. G.S. Amin and H.M. Kat, Hedge Fund Performance 1990 - 2000: Do the 'Money Machines'
Really add Value?, Journal of nancial and quantitative analysis, 38(02), 251-274, 2003.
9. M. Andersson, P. Bolton and F. Samama, Hedging Climate Risk, Financial Analysts Jour-
nal, 72(3), pp. 13-32, 2016.
10. R.M. Anderson, S.W. Bianchi and L.R. Goldberg, Determinants of Levered Portfolio Per-
formance, Forthcoming Financial Analysts Journal, UCLA at Berkeley, 2014.
11. R. Anderson and T. Moore, The Economics of Information Security, Science 314, 610-613,
2006.
12. A. Ang, Mean-Variance Investing, Lecture Notes Columbia University, ssrn.com, 2012.
13. A. Ang, Asset Management. A Systematic Approach to Factor Investing, Oxford Univer-
sity Press, 2014.
14. A. Ang, W. Goetzmann, and S. Schaefer, Evaluation of Active Management of the Nor-
wegian GPFG, Norway: Ministry of Finance, 2009. (the Professor's Report)
539
540 CHAPTER 8. REFERENCES
15. A. Ang, S. Gorovyy and G.B. Van Inwegen, Hedge Fund leverage. Journal of Financial
Economics, 102(1), 102-126, 2011.
16. A. Ang, D. Basu, M. D.Gates and V. Karir, Model Portfolios, ssrn.com, 2018.
17. A. M. Antonopoulos, Mastering Bitcoin, O'Reilly Books, New York, 2015.
18. F. Allen and D. Gale, Financial Markets, Intermediaries and Intertemporal Smoothing, J.
Pol. Econom., 105, 523-546, 1997.
19. A. Artzner, F. Delbaen, J.-M. Eber and D. Heaths, Coherent Measures of Risk, Mathe-
matical Finance, 9(3), 203-228, 1999.
20. T. Aste, Blockchain, University College London, Center for Blockchain Technologies,
preprint ssrn.com, 2016.
21. C. S. Asness, Hedge Funds: The (Somewhat Tepid) Defense, AQR, October 24, 2014.
22. C.S. Asness, How Can a Strategy Still Work if Everyone Knows About it? International
Invest Magazine, September, 2015.
23. C.S. Asness and J. Liew, The Great Divide of Market Eciency, Institutional Investor,
March 03, 2014.
24. C.S. Asness, A. Frazzini, R. Israel and T. Mokowitz, Fact, Fiction, and Value Investing,
Forthcoming, Journal of Portfolio Management, Fall 2015, 2015.
25. V. Agarwal, N. D. Daniel, and N. Y. Naik, Do Hedge Funds Manage Their Reported
Returns?, Review of Financial Studies, forthcoming, 2011.
26. V. Agarwal and N.Y. Naik, Multi-Period Performance Persistence Analysis of Hedge Funds,
JFQE, 35(03), 327-342, 2000.
27. F. Allen, J. Barth and G. Yago, Fixing the Housing Market: Financial Innovations for
the Future, Wharton School Publishing-Milken Institute Series on Financial Innovations,
Upper Saddle River, NJ: Pearson Education, 2012.
28. F. Allen and G. Yago, Financing the Futures. Market-Based Innovations for Growth.
Wharton School of Publishing and Milken Institute, 2012.
29. G.O. Aragon and J.S. Martin, A Unique View of Hedge Fund Derivatives Usage: Safeguard
or Speculation? Journal of Financial Economics, 105(2), 436-456, 2012.
30. Assenagon Asset Management, 1. Assenagon Derivatetag am See, 2013.
31. M. Avellaneda and D. Dobi, Structural Slippage of Leveraged ETFs, ssrn.com, 2012.
32. D. Avramov, R. Kosowski, N.Y. Naik and M. Teo, Hedge Funds, Managerial Skill, and
Macroeconomic Variables. Journal of Financial Economics, 99(3), 672-692, 2011.
33. Ph. Bacchetta, C. Tille and E. van Wincoop, Self-Fullling Risk Panics, American Eco-
nomic Review 102, 3674-3700, 2013.
34. K. E. Back, Asset Pricing and Portfolio Choice Theory, Oxford University Press, 2010.
35. Bank of England, The Economics of Digital Currencies, Quarterly Bulleting, Q3, 2014.
36. D. H. Bailey, J. M. Borwein, M. L. de Prado and O. J. Zhux, Pseudo-Mathematics and Fi-
nancial Charlatanism: The Eects of Backtest Overtting on Out-Of-Sample Performance,
Notices of the American Mathematical Society, 61(5), 458-471, 2014.
541
58. A. Börsch-Supan, K. H. Alcser, Health, Aging and Retirement in Europe: First Results
from the Survey of Health, Ageing and Retirement in Europe. Mannheim: Mannheim
Research Institute for the Economics of Aging (MEA), 2005.
59. A. Börsch-Supan, A. Ludwig, and J. Winter, Ageing, Pension Reform and Capital Flows:
A Multi-Country Simulation Model, Economica 73.292, 625-658, 2006.
60. A. Börsch-Supan, M. Brandt, C. Hunkler, T. Kneip, J. Korbmacher, F. Malter and S.
Zuber, Data resource prole: the Survey of Health, Ageing and Retirement in Europe
(SHARE), International journal of epidemiology, dyt088, 2013.
61. C. Badertscher, J. Garay, U. Maurer, D. Tschudi and V. Zikas, But why does it Work?
A Rational Protocol Design Treatment of Bitcoin, In Annual International Conference on
the Theory and Applications of Cryptographic Techniques, Springer, Cham 34-65, 2018.
62. T. Bourgeron, E. Lezmi and T. Roncalli, Robust Asset Allocation for Robo-Advisors,
arXiv, arxiv.org/abs/1902.07449, 2018.
63. M.W. Brandt, Portfolio Choice Problems, Brandt, in Y. Ait-Sahalia and L.P. Hansen
(eds.), Handbook of Financial Econometrics, Volume 1: Tools and Techniques, North
Holland, 269-336, 2010.
64. M. Brenner and Y. Izhakian, Asset Prices and Ambiguity: Empirical Evidance, Stern
School of Business, Finance Working Paper Series, FIN-11-10, 2011.
65. R. Brian, F. Nielsen and D. Steek, Portfolio of Risk Premia: A New Approach to Diver-
sication, MSCI Barra Research Insights, 2009.
66. S. Browne, Reaching Goals by a Deadline: Digital Options and Continuous-Time Active
Portfolio Management, Adv. Appl. Prob. 31, 551-557, 1999.
67. S. J. Byun and B.H. Jeon Momentum Crashes and the 52-Week High, 2018.
68. R. G. Brown, J. Carlyle, I. Grigg and M. Hearn, Corda: An Introduction, squarespace.com,
2016.
69. S.J. Brown, W. Goetzmann, R.G. Ibbotson and S.A. Ross, Survivorship Bias in Perfor-
mance Studies, Review of Financial Studies, 5(4), 553-580, 1992.
70. S.J. Brown, W. Goetzmann and R.G. Ibbotson, Oshore Hedge Funds: Survival and
Performance, 1989-95, Journal of Business, 72(1), 1999.
71. S.J. Brown, W. Goetzmann and J.M. Park, Conditions for Survival: Changing Risk and
the Performance of Hedge Fund Managers and CTAs, ssrn.com, 1999.
72. B. Bruder, N. Gaussel, J.-C. Richard and T. Roncalli, Regularization of Portfolio Alloca-
tion, Lyxor White Paper Series, 10, 2013.
73. J. Bruna, Mathematics of Deep Learning, Courant Institute of Mathematical Science,
NYU, 2018.
74. C. Burges, A tutorial on support vector machines for pattern recognition. Data mining
and knowledge discovery, 2. Jg., Nr. 2, S. 121-167, 1998.
75. A. Corbellini, Elliptic Curve Cryptography: A Gentle Introduction, webpage of A. Cor-
bellini, 2015.
543
76. R.J. Caballero, Macroeconomics after the Crisis: Time to Deal with the Pretense-of-
Knowledge Syndrome, Journal of Economic Perspectives, Volume 24, Number 4, Fall, 85
- 102, 2010.
77. R.J. Caballero and A. Krishnamurthy, Collective risk management in a ight to quality
episode. The Journal of Finance, 63(5), 2195-2230, 2008.
78. C. Camerer, G. Loewenstein, and D. Prelec. Neuroeconomics: How Neuroscience can
Inform Economics. Journal of economic Literature: 9-64, 2005.
79. J.Y. Campbell and L. M. Viceira, Strategic Asset Allocation: Portfolio Choice for Long-
Term Investors, books.gooble.com; 2002.
80. C. Cao, Y. Chen, B. Liang and A.W. Lo, Can Hedge Funds Time Market Liquidity?,
Journal of Financial Economics, 109(2), 493-516, 2013.
81. M.M. Carhart, On Persistence in Mutual Fund Performance, The Journal of nance, 52(1),
57-82, 1997.
82. Z. Cazalet and T. Roncalli, Style Analysis and Mutual Fund Performance Measurement
Revisited, Lyxor Research Paper, 2014.
83. Y. Chen, Timing Ability in the Focus Market of Hedge Funds, Journal of Investment
Management, 5(2), 66, 2007.
84. Y. Chen, Derivatives Use and Risk Taking: Evidence from the Hedge Fund industry,
Journal of Financial and Quantitative Analysis, 46(04), 1073-1106, 2011.
85. CEM Benchmarking, CEM Toronto, 2014.
86. N. Chatsanga and A.J. Parkes, International portfolio optimisation with integrated cur-
rency overlay costs and constraints. Expert Systems with Applications, 83, 333-349, 2017.
87. P.Cheridito and E. Kromer, Reward-Risk Ratios, Journal of Investment Strategies 3(1),
1-16, 2013.
88. T. Chordia, A. Goyal and A. Saretto, p-hacking: Evidence from Two Million Trading
Strategies, University of Lausanne, preprint, 2017.
89. Y. Choueifaty and Y- Coignard, Toward Maximum Diversication. Journal of Portfolio
Management, 35(1), 40, 2008.
90. M.M. Christensen, On the History of the Growth Optimal Portfolio, University Southern
Denmark, Preprint, 2005.
91. J. Cochrane, Asset Pricing, Princeton University Press, 2005.
92. J. Cochrane, The Dog That Did Not Bark: A Defense of Return Predictability, Review of
Financial Studies 21 (4): 1533 - 75, 2077.
93. J. Cochrane, Discount Rates, Presidential Address AFA 2010, Journal of Finance, Vol
LXVI, 4, August, 2011.
94. P. Cocoma, M. Czasonis, M. Kritzman and D. Turkington, Facts about Factors. The
Journal of Portfolio Management, 43(5), 55-65, 2017.
95. N. Cuche-Curti, O. Sigrist and F. Boucard, Blockchain: An Introduction, Research and
Policy Notes, Swiss National Bank, 2016.
544 CHAPTER 8. REFERENCES
96. J. Cui, F. De Jong and E. Ponds, Intergenerational Risk Sharing within Funded Pension
Schemes. Journal of Pension Economics and Finance 10.01, 1-29, 2011.
97. C. Culp and J. Cochrane, Equilibrium Asset Pricing and Discount Factors: Overview and
Implications for Derivatives Valuation and Risk Management, Modern Risk Management:
A History. Peter Field, ed. London: Risk Books, 2003.
98. T. Dangl, O. Randl and J. Zechner, Risk Control in Asset Management: Motives and
Concepts, K. Glau et al. (eds), Innovation in Quantitative Risk Management, Springer
Proceedings in Mathematics and Statistics 99, 239-266, 2015.
99. V. DeMiguel, V. Galappi and R. Uppal, Optimal Versus Naive Diversication: How In-
ecient is the 1/n Portfolio Strategy?, Review of Financial Studies, 22(5), 1915-1953,
2009.
100. V. DeMiguel, Y. Plyakha, R. Uppal, G. Vilkov, Improving Portfolio Selection using Option-
Implied Volatility and Skewness, Forthcoming in Journal of Financial and Quantitative
Analysis, 2010.
101. G. De Nard, O. Ledoit, and M. Wolf, Factor Models for Portfolio Selection in Large
Dimensions: The Good, the Better and the Ugly, Working Paper No. 290, 2018.
102. M. L. de Prado, Building Diversied Portios that Outperfom out-of-sample, ssrn.com,
May, 2016.
103. L. Deville, Exchange Traded Funds: History, Trading, and Research, Handbook of Finan-
cial Engineering, Zopounidis, Doumpos and Pardalos (eds)., 67-99, 2007.
104. K. Daniel and T. Moskowitz, Momentum Crashes, The Q-Group: Fall Seminar, 2012.
105. K. Daniel and S. Titman, Evidence on the Characteristic of Cross Sectional Variation in
Stock Returns, Journal of Finance 55 (1), 380-406, 1997.
106. Deutsche Bank, Equity Risk Premia, Deutsche Bank London, February, 2015.
107. Deutsche Bank, A New Asset Allocation Paradigm, Deutsche Bank London, July, 2012.
108. F.X. Diebold, A. Hickman, A. Inoue, and T. Schuermann, Converting 1-Day Volatility to
h-Day Volatility: Scaling by Root-h is Worse than You Think, Risk, 11, 104-107, 1998.
109. D. Dobi and M. Avellaneda, Structural Slippage of Leveraged ETFs, Preprint NYU, 2012.
110. J. Dow and S. R. d. C.Werlang, Uncertainty Aversion, Risk Aversion, and the Optimal
Choice of Portfolio, Econometrica, Vol. 60, No. 1, 197 - 204, 1992.
111. M. Dudler, B. Gmür and S. Malamud, Risk-Adjusted Time Series Momentum, Working
Paper, 2014.
112. S. Duivestein, M. van Doorn, T. van manen, J. Bloem and E. van Ommeren, Design to
Disrupt, Blockchain: Cryptoplatform for a Frictionless Economy, SogetiLabs, 2016.
113. E. Van Duuren, A. Plantinga and B. Scholtens, ESG integration and the investment man-
agement process: Fundamental investing reinvented. Journal of Business Ethics, 138(3),
525-533, 2016.
114. F.R. Edwards and M.O. Caglayan, Hedge Fund Performance and manager skill, ssrn.com,
2011.
115. EFAMA, European Fund and Asset Management Association, Annual Figure 2013, 2014.
545
116. EFAMA, European Fund and Asset Management Association, Annual Figure 2017, 2018.
117. D. Ellsberg, Risk, Ambiguity, and the Savage Axioms, Quarterly Journal of Economics,
75, 643-669, 1961.
118. E.J. Elton and M. J. Gruber, Risk Reduction and Portfolio Size: An Analytical Solution,
Journal of Business: 415-437, 1977.
119. Ernst & Young, What's new? Innovation for Asset Management, 2012 Survey, 2012.
120. Ethereum, www.ethereum.org, 2016.
121. ETF Sta, A Short Course in Currency Overlay. etf.com, April, 1999.
122. I. Eyal and E. G. Sirer, Majority is not Enough: Bitcoin Mining is Vulnerable, International
Conference on Financial Cryptography and Data Security. Springer Berlin Heidelberg,
2014.
123. F. Fabozzi, R. J. Shiller, and R. Tunaru, Hedging Real-Estate Risk, working paper 09-12,
Yale International Center for Finance, 2009.
124. M. Faber, A Quantitative Approach to Tactical Asset Allocation. Journal of Wealth
Management 9 (4), 69 - 79, 2007.
125. E.F. Fama, The Behavior of Stock Market Prices, Journal of Business, 38, 34-101, 1965.
126. E.F. Fama, Ecient Capital Markets: A Review of Theory and Empirical Work, Journal
of Finance 25, 383 - 417, 1970.
127. E.F. Fama, Ecient Markets: II, Journal of Finance, 46(5), 1575-1618, 1991.
128. E. F. Fama and J. D. MacBeth, Risk, Return, and Equilibrium: Empirical Tests, Journal
of political economy, 81(3), 607-636, 1973.
129. E.F. Fama and K. R. French, Permanent and Temporary Components of Stock Prices,
Journal of Political Economy 96: (2): 246 - 67. 1988.
130. E.F. Fama and K.R. French, Disagreement, Tastes, and Asset Prices, Journal of Financial
Economics 83 (3), 667-89, 2007.
131. E.F. Fama and K.R. French, A Five-Factor Asset Pricing Model, Journal of Financial
Economics, 116, 1-22, 2015.
132. B. Fastrich, S. Paterlini and P. Winker, Constructing Optimal Sparse Portfolios Using
Regularization Methods, ssrn.com, 2013.
133. J. D. Fisher, D.M. Geltner, and R.B. Webb, Value indices of commercial real estate: a com-
parison of index construction methods. The journal of real estate nance and economics,
9(2), 137-164, 1994
134. T. Fletcher, Machine Learning for Financial Market Prediction, PhD Thesis University
College London, 2012.
135. A. Frazzini and L. H. Pedersen, Betting Against Beta, Journal of Financial Economics
111.1, 1-25, 2014.
136. G. Frahm and C. Memmel, Dominating estimators for minimum-variance portfolios. Jour-
nal of Econometrics, 159(2), 289-302, 2010.
546 CHAPTER 8. REFERENCES
137. P. Franco, Understanding Bitcoin: Cryptography, Engineering and Economics. John Wi-
ley& Sons, 2014.
138. J. Freire, Massive Data Analysis: Course Overview, NYU School of Engineering, 2015.
139. C.B. Frey und M.A. Osborne, The Future of Employment: How Susceptible are Jobs to
Computerisation?, Oxford, September, 2013.
140. W. Fung, D.A. Hsieh, N.Y. Naik and R. Ramadorai, Hedge Funds: Performance, Risk,
and Capital Formation, The Journal of Finance, 63(4), 1777-1803, 2008.
141. W. Fung and D.A. Hsieh, Empirical Characteristics of Dynamic Trading Strategies: The
Case of Hedge Funds, Review of nancial studies, 10(2), 275-302, 1997.
142. W. Gale and R. Levine, Financial Literacy: What Works? How could it be more Eective,
Financial Security Project, Boston College, 2011.
143. J. Gatheral, Random Matrix Theory and Covariance Estimation, New York, October 3,
2008.
144. M. Gao and J. Huang, Capitalizing on Capitol Hill: Informed Trading by Hedge Fund
Managers, In Fifth Singapore International Conference on Finance, 2011.
145. D.M. Geltner, N. G. Miller, J. Clayton, and P. Eichholtz, Commercial real estate analysis
and investments (Vol. 1, p. 642). Cincinnati, OH: South-western, 2001.
146. C. R. Genovese, A Tutorial on False Discovery Control, Carnegie Mellon University, 2004.
147. D.M. Geltner and J. Fisher, Pricing and Index Considerations in Commercial Real Estate
Derivatives Journal of Portfolio Management Special Issue: Real Estate, 1 - 21, 2007.
148. E. Gerbl,Robo-Advisors. Kampf um das grosse Geld, Bilanz, 22.10.2019.
149. M. Getmansky, B. Liang, C. Schwarz and R. Wermers, Share Restrictions and Investor
Flows in the Hedge Fund Industry, Working Paper, University of Massachusetts, Amherst,
2015.
150. M. Getmansky, M.P. Lee, and A. Lo, Hedge Funds: A Dynamic Industry In Transition,
NBER, 2015.
151. G. Gigerenzer and G.Goldstein, Reasoning the Fast and Frugal Way: Models of Bounded
Rationality, in Heuristics: The Foundations of Adaptive Behavior, eds Gigerenzer G.,
Hertwig R., Pachur T., editors. (New York: Oxford University Press; ), 31-57, 2011.
152. C. Gini, Measurement of Inequality of Incomes, The Economic Journal: 124-126, 1921.
153. P. W. Glimcher, and E. Fehr, eds. Neuroeconomics: Decision making and the brain.
Academic Press, 2013.
154. Global Sustainable Investment Alliance, 2016 Global Sustainable Investment Review, GSIA
Report, March, 2017.
155. W.N. Goetzmann, J.E. Ingersoll and S.A. Ross, High-water Marks and Hedge Fund Man-
agement Contracts, Journal of Finance 58, 1685 - 1717, 2003.
156. W.N. Goetzmann and A. Kumar, Equity Portfolio Diversication, Review of Finance, Vol.
12, No. 3, 433 - 463, 2008.
157. W.N. Goetzmann and K. Rouwenhorst, The History of Financial Innovation, Carbon Fi-
nance Spearker Series at Yale, 2007.
547
178. S. Hayley, Diversication Returns, Rebalancing Returns and Volatility Pumping, City
University London, 2015.
179. J. M. Grin, Are the Fama and French Factors Global or Country Specic?, Review of
Financial Studies, 15(3), 783-803, 2002.
180. S. Gu, B. Kelly and D. Xio, Empirical Asset Pricing via Machine Learning, Booth School
of Business University of Chicago, July 21, 2018.
181. E. Hazan, Theoretical Machine Learning, Princeton University, 2017.
182. G. He and R. Litterman, The Intuition Behind Black-Litterman Model Portfolios, Goldman
Sachs Asset Management Working paper, 1999.
183. R.D. Henriksson and R.C. Merton, On Market Timing and Investment Performance. II.
Statistical Procedures for Evaluating Forecasting Skills, Journal of business, 513-533, 1981.
184. O.C. Herndahl, Concentration in the Steel Industry, Diss. Columbia University, 1950.
185. U. Herold, Portfolio Construction with Qualitative Forecasts, Journal of Portfolio Man-
agement, Fall 2003, 61-72, 2003.
186. E. Hjalmarsson, Portfolio Diversication Across Characteristics, The Journal of Investing,
Vol. 20, No. 4, 2011.
187. S. Holden and J. VanDerhei, 401 (k) Plan Asset Allocation, Account Balances, and Loan
Activity in 2003, Investment Company Institute, Perspective, Vol. 6, No. 1., 2004.
188. K. Hou, C. Xue, and L. Zhang. Replicating Anomalies. No. w23394. National Bureau of
Economic Research, 2017.
189. H. Hong and M. Kacperczyk, The Price of Sin: The Eects of Social Norms on Markets,
Journal of Financial Economics, 93(1), 15-36, 2009.
190. G. Huberman and Z. Wang, Arbitrage Pricing Theory, Federal Reserve Bank of New York
Sta Reports, Sta Report no.216, 2005.
191. J. Huij and M. Verbeek, On The Use of Multifactor Models to Evaluate Mutual Fund
Performance, Financial Management, 38(1), 75-102, 2009.
192. M. Hulbert, The Prescient are Few, New York Times, July 13, 2008.
193. R.G. Ibbotson, P. Chen and K.X. Zhu, The ABCs of Hedge Funds: Alphas, Betas, and
Costs, Financial Analysts Journal, 67(1), 15-25, 2011.
194. T. Idzorek, A Step-By-Step guide to the Black-Litterman Model, Incorporating User-
Specied Condence Levels, Working paper, 2005.
195. T. Idzorek, and M. Kowara, Factor-Based Asset Allocation vs. Asset-Class-Based Asset
Allocation, Financial Analysts Journal, Vol. 69 (3), 2013.
196. A. Ilmanen, Expected Returns: An Investor's Guide to Harvesting Market Rewards, Wiley
Finance, 2011.
197. A. Ilmanen and J. Kizer, The Death of Diversication Has Been Greatly Exaggerated, The
Journal of Portfolio Management, Vol. 38, No. 3, 2012.
198. Investment Company Institute, Prole of Mutual Fund Shareholders, 2014, ICI Research
Report, 2014.
549
220. W. Kinlaw, M. Kritzman, and D. Turkington, The Divergence of High- and Low-Frequency
Estimation: Causes and Consequences, The Journal of Portfolio Management. Special
40th Anniversary Issue. 2014.
221. W. Kinlaw, M. Kritzman, and D. Turkington, The Divergence of High- and Low-Frequency
Estimation: Implications for Performance Measurement, The Journal of Portfolio Man-
agement, 2015.
222. F. Knight, Risk, Uncertainty, and Prot, New York: Houghton Miin, 1921.
223. M.P. Kritzman, Puzzles of Finance: Six Practical Problems and Their Remarkable Solu-
tions, John Wiley, New York, NY, 2000.
224. R. Kunz, Asset Management, DAS in Banking and Finance, SFI, 2014.
225. Y.K. Kwok, Lecture Notes, University of Hong Kong, 2010.
226. C.H. Lanter, Institutional Portfolio Management, Swiss Finance Institute, Asset Manage-
ment Program, 2015.
227. B. Lawler, B. Mossmann, P. Nolan, and A. Ang, Factors and Advisor Portfolios, preprint
SSRN, July 15, 2019.
228. O. Ledoit and M. Wolf, Improved Estimation of the Covariance Matrix of Stock Returns
with an Application to Portfolio Selection, Journal of Empirical Finance, 10(5), 603-621,
2003.
229. O. Ledoit and M. Wolf, The Power of (Non-)Linear Shrinking: A Review and Guide to
Covariance Matrix Estimation, Working Paper University of Zurich No. 323, 2019.
230. W. Lee, Advanced Theory and Methodology of Tactical Asset Allocation, Duke University,
2000.
231. W. Lee and D.Y. Lam, Implementing Optimal Risk Budgeting, The Journal of Portfolio
Management, 28, 1, 73-80, 2001.
232. O. Ledoit and M. Wolf, A well-conditioned estimator for large-dimensional covariance
matrices. Journal of multivariate analysis, 88(2), 365-411, 2003.
233. O. Ledoit and M. Wolf, Nonlinear Shrinkage of the Covariance Matrix for Portfolio Selec-
tion: Markowitz Meets Goldilocks, Revue of Financial Studies, vol 30, 2018.
234. B. Lehmann and D.M. Modes, Mutual Fund Performance Evaluation: a Comparison of
Benchmarks and Benchmarks' Comparisons, Journal of Finance, 233 - 265 June, 1987.
235. M. Leippold, Resampling and Robust Portfolio Optimization, Lecture Notes University of
Zurich, 2010.
236. M. Leippold, Asset Management, Lecture Notes University of Zurich, 2011.
237. M. Leippold and R. Rüegg, Fifty Shades of Active and Index Alpha, ssrn.com, 2018.
238. M. Leippold and R. Rüegg, FamaFrench factor timing: The long-only integrated ap-
proach, University of Zurich, June 29, 2019.
239. E. Levina and R. Vershynin, Partial Estimation of Covariance Matrices, Probability theory
and related elds, 153(3-4), 405-419, 2012.
240. S. F. LeRoy and J. Werner, Principles of Financial Economics, Lecture Notes, UC Santa
Barbara and U Minnesota, 2000.
551
241. J. Lewellen, S. Nagel and J. Shanken, A Sceptical Appraisal of Asset Pricing Tests, Journal
of Financial Economics 96, 175-194, 2010.
242. Y. Lewenberg, Y. Bachrach, Y. Sompolinsky, A. Zohar and J. Rosenschein, Bitcoin Mining
Pools: A Cooperative Game Theoretic Analysis, Proceedings of the 2015 International
Conference on Autonomous Agents and Multiagent Systems. International Foundation for
Autonomous Agents and Multiagent Systems, 2015.
243. H. Li, X. Zhang and R. Zhao, Investing in Talents: Manager Characteristics and Hedge
Fund Performance, Journal of Financial and Quantitative Analysis, 46(01), 59-82, 2011.
244. B. Liang, Hedge Funds: The Living and the Dead. Journal of Financial and Quantitative
Analysis, 35(03), 309-326, 2000.
245. C.-Y. Lin, Big Data Analytics, Lecture Notes, University of Columbia, 2015.
246. A. Lo, Data-Snooping Biases in Financial analysis. AIMR Conference Proceedings. Vol.
1994. No. 9. Association for Investment Management and Research, 1994.
247. A. Lo, The Statistics of Sharpe Ratios, Financial Analysts Journal, (58)4, 2002.
248. A. Lo, Ecient Markets Hypothesis, The New Palgrave: A Dictionary of Economics, L.
Blume, S. Durlauf, eds., 2nd Edition, Palgrave Macmillan Ltd., 2007.
249. D. Luenberger, Projection Pricing, Stanford University, researchgate.net, 2014.
250. F. Maccheroni, M. Marinacci and D. Runo, Alpha as Ambiguity: Robust Mean-Variance
Portfolio Analysis, Econometrica. Volume 81, Issue 3, pages 1075 - 1113, May, 2013.
251. G. Magnus, The Age of Ageing: Global Demographics, Destinies, and Coping Mechanisms,
First webcast: The Conference Board, 2013.
252. D. Mahringer, W. Pohl and P. Vanini, Structured Products: Performance, Costs and
Investments, SFI White Papers, 2015.
253. S. Maillard, T. Roncalli and J. Teiletche, On the Properties of Equally-Weighted Risk
Contributions Portfolios, ssrn.com 1271972, 2008.
254. B.G. Malkiel, The Ecient Market Hypothesis and Its Critics, Journal of economic per-
spectives, 59-82, 2003.
255. B.G. Malkiel and A. Saha, Hedge Funds: Risk and Return, Financial analyst journal,
61(6), 80-88, 2005.
256. L. Martellini and V. Milhau, Factor Investing: A Welfare-Improving New Investment
Paradigm or Yet Another Marketing Fad? EDHEC-Risk Institute Publication, July, 2015.
257. W. Marty, Portfolio Analytics. An Introduction to Return and Risk Measurement, Springer
Texts in Business and Economics (2nd edition), Springer Berlin, 2015.
258. J. F. May, World Population Policies: Their Origin, Evolution, and Impact, Canadian
Studies in Population 39, No. 1 - 2 (Spring/Summer 2012):125 - 34, Dordrecht: Springer,
2012.
259. McKinsey& Company, Looking Ahead in Turbulent Times - Strategic Imperatives for
Asset Managers Going Forward, SFI Asset Management Education, R. Matthias, 2015.
260. McKinsex&Company, State of the Industry 2014/15 - a Perspective on Global Asset Man-
agement, SFI Asset Management Education, R. Matthias, 2015.
552 CHAPTER 8. REFERENCES
261. A. Mehtaa, M. Bukov, C.-H. Wang, A.G.R. Daya, C. Richardson, C.K. Fisher and D.
J. Schwab, A high-bias, low-variance Introduction to Machine Learning for Physicists,
Physics Reports, March, 2019.
262. Melbourne Mercer Global Pension Index, Report, 2015.
263. The Memo, Looking for a UK business loan? Amazone might be the answer, 2015.
264. The Millennial Disruption Index, Viacom Media Networks, 2013.
265. MIT, Applied Macro- and International Economics II, Spring 2016, MIT OpenCourseWare,
2016.
266. E. Moritz, The Big Four - werden Amazon, Google, Apple und Facebook die besseren
Banken?, Finance News, 2016.
267. R. C. Merton, Lifetime Portfolio Selection under Uncertainty: the Continuous-Time Case,
The Review of Economics and Statistics 51 (3): 247 - 257, 1969.
268. R. C. Merton, Optimum consumption and portfolio rules in a continuous-time model,
Journal of Economic Theory 3 (4): 373 - 413, 1971.
269. R. C. Merton, An Intertemporal Capital Asset Pricing Model, Econometrica: Journal of
the Econometric Society, 867-887, 1973.
270. R. C. Merton, On the Pricing of Corporate Debt: The Risk Structure of Interest Rates,
Journal of Finance, 29:449-470, 1974.
271. A. Meucci, Black - Litterman Approach, Encyclopedia of Quantitative Finance, Wiley
Finance, 2010.
272. A. Meucci, Fully Flexible Views: Theory and Practice, ssrn.com library, 2010b.
273. The Millennial Disruption Index, Viacom Media Networks, 2013.
274. P. Milnes, The Top 50 Hedge Funds in the World, hedgethink.com, 2014.
275. T. J. Moskowitz, Y.H. Ooi, and L. H. Pedersen, Time series momentum, Journal of Fi-
nancial Economics 104.2, 228-250, 2012.
276. J. Müller, Steht uns die Libralisierung der globalen Währungsordnung bevor? Presentation
SFIRT, November, Zurich, 2019.
277. A. H. Munnell, M.S. Rutledge and A. Webb, Are Retirees Falling Short? Reconciling the
Conicting Evidence, Reconciling the Conicting Evidence (November 2014). CRR WP
16, 2014.
278. A.H. Munnell and M. Soto, State and Local Pensions are Dierent from Private Plans,
Center for Retirement Research at Boston College, Number 1, November, 2007.
279. S. Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System, 2008.
280. NHGRI Genome Sequencing Program (GSP), www.genome.gov/sequencingcostsdata. 2017
281. S.V. Nieuwerburgh and R.S.J. Kojen, Financial Economics, Return Predictability, and
Market Eciency, University of Tilburg, Preprint, 2007.
282. R. Novy-Marx and J. D. Rauh, Policy Options for State Pension Systems and their Impact
on Plan Liabilities, Journal of Pension Economics and Finance 10.02: 173-194, 2011.
283. OECD Science, Technology and Industry Scoreboard: Innovation for Growth, Paris, 2013.
553
284. S. Pafka and I. Kondor, Estimated Correlation Matrices and Portfolio Optimization, Phys-
ica A, 343, 623-634, 2004.
285. S. Pal and T.K.L. Won, Energy, Entropy, and Arbitrage, arXiv preprint arXiv:1308.5376,
2013.
286. A. Patton, T. Ramadorai and M. Streateld. Change You Can Believe In? Hedge Fund
Data Revisions. Journal of Finance, 2013.
287. L. Pastor, R. F. Stambaugh and L. A. Taylor, Scale and Skill in Active Management,
Journal of Financial Economics, 2014
288. L. Pastor, and R. F. Stambaugh, Comparing Asset Pricing Models: An Investment Per-
spective, Journal of Financial Economics, 56, 335-381, 2000.
289. A. F. Perold and W. F. Sharpe, Dynamic Strategies for Asset Allocation, Financial Analyst
Journal, Jan, 16-27, 1988.
290. L. H. Pedersen, Sharpening the arithmetic of active management. Financial Analysts
Journal, 74(1), 21-36, 2018.
291. G. W. Peters, E. Panayi and A. Chapelle, Trends in Crypto-Currencies and Blockchain
Technologies: A Monetary Theory and Regulation Perspective, arXiv preprint, 2015.
292. S. Perrin and T. Roncalli, Machine Learning Optimization Algorithms and Portfolio Allo-
cation, preprint, ssrn.com, 2019.
293. E. Podkaminer, Risk Factors as Building Blocks for Portfolio Diversication: The Chem-
istry of Asset Allocation, Investment Risk and Performance, CFA Institute, 2013.
294. PriceWaterhouseCoupers, Asset Management 2020, A Brave New World, assetmanage-
ment, 2014
295. PriceWaterhouseCoupers, Asset & Wealth Management Revolution: Embracing Exponen-
tial Change, 2018.
296. E. Quian, A Mathematical and Empirical Analysis of Rebalancing Alpha, www.ssrn.com,
2014.
297. N. Rab and R. Warnung, Scaling Portfolio Volatility and Calculating Risk Contributions
in the Presence of Serial Cross-Correlations, arxiv.q-n.RM, preprint, 2011.
298. M. Rabin, Risk Aversion and Expected-Utility Theory: A Calibration Theorem, Econo-
metrica 68.5, 1281-1292, 2000.
299. T. Ramadorai, Capacity Constraints, Investor Information, and Hedge Fund Returns,
Journal of Financial Economics, 107(2), 401-416, 2013.
300. S. Ramaswamy, Market Structures and Systemic Risks of Exchange-Traded funds, BIS,
2011.
301. S.C. Rambaud, J.G. Perez, M.A. Granero and J.E. Segovia, Markowitz Model with Eu-
clidian Vector Spaces, European Journal of Operational Research, 196, 1245-1248, 2009.
302. R. Rebonato and A. Denev, Portfolio Management under Stress: A Baysian Net Approach
to Coherent Asset Allocation, Cambridge University Press, Cambridge, 2013.
303. L. M. Rotando and E.O. Thorp, The Kelly Criterion and the Stock Market, The American
Mathematical Monthly, December, 1992.
554 CHAPTER 8. REFERENCES
304. J. Rifkin, The Zero Marginal Cost Society: The Internet of Things, the Collaborative
Commons, and the Eclipse of Capitalism, Palgrave Macmillan Trade, 2014.
305. C. O. Roche, Understanding Modern Portfolio Construction, ssrn.com working paper,
2016.
306. P. Rohner, Seminar Asset Management, University of Zurich, 2014.
307. R. Roll, A Critique of the Asset Pricing Theory's Tests, Journal of Financial Economics
4: 129 - 176, 1977.
308. T. Roncalli, Introduction to Risk Parity and Budgeting, Chapman & Hall, Financial Math-
ematics Series, 2014.
309. T. Roncalli, How Machine Learning Can Improve Portfolio Allocation of Robo-Advisors,
swissQuant Conference, 2018.
310. S.A. Ross, The Arbitrage Theory of Capital asset Pricing, Journal of Economic Theory
13, 341 - 60, 1976.
311. S. Satchell and A. Scowcroft, A Demystication of the Black-Litterman Model: Managing
Quantitative and Traditional Portfolio Construction, Journal of Asset Management, Vol
1, 2, 138-150, 2000.
312. C.J. Savage, The foundation of statistics, Wiley, New York, 1954.
313. W. F. Sharpe, Capital asset prices: A theory of market equilibrium under conditions of
risk, Journal of Finance, 19 (3), 425-442, 1964.
314. B. Scherer, Portfolio Construction and Risk Budgeting, Third Edition, Risk Books, 2007.
315. SEC, Mutual Funds: A Guide for Investors, New York, 2008.
316. S. Schaefer, Factor Investing, Lecture at SFI Annual Meeting, 2015.
317. P. Schneider, Generalized Risk Premia, Journal of Financial Economics. forthcoming,
2015.
318. P. Schneider, C. Wagner and J. Zechner, Low Risk Anomalies, Preprint SFI, 2016.
319. C. Shimizu, H. Takatsuji, H. Ono, and K. Nishimura, Structural and temporal changes
in the housing market and hedonic housing price indices: A case of the previously owned
condominium market in the Tokyo metropolitan area. International Journal of Housing
Markets and Analysis, 3(4), 351-368, 2010.
320. J. Siegel, Stocks for the Long Run, McGraw-Hill, New York, NY, 1994.
321. S. Shalev-Shwartz, Introduction to machine Learning, Lecture Notes The Hebrew Univer-
sity of Jerusalem, 2016.
322. R. J. Shiller, The Uses of Volatility Measures in Assessing Market Eciency, Journal of
Finance 36: 291 - 304, 1981.
323. R. J. Shiller, From Ecient Markets Theory to Behavioral Finance, Journal of Economic,
Perspectives 17 (1): 83 - 104, 2003.
324. R. J. Shiller, Speculative Asset Prices, Cowles Foundation Paper No. 1424, 2014.
325. R. J. Shiller, Market Eciency and Role of Finance in Society, Key Note Lecture, EFA
2014, Lugano, 2014.
555
326. R. J. Shiller and A.N. Weiss, Home Equity Insurance, The Journal of Real Estate Finance
and Economics, 19(1): 21-47, 1999.
327. M. Silver, How to better measure hedonic residential property price indexes, IMF Working
Paper, 2018.
328. A. J. Smola and B. Schölkopf, A tutorial on support vector regression. Statistics and
computing, 14. Jg., Nr. 3, S. 199-222, 2004.
329. Y. Sompolinsky and A. Zohar, Secure High-Rate Transaction Processing in Bitcoin, In-
ternational Conference on Financial Cryptography and Data Security. Springer Berlin
Heidelberg, 2015.
330. State Street, The Folklore of Finance, Center of Applied Research. 2014.
331. G.V.G. Stevens, On the Inverse of the Covariance Matrix in Portfolio Analysis, The Journal
of Finance, Vol. 53(5), 1821-1827, 1998.
332. R. Sullivan, A. Timmermann, and H. White, Data-snooping, Technical Trading Rule Per-
formance , and the Bootstrap, The Journal of Finance 54 (5), 1647 - 1691, 1999.
333. M. Swan, Blockchain: Blueprint for a New Economy, O'Reilly Media, 2015.
334. Swissquant, Costumer Retention, Big Data Analytics, 2017.
335. J. Syz, M. Salvi and P. Vanini, Property Derivatives and Index-Linked Mortgages, Journal
of Real Estate Finance and Economics, Vol. 36, No. 1, 2008.
336. J. Syz and P. Vanini, Real Estate, Swiss Finance Institute Annual Meeting, 2008.
337. N. Sullivan, A (Relatively Easy To Understand) Primer on Elliptic Curve Cryptography,
Cloudfare blog, 2013.
338. N. Szabo, Formalizing and Securing Relationships on Public Networks, First Monday, 2(9),
1997.
339. P. Tasca, Economic Foundation of the Bitcoin Economy, University College London, Center
for Blockchain Technologies, Blockchain Workshop Zurich, 2016.
340. N. Taleb, The Black Swan. The Impact of the Highly Improbable. New York: Random
House, 2010.
341. J. Teiletche, Risk-Based Investing: Myths and Realities, CFA UK Masterclass, London
June 9th, 2015.
342. J. Teiletche, Active Risk-Based Investing, CQ Asia, Hong Kong, 2014.
343. M. Teo, The Liquidity Risk of Liquid Hedge Funds, Journal of Financial Economics, 100(1),
24-44, 2011.
344. J. Ter Horst and M. Verbeek, Fund Liquidation, Self-Selection, and Look-Ahead Bias in
the Hedge Fund Industry, Review of Finance, 11(4), 605-632, 2007.
345. J. Treynor and K. Mazuy, Can Mutual Funds Outguess the Market, Harvard business
review, 44(4), 131-136, 1966.
346. F. Trojani and P. Vanini, A Note on Robustness in Merton's Model of Intertemporal
Consumption and Portfolio Choice, Journal of Economic Dynamics and Control, Vol. 26,
No. 3, 423-435, 2002.
556 CHAPTER 8. REFERENCES
347. Tu and Zhou, Data-Generating Process Uncertainty, What Dierence Does it Make in
Portfolio Decisions?, Journal of Financial Economics, 72, 385-421, 2003.
348. S. Tilly and F. Triebel, Automobilindustrie 1945-2000, Stepanhie Tilly % Florian Triebel
(eds), Oldenburg Verlag München, 2013.
349. UBS, Strategy and Regulation. Impact of Regulation on Strategy and Execution, SFI
Conference on Managing International Asset Management, N. Karrer, 2015.
350. UBS, Distribution Strategies in Action, SFI Conference on Managing International Asset
Management, A. Benz, 2015.
351. Vershynin, R., How close is the sample covariance matrix to the actual covariance matrix?.
Journal of Theoretical Probability, 25(3), 655-686, 2012.
352. Viacom Media Networks, 2013.
353. L. Vignola and P. Vanini, Optimal Decision-Making with Time Diversication, Review of
Finance, 6.1, 1-30, 2002.
354. I. Walter, The Asset Management Industry Dynamics of Growth, Structure and Perfor-
mance , edited By Michael Pinedo and Ingo Walter, 2013.
355. J.H. White, Volatility Harvesting: Extracting Return from Randomness, arXiv, November,
2015.
356. World Economic Forum, The Future of Long-term Investing, New York, 2011.
357. World Economic Forum, Future of Financial Services, New York, 2015.
358. World Economic Forum, Beyond Fintech: A Pragmatic Assessment Of Disruptive Potential
In Financial Services, New York, 2017.
359. A. Yeniay and A. Göktas, A Comparison of Partial Least Square Regression with other
Prediction Methods, Journal of Mathematics and Statistics Volume 31, 99-111, 2002.
360. A. Zelltner and V.K. Chetty, Prediction and Decision Problems in Regression Models from
the Baysian Point of View, Journal of the American Statistical Association, 60, 608-616,
1965.
361. ZKB, Index Methoden, 2013.
362. H. Zou, The Adaptive LASSO and its Oracle Properties, Journal of the American Statis-
tical Association 101(476), 14181429, 2006.
363. G. Zyskind, N. Oz and A. Pentland, Enigma: Decentralized Computation Platform with
Guaranteed Privacy, arXiv preprint, 2015.
Index
Permissioned Protocol, 481 Benchmark Return, 179
Benchmarking, 69
Active Investment and Benchmarking, 299 Beta and Volatility Based Low Risk Anoma-
Active versus Passive lies, 282
Sharpe's Arithmetics , 67 Beta Pricing Model, 260
Altcoins, 494 Bias-Variance Trade-O, 431
Alternative Investments (AIs) Bitcoin Protocol, 480
Insurance-Linked Investments, 102
Bitcoin Security, 497
Arithmetical Relative Return (ARR), 179
Black-Litterman Model, 329
Asset Class
Black-Scholes Equation, 222
Denition, 13
Black-Scholes, Formula for Call, 218
Asset Management Industry
Black-Scholes, Interpretation, 219
Wealth 2020, 17
Black-Scholes, Interpretation No Arbitrage,
Asset Management Overview, 14
218
Asset Pricing
Brinson-Hood-Beebower (BHB) Eect, 179
Absolute Pricing, 261, 413
Broken Covered Interest Parity (CIP), 206
Fundamental Asset Pricing Equation,
Buy-and-hold, static, 164
263, 416
General Equilibrium, 414
Call Option, 195
Good and Bad Times, 264, 417
Capital Weighted Index Funds, 91
Low Volatility Strategies, 282
Capital-Guaranteed Producs (CP), 226
Multi Factor Models, 281
CAPM
Multi Period, 281
Appraisal (Information) Ratio, 376
What Happens if an Investment Strat-
Assumption, 372
egy is Known to Everyone? , 285
Beta Pricing Model, 371
Asset Pricing in Financial Markets, 193
CML and SML, 374
Average Investment Capital (AIC), 182
Conditional CAPM, 380
557
558 INDEX
80 Strategies, 111
agement, 406
i all assets, 299
Success of the Active Strategy, 402
Incomplete Market, 185
TER and Performance, 85
Independent Sample Error, 369
UCITS, 86
Index Construction, 89
FX Forward, 204
Index Funds and ETFs, 88
Game Theoretic Concept Blockchain, 484 Index Sampling, 299
Gamma , 220 Information Coecient (IC), 403
General Linear Model, 460 Information Ratio IR, 335, 403
Generalization Error, 430 Interest Rate Parity
Geometric Margin, 447 CIP, 205
Global AM Covered, 204
2014-2020, 74 Trilemma, 206
AM versus Trading, 75 UIP, 206
AM versus Wealth Management, 77 Uncovered, 204
Demand and Supply Side, 70 Interest Rate Swaps (IRS), 153
Eurozone, 70 Internal Rate of Return (IRR), 182
Global Figures 2007-2014, 72 Interval Error, 369
560 INDEX