100% found this document useful (1 vote)

391 views563 pages

Testabc 0 Bookassetmgt

This document provides an overview of the asset management industry and related topics. It discusses key concepts such as types of investors and assets under management, pension funds, risk and return, diversification, the fund industry including mutual funds and ETFs, alternative investments such as hedge funds and private markets, and environmental, social and governance (ESG) investing. The document also covers some theoretical fundamentals of asset management such as returns, performance attribution, no arbitrage pricing, and derivative pricing models.

Uploaded by

Moin Uddin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

391 views563 pages

Testabc 0 Bookassetmgt

Uploaded by

Moin Uddin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 563

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/309835452

Asset Management

Book · May 2018

CITATIONS READS

0 16,997

1 author:

Paolo Vanini
University of Basel
144 PUBLICATIONS 858 CITATIONS

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Advanced Mathematical Methods for Economics and Natural Sciences View project

Banking View project

All content following this page was uploaded by Paolo Vanini on 12 July 2020.

The user has requested enhancement of the downloaded file.

Asset Management

Paolo Vanini

University of Basel

July 12, 2020

2
Contents
1 Introduction 9
2 Asset Management Overview 13
2.1 Wealth of Nations and Assets under Management (AuM) . . . . . . . . . 15
2.2 Investors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.1 Private Investors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.2 Sovereign Wealth Funds (SWFs) . . . . . . . . . . . . . . . . . . . 21
2.3 Pension Funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.1 Management of Pension Funds . . . . . . . . . . . . . . . . . . . . 27
2.3.2 Demography and Pension Funds . . . . . . . . . . . . . . . . . . . 29
2.4 Who Decides? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.1 MiFID II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.4.2 Investment Process for Retail Clients . . . . . . . . . . . . . . . . . 42
2.4.3 Robos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4.4 Mandate Solutions for Pension Funds . . . . . . . . . . . . . . . . . 45
2.4.5 Conduct Risk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.5 Risk, Return, Diversication and Reward-Risk Ratios . . . . . . . . . . . . 50
2.5.1 Long-term Risk and Return Distribution . . . . . . . . . . . . . . . 51
2.5.2 Diversication of Assets - Portfolios . . . . . . . . . . . . . . . . . 51
2.5.3 Two Mathematical Facts About Diversication . . . . . . . . . . . 55
2.5.4 Risk Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.5.5 When Diversication Fails . . . . . . . . . . . . . . . . . . . . . . . 58
2.5.6 Concentration and Diversity . . . . . . . . . . . . . . . . . . . . . . 60
2.5.7 Risk Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.5.8 Costs and Performance . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.5.9 Passive versus Active Investment; a First Step . . . . . . . . . . . . 67
2.6 Market Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
2.6.1 The Demand and Supply Side in Asset Management . . . . . . . . 70
2.6.2 Asset Management Industry - the Eurozone . . . . . . . . . . . . . 70
2.6.3 Global Figures 2007-2014, Market Structure . . . . . . . . . . . . . 72
2.6.4 Asset Management vs Trading Characteristics . . . . . . . . . . . . 75
2.6.5 Institutional Asset Management versus Wealth Management . . . . 76
2.7 The Fund Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

3
4 CONTENTS

2.7.1 Mutual Funds and SICAVs . . . . . . . . . . . . . . . . . . . . . . 79

2.7.2 US Mutual Funds versus European UCITS . . . . . . . . . . . . . 81
2.7.3 Functions of Mutual Funds . . . . . . . . . . . . . . . . . . . . . . 81
2.7.4 Fees for Mutual Funds . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.7.5 The European Fund Industry - UCITS . . . . . . . . . . . . . . . . 87
2.8 Index Funds and ETFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.8.1 Index Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2.8.2 Capital Weighted Index Funds . . . . . . . . . . . . . . . . . . . . 92
2.8.3 Risk Weighted Index Funds . . . . . . . . . . . . . . . . . . . . . . 95
2.8.4 ETFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2.8.5 Evolution of Expense Ratios . . . . . . . . . . . . . . . . . . . . . . 102
2.9 Alternative Investments (AI) - Insurance-Linked Investments . . . . . . . 103
2.10 Private Markets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
2.11 Hedge Funds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
2.11.1 What is a hedge fund (HF)? . . . . . . . . . . . . . . . . . . . . . . 109
2.11.2 Hedge Fund Industry . . . . . . . . . . . . . . . . . . . . . . . . . . 110
2.11.3 CTA Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
2.11.4 Fees and Leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
2.11.5 Withdrawing Restrictions, Fund Flows and Capital Formation . . 114
2.11.6 Biases, Entries and Exits . . . . . . . . . . . . . . . . . . . . . . . 114
2.11.7 Investment Performance . . . . . . . . . . . . . . . . . . . . . . . . 116
2.12 AM Innovation - Views on Disruption . . . . . . . . . . . . . . . . . . . . 123
2.12.1 Replacement and Prices . . . . . . . . . . . . . . . . . . . . . . . . 123
2.12.2 Market Entrants . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
2.12.3 Value Chain, Investment Process and Technology . . . . . . . . . . 129
2.12.4 Some Innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
2.13 ESG Investing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
2.14 Green Investing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
2.14.1 Green Bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
2.14.2 Energy Contracting and Structured Finance . . . . . . . . . . . . . 143
2.15 Uniformity of Minds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
2.15.1 The Great Depression and the Great Recession . . . . . . . . . . . 146
2.15.2 Uniformity of Minds . . . . . . . . . . . . . . . . . . . . . . . . . . 147

3 Fundamentals Theory 149

3.1 Returns and Performance Attribution . . . . . . . . . . . . . . . . . . . . 149
3.1.1 Time Value of Money (TVM) . . . . . . . . . . . . . . . . . . . . . 149
3.1.2 Interest Rate Swaps . . . . . . . . . . . . . . . . . . . . . . . . . . 153
3.1.3 Forward Rate Agreements . . . . . . . . . . . . . . . . . . . . . . . 159
3.1.4 Constructing Discount Factors . . . . . . . . . . . . . . . . . . . . 160
3.1.5 Return Bookkeeping . . . . . . . . . . . . . . . . . . . . . . . . . . 162
3.1.6 Returns and Rebalancing . . . . . . . . . . . . . . . . . . . . . . . 164
3.1.7 Rebalancing Example . . . . . . . . . . . . . . . . . . . . . . . . . 166
CONTENTS 5

3.1.8 Rebalancing = Short Volatility Strategy . . . . . . . . . . . . . . . 167

3.1.9 Optimal Investment Strategy and Rebalancing . . . . . . . . . . . 169
3.1.10 Stochastic Portfolio Theory (SPT) . . . . . . . . . . . . . . . . . . 174
3.1.11 Return Attribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
3.1.12 Returns and Leverage . . . . . . . . . . . . . . . . . . . . . . . . . 181
3.2 Basics of No Arbitrage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
3.3 No Arbitrage and Derivative Pricing . . . . . . . . . . . . . . . . . . . . . 190
3.4 Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
3.4.1 TAA Construction, Forwards and Futures . . . . . . . . . . . . . . 195
3.4.2 Currency Forward and Futures . . . . . . . . . . . . . . . . . . . . 203
3.4.3 Call-Put-Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
3.4.4 Market Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
3.4.5 Incomplete Market . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
3.4.6 Multi Period Derivative Pricing . . . . . . . . . . . . . . . . . . . . 210
3.4.7 Black and Scholes . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
3.4.8 Hedging and Greeks . . . . . . . . . . . . . . . . . . . . . . . . . . 218
3.4.9 Structured Products (SP) and Structured Investments . . . . . . . 223
3.4.10 Pricing of Structurd Products . . . . . . . . . . . . . . . . . . . . . 225
3.4.11 Political Events: Swiss National Bank (SNB) and ECB and SP
Investment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
3.4.12 Market Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232
3.5 Collateral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
3.5.1 Prime Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
3.5.2 Repo Transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
3.6 The Ecient Market Hypothesis (EMH) . . . . . . . . . . . . . . . . . . . 238
3.6.1 Predictability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
3.6.2 Testing Predictabilty . . . . . . . . . . . . . . . . . . . . . . . . . . 247
3.6.3 Cross-Sectional vs Time Series Predictability . . . . . . . . . . . . 251
3.6.4 EMH Extensions and Critique . . . . . . . . . . . . . . . . . . . . . 251
3.7 Asset Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
3.7.1 Equivalent Formulation of the Fundamental Asset Pricing Equation 254
3.7.2 Geometry of Asset Pricing . . . . . . . . . . . . . . . . . . . . . . . 256
3.7.3 Absolute Pricing (General Equilibrium) . . . . . . . . . . . . . . . 261
3.7.4 Projection Pricing and SDF Formulation . . . . . . . . . . . . . . . 265
3.7.5 Arbitrage Pricing Theory (APT) . . . . . . . . . . . . . . . . . . . 271
3.7.6 Pricing Real-Estate Risk . . . . . . . . . . . . . . . . . . . . . . . . 273
3.7.7 Multi-Period Asset Pricing and Multi-Risk-Factors Models . . . . . 281
3.8 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
3.8.1 Low Volatility Strategies . . . . . . . . . . . . . . . . . . . . . . . . 282
3.8.2 What Happens if an Investment Strategy is Known to Everyone? . 284
3.8.3 Short-Term versus Long-Term Investment Horizons . . . . . . . . . 285
3.8.4 Time-Varying Investment Opportunities . . . . . . . . . . . . . . . 286
3.8.5 Model Portfolios . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
6 CONTENTS

3.8.6 Fallacies in Long Term Investment . . . . . . . . . . . . . . . . . . 290

4 Portfolio Construction 293

4.1 Steps in Portfolio Construction . . . . . . . . . . . . . . . . . . . . . . . . 293
4.2 Allocation - Foundations of Investment Decisions . . . . . . . . . . . . . . 294
4.2.1 Statistical Models, Quadratic Optimization . . . . . . . . . . . . . 294
4.2.2 Rational Dynamic Decision Making . . . . . . . . . . . . . . . . . . 300
4.2.3 Growth Optimal Portfolios . . . . . . . . . . . . . . . . . . . . . . 301
4.2.4 Heuristic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
4.3 Portfolio Construction Examples . . . . . . . . . . . . . . . . . . . . . . . 306
4.3.1 Heuristic Allocation: Static 60/40 Portfolio . . . . . . . . . . . . . 306
4.3.2 Optimal Allocation: Dynamic Merton Model . . . . . . . . . . . . 308
4.3.3 Optimal Allocation: Goal Based Investment . . . . . . . . . . . . . 309
4.3.4 Optimal Allocation: Markowitz . . . . . . . . . . . . . . . . . . . . 310
4.3.5 Review Markowitz Model . . . . . . . . . . . . . . . . . . . . . . . 323
4.3.6 Views and Portfolio Construction - The Black-Litterman Model . . 327
4.3.7 Heuristic Allocation: Risk Budgeting Portfolio Construction . . . . 333
4.4 Estimation: The Covariance Matrix . . . . . . . . . . . . . . . . . . . . . . 339
4.4.1 Dimension Reduction: Eigenvalues and Eigenvectors . . . . . . . . 343
4.4.2 Linear Shrinkage of the Covariance Matrix . . . . . . . . . . . . . . 349
4.4.3 Non-Linear Shrinkage of the Covariance Matrix . . . . . . . . . . . 351
4.4.4 Comparing Dierent Approaches - Asymptotics . . . . . . . . . . . 352
4.5 Factor Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
4.5.1 Industry Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . 360
4.5.2 Non-Performance of Alternative Risk Premia . . . . . . . . . . . . 365
4.5.3 A Critical Review of the Industry Perspective . . . . . . . . . . . . 367
4.5.4 The CAPM as a Beta Pricing Model . . . . . . . . . . . . . . . . . 369
4.5.5 Factor Investing: 3-Factor Model of Fama and French . . . . . . . 379
4.5.6 Factor Investing: 5-Factor Model of Fama and French . . . . . . . 382
4.5.7 Risk Factor Allocation . . . . . . . . . . . . . . . . . . . . . . . . . 384
4.5.8 Factors and Advisor Portfolios . . . . . . . . . . . . . . . . . . . . 385
4.6 Backtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
4.6.1 Data Snooping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
4.6.2 Overtting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
4.6.3 Backtesting and Multiple Testing . . . . . . . . . . . . . . . . . . . 391
4.6.4 Application to Factor Investing . . . . . . . . . . . . . . . . . . . . 395
4.6.5 p-Hacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
4.6.6 Active vs Passive Investments . . . . . . . . . . . . . . . . . . . . . 400

5 Asset Management Innovation 411

5.1 Big Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
5.1.1 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411
5.1.2 Demand for Big Data . . . . . . . . . . . . . . . . . . . . . . . . . 412
5.2 Machine Learning (ML) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
CONTENTS 7

5.2.1 Set-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

5.2.2 Linear Regression Analysis . . . . . . . . . . . . . . . . . . . . . . 418
5.2.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
5.2.4 Overtting and Undertting . . . . . . . . . . . . . . . . . . . . . . 440
5.2.5 Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446
5.2.6 Theory ML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 450
5.2.7 Linear Threshold Model . . . . . . . . . . . . . . . . . . . . . . . . 459
5.2.8 Support Vector Machines (SVM) . . . . . . . . . . . . . . . . . . . 461
5.2.9 Tree Based Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 463
5.2.10 Naive Bayes Classier . . . . . . . . . . . . . . . . . . . . . . . . . 468
5.2.11 Nearest Neighbour Analytics . . . . . . . . . . . . . . . . . . . . . 470
5.2.12 'Sentimental Risk' . . . . . . . . . . . . . . . . . . . . . . . . . . . 471
5.2.13 Customer Retention: Text Mining . . . . . . . . . . . . . . . . . . 472
5.2.14 Portfolio Construction with Machine Learning, I . . . . . . . . . . 477
5.2.15 Portfolio Construction with Machine Learning, II . . . . . . . . . . 481
5.3 Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
5.3.1 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484
5.3.2 Modular Arithimetic (MA) . . . . . . . . . . . . . . . . . . . . . . 486
5.3.3 RSA Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
5.3.4 Hash Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491
5.3.5 Digital Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . 492
5.3.6 Blockchain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494
5.3.7 Dierent Blockchain Types, Type of Consensus . . . . . . . . . . . 500
5.3.8 Blockchain Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 502
5.4 Currencies and Crypto-Currencies . . . . . . . . . . . . . . . . . . . . . . . 508
5.4.1 Money and Payment Systems . . . . . . . . . . . . . . . . . . . . . 508
5.4.2 Fiat Money . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 508
5.4.3 Bitcoin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
5.4.4 Bitcoin Blockchain Security . . . . . . . . . . . . . . . . . . . . . . 515
5.4.5 Libra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517

6 Proofs 521
7 Appendix 537
8 References 539
8 CONTENTS
Chapter 1

Introduction
Assets and their management (AM) are a key discipline in a modern economy: we man-
age our assets to maintain the standard of living after retirement, to buy property later,
or because a sovereign wealth fund does not want to lose the assets of future genera-
tions. AM is a process of building, distributing, and maintaining assets throughout the
life-cycle cost-ecient and compliant. Pension funds, institutional investors or private
investors are dierent users of the AM process.

Game Changers
PwC (2015, 2012), McKinsey (2015), Oliver Wyman (2016) and many others identify the
following game changers for the asset management industry:
1

• Growth of wealth: Global assets under management (AuM) will exceed USD 100
trillion by 2020, up from USD 64 trillion in 2012.

• Regulation: In the past, banks dominated the nancial industry. They were the in-
novators. Regulation focused on banks and insurers after the 2008 Great Financial
Crisis (GFC). AM initially faced fewer regulatory requirements and is now moving
more and more center stage.

• Technological Disruptions: Platforms, data analysis and mutual distributed ledger

technologies allow greater connectivity between market participants, a redesign of
the AM value chains, a reduction in life cycle costs, access to new customers and
new approaches to horizontal integration.

• Longevity and demographics: Retirement and health care will become critical issues
as aging grows. The ratio of pensioners to the working-age population will reach
25.4 percent by 2050, up from 11.7 percent in 2010. This puts a strain on pension
systems. The still increasing life expectancy - each new generation will live three
months longer in the developed world - increases the need for individual wealth

1 The data published by consulting rms are private and results cannot be veried nor replicated by
a third party.

9
10 CHAPTER 1. INTRODUCTION

management solutions when people are retired. Asset managers will therefore focus
on long-term investments and on individual asset decumulation. This change aects
in particular the US, Japan, most European countries, South Korea, Singapore,
Taiwan and China.

• The distribution of AM services will be redesigned. Economies of scale force global

distribution on global platforms, and on the other hand, increasing compliance
complexity strengthens regional platforms.

• Fees will continue to decrease for most asset management solutions and regulation
requires to transform many existing fee models.

• Alternative investments transform into traditional ones and exchange traded funds
(ETFs) continue to proliferate.

Climate change is missing in the list above, although it will be one of the most impor-
tant game changer. Furthermore, the game changer 'performance' is missing although
performance is a notorious problem for many investors and there is no consensus about
optimal investment behavior. We will give this topic wide scope.

Today's technology enables new approaches to investment. Such connections between

technology and investment methodology are as important as the technology seen in terms
of process eciency, change in market infrastructure, and data integration. Asset man-
agers face competition from new entrants - FinTechs with technological advantage but
no customer base. The Medallion Fund of Renaissance shows that the interplay between
technology and scientic mastery can make all the dierence: In 26 years of investment
history (1988-2016), the fund has returned 88% annually per year with only a loss of 4%
in a year. Today, even the question arises as to whether technology can generally replace
human abilities - can one generate a digital alpha? But it works or went without tech-
nology. Lord Keynes for more than 19 years outperformed by17% per annum the S&P500.

While regulation dominated the decade after the GFC, the changes caused by tech-
nology are even more profound for the future of AM.

• Technology is irreversible while regulation is not. Regulators could revoke any

regulatory rules. But technology which proves useful to the people cannot be
stopped - how to stop the use of iPhones?

• Technology has still an overall positive connotation - it improves the circumstances

of living and it is creative. Regulation, despite its goals to make the nancial
system safer and to protect customers, fails to be seen in the same way.

• Technology puts clients center stage. Regulation intends to do so.

The current digitization wave diers from the well-known automation. The technology
has matured to a level where abstract banking and asset management products can be
11

understood, researched and valued by clients in a completely dierent way than in the
past. Today's technology is closer to humans than it ever was. Technology is also able
to replace human labor even for complex activities in the AM value chain - which work
will still be human-specic in the AM industry?

Contents

The content is from a methodological point of view split into two parts: Classical
methods and innovation. The former one considers some of the main developments in
the last decades which are in use in the AM industry. These can be the many ways how
portfolios are constructed using the models or methods of Markowitz, factor investing,
Black Litterman and many others. But it also means the way how the AM value chain
is structured and organized. We focus in innovation on two topics: Data science, i.e. the
way how possibly better forecasts can be made or customer needs measured. The second
one are platforms and blockchain. This means new forms how the asset management
infrastructure and value chain can be designed. The traditional models are discussed in
Chapter 4 and innovation is considered in Chapter 5.

From a topical perspective, standard and trend topics can be dierentiated. The
rst one includes to understand how dierent asset or asset classes behave, how their are
selected and managed. Besides the technological trends described above the focus is on
retirement provision. The standard material appears in all rst ve chapters. The trends
in retirement provision are presented in Chapter 5

Finally, in AM need-to-know and need-to-think both matter. It is important to know

facts about the status and the projections of the AM industry. Therefore, facts and
gures about the AM industry are given full weight. Since AM always means to turn
information into numbers analytical methods play a prominent role. Besides traditional
techniques, we introduce to machine learning and blockchain technology.

I am grateful for the assistance of Dave Brooks and Theresia Büsser. I would like to
thank Sean Flanagan, Barbara Doebeli, Bruno Gmür, Jacqueline Henn-Overbeck, Tim
Jenkinson, Andrew Lo, Helma Klüver-Trahe, Roger Kunz, Tom Leake, Robini Matthias,
Attilio Meucci, Tobias Moskowitz, Tarun Ramadorai, Blaise Roduit, Olivier Scaillet,
Stephen Schaefer and Andreas Schlatter for their collaboration, their support or the
possibility to learn from them.
12 CHAPTER 1. INTRODUCTION
Chapter 2

Asset Management Overview

The expression 'Asset Management' (AM) requires: What do we mean by an 'asset,' who
'manages' the assets, and how is this done?

Denition 1. Financial assets are nancial contracts that dene resources over which
property rights are enforced and from which future economic benets can ow to the
owner. An asset class is a group of nancial assets that share predened economic, legal
and regulatory characteristics.

Financial assets are intangible, non-physical assets. Financial assets often are more
liquid than tangible assets. Securities are tradable nancial assets. They are issued
through nancial intermediaries (primary market) and can often be traded on the sec-
ondary market. They dier among others in their ownership, complexity, liquidity, risk
and reward prole, transaction fees, accessibility and regulatory compliance. Traditional
asset classes are equities, xed income securities, money market instruments and curren-
cies. Alternative asset classes include real estate, commodities and private equity. Hedge
funds are not an asset class but an investment strategy dened for liquid asset classes.

Asset management is a systematic process of analyzing, trading, lending and bor-

rowing assets of all kinds. Since all assets belong to a person, the management of the
assets is a decision made by the owner of the assets or by a third party. McKinsey (2013)
estimates that third-party asset managers managed a quarter of global nancial assets
worldwide. Main outsourcers of assets are pension funds, sovereign wealth funds, family
oces, insurance companies, and private households. Third-party managed portfolios
are either mutual fund companies or discretionary mandates. In a mandate, the owner of
the asset delegates the investment decision to the asset manager. Funds combine assets
with a certain level of risk into a collective system. Diversication is the key risk concept.
Investors buy and sell shares in funds (mutual funds, ETFs or hedge funds). The asset
management function can be organized as an independent rm (Blackrock, Amundi) or
the AM division of a bank or insurer (Goldman Sachs Asset Management).

The goal of investing is to save today for the benets of future consumption. The

13
14 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

benet after an investment period should be greater than the present direct consumption
of all resources. Investments are made through the use of securities of all kinds - that is,
money, stocks, bonds, ETFs, mutual funds or derivatives.

The AM rm's role to channel savings towards investment can be structured as follow.
It creates products that match investors' needs. By trading the assets AM contributes
to liquidity of nancial markets. Investments are used by rms and governments. AM
are one of the biggest investors in government bonds.

AM makes investment in issued bonds and stocks accessible to small private investors
by using wrappers such as funds: Investors get for a small amount of money access to the
economics of a diversied portfolio of assets. AM also engage with investee companies.
As shareholders they hold the companies accountable and integrate environmental, social
and governance (ESG) concerns in their investment processes.

AM rms are required by law to act in the best interests of their clients and to invest
in accordance with a predened set of rules and principles. They charge a fee which is
based on the value of the assets under management (AuM). AuM grow if investment
is performing which leads to higher fees for AM and higher returns for investors. The
incentives of investors and asset managers to achieve positive returns are aligned. AuM
refers to all the assets managed by a nancial service provider. This includes assets
managed under a discretionary asset management mandate as well as assets managed
under an advisory asset management mandate. Denitions and formulas for calculating
the AUM vary from company to company. Some nancial institutions include bank
deposits, investment funds and cash in their calculations; others limit them to funds
where the investor assigns responsibility for investment decisions to the company.
Pricing and price forecasts of assets are important for investors. There are two ways
to price assets in theory: absolute pricing as an equilibrium outcome in an economy,
and relative pricing using the concept of no arbitrage. Equilibrium pricing is not rele-
vant for AM industry except for the CAPM as a benchmark model, while no arbitrage
pricing is key in derivative pricing. To price stocks and bonds, also called cash assets,
often empirical pricing models are used. They follow from working with data such as the
Fama-French model or more recently by using machine learning and AI. This approach
is the far most used one in the industry although lack of theoretical foundations and mis-
use of statistics often lead to awed investment strategies - data mining, data snooping,
inaccurate backtestings are examples.

Four key questions in AM are:

1. Who decides?

2. How do we invest? The investment method question.

3. Where do we invest? The asset selection question.

2.1. WEALTH OF NATIONS AND ASSETS UNDER MANAGEMENT (AUM) 15

4. How are asset management services produced and distributed in dierent jurisdic-
tions? - the protability, process, client segmentation, regulation and technology
question.

In the past, technology was mostly needed to implement the investment strategies.
New technologies enable radically new investment approaches that dier from traditional
statistical models such as the Capital Asset Pricing Model (CAPM). But technology is
also the key factor in scaling the business and managing regulatory complexity, i.e. to
keep or increase protability.

Question 4. attracted a large part of the asset management resources in the decade
after the GFC due to regulatory and technological changes and also to dierent client
expectations. This question can be considered as the sum of the following strategic
business issues (UBS [2015]):

• In which countries does an AM rm want to compete in? The answer to this
geographical question depends on the AM rm's actual strength, its potential, the
costs to comply with the country specic regulation, the costs to build up the
human capital and the business and technological complexity.

• Which clients should be served?

• Which products and investment areas should the AM rm focus on? Often large
AM rms oer up to several hundred investment strategies.

• What services should be provided and which technologies should be used for them?

• What operating model should be used? This question has a distribution dimension
(global vs. (multi)-local oering), an operational one (centralized vs. decentral-
ized), a value-chain one (in-house vs. outsourcing) and a legal/tax environment
one (on-shore vs. oshore).

2.1 Wealth of Nations and Assets under Management (AuM)

Prosperity growth is the raw material for asset management. Figure 2.1 shows the relative
distribution of wealth worldwide in the last 2000 years.
In the period up to 1500, the distribution of wealth was stable and proportional to
the population but not to the distribution within a population. This reects the small
global productivity dierences. This changed radically as Europe and then North Amer-
ica ruled the rest of the world. Globalization and the end of colonialism, in which the
economic dierences between countries are shrinking, are changing the distribution of
GDP towards the time of the Roman Empire. In absolute terms, it took 400 years to
double global GDP from $ 1 trillion to $ 2 trillion (1500-1900), but it only took 30 years
from 1960-1990 to triple global wealth.
16 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Figure 2.1: The size of the area indicates the proportion of global GDP produced in that
area during the years concerned. GDP is measured in USD to oset purchasing power
parity. In each chart, the total assets are displayed in USD. 1 AD means the year 1 anno
Domini in the Julian calendar (worldmapper.org).

Assets under Management (AuM) is the market value of assets that an investment
company manages on behalf of investors. AuM is often used as a measure of growth
between asset managers. As protability varies widely for dierent types of assets, AuM
should be used with caution to draw conclusions about the asset manager's protabil-
ity. GIPS (Global Investment Performance Standards) is the market standard or AuM
reportings investors.

PwC (2015) estimates that global AuM will exceed USD 100 trillion by 2020, up
from USD 64 trillion in 2012. Other estimates are similar. These gures would result
in an annual global compounded growth rate of 6 percent. This rate varies for dierent
geographic regions (Boston Consulting Group [2016]):

• Western Europe, northern America, Japan: 1.6% p.a.

• Emerging Markets (EM): South America, BRIC states, Middle East, Eastern Eu-
rope: 8.5% p.a.

The dierent growth rates dene opportunities for wealth managers in developed markets
to oer solutions in fast-growing markets. Therefore, market access for the development
of AM plays a prominent role. At the individual level, per capita GDP in 2016 was USD
11'000 for the emerging economies and USD 47'000 for the industrialized countries. The
2.1. WEALTH OF NATIONS AND ASSETS UNDER MANAGEMENT (AUM) 17

estimates for the period 2016-2021 are 150% for the EM and 50% for the developed ones
(IMF World Economic Outlook [2016]).

The evolution of EM can also be seen by considering specic assets, see Table 2.1 for
the emerging market E(M) bonds market share growth. 20 years ago almost 100% of

Markets 31 Dec 1989 31 Mar 2016 trn USD

US 61.30% 38.10% 37
Developed Markets ex US 37.80% 44.30% 44
Emerging Markets 1.00% 17.50% 17

Table 2.1: Bond market shares (Barclays Capital, BIS, FactSet, J.P. Morgan Asset Man-
agement [2016]).

the EM bonds had a high yield creditworthiness, in 2016 only 45% had such a rating in
the JP Morgan EM bond index and therefore with 55% with an investment grade rating.
Figure 2.2 shows other dimensions of the EM developments.
Wealth growth must be compared to the dynamics of wealth inequality. Increase in
inequality is likely to destabilize the growth of wealth as it leads to social and political
instability. Inequality risks are among the highest risks in the annual global risk map of
the World Economic Forum. On the one hand, the global increase in wealth has been
the main reason for poverty to fall worldwide at a level never seen before in history.
CO2 emissions due to the changed living conditions, mobility, meat dominated food and
tourism among others will trigger or reinforce global economic and social tension.

The global wealth projections of PwC (2015) for dierent types of investors are shown
in Table 2.2.

Clients 2012, USD tr. 2016, USD tr. E2020, USD tr. Growth rate p.a.
Pension funds 33.9 38.3 53.1 6.5%
Insurance companies 24.1 29.4 38.4 4.8%
SWF 5.2 7.4 10 6.9%
HNWIs 52.4 72.3 93.4 4.9%
Mass auent 59.5 67.2 84.4 6.7%

Table 2.2: There are double counts. Assets of wealthy individuals (HNWIs) are invested
in insurance and pension funds. Mass auent refers to individuals with liquid assets
between USD 1-3 mn. HNWIs possess liquid assets of USD 3 - 20 mn. The categorization
is not unique. The predictions of the 2020 AuM changed from 2015 and 2018 vista time.
While the numbers were stable for pension funds and insurance companies, the forecast
for HWNI was signicantly correceted upwards and the mass auent number for 2020
is now signicantly lower (PwC [2015], PwC [2018]).

Mass auent clients and HNWIs in emerging markets are the main drivers of AuM
18 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Share of Global Nominal Consumption

0.4 14.00%
EM Private Credits EM Real Interest Rate
12.00% Growth
0.35
10.00%

0.3
8.00%

0.25 6.00%

4.00%
0.2

2.00%
0.15
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
0.00%
Taper Tantrum 2017 Taper Tantrum 2017

EM Consumption US Consumption

Figure 2.2: Upper left panel: Share of global nominal consumption measured in current
USD expenditures. Upper right panel: EM country fundamentals at the time of the
taper-tantrum and measured at the beginning of 2017. Lower panel: Creditworthiness
of EM countries. The right panel shows the divergence for dierent EM countries. (
J.P. Morgan Guide to the Markets, UN, World Bank, J.P. Morgan Global Economics
Research [2013, 2015, 2016])

growth. The global middle class is projected to grow by 180 percent between 2010 and
2040, with Asia replacing Europe as home to the highest proportion of middle classes as
early as in 2015 (OECD, European Environment Agency, PwC [2014]). The growth of
pension funds will be large in countries with fast growing GDPs, weak demographics and
dened contribution pension schemes.

2.2 Investors
There are dierent types of investors: private clients, high net worth individuals, pension
funds, family oces or state investment funds. At a higher level, investors are divided
into private investors and institutional investors. The ownership of assets between these
two categories changes over time, see Figure 2.3 for the US.
2.2. INVESTORS 19

Figure 2.3: Equity ownership in the US. In the 1950s, 90% of equity in the US were held
by private investors. This number dropped almost linearly to 40% by the end of 2010
and then began to rise slightly. The fraction of equity ownership held by institutional
investor follows the opposite evolution. Source: Rohner [2014].

2.2.1 Private Investors

Private investors dier in many ways from sovereign wealth funds and pension funds.
Their biggest assset is human capital, which interacts with assets along the life cycle as
follows. As a young, they only have human capital. In the course of their lives, human
capital generates an income that is converted into nancial capital. When retiring, most
people no longer use their human capital to generate nance capital but consume accu-
mulated nance capital. Pension funds for example are timeless.

Private investors show a strong real estate dependence in their balance sheet, see
Figure 2.4 for the Swiss case. In particular younger investor face a large leverage eect
of mortgage nancing: the ratio of assets (real estate) to existing capital is large. Small
changes in the property asset price have a signicant impact on the balance sheet equity
of the investor. Interest rate risk and real estate market price risk aect the asset. The
latter risk is more dangerous for the investor's default.
Consider a private investor which bought a house worth CHF 1 million. The 'golden
rule of aordability' in Swiss banking states that the investor needs to cover 20% of the
house price with his own capital and that the interest rate charge for the mortgage should
not exceed 1/3 of regularly income assuming a hypothetical high interest rate level of 5%.
For a mortgage of CHF 8000 000 regular income of the investor has to be not lower than
20 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Figure 2.4: Balance sheet of private households in Switzerland (SNB [2018]).

CHF 3 × 0.05 × 8000 000 = 1200 000. Suppose that the investor gets a mortgage with xed
5 year rate of 1% which is a plausible number in a zero interest rate risk environment.
He therefore pays for the next 5 years without any amortization payments CHF 8 000
0

per annum which is much less than renting the same object. Assume that the remaining
liquid capital of the investor is CHF 1000 000 and an annual salary of CHF 1500 000.

The leverage ratio of the investor, the ratio of the asset value to equity value, is
10 0000 000
λ= 1000 000 = 10. Consider two scenarios. First, interest rates are up in ve years to
3%. Second, house price fall by 15% in the next ve years. The rst scenario implies
that the investor has to pay
0
CHF 24 000 per annum for the interest rate charge for the
new mortgage after ve years. Three times more than in the past but still an aordable
part of income. In the second scenario. the house is only worth CHF 8500 000. Since
the investor should always cover 20% of the house price, a maximum mortgage of 80%
means a value of CHF 6800 000 since the new house price is CHF 8500 000. The investor
has to pay the dierence of the old and new mortgage value of
0
CHF 120 000. This is an
annual salary! Hence, house price risk is more severe risk than interest rate risk.

Given the importance of real estate risk for private clients it is not understandable
why the myriad of sophisticated wealth management tools almost always only consider
the nancial assets leaving aside the house asset and mortgage debt. But not only retail
investors use mostly an asset only approach in investment. Research from State Street
(2014), using data from a worldwide survey of 3, 744 investors, shows that although
2.2. INVESTORS 21

nearly 80 percent of investors realize the importance of achieving long-term goals but
prociency in achieving them can strongly deviate. In the US, public pension funds were
on average less than 70 percent funded, with more than USD 1.3 trillion of unfunded
liabilities. A similar picture holds for private investors. While 73 percent cited long-term
goals only 12 percent could say with condence that they were on target to meet those
goals. Many academic papers address the misalignment between what investor's say is
important (ALM) and what they do (asset only). There is a myriad of possible reasons
for this dierence between what they state and what they do which are discussed in the
papers.

Investors dier also in the type of nancial assets they buy. The more professional
investors are, the more they invest in cash products. They do not use mutual funds or
structured products, since they can create the same payos without paying the wrapper
costs. Figure 2.4 shows on the aggregate of all investors that bond investments and struc-
tured products did not grow in the last decade opposite to the growth of funds and shares.

Individuals and smaller pension funds prefer mutual funds and structured products.
One reason is lack of capital to reach a reasonable diversication. We discuss below that
a Swiss investor needs about CHF 1.5 million in order to achieve a reasonable diver-
sication by investing in cash products. The second reason is that individuals fail to
have direct access to some markets and they cannot enter into short positions or are not
allowed to trade derivatives under the International Swaps and Derivatives Association
(ISDA) agreement. They are forced to buy derivatives in the packaged form of a mutual
fund or a structured product.

2.2.2 Sovereign Wealth Funds (SWFs)

SWFs are among the largest wealth owners in the world. The largest SWF in 2018 was
the Norwegian government's pension fund with assets of $ 1,002 billion. The next largest
are from the Middle or Far East: Abu Dhabi, United Arab Emirates, Saudi Arabia,
China, Kuwait, Hong Kong, Singapore and Qatar. All manage funds with assets ranging
from $ 200 to $ 800 billion.

Why are there so many SWFs in emerging markets? More than 50 percent of all large
SWFs originate in oil and Asian governments are much more active in managing their
economies than some of their western counterparts. According to Ang (2014), another
reason is that the US, after the many state bankruptcies of the 1980s and 1990s, told
emerging markets to save more. In recent years, a debate has begun on whether it is
productive to accumulate so much capital in sovereign wealth funds. Would it not be
more productive to invest capital directly in the local economy?

Many SWFs accumulate liquid assets as reserves for unexpected future economic
shocks. This forms a long-term precautionary savings motive for future generations.
22 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

This motivation is crucial for the acceptance of a SWF. A SWF can only exist if it has
public support. This public support is a sensitive issue. Scandals due to incompetent
fund management, lack of integration of the fund into economic strategies, political
mismanagement and criminal acts should be avoided. All changes in the risk policy for
asset management must be documented and communicated to the owners of the Fund.
For example, the Norwegian SWF initially invested only in bonds. Only after a broad
public discussion was a diversication of investments into other asset classes considered.
This behavior of Norwegians is unique and rooted in their democratic tradition.

2.3 Pension Funds

Large pension funds can be managed at the state level, but most are privately owned,
unlike SWFs. The assets managed by pension funds vary between 70 percent (US) and
130 percent (Netherlands) of GDP (2017). Why are there pension funds? Pension funds
can provide individuals with risk-sharing mechanisms that are not feasible on their own.
Consider a 25 year old person who wants to protect their future capital at retirement.
The nancial markets do not oer capital-protected products with a maturity of 40 years
- the markets are incomplete. A retirement plan can mitigate today's generation risks by
creating buer stocks over the next 40 years across generations. A risk sharing between
generations takes place: In this sense, pension funds complete the market by adding a
synthetic innite lived market participant - the aggregate over all generation - which
allows individual to share their life cycle-specic investment risk.

Pension funds are one part of the total pension system of a country, which is often
divided into three pillars:

• Pillar I - This pillar should cover the subsistence level and is often organized ac-
cording to the pay-as-you-go system. Each month, employees pay part of their
salary, which is immediately distributed to pensioners.

• Pillar II - This is the pillar of the pension funds. It should be enough to cover
the cost of living after retirement together with pillar I. The asset owners only
have limited access to their assets. There are two types of funds: Dened Benet
(DB) and Dened Contribution (DC). DB plans are based on predetermined future
benets for the beneciaries, but keep the contributions exible. DC plans x the
contributions but not the future benets. In summary, the contributions dene the
benets in the DC plans and the benets dene the contributions in the DB plans.

• Pillar III - Privately managed investments, which often have tax advantages. Access
to assets before retirement is usually limited.

Figure 2.5 illustrates the importance of dierent pillars in dierent countries. Re-
tirement systems are under pressure in most industrialized countries due to demographic
changes and increasing longevity. For the rst pillar, demographic change means that
working people pay on average for a growing number of retirees. This jeopardizes the
2.3. PENSION FUNDS 23

Figure 2.5: Left panel: The importance of the three pillars in percentage of retirement
income (ABP [2014]). Right Panel: Basic form of DC and DB pension plans.

concept of intergenerational risk sharing.

The threat to the rst pillar has major implications for national budgets. The rst
pillar accounts for more than 90 percent of retirement income in Spain. For Germany,
France and Italy, the value is between 75 percent and 82 percent. Given the extremely
low fertility rates and high unemployment among young people in Spain and Italy, the
rst pillar can not survive. Shifts into the second or third pillar are required, which rep-
resents an opportunity for asset management. But this only makes sense for the workers
with a regular income. The pension problem of the mass of today young people without
work remains unresolved.

The drivers in pay-as-you-go systems

The above statements can be illustrated by the following back-on-the-envelope cal-

culation. Assume that people work 40 years, retire and live for another 20, that the
population is the same for every year (normalized to 1), that all workers earn 1 unit per
year and that they pay in the benchmark case 20% of their income to the rst pillar.
Therefore, retired earn 0.4 units per year from the rst pillar which is enough to survive
but also requires a second pillar. This denes the benchmark We consider four scenarios.:
24 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

• Benchmark

• Unemployment 10 percent, earn 0.4

• Demography: Working cohorts 10% smaller than retired ones (demography)

• Longevity

The results indicate the increase in rst pillar contributions. The assumptions in the
scenarios are mild given tha in southern Europe unemployment rate for generations of
young workers are higher than 20% and that the working class in Japan will most likely
drop by 50% in the next 20 years. This is the reason why Japan heavily invests in robo
technology to substitute missing human workforce.

In the DB plans, the pension is set in relation to the last average salaries, see Figure
2.5. The contributions are calculated in such a way that they generate a predened cap-
ital stock at the end of working life. Therefore, an increase in salary requires additional
funds in order to maintain the full rent. On the other side, a year with very low income
can have dramatic eects for the contributor in the retirement period. Since the nancing
amount can change on an annual basis, they are considered intransparent.

In DC plans, the xed contributions are invested in several asset classes and the rent
is only weakly related to the most recent salary of the contributor. The growth of the
invested capital, including interest payments, implies a nal capital value at the end of
working life. The conversion rate applied to that nal capital level nally denes the
annual rent. Contributors to DC plans - contrary to those who contribute to DB plans
- bear the investment risk. This makes this form of pension system cheaper to oer for
employers. Unlike DB plans, the contributors can at least partially inuence investment
decisions - that is, choose the risk and return levels of the investments. This is one reason
why DC plans have become more attractive to contributors than their DB counterparts.
Finally, in some jurisdictions, DC plans are portable from one job to the next, while DB
plans often are not portable.

Underfunding is a serious problem. The S&P 500's biggest pension plans faced 2018
a $ 382 bn funding gap, of the 200 biggest DB plans in the S&P 186 aren't fully funded
in 2018. Companies like Intel have a ratio of pension assets to pension obligation of less
than fty percent. In Switzerland, the average funding ratio of private pension funds
in 2013 was 107.9 percent (Kunz [2014)]). The ratio for public funds was 87.8 percent,
showing strong underfunding. Private and public pension funds dier even more severely
when comparing the overfunding and underfunding gaps: For the Swiss private sector,
there is CHF 16.2 billion of overfunding capital and CHF 6.4 billion of underfunding. In
the public domain, the situation is the opposite: CHF 1.4 billion of overfunding versus a
CHF 49.5 billion funding gap.
2.3. PENSION FUNDS 25

2.3.0.1 DB versus DC Planes

There was a rapid shift from the DB plans in the 1980s to DC systems in the US and
UK. In the United States, nearly 70 percent of the 2017 pension funds are DC. This is a
percentage reversal from the situation 30 years ago. This system change took place more
quickly in the private sector than in the public sector, as the state can rely on taxpayers.
What are the causes of these changes? One reason is regulation, which, according to the
proposals of the Basel Committee and also under Solvency II, requires a certain coverage
ratio for the insurance industry. Furthermore, IFRS accounting standards since 2006
state that a funding decit should be included in the balance sheet of the companies.
For DB systems, the shortfalls are nanced by the employer, so that guarantees are on
the balance sheet of the companies. By switching to DC plans, which are not guaranteed,
the burden on the balance sheet disappears for the companies.

Another perspective associated with the transition to DC-based plans is the average
undersavings in such plans. Munnell et al. (2014) report that in 2013 the average DC
portfolio at retirement is USD 110,000, while over USD 200,000 is needed. Finally, DB
and DC dier in their costs. The CEM benchmarking (2011) which considers 360 global
DB plans with 7 trillion USD assets nd a fee range between 36 and 46 bps. Munnell
and Soto (2007) estimate the fees for DC plans between 60 and 170 bps.

Another perspective on DC and DB is nancial literacy - the ability of decision makers

to understand their investments. By denition, employees make investment decisions in
DC plans. Several studies document that a majority of employees want to delegate their
investment decision. One reason is their knowledge in nancial matters. Gale and Levine
(2011) are testing four traditional approaches to nancial education - employer-based,
school-based, credit counseling or community-based. They note that none of the literacy
eorts have had a positive and substantial impact.

2.3.0.2 Demographic Changes and Longevity

An AM trend in many countries will reect the increasing importance of asset consump-
tion by the baby boomer generation in retirement, compared to previous generations
whose main goal was saving. This shift from an accumulation regime to asset consump-
tion has a deep impact on the delivery of AM solutions. Asset consumption in retirement
is a personalized asset-liability management problem. Accumulation of wealth, on the
other hand, is much less individualized. Baby boomers will therefore demand tailored
asset-liability management solutions.

Furthermore, private savings are becoming more important due to the problems in
the rst pillar. They will be responsible for a larger part of their assets and bear the
investment risk. Given the inability to cover retirement losses, pension fund clients will
ask for less risky assets.
26 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Several nancing and redistribution risks between the active insured and the retires
exist. Many countries dene a legal minimum xed interest rate which has to be applied
to the minimum benet pension plan. This rate is in Switzerland 1.75% for 2015 and
1.25% in 2016. Given the CHF swap rate for 10 years in 2015 close to zero, it is not
possible for a pension fund to generate the legally xed rate using risk free investments.
This denes nancing risk for the contributing population to a pension plan.

To understand redistribution risk, we consider the technical interest rate. This is by

denition the discount rate for pensions. Since pensions cannot be changed in Switzerland
after the day of retirement (constitution), any reduction of the technical interest rate leads
to higher capital for the retired population in order to maintain their pensions unchanged.
The technical rates are 2016 signicantly higher in most low-interest countries than the
interest rates: The pensions paid out are simply too high, see Figure 2.7. Axa Winterthur
(2015) estimates that in Switzerland CHF 3.4 bn are redistributed from active insured
to retired persons every year. If the ratio between the active and retired populations
changes in the future due to the demographics and longevity issue, future low interest
periods will sharply increase the annually redistributed amounts.

Figure 2.7: The return of the 10y Swiss government bond, the minimum legal rate for
Swiss pension plans and the technical rate for privately insured retired individuals. If
this status remains unchanged in the next years then underfunding becomes a serious
issue and there can be no signicant return expected from investment in the xed income
asset class. The technical rates are even higher than the minimum rates which indicates
the extent at which actual pensions are too high. (Swisscanto [2015], SNB [2015], OAK
[2014]).
2.3. PENSION FUNDS 27

If interest rates are low, pension funds are forced to consider alternative investments:
Invest more or newly in stock markets, credit-linked notes, private markets, liquid invest-
ment strategies (smart beta or factor investing), insurance-linked investments, high-grade
securitized mortgages or senior unsecured loans. These alternatives induce dierent risks
and the experience of many pension funds is limited. Pension funds can also reduce their
costs. This would help but not solve any of the above problems due to demographics,
low interest rates or longevity risk.

2.3.1 Management of Pension Funds

The obvious approach to manage pension funds is to match the assets' cash ows with
the liabilities'o nes. This means optimizing the dierence between asset and liability (sur-
plus). This is not a trivial task. Reasons are mismatches of risk, growth and maturity
between assets and liabilities. While assets and risks are market-driven, the values and
risks of liabilities are dened primarily by the characteristics of contributors to pension
funds, demographic changes and policy interventions - all non-market factors. Further-
more, the growth rate of liabilities is more stable than for assets.

Another reason is implicit or explicit return guarantees on the liability side. Guaran-
tees cut linear payos of liabilities; i.e. options are generated. Unlike standard nancial
derivatives on stocks, the pricing of these options is much more complex and opaque:
the underlying assets are not tradeable and risk-sharing mechanisms must be considered
in option pricing. These options are often neither valued nor hedged. But they exist
adversely aect the goals of a pension fund. A third reason is the overlapping of genera-
tions in the design of the pension system, i.e. generation x pays also for say a yet retired
generation.

We are pursuing the less ambitious task of taking the asset side management into
account, with the liability side implicitly included in the asset return benchmark. It is
customary to divide the yield contribution into three parts: strategic asset allocation
(SAA), tactical asset allocation (TAA) and stock selection. The SAA is an asset alloca-
tion over a long-term period of 5-10 years. It is based on unconditional past information;
returns are unconditional expectations. The TAA seeks to exploit the predictability of
returns over a short to medium term horizon. TAA forecasts are conditional expecta-
tions, the current status of the nancial market or the business cycle matter. As a result,
SAA weights change slowly over time while TAA weights are more dynamic. Formally,

SAA : EP (Rt+1 ) , TAA : EP (Rt+1 |Ft )

with Ft the information set at time t. The denition of this set is basic in the Ecient
Market Hypothesis or the predictability of asset prices.

Denition 2. (Sharpe (2007)) In a SAA, an investor's return objectives, risk tolerance,

and investment constraints are integrated with long-run capital market expectations to
28 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

establish exposures to permissible asset classes and currencies. The end result is a set of
portfolio weights (of asset classes) that denes the investor's risk-return trade-o.

The SAA's primary objective is to create a long-term optimal expected risk and
return asset mix. The SAA divides assets into dierent asset classes, geographic regions,
sectors, currencies and various credit rating levels.
The TAA bets on the predictability of asset return. But are asset returns predictable?
Although the concept of a TAA has existed for more than 40 years, practitioners and
scientists attribute dierent meanings to a TAA. Practitioners use a one-period setup to
dene a TAA. Academics often use intertemporal portfolio theory to derive dynamic op-
timal investment rules. This theoretical optimal TAA that has a short-sighted one-period
and a dynamic hedging demand component. The short-sighted part of the optimal TAA
corresponds to the TAA of practitioners. The other component is missing in practice,
see Sections 4.3.4.6 and 3.1.9.

Example Historical background TAA

The rst investment rm to consider a TAA was Wells Fargo in the 1970s. The decline
in many assets during the 1973-1974 oil crisis increased investor demand for alternatives
to shifts within a particular asset class. Wells Fargo proposed shifts across asset classes
and bonds. The system was able to generate positive returns over a period when stock
markets fell more than 40 percent. In the 1980s, portfolio insurance became popular
based on the option price theory. These dynamic strategies seek to maintain a guaranteed
minimum portfolio return (oor). The Constant Proportion Portfolio Insurance (CPPI)
approach largely simplied the option approach, making portfolio insurance even more
attractive to investors. The global stock crash in 1987 shifted the investor's interest
away from portfolio insurance back to TAA, as portfolio insurance strategies mostly did
not deliver the guaranteed oor, while TAA strategies suered before the crash, but
outperformed shortly thereafter. We refer to Lee (2000) for a detailed discussion.

Let's go back to the management of pension funds. We assume that the people at the
top of the funds have little investment knowledge. Their decisions concern the SAA. At
the lower end of the fund hierarchy are the experienced asset managers. Their success
is measured in relation to TAA and they seek to generate excess returns over the TAA
benchmark by selecting assets. However, many empirical studies show that SAA is the
most important determinant of total return and risk of a broadly diversied portfolio.
This denes the discrepancy between economic relevance and know-how in the hierarchy
of decision-makers.

• Brinson et al. (1986) report that around 90 percent of the return variance arrives
from the passive investment part. Subsequent papers claried these ndings and
estimate the importance of these returns being between 40 percent and 90 percent
(see, for example, Ibbotson and Kaplan [2000]). Schaefer (2015), one author of the
2.3. PENSION FUNDS 29

professors report to the Norway's Government Pension Fund Global, states that
the variance attribution to the benchmark return was 99.1% and only 0.9% was
attributed to the active return.

• Between 5 and 25 percent are due to TAA and related to the Chief Investment
Ocer (CIO) function.

• Between 1 and 5 percent are due to security selection by the portfolio managers.

2.3.2 Demography and Pension Funds

We already considered parts of the topics demography, retirement provision and pension
systems. Before we continue to discuss these topics also from an asset management
perspective I remark that asset management is only a important tool for the solution of
the problems in the dierent retirement pillars which many countries face. Necessary for
the change of the dierent systems are deep political reforms which will restore the trust
of the populations in the retirement systems.

2.3.2.1 Demographic Facts

Though population explosion is no longer the burning issue it once was, we are still expe-
riencing staggering population growth of 2 to 3 percent per annum. Population pressure
will of course mean a growing likelihood of mass emigration to other parts of the world;
in particular if those countries with strong population growth are hit by the eects of
climate change or war.

The economically most advanced societies face another population problem. Each fu-
ture generation will be smaller than the one that preceded it. For some, this has already
become a matter of national survival. Triggered by low fertility rates, this phenomenon
is gaining ground worldwide: 46 percent of the world's population has fallen into a low-
fertility regime. There is nothing to indicate that this rate is going to recover. Magnus
(2013) states that (i) the ratio of children to older citizens stands at about 3 : 1 but is
declining. By 2050, there will be twice as many older citizens as there are children, (ii)
the number of over-60s in the rich world is predicted to rise by 2.5 times by 2050 to 418
million and (iii) in the emerging and developing worlds, the number of over-60s will grow
by more than seven times to over 1.5 billion by 2050, and behind this, you can see a
17-fold increase in the expected population of those aged over 80, to about 262 million.
Magnus (2013)

Malthus (1798) were the rst to study the interdependence between economic growth
and population growth. He assumed that as long as there was enough to eat, people
would continue to produce children.
Since this would lead to population growth rates in excess of the growth in the food supply,
30 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

people would be pushed down to the subsistence level. According to Malthus's theory,
sustained growth in per capita incomes was not possible; population growth would always
catch up with increases in production and push per capita incomes down. Of course, today
we know that Malthus was wrong, at least as far as the now industrialized countries are
concerned. Still, his theory was an accurate description of population dynamics before the
industrial revolution, and in many countries it seems to apply even today. Doepke (2012).

Hence, for Malthus children were a normal good. When income went up more
children were consumed by parents. Using a micro economic model the equilibrium
supports the above intuition: An increase in productivity causes a rise in the population,
but only until the wage is driven back down to its steady- state level. Even sustained
growth in productivity will not raise per capita incomes. The population size will catch
up with technological progress and put downward pressure on per capita incomes. This
model explains the relationship between population and out-put for almost all of history,
and it still applies to large parts of the world today. Doepke (2012).

In developed, Western countries, persistent sub-replacement fertility levels, ageing,

and immigration are recognized as the three major population policy issues. Sub-
replacement fertility and immigration, in particular, are areas in which eective policies
are hard to come by. The debate, May (2012), is marred by controversy and passion
and discussions on policy issues are polarized. Policy actors seem to be torn between a
laissez-faire attitude and increasing immigration. Increasing immigration has two serious
limitations. First, the level of immigration cannot grow arbitrarily high without gen-
erating political tensions. Second, it is becoming increasingly dicult to nd the kind
of migrants one wishes to attract since more and more countries are striving to attract
highly skilled migrants. Japan, South Korea, and Taiwan populations are shrinking.
Yet they still resist immigration. They choose automation as a response to dwindling
manpower. In Western democracies, immigration has become an ideology to the extent
that any rational discussion thereof is barely possible. While any forecasts regarding
personal longevity are uncertain, in the last 150 years women have seen their average life
expectancies increase at a rate of three months each year. All those who have forecast
that growth in personal longevity will come to a standstill have been proved wrong. But
there are currently two factors that could well put a stop to growth in average longevity:
the rapid growth of so-called lifestyle illnesses and increasing medical care costs. The
breakdown of the Soviet Union showed that once medical care fails to maintain its level
of quality for the whole population, that population's life expectancy quickly falls signif-
icantly.

But ageing in developed countries occurs in parallel with better health, more extensive
education, and related societal changes. We are not just living longer, we are slower to
age. Boersch-Suppan et al. (2005, 2006, 2013) make this precise. They nd that:
2.3. PENSION FUNDS 31

• The average expected healthy life expectancy of men at the age of 65 is larger than
5 years for men living in any European country.

• Using more than 4.8 million data sets of a large insurance company, the authors
measured the productivity of dierent aged workers for dierent type of work
classes: Contract negotiation (the most challenging jobs), standard advice of cus-
tomers and repetitive jobs. They found that older workers made more error in the
repetitive jobs than the younger ones but were signicantly more productive in the
challenging jobs than their younger counter parts.

• The intergenerational warfare is a myth. There is no support in the data used to

analyse conicts between children and parents of such a potential warfare.

Fact 3. The discussion whether we can work until the age of say 67 is for the average
population not related to its healthiness. The retirement age could from a health point of
view be raised to 70 years. The tendency of ring older works is destruction productivity
since the experience of the older, motivated workers generates a higher productivity for
demanding jobs as this is achieved by younger ones.

We spend longer in education; we travel more before permanently joining the work-
force; we start families later. We don't think of ourselves as being as old as previous
generations would have at the same age. The eect of all these changes taken together
is not that society is ageing, but that it is getting younger. Finally, a society with
a predominantly young population has a dierent productivity level than a more aged
population. Syl and Galenson show that 40 percent of productivity increases are down
to young people who enter new markets. These young people break with tradition and
manifest new ways of thinking. Google and Facebook are two prominent examples. Older
individuals possess more experience and wisdom. But Syl and Galenson state that this
only gradually changes productivity.
To manage the emerging demographic regime, innovative policies and new ways of think-
ing about population are called for. Romaniuk (2012). This change in the structure of
society will have many consequences. One of the most signicant will be a labor short-
age. If societies are going to maintain their standard of living, they are going to have
to avoid any reduction in the workforce as a proportion of the total population. At the
same time, many people are going to reach retirement age and realize that they do not
have enough income to maintain what they feel is an acceptable standard of living. The
combination of these two issues will put a lot of pressure on our current views on the
relationship between working and retirement. Employment and retirement laws designed
for a young and growing population no longer suit populations that are predominantly
old but healthy and capable of being productive, all the more so in a work environment
of automated technology. Prevailing family assistance policies are equally antiquated.
Though the maternity instinct may still be present as it always was, women's conditions
have radically changed. The women of today in developed countries, and throughout
the modernizing world, are faced with many deterrents to maternity (e.g., widespread
32 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

celibacy, marital instability, nancial insecurity) on the one hand, and with many ful-
lling, nancially well-rewarded opportunities on the other. So much that they are left
with little incentive to trade the latter for the uncertainties of motherhood.

It is easier to bring population down than to make it up, writes John May (2012).
And that is why - in order to escape the sub-replacement fertility trap and to bring the
fertility rate to, and sustain it at, even a generational replacement level, Romaniuk (2012)
- we need to bring to bear meaningful nancial and social rewards for maternity. The
current family allowance and other welfare-type assistance to families cannot do this.
Societies under a demographic maturity regime may need to have in place permanent,
'life-sustaining' mechanisms to prevent fertility from sliding ever lower. Instead we need
a more balanced resource allocation between production and reproduction.

Impact on Retirement Systems

With such demographic development, it will not be possible to meet the promises of the
three pillars of social welfare in many countries. This will lead to more saving behav-
ior on an individual basis and solidarity between generations (the rst pillar) will come
under stress. In order for the retirement system not to collapse, the state will have to
dene reforms, see Albrecher et al. (2016). Will it save the rst pillar - that is, it will
secure the minimum necessary standard of living for all? How will the second and third
pillars be changed or will they disappear? As a result, people will individually save more -
because they have to and because condence in the social welfare system will not increase.

The Melbourne Mercer Global Pension Index report (MMGPI [2015]) from the Aus-
tralian Centre for Financial Studies and Mercer compared the status of the retirement
systems of 25 countries. The index is based on the following construction; see Figure 2.8.

Although it is called a 'pension index', it allows one to consider the entire retirement
systems of the dierent countries. Figure 2.9 summarizes the results for the 25 countries
surveyed.

2.3.2.2 Pension Funds

The pension fund assets in the OECD member countries encompassed USD 23 trillion
in 2014. The collision between demographics and the strong reliance on pay-as-you-go
systems in developed countries requires resolution; if not, these problems can be expected
to spread to the rest of the world. There are a number of ways of approaching this,
including (Walter (2007)):

• Raising mandatory social charges on employees to cover increasing pension obli-

gations. This is very problematic due to the 'inverse' demographic pyramid and
becomes even more dicult to implement in countries where individuals already
face a high tax burden.
2.3. PENSION FUNDS 33

Benefits Coverage Regulation

Savings Total Assets Governance
Tax Support Contributions Protection
Benefit Design Demography Communication
Growth Assets Government Debt Costs

Adequacy Sustainability Integrity

40% 35% 25%

Melbourne Mercer Global Pension Index

Figure 2.8: The Melbourne Mercer Global Pension Index (GMMPI [2015]).

Grade Index Value Countries Description

A >80 DK, NL Robust retirement system that

delivers good benefits, is
sustainable and has a high level
of integrity
B+ 75-80 AU
Compared to A-grade it has some
S, CH, Finnland, CA, Chile, areas for improvement
B 65-75
UK
C+ 60-65 Singapore, D, Ireland It has some good features but also
has major risks and/or
shortcomings. Without adressing
C 50-60 F, USA, Poland, SA, BR, A, I,
them efficacy and/or
Mexico
sustainability can be questionned.
D 35-50 Indonesia, China, J, South A system with major weaknessses
Korea, India and omissions that need to be
adressed.
E <35 Nil
A poor or non-existing system

Figure 2.9: Summary for the 25 countries in the Melbourne Mercer Global Pension Index
as of 2015 (Adapted from GMMPI [2015]).
34 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

• Cutting retirement benets. Limiting the growth of pension expenditures to the

projected rate of economic growth starting in 2015 reduces the income-replacement
rate from 45 percent to 30 percent over a period of 15 years. Walter (2007). This
would push retired people with low personal saving resources into poverty.

• Increasing the retirement age. For countries with a high unemployment rate this
is not a feasible alternative.

• Reforming the systems away from pay-as-you-go toward dened-contributions or

dened-benet pension plans. This is a possibility, and would create a huge demand
for professional asset management.

• Keeping the pay-as-you-go systems and reducing the contribution to the pension
funds.

These changes impact asset management. The demographic problems in developed

countries and the diculties in nding structural solutions will force pension funds to
increase their investment performance.

The asset allocation of pension fund assets diers signicantly between countries.
The exposure to growth assets (including equities and property) varies and ranges from
less than 10 percent, in India, Korea, and Singapore, to about 70 percent in Australia,
South Africa, the UK, the US, and Switzerland. GlobalPensionIndex (2015). The more
growth assets are included in the asset allocation, the larger are the risks: there were
signicant declines in the value of assets in 2010 and 2011 reecting the consequences of
the global nancial crisis of 2007 and 2008. However, since that time there has been a
steady recovery in the level of pension assets in each country surveyed as equity markets
have recovered. GlobalPensionIndex (2015).

2.3.2.3 Role of Asset Management

Asset management can support the pension system in three respects: First, asset manage-
ment could become more ecient - this means to save costs. Second, asset management
could expand the range of solutions - the investment strategies and nally, asset man-
agement could expand the investment opportunity set - the assets.

The expansion of investment strategies means to apply factor investing for example.
All pros and cons of last sections also apply to the pension system case. The third pos-
sibility means to make some illiquid asset classes accessible for pension funds. Examples
are private equity, insurance-linked investments and securitized loans. These are the
typical examples given.

Example
2.3. PENSION FUNDS 35

Asset managers can become more important nancial actors by driving the raising of
capital and the capital deployment required to meet the demands of growing urbaniza-
tion and cross-border trade. The world urban population is expected to increase by 75
percent between 2010 and 2050, from 3.6 billion to 6.3 billion. The urban prole in the
east will see many more 'megacities' (cities with a population in excess of 10 million)
emerging. Today's number of 23 megacities will be augmented by a further 14 by 2025,
of which 12 will be in emerging markets.

This will create signicant pressure on infrastructures. According to the OECD, USD
40 trillion needs to be spent on global infrastructure through 2030 to keep pace with the
growth of the global economy. Some policy makers appear to have taken the problem
on board: in Europe - after considerable debate - the European Long Term Investment
Funds (ELTIF) initiative was nally created in 2013, helping European asset managers
to invest in infrastructure. But infrastructure investments will disproportionately target
emerging markets and emerging markets' asset managers have recognized this and already
started to focus on it.

Whatever of the above measure is considered, it is evident that asset management

alone is not able to solve some of the fundament problems of pension funds which we
discussed above. At its best, the asset management function can help to reduce some
costs or to improve the likelihood of higher investment returns. But it cannot produce
what it is not possible - this means, to solve the problem of the demographic change.
But the asset management function can play an important role in two other aspects.
First, it can provide solutions for the baby-boomers with their asset decumulation needs.
Second, asset management will be central for increased private savings of individuals due
to the weakness of the rst and second pillar. The growth of the future AuM will arise
much more from this channel than from the traditional pension fund channel.

2.3.2.4 Investment Consultants

Investment consultants play an important role as intermediaries, in particular for insti-
tutional investors and pension funds. They oer the following services: asset/liability
modelling, strategic asset allocation, benchmark selection, fund manager selection, and
performance monitoring. Goyal and Wahal (2008) estimate that 82 percent of US public
plan sponsors use investment consultants, as do 50 percent of corporate sponsors. In-
vestment consultants have largely avoided the attention of academics with one notable
exception - Jenkinson et al. (2014). A recent survey by Pension and Investments [2013]
found that 94 percent of plan sponsors employed investment consultants. The ve leading
investment consultants worldwide - ranked by 'assets under advisement' - in 2011 were
Hewitt Ennis Knupp (USD 4.4 trillion), Mercer (USD 4.0 trillion), Cambridge Associates
(USD 2.5 trillion), Russell Investments (USD 2.4 trillion), and Towers Watson (USD 2.1
trillion).
36 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

What drives investment consul-

Jenkinson et al. (2014) ask the following questions:
tants' recommendations of institutional funds? What impact do these recommendations
have on ows? Do recommendations add value for plan sponsors?

The authors use data from eVestment and limit their analysis to US long-only equity
products, which can be considered to be among the ecient markets. In the approxi-
mate period 1999 to 2011, one-quarter of these products were recommended annually by
investment consultants and the rest were not recommended. This much larger number
of recommended products compared to the non-recommended ones remains stable in the
dierent years studied.

The authors nd, the rst question, that consultants' recommendations are partly
driven by past fund performance, but also by other soft factors such as service quality and
investment quality factors, Jenkinson et al. (2014) : to be recommended it is not su-
cient to have a strong return history. The authors then analyze whether the size of the
fees charged has an impact on the recommendation rate. If this were the case, conicts
of interest would be suspected. The analysis shows that this is not the case. Fees are
very similar for recommended and non-recommended products independent of the size of
the products and their styles (growth, value, small- and mid- cap). The fees are in line
with the fees in Section 2.7.4.3 - that is to say, close to 70 bps for larger products.

Recommendations, and in particular changes in recommendations, have a strong im-

pact on product ows (question 2): Moving from zero-recommendation to the case where
all consultants recommend leads to an additional inow of assets of USD 2.4 billion. On
a percentage basis, on average the extra inow equals 29 percent of the assets managed
by that product in the previous year, compared to a not shortlisted product.

The answer to the third question created a lot of public attention. They construct
equal- and value-weighted portfolio returns of recommended and non-recommended prod-
ucts. Using the returns of these portfolio they estimate one- (the CAPM), three- (FF),
and four- factor (FFC) alphas and excess returns over portfolios of selected benchmarks.

For the equally weighted portfolios, the returns of the recommended products were
signicantly lower than those of the non-recommended ones by the order of 1 percent in
magnitude, independent of the factor model chosen (see Figure 2.10). For value-weighted
Value-
portfolios, dierent factor models lead to dierent returns for the two alternatives.
weighted returns and alphas are consistently lower, suggesting that smaller products per-
form relatively better. Jenkinson et al. (2014). Summarizing the evidence: investment
consultants are not able consistently to add value by selecting superior investment prod-
ucts.

The underperformance of recommended products in the equally weighted case could

be explained by the tendency of consultants to recommend large products that perform
2.3. PENSION FUNDS 37

worse. However, after adjusting for dierent sizes, the explanation turns out to be wrong.

Figure 2.10: The table shows the performance of portfolios of actively managed US equity
products that experience a net increase (decrease) in the number of recommendations in
the twelve or twenty-four month period following the recommendation change. Perfor-
mance is measured using raw returns; returns in excess of a benchmark chosen to match
the product style and market capitalization; and one-, three-, and four-factor alphas
(corresponding to the CAPM, the Fama - French three-factor model, and the Fama -
French - Carhart model). Excess returns and alphas are expressed in percent per year.
All reported gures are gross of fees. The rst part of the table shows the results for
equally weighted portfolios of products whereas the second part of the table shows the
same statistics for portfolios of products weighted using total net assets at the end of the
previous year. t-statistics based on standard errors - robust to conditional heteroscedas-
ticity and serial correlation of up to two lags as in Newey and West (1987) - are reported
in parentheses. ***, **, * mean statistically signicant at the 1, 5, and 10 percent levels,
respectively. The benchmarks for the investment products are the corresponding Rus-
sell indices. Investment product large cap growth is benchmarked by the Russell 1000
Growth, the small cap value by the Russell 2000 Value, etc. (Jenkinson et al. [2014]).

These results raise several questions. First, why do pension funds use - on a rational
basis - investment consultants that add no value? The argument, that consultants act as
insurance against being sued is simply not justiable. Second, it is dicult to understand
why investment consultants are virtually unregulated in most jurisdictions.
38 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

2.4 Who Decides?

Investors can decide for themselves or delegate the decision to third parties. Each type
of decision is subject to a comprehensive regulatory framework. After the introduction,
we focus on the MiFID II rules.

An investment decision today has to meet much more regulatory standards than in
the past. Regulation denes restrictions and rules for decision-making, but it never sets
an AM rm goals.

Individual regulations can have strategic or operational implications for AM. High
operational impacts have UCITS, PRIIPS, EMIR or MiFID II.
1 Low Strategic Impact
have PRIIPS and MAD II. MiFID II, the Volcker Rule or Dodd-Frank Act, UCITS have
a high strategic importance. The ability of international banks and large AMs after the
GFC to comply quickly and integrate the regulatory program into their strategic plan-
ning resulted in a competitive advantage over smaller institutions. The know-how of the
international institutions enables them to participate actively in the technological change
eciently. They are almost invulnerable, despite the many and heavy nes imposed on
them by the many scandals in recent years.

Example Impact of Regulation on the Swiss banking sector and asset management

Regulatory burden togehther with broken business models impact the nancial indus-
try. It is estimated that of the approximately 300 Swiss banks in 2014, about one-third
will stop operating as an independent brand. A KPMG study from 2013 (KPMG [2013])
summarizes:

1 PRIIPs are the Packaged Retail Investment and Insurance-based investment Products documents
and UCITS is The Undertakings for Collective Investment in Transferable Securities Directive for collec-
tive investments by the European Union. Obligations for central clearing and reporting (EMIR, Dodd
Frank) and higher capital requirements for non-centrally cleared contracts (CRR), the obligation to trade
on exchanges or electronic trading platforms is considered by revising MiFID, the so-called The Markets
in Financial Instruments Regulation (MiFIR). US T+2 means the realization of a T+2 settlement cycle
in the US nancial markets for trades in cash products and unit investment trusts (UITs). FIDLEG is
part of the new Swiss nancial architecture which should be equivalent to MiFID II of the euro zone.
In 2013, following the LIBOR and EURIBOR market-rigging scandals, the EU Commission published
legislative proposal for a new regulation on benchmarks (Benchmark Regulation). The Asia Derivative
Reform mainly focus on the regulation of OTC derivatives and should therefore be compared with EMIR
and Dodd-Frank Act. The Market Abuse Directive (MAD) in 2005 and its update MADII resulted in
an EU-wide market abuse regime and a framework for establishing a proper ow of information to the
market. BCBS considers principles of risk data aggregation and reporting by the Basel Committee on
Banking Supervision. Comprehensive Capital Analysis and Review (CCAR) is a regulatory framework
introduced by the Federal Reserve in order to assess, regulate, and supervise large banks and nancial
institutions. EU FTT means the EU Financial Transaction Tax. IRS 871 (m) are regulations of the
IRS about dividend equivalent payment withholding rules for equity derivatives. CRS are the Common
Reporting Standards of the OECD for the automatic bank account information exchange.
2.4. WHO DECIDES? 39

• A total of 23 percent of Swiss banks faced losses in 2012. All of them with AuM
of less than CHF 25 billion.

• Non-protable banks in 2012 were mostly not protable in previous years too.

• Dispersion between successful banks (large and small ones) and non-performing
banks (small ones) is increasing.

• The performance of small banks is much more volatile than that of larger ones.

• Changes of business model in large banks seem to be successful.

• A total of 53 percent of the banks reported negative net new money (NNM).

Small asset managers, many of them rms with less than 5 employee's, faced after the
GFC's due to the regulatory and legal changes a cost and lack of knowledge problem.
They failed to have legal and compliance know how and it was also not protable to higher
specialists in these elds. Similarly, they could not invest in new, scalable technologies for
accounting, strategy construction, performance calculation and attribution. etc. Both
factors led to platform-as-a-service (PaaS) innovations where the dierent services are
outsourced and are bought by connecting via API technology.

Many of the regulatory initiatives launched in recent years are related to asset man-
agement and trading. We consider the eurozone. The Alternative Investment Fund
Managers Directive (AIFMD) mainly acts in the hedge fund sector, whereas UCITS are
key for the fund industry. EMIR regulates the OTC derivative markets, and PRIIPS
initiative is responsible for the key information for retail investors in the eurozone. Mi-
FID II provides harmonized regulation for investment services across the member states
of the EU with one of the main objectives being to increase competition and consumer
protection in investment services. In the US, the Dodd-Frank Act is the counterpart of
many European initiatives.

Regulatory initiatives place greater demands on asset managers and their service
providers. They force changes in the areas of customer protection, agreements with
service providers, disclosure of regulatory and investor information, distribution channels,
trade transparency, and compliance and risk management functions (PwC [2015]).

2.4.1 MiFID II
The MiFID II Directive implements the G20 Pittsburgh Summit Agreement in 2009 in
the euro area and for all non-EU nancial intermediaries oering investment products in
the eurozone. It requires the adoption of 32 legal acts by the European Commission, 47
regulatory standards, 14 performance standards and 10 packages of measures.
2 MiFID
2 Similar remarks apply also to other regulatory initiatives such as Dodd Frank Act in the US. Its
implementation requires to create 398 new rules for governing nancial activities, disclosures and pro-
40 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

II has the following goals:

• The creation of a robust framework for all nancial market players and nancial
instruments.

• Improving the supervision of the various market segments and market practices, in
particular OTC nancial instruments.

• Strengthening market integrity and competition through greater market trans-

parency.

• Harmonization and strengthening of regulation.

• Improving investor protection.

• Limiting the risks of market abuse in relation to derivatives on commodities, in

particular for futures of essential goods.

Investor protection is based on four topics. First, inducements, i.e. the need to
disclose independent versus non-independent status of advice and the prohibition for
discretionary managers and independent advisers to be involved in inducements. Prod-
uct governance means that the manufacturers' product approval process has to include
the target market denition which has to be taken into account by the distributors and
which has to be tracked by the asset managers. Suitability and appropriateness
requires from all investment rms operating in EU countries to provide clients with ade-
quate information for assessing the suitability and appropriateness of their products and
services, and to comply with best execution obligations. Finally, client information
requires that enhanced information is shared with clients, both regarding content and
method such as in particular costs and charges for services or advice.

In the eurozone, suitability and appropriateness have to follow client segmentation

and intermediation segmentation (see Figure 2.11). This segmentation applies to all EU
and all non-EU banks oering investment products in the zone.
Intermediation Channel Segmentation

• Execution only: Investors decide themselves and investment rms only execute
orders.

• Advisory: Investors and investment rm sta interact. While relationship managers
or specialists advise the investor, the investment decision is nally made or approved
by the investors themselves. Advisory was the traditional intermediation channel
before the nancial crisis of 2007.

cesses, conduct 67 studies, and issue 22 periodic Reports. The law itself consists of 2'300 pages, without
estimating the nal documents for implementation.
2.4. WHO DECIDES? 41

Figure 2.11: Client segmentation and intermediation segmentation as per MiFID II.

• Mandate: The investor delegates the investment decision in a mandate. The man-
date contract reects the investor's preferences. The portfolio manager chooses
investments within the contracted limits. Many banks and asset managers moti-
vated their clients to switch from the advisory to the mandate channel after the
GFC. The main reasons for this are lower business conduct risk and better oppor-
tunities for automatization. These reduce production costs and enhance economies
of scale. Since the active portfolio managers are benchmarked against the CIO's
TAA mandates they face the same problems as actively managed funds - most of
them will turn out to be zero-alpha funds, see Section 4.6.6.3. This will motivate
many customers to move back to the advisory or execution only channel.

Client Segmentation. Investment rms must dene written policies and procedures
according to the following categorization:

• Eligible counterparties such as banks, large corporates, and governments.

• Professional clients. A professional client possesses experience, knowledge, and

expertise with which to make his or her own investment decisions and properly
assess the risks thus incurred.

• Retail clients (all other clients).

Wealth as the sole variable for the classication of customers is no longer applicable.
Customers can both opt to opt up or down. So they can choose a less or stringent
protection category. Suitability and appropriateness requirements are dened in each
42 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

cell of the 3×3 segmentation matrix (Figure 2.11). Client suitability addresses the
following six points:

1. Information on clients

2. Information provided to clients

3. Client knowledge and experience

4. Financial circumstances of the client

5. Investment objective

6. Risk awareness and risk appetite

These six points reect the parameters that dene the optimization problem of a
rational economic investor. To determine the preferences of an investor one needs to
have general information about the investor (4.1) and specic risk attitudes (6), which
both enter into the objective function (5). The optimization of the objective function
leading to the optimal investment rule is carried out under various restrictions: the
budget restriction (4) and restrictions of admissible securities due to their complexity or
the experience of the investor (3). Tax issues, legal constraints, and compliance issues
also enter into the restriction set and require information to be provided to the client
(4.3). These six points are therefore sucient for the investor to determine his or her
optimal investment strategy.
Client product suitability consists of requirements that ensure that the product
is suitable:

1. Specic service-/product-related restrictions

2. Adverse tax impact

3. Requirements for prospectuses

4. Disclaimer

These requirements become less demanding the more experienced the client is. Suit-
ability in advisory services requires qualied sta and an appropriate incentive structure
in the asset management rm.

2.4.2 Investment Process for Retail Clients

How are the investor's preferences elicited, transformed into investment guidelines, and
managed over time for retail clients? Figure 2.12 illustrates an investment process. Given
the client's need, his or her preferences are compared with the CIO view and its trans-
formation into CIO portfolios. This comparison denes the theoretical client portfolio.
Using the securities from the producers the theoretical portfolio is transformed into the
(real) client portfolio. Life-cycle management controls the evolution of the client portfolio
2.4. WHO DECIDES? 43

over its life cycle and compares the risk and return properties with the initially dened
client prole. If necessary, this process sends warning or necessary activity messages to
the client and/or advisor. A CIO view typically consists of several inputs such a quanti-
tative model, research macro view and market view. Smaller institutions do not have the
resources to provide all these inputs. They then buy the CIO view from another bank.

Figure 2.12: An investment process. The three channels from left to right are the client
- advisor channel, the investment oce, and the producers of the assets or portfolios
(trading and asset management).

Traditionally, intermediaries used questionnaires to reveal investors' preferences. This

approach is fully replacable by electronic end-to-end processes in wealth and asset man-
agement. Client risk proling, preference and know- how are one part in the process. The
traditional questions are more and more replaced by decision problems which is closer to
a game. This gamication in the customer journey is liikely to produce more accurate
investor prferences.

New trends in technology allow the process outlined in Figure 2.12 to be shaped. In
extremis, there will be no need for an investor to disclose his or her investment preferences
since the data already exist in the virtual world. If, furthermore, the investment views are
formed in a fully automatized manner using publicly available data, then the function
both of advisor's and of the CIO will become superuous. Digital money managers
are enticing with deep barriers to entry. Your performance is impressive. The greatest
weakness is customer understanding.
44 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

2.4.3 Robos
Selma looks young and trendy. Her white-blond hair is formed into a casual bob hairstyle.
The glasses are tinted, the lipstick is bright red. Selma keeps smiling and winking at you.
She's with everyone right now by you. 'Hi! I'm Selma. Let's have a quick chat about
your nances,' she writes. Selma is warm - but not a person of esh and blood. She is
the digital investment assistant for Selma Finance, a Robo Advisor. They have come to
challenge traditional asset management and reinvent it with technical assistance. The
idea is to take the complexity out of classic nancial services in a playful way. Since the
appearance of the iPhone 2008 , little by little, all areas of everyday life are being digitized,
driven by technological progress and the belief that computers not only perform tasks
faster and cheaper, but also better with the help of articial intelligence. This section is
based on Gerbl (2019).

2.4.3.1 Markets
The digitization of wealth management initiated in the USA and Great Britain after the
GFC: A retail customer practically no longer get investment advice in the UK cost and
compliance after the GFC forced banks to change their business model. A niche opened
up that was quickly lled by nancial service providers. Over 100 Robo Advisors are
now active in the USA. Companies such as Vanguard, Charles Schwab, Betterment and
Wealthfront dominate the market. ETF giant Vanguard's Robo alone is responsible for
$ 120 billion in investment money. In total, Robos is increasing more than 800 billion
dollars. The Robos are forecast to manage around USD 2.2 trillion in 2023; one-third of
Blackrocks 2018 AuM.

In Switzerland or Germany, investors are receiving better care. Every regional bank
picks up the customer and covers him with products where technology supports the RM.
Hence, a hybrid model applies so far. This evidently scales much less than the digital
world in Anglo-Saxon or Scandinavian countries. But proponent of the hybrid model
base their business approach on the assumption that wealth management is not bought,
but sold. This does not usually happen with Robos, so there are no huge inows in
Switzerland or Germany so far. Currently, 200 million francs are being managed for end
customers on its own platform in Switzerland (Gerbl (2019)). Is this cultural evidence
strong enough to outweigh the advantages of the Robos

2.4.3.2 Advantages of Robos

Robos have brought a democratization of asset management in the sense that services
are oered already for a few thousand dollars. But what is the quality of these ser-
vices? Traditionally, Robos invests client funds in diversied exchange-traded ETFs that
passively follow an index. Hence, meaningful Robos diversify client's wealth. The port-
folio construction mostly follows traditional nance: Robust mean-variance optimization
superimposed with some recommender system for more wealthy clients apply. Social
2.4. WHO DECIDES? 45

trading is also oered by some rms. In any case the information structure used to
form the portfolios follows the EMH as an anchor (say mean-variance optimization us-
ing historical estimation of the input variables) and as an overlay individual views and
preferences as well collective market participants views. They therefore do not engage in
more expansive active management with its doubtful performance track record. So far
there is not much intelligence in the Robos. But they follow the investment strategy of
the customer and therefore rebalancing is a service which Robos oer.

Clients are oered visualizations to change the weights in the portfolio construction
following their preferences. Anecdotal evidence states that second customer adapts such
pattern strategies. Such patterns cannot be considered as passive investment any longer.
In fact, more and more Robos are oering such active components for wealthier and
more experienced investors. They can playfully simulate dierent strategies and compare
them by backtesting. The programme decides whether the wishes are fullled or not.
Costs are an important component whether or not it pays to use Robos. In comparison
to traditional asset management mandates with costs usually well above one percent,
Robos with at fees of 0.68 percent are a cheaper alternative. In the USA, the average
cost of Robos is less than 0.4 percent. How protable are Robos? Agnesens (2019)
state that `Since the beginning of 2000, Robo strategies have yielded up to two percent
better returns than mixed funds every year.' She compared strategy funds and dierent
Robo-advisory model portfolios, see Figure 2.13.
The dierences increase with increasing risk which is due to increasing costs for strat-
egy funds facing increased risk. In Agnesens' comparison the Robos before costs are even
slightly ahead of the strategy funds.

What are main critics against Robos? First it is claimed that many investors do not
understand Robos, i.e. they are primarily concerned with the investment side and less
with the client. Second, it is claimed that Robos don't know their customers well enough
and don't explain to them what's going on in the markets and with their investments.
There is no deeper understanding of the customer and his needs. A Robo Advisor is far
from the personal advisor you ideally know for many years. If this holds true this can
become a problem in turbulent markets. But one can also state that to know possibly
customers better was not of any help in the nancial crisis.

2.4.4 Mandate Solutions for Pension Funds

This section follows Lanter (2015). Figure 2.14 illustrates the investment decision process
for a pension fund.
Asset Liability Management (ALM) is the rst step, often involving external advice.
This analysis provides a transparent picture of current assets and liabilities and how they
may change in the future due to the various risk factors. Fulllment of the Pension Fund's
long-term objectives based on the analysis denes the strategic asset allocation, ie the
allocation that should be stable through the possible future economic and nancial mar-
ket cycles. The tactical asset allocation is the next step. The pension funds must decide
46 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Strategy Funds vs Robo Adivsory

Jan 2000-Aug 2019

2.5

2
Annual Return %

1.5

0.5

0
0 2 4 6 8 10 12 14

-0.5
Annual Volatility %

Figure 2.13: Strategy funds versus Robo Advisory strategies after costs. The blue dots
represent strategy funds. The red symbols Robo Advisors. The red circle are model
portfolios with equity up to 25%, the diamonds are between 25 − 50% and the squares
are for portfolios with more than fty percent equity. (Agnesens [2019]).

whether to delegate the TAA to external portfolio managers in the form of a mandate or
whether they will manage the assets within the fund. Furthermore, the benchmark and
the denition of risk-based areas for tactical asset allocation have to be determined. It
must also be decided whether the reporting, administration and risk controlling functions
of the investment portfolios should also be outsourced. In case of outsourcing, request for
proposal will be used. The entire investment decision outsourcing process is conducted
with the involvement of external consultants. Goyal and Wahal (2008) estimate that 82
percent of US public pension funds use pension consultants.

We discuss in Section 2.3.2.4 that the extensive use of investment consultants raises
is by no means free of conicts for the performance of the delegated investments and
for the selected asset managers. Critics for example often make them the accusation to
be drivers of new investment strategies which turn out to be more complex (hence more
dicult to handle, understand and also more expensive) than the actual used ones but
where it is not clear whether they lead to a larger performance.

The other steps in the process, as illustrated and described in the last gure, are
evident.
2.4. WHO DECIDES? 47

Figure 2.14: Process for a mandate in a pension fund (Lanter [2015]).

2.4.5 Conduct Risk

The largest risk for investment rms is conduct risk in the investment process. Conduct
risk comprises a wide variety of activities and types of behavior that fall outside the other
main risk categories. It refers to risks attached to the way in which all employees con-
duct themselves. A key source of this risk is the diculty of managing information ows,
their impact, their perception, and responsibilities in an unambiguous way. Consider
an execution-only investor who does not understand a particular statement in a given
research report. Can the relationship manager help the execution-only investor without
entering into conict with his or her 'execution-only' status - that is, help without advis-
ing? To hedge their conduct risk sources investment rms are forced to work out detailed
and well-documented processes concerning the information ow between themselves and
the customer. While this paper work may be eective as a hedge against conduct risk,
its eciency is questionable.

Example

The Financial Stability Board (FSB) stated in 2013: One of the key lessons from
the crisis was that reputational risk was severely underestimated; hence, there is more
focus on business conduct and the suitability of products, e.g., the type of products sold
and to whom they are sold. As the crisis showed, consumer products such as residential
mortgage loans could become a source of nancial instability. The FSB considers the
48 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

following issues key for a strong risk culture:

• Tone from the top: The board of directors and senior managers set the institu-
tion's core values and risk culture, and their behaviour must reect the values being
espoused.

• Accountability: successful risk management requires employees at all levels to un-

derstand the core values of the institution's risk culture. They are held accountable
for their actions in relation to the institution's risk-taking behaviour.

• Eective challenge: a sound risk culture promotes an environment of eective chal-

lenge in which decision-making processes promote a range of views, allow for testing
of current practices, and stimulate a positive, critical attitude among employees and
an environment of open and constructive engagement.

• Incentives: nancial and non-nancial incentives should support the core values
and risk culture at all levels of the nancial institution.

Conduct risk is a real source of risk for investment rms: nes worldwide amounted
to more than USD 100 billion for the period 2009-2014. These nes and the new reg-
ulatory requirements raise serious protability concerns for investment rms and banks
(see Figure 8). But there is more than just nancial costs at play for the intermediaries.
A loss in trust in large asset managers and banks can prove disastrous. In particular if
new entrants without any reputational damage can oer better services thanks to Fin-
Tech. Figure 2.15 shows the evolution of the nes imposed by the British regulatory
authorities (Left Panel) and the global value of nes. One sees that it took about three
years after the GFC to charge the nes to the banks, insurance companies and asset
managers. The global gures now exceed USD 230 bn since the start of the GFC. The
horizontal lines in the histogram show how large the individual nes were. It follows from
example that there was a ne in 2014 of more than USD 15 bn to a single institution.
In the US, enforcement statistics from the Securities and Exchange Commission (SEC)
show an increase in enforcement actions in the category investment advisor/investment
company of roughly 50% following the GFC. Compared to the pre-crisis gures of 76 and
97 cases per year, respectively, 2011-2014 returned respective gures of 130 and 147 cases.

Anti-tax-evasion and anti-money-laundering measures are driven by the OECD. Af-

ter the Base Erosion and Prot Shifting (BEPS) report of 2013, asset managers operate
in a world with country specic reporting of prots and tax paid. Therefore, oshore
nancial centers try to have access to double tax treaties (DTT) which motivates asset
managers to use cross-border passports and reciprocities. But it also forces asset man-
agers to decide in which location they want to be active and where they want to step back.
2.4. WHO DECIDES? 49

Figure 2.15: Left Panel: Table of nes imposed in the UK (FSA and FCA web pages).
Right Panel: Global value of nes (FT research, June 2015).

Example - Hedge fund disclosure

Patton et al. (2013) show that disclosure requirements for hedge funds are
not sucient to protect investors. The SEC for example requires US-based hedge
funds managing over USD 1.5 billion to provide quarterly reports on their perfor-
mance, trading positions, and counterparties. The rule for smaller hedge funds are less
detailed. Instead, one has to care seriously about the quality of the information disclosed.

We consider monthly self-reporting of investment performance where thousands of

individual hedge funds provide data to one or more publicly available databases which
are then widely used by researchers, investors, and the media.

Are these voluntary disclosures by hedge funds reliable guides to their past perfor-
mance? The authors state:

... track changes to statements of performance in 'vintages' of these databases

recorded at dierent points in time between 2007 and 2011. In each such 'vintage', hedge
funds provide information on their performance from the time they began reporting to
the database until the most recent period.
50 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Vintage analysis refers to the process of monitoring groups and comparing perfor-
mance across past groups. These comparisons allow deviation from past performance to
be detected. The authors nd

that in successive vintages of these databases, older performance records (as far back
as 15 years) of hedge funds are routinely revised: nearly 40 percent of the 18, 382 hedge
funds in the sample have revised their previous returns by at least 0.01 percent at least
once, and over 15 percent of funds have revised a previous monthly return by at least
1 percent. These are very substantial changes, given the average monthly return in the
sample period is 0.64 percent.

Less than 8 percent of the revisions are attributable to data entry errors. About
25 percent of the changes were based on dierences between estimated values at the
reporting dates for illiquid investments and true prices at later dates. Such revisions
can be reasonably expected. In total, 25 percent (50%) of the revisions relate to returns
that are less than three months old (more than 12 months old). They nd that negative
revisions are more common, and larger when they do occur than positive ones. They
conclude that on average initially provided returns signal a better performance compared
to the nal, revised performance. These signals can therefore mislead potential in-
vestors. Moreover, the dangerous revision patterns are signicantly more likely revised for
funds-of-funds and hedge funds in the emerging-markets style than for other hedge funds.

Can any predictive content be gained from knowing that a fund has revised its history
of returns? Comparing the out-of-sample performance of revising and non-revising funds,
Patton et al. (2013) nd that non-revising funds signicantly outperform revising funds
by around 25 basis points a month.

2.5 Risk, Return, Diversication and Reward-Risk Ratios

The rst step toward investment theory is to gain insights into the interplay between
risk, return, and diversication without relying on a particular investment model. We:

• show on an ad hoc basis when a portfolio is more than the sum of the parts - that
is, more return and less risk;

• analyze the long-term performance of investments before and after costs;

• consider risk scaling;

• discuss two proposition from statistics concerning diversication;

• introduce to diversity and concentration risk;

• show how fees impact long-term returns;

2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 51

• introduce to the debate between active and passive management.

2.5.1 Long-term Risk and Return Distribution

Table 2.3 shows the risk and return distribution and the wealth growth for the period
1925-2013 for dierent asset classes (Kunz [2014]).

Investment of CHF Return Risk

100 after 88 years gives Average annual return Standard deviation
Stocks USA 71,239 7.75% 23.50%
Stocks CHF 70,085 7.73% 19.30%
Stocks DEU 44,669 7.18% 41.30%
Stocks GBR 34,619 6.87% 25.30%
Stocks FRA 18,939 6.14% 29.20%
Stocks JPN 5,367 4.63% 29.80%
Stocks ITA 2,552 3.75% 28.30%
Bonds CHF 3,611 4.16% 3.70%
Bonds GBR 1,880 3.39% 12.70%
Bonds USA 1,196 2.86% 12.50%
Bonds FRA 212 0.86% 15.00%
Bonds ITA 195 0.76% 20.40%
Bonds JPN 57 -0.64% 21.20%
Deposit CHF 1,070 2.73% 1.20%
Gold 1,052 2.71% 15.80%

Table 2.3: Average annual returns and standard deviations of the asset classes and growth
of capital after 88 years. The calculation logic being 71, 239 = 100(1 + 0.075)88 .

The Figure 2.16 shows the distribution of return and risk, measured by the standard
deviation, over 88 years of investments.

In the long run equity had in most economies higher returns and risks than its bond
counterparts. We discuss below why nevertheless an advice to invest in stocks only if
the investor has a long-term horizon is not an optimal strategy. Furthermore, a small
dierence in the average return creates a large dierence in wealth accumulation; the
compounding eect. Finally, gold has in this long period a large risk component but
only a small average return. This rst analysis allows us to consider diversication next.

2.5.2 Diversication of Assets - Portfolios

Can we combine dierent investment classes to form a portfolio with higher return and
lower risk than the individual asset classes above? This is the diversication question.
If there is a positive answer, is there an optimal way of diversifying the investment? We
apply diversication to the data in Table 2.3 using an ad hoc portfolio construction
52 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Return and standard deviation, 1925‒2013

9.00%

8.00%

7.00%

Averrage annual returns 6.00%

5.00%

4.00%

3.00%

2.00%

1.00%

0.00%

-1.00%

-2.00%
0.00% 5.00% 10.00% 15.00% 20.00% 25.00% 30.00% 35.00% 40.00% 45.00%
Standard deviation

Figure 2.16: The distribution of return and risk, measured by the standard deviation,
over 88 years of investments. The square marks represent equity, the diamonds bonds,
the triangle is cash, and the circle is gold (data from Kunz [2014]).

approach: the weights are not optimally chosen using a statistical model but are xed
based on heuristics (experience). We form four portfolio strategies - called conservative,
balanced, dynamic, and growth, see in Table 2.4.
Strategy
Conservative Balanced Dynamic Growth
Equity 25% 50% 75% 100%
CH 10% 20% 30% 40%
Rest of world total (six countries)* 15% 30% 45% 60%
Rest of the world per country 2.5% 5% 7.50% 10%
Bonds 75% 50% 25% 0%
CH 66% 44% 22% 0%
Rest of world total (six countries)* 9% 6% 3% 0%
Rest of the world per country 1.50% 1% 0.50% 0%

Table 2.4: Investment weights in four investment strategies (data from Kunz [2014]).
*Investment in G, F, I, J, USA, UK.

Using data from Figure 2.16 for the dierent asset classes, we get the returns in Table
2.5.
Figure 2.17 shows that a combination of risk and return gures of basic asset classes
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 53

Investment Return Risk

100 CHF after 88 years gives Average annual return Standard deviation
Conservative 143,131 8.61% 19.80%
Balanced 76,949 7.84% 15%
Dynamic 33,318 6.82% 10.40%
Growth 11,702 5.56% 6.30%

Table 2.5: Average annual return, risk, and wealth growth for the four investment strate-
gies.

can lead to a portfolio from which more return can be expected for the same risk or less
risk for the same return. The green marks for the investment strategies form a virtual
boundary line. In fact, the Markowitz model is an example that there is a ecient frontier
such that there can be no portfolio construction with more return and lower risk than
any portfolio on the ecient frontier within this model approach.

Figure 2.17: Distribution of return and risk, measured by the standard deviation, over
88 years of investments. The square marks represent equity, the diamonds bonds, the
triangle is cash, and the circle is gold. The dots represent the four investment strategies
- conservative, balanced, dynamic, and growth (data from Kunz [2014]).

Two questions regarding diversication arise:

• What are the risks of not diversifying? Concentration risk.

• When does diversication make little sense?

54 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Consider the rst question. Often employees own many stocks of their employer directly
or indirectly in their pension scheme. Such stock concentration can be disastrous. Enron
employees for example had over 60% of their retirement assets in company stock. They
faced heavy losses when Enron went bankrupt. Diversication reduces such idiosyncratic
risk.

Institutional investors also fail to diversify suciently. The University of Rochester's

endowment in 1971 was USD 580 million, placing it fourth in the respective ranking of
private universities. In 1992, it ranked twentieth and by 2011 had dropped to thirtieth
place. A main reasons was the concentration held in Eastman Kodak, which led for
bankruptcy in February 2012. Boston University invested USD 107 million in a privately
held local biotech company in the 1980s. The rm went public and suered a setback.
In 1997, the university's stake was worth only USD 4 million. The Norwegian sovereign
wealth fund, in contrast, was created precisely to reap the gains from diversication. The
fund swapped the highly concentrated oil revenues into a diversied nancial portfolio.
While anticipated events are incorporated into market prices, most of the return ulti-
mately realized will be the result of unanticipated events. Investors do not know their
timing, direction, or magnitude. Investment diversication is a mean to reduce these
risks.

Considering the second question, Warren Buet states: Diversication is protection

against ignorance. It makes little sense if you know what you are doing.

Diversication also reduces complexity of portfolio risk management. If a portfolio

is well diversied, the hope is that idiosyncratic risks compensate for each other leaving
for management only market risk instead of many idiosyncratic risk factors. Figure
2.18 shows that dependence between asset classes strongly vary and also change sign.
Diversication benets vary over time and are dicult to predict.

To highlight these statements, consider the fraction of wealth φ invested in asset 1

and the remainder in asset 2. Portfolio variance reads

σp2 = σ12 φ2 + σ22 (1 − φ)2 + 2ρσ2 σ1 φ(1 − φ) (2.1)

with ρ the correlation between the two assets. Portfolio risk becomes additive only if
the assets are not correlated. A negative correlation value reduces portfolio risk which
motivates the search for negatively correlated risks. If correlation is −1, portfolio risk
becomes a complete square and can be eliminated completely in two risky asset case by
solving σp2 = 0 w.r.t. the strategy. If correlation is +1, which is typical for many asset
classes when markets are under stress, portfolio risk is maximal.

Example Needed Investment Amount for Diversication

2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 55

Figure 2.18: Pair-wise correlations over time for dierent asset classes (Goldman Sachs
[2011]).

Elton and Gruber (1977) show that the individual risk of stocks could be reduced
from 49 percent to 20 percent by considering 20 stocks per market. Adding another
980 stocks only reduces risk further to 19.2 percent. The eect of adding more and more
assets has a diminishing impact on risk diversication.

How much wealth is needed to achieve a diversication in 20 securities? Given the

average price of stocks and bonds in Swiss francs the amount invested in one security
should be around CHF 10,000. Lower investments are not ecient. Therefore, one needs
CHF 200,000 for a pure equity portfolio of Swiss stocks. Diversifying this portfolio to
US, European, and Asia-Pacic stocks requires an investment of CHF 0.8 million. If the
portfolio should be an equal dollar mixture of bonds and equities the amount needed for
diversied single security investments is CHF 1.6 million. Therefore, only wealthy indi-
viduals can invest directly in cash products to generate a suciently diversied portfolio.
This is a rationale for the existence of ETFs, mutual funds, or certicates, which oer a
similar diversication level to less wealthy clients as well.

2.5.3 Two Mathematical Facts About Diversication

The following propositions make precise how idiosyncratic and market risk behave if one
increases the number of assets.
56 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Proposition 4. Assume N uncorrelated asset returns and equally weighted (EW) in-
vestment, that is φk = 1/N for all assets. Increasing the number of assets N reduces
portfolio risk σp2 arbitrarily and monotonically.

The EW-assumption is not necessary but facilitates the proof. To eliminate portfolio
risk completely in an portfolio with uncorrelated returns, one only has to increase the
number of assets in the portfolio. The proof reads:

   
N N
X 1 1 X Nc
σp2 = var  Rj  = 2 var  Rj  ≤ 2
N N N
j=1 j=1

with c the largest variance. If assets are correlated to each other, which removes an
unrealistic assumption in the last theorem, then:

Proposition 5. Consider an EW portfolio strategy 1/N . The portfolio variance is equal

to the sum of market risk and idiosyncratic risk. Increasing N , the latter one vanishes
while market risk can only be reduced to the level of average portfolio covariance cov.

The proof is only slightly more complicated than the former proof, and leads to:

var 1
σp2 = + (1 − )cov .
N N
Hence, covariances prove more important than single asset variances in determining the
portfolio variance. Taking the derivative of the portfolio variance w.r.t. the number of
assets N, − N12 . Adding to N = 4 a further asset
the sensitivity becomes proportional to
1 1
reduces portfolio risk by
25 , adding another asset to 9 assets the reduction is only 100 .
Therefore, reducing portfolio risk by adding new assets becomes less and less eective
the larger the portfolio is.

We show with that the two-asset intuition does not carry over to three or more as-
sets. Consider an investor which wants to increase the return on investment by selling
volatility and correlation of two stocks S1 and S2 . He sells the risk that any of the two
stocks breaches a barrier level in specied time period. The price for this sold volatility
and correlation risk is transformed into a xed coupon which the investor receives. The
sold option is a down-and-in put option since the barrier level is typically lower than the
strike of the option and the option has a value dierent from zero only if the barrier is
breached ('in'). Barrier reverse convertibles are a wrapper for such a payo. An investor
gets at maturity his invested amount plus the coupon if there was no breach or the the
coupon plus the lowest stock value at maturity in case of a breach. The higher the prob-
ability of a breach, the higher the coupon to the investor.

Suppose that both stocks can move up and down with the same probability. If +1,
the change of a barrier breach is 50% - either both move up or down and breach. If −1,
the probability is 1 since one stock has to go down and breach the barrier. If they are
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 57

uncorrelated, the probability is 75% since there is only one state with both up where
there is no breach in the four possible states. Hence, for two assets the more negatively
correlated the assets are the higher the risk of breaching the barrier and therefore the
higher the coupon.

Consider the same investment with 3 stocks. The intuition of the two asset case
does not generalize. Given 3 assets there are 3 pairwise correlations. That all three
correlations equal to −1, which would lead to the highest coupon, is not possible. If two
correlations are −1, then the third one has to be +1. This shows that the 2-asset case
logic does not extend to the three asset case.

2.5.4 Risk Model

A general risk model arises from the mapping of asset returns into a portfolio context.

Model Asset Return → Portfolio Risk Analytics

A portfolio context is used since building a risk model on say 10'000 individual assets
would mean to consider 10'000 models. Therefore, a risk model is build for all assets.
Traditionally, risk is dened as the variance of returns. Most risk models in asset
management are based on linear multi-factor return models. These models are simple,
clear and tractable. The hope is to capture the dependency structure between the many
assets by considering a much smaller number of factors. Factors should be independent
of one another. If we have N assets, the dimension of the covariance matrix N (N − 1)/2
is reduced to K + N (K + 2), if there are K factors. Formally, for asset i out of N assets,
a generic linear models reads

Ri,t = αi + βi0 Ft + i,t (2.2)

where there are K factors F , R, F, are IID Gaussian

Rt ∼ N (α, C) , Ft ∼ N (0, I) , t ∼ N (0, D2 )

with D2 the diagonal idiosyncratic covariance matrix with the variances of the idiosyn-
cratic risks as entries and I the identity matrix. The (N × K ) matrix β is the loadings
matrix. The dynamics (2.2) implies

σi2 = βi0 CF βi + Di2 C (2.3)

with CF the factor covariance matrix.

Consider the following correlation matrix:

 
1
 0.09 1 
ρ=
 0.02 0.12

1 
0.01 0.18 0.94 1
58 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

The matrix indicates that the rst and second assets as well the third and fourth
assets are driven by the same risk factor. The other correlations are also of the same
order of magnitude. Instead of considering (4 × 3)/2 = 6 correlations, one would start
with a two-factor model.

The linear factor model for the assets transform in the same functional form for
portfolios. Let φj be the portfolio weights (long or short) which add up to 1. Portfolio
risk reads

σp2 = φ0 βp0 CF βp φ + φD2 φ. (2.4)

Therefore, a risk model specication means to x the factor covariance matrix CF , the
factor exposures β and the residual risks D2 .

The risk model hierarchy decision are as follow:

• First, factors or betas are not specied. Then a statistical factor model is used such
as Asset Pricing Theory (APT) model. A model provider is Sungard. Principal
Component Analysis (PCA) is used for the estimation. Statistical factor models
are the best in-sample performing ones by construction. The resulting factors are
dicult to interpret and they can vary strongly. The models are not meaningful
in wealth management when portfolio risk has to be explained but they are used
in trading thanks to the their precision for short time horizons to circumvent the
instability problem.

• Second, factors are dened and betas are estimated by a time-series regression.
This set-up is used by UBS, Blackrock, swissQuant, Quantec, R-Squared.

• Third, betas are dened and factors are estimated using a cross-sectional regression.
Providers of this model are Barra, Axioma, Bloomberg.

The second and third model both lose information, i.e. estimation error enters the risk
model either in the stock betas, factor returns and covariances. In the second method,
the estimation error in the betas are diversied away on the portfolio level if N is large.
This is not true for the third model: Estimation risk on the portfolio and individual
asset level are the same. Both methods assume that the variables in the estimation are
observable.

The time-series model (type 2) can only be used when the stock betas are stable over
time. But style investing (factor investing) assumes that betas are not stable. In risk
models where style factors enter, a hybrid approach is necessary - one part for the stable
model (second model) and one part for the style part (third model).

2.5.5 When Diversication Fails

We saw that correlation is highly non-constant and it is known and documented that
correlations increase for many asset classes in a crisis. But a correlation close to one
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 59

means that assets move in the same direction, which we summarize that diversication
disappears. This observation holds for individual stocks, country equity markets, global
equity industries, hedge funds, currencies, and international bond markets. Basically,
correlation seems to align in the left tail of the risk distribution over all assets. Hence,
using full-sample correlations do not account for this tail behaviour and are misleading.
Prudent investors therefore use additional risk gures such as downside risk measures
and scenario analyses. Chua et al. (2009) documented signicant undesirable correlation
asymmetries for a broad range of asset classes: Correlations increase on the downside
and signicantly decreased on the upside. This is exactly the opposite of what investors
want: Not all assets moving downwards in a crisis but all moving upwards in a boom.
If diversication fails in a crisis, diversied portfolios may have greater exposure to loss
than more concentrated portfolios. Leibowitz and Bova (2009) showed that during the
GFIC a diversied portfolio underperformed a simple 60% US stocks/40% US bonds
portfolio by 9 percentage points.

There are dierent way of how to measure correlation in the tails. Longin and Solnik
(2001) and Chua et al. (2009) used double conditioning, i.e. they isolate months during
which both assets moved (up or down) by at least a given percentage. Page and Panariello
(2018) condition only on a single asset:

(
ρ(x, y|x > θ), θ > 0
ρ(θ) =
ρ(x, y|x < θ), θ < 0

where x, y represent the two assets, θ is the return threshold which partitiones the data.
This anti-symmetric single asset conditioning measures dierences in tail correlations
based on which market drove the sello.

Subsample correlations are expected to dier from full-sample estimates, a condi-

tioning bias follow. Assuming normality, simulating the assets using a bivariate normal
distribution where the simulation is used the full-sample empirical correlations, means,
and volatilities any dierences to the empirical tails indicate departures from normality.
Furthermore, under normality downside and upside correlation proles are symmetric
and any departure indicates a departure from normality. Deep in the tails there are only
few data and robustness is an issue. The authors augment the data in the tail using an
exponentially weighted approach on the full data sample, i.e. estimating an exponential
weighted function on the percentile of the returns.
Figure 2.19 shows for international portfolios that empirical correlation proles dier
substantially from their normally distributed counterparts. When US stocks rallied their
correlation with non-US stocks dropped all the way to 17%. During the worst 1% sellos
in US stocks their correlation with non-US stocks rose to +87%. This is an asymmetry
and the opposite an investor expects: Diversication sorks only in boom markets but
not when they are under stress. This result holds not only for the US but studies were
performed for many other pairs of countries, for pairs of asset classes, for hedge fund
60 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Figure 2.19: US equity correlation with international equity using monthly data Jan
1970 to Jun 2017. Shown are conditional correlations by percentile based on US stock
returns between US stocks (MSCI US Total Return Index) and non-US stocks (MSCI
EAFE Total Return Index). The dotted line shows the correlation prole that we would
expect if both markets were normally distributed. Empirical conditional correlations are
adjusted by the data-augmentation methodology. Page and Panariello (2018).

strategies and for risk factors all with similar results of asymmetry.

These facts have several implications. First, if a portfolio manager has a proven
track record to forecast market and asset movements within a certain condence then
he should pick stocks in the upside and buy a protective put in the expected downside.
These assumptions are in most cases not valid. Either market movements come as a
surprise or stock picking capabilities are not existing. Then one approach is rst to
consider risk management serious, i.e. to analyze the tail behaviour if markets boom or
are under stress, and to second to trade with discipline within the given risk governance
framework.

2.5.6 Concentration and Diversity

The attentive reader remarked that we have not dened the notion of 'diversication'.
There is not a single, widely accepted denition. We consider some concepts of con-
centration risk and diversity: The diversication index of Tasche (2008), the concen-
tration indices of Herndahl (1950) and Gini (1921), and the Shannon entropy, which
measures diversity; see Roncalli (2014) for a detailed discussion. We consider long-only
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 61

non-leveraged portfolios.

Tasche's diversication index

The diversication index of Tasche (2008) is the ratio between the risk measurement of
a portfolio and the weighted risk measurement of the assets. If one species the risk
measure to be the volatility, the Tasche diversication index TA reads

p
hφ, Cφi
TA(φ) = , (2.5)
hφ, Di

where D is the vector of volatilities. The numerator is equal to the portfolio risk term in
the Markowitz model (4.1). The most-diversied-portfolio MDP portfolio minimizes the
diversication index of Tasche, see Choueifaty and Coignard (2008). The diversication
ratio is dened by
1
DR(φ) = , (2.6)
T A(φ)
i.e. the ratio of of the weighted average of volatilities divided by the portfolio volatility.
This ratio is smaller than one and only equal to one if all wealth is invested in a single
asset. Given a set of constraints M, the MDP is the portfolio which maximizes the
diversication ratio under the set of constraints. If the expected returns of the assets
are proportional to the their volatilities, expected returns replace in DR the nominator
hφ, Di. Then, maximizing DR is the same as maximizing the Sharpe ratio of the portfolio
and MDP is the tangency portfolio.

Denition 6. [Sharpe Rato] The Sharpe ratio SR is dened by

E(R)+
SR(R) = ≥0 (2.7)
σ(R)

with R a general return (absolute, relative, net, gross) and A+ = max(A, 0).

Often the Sharpe ratio is not constrained to be positive. But this ratio is not very
meaningful for negative values since the higher risk for a xed negative return the higher
the Sharpe ratio. Assuming log normal returns. Square-root scaling rule implies that
√
the Sharpe ratio scales with T for an increasing time horizon while the market price of
risk (MPR) is time-scale invariant. While conceptually simple, there are many dierent
interpretations and calculation methods for the Sharpe ratio: Should one use linear or
log returns, how do we scale the Sharpe ratio properly from one time horizon to another
one, what are the industry standards in the calculation of the ratio? The widely ob-
served square-root scaling rule only holds in the IID case, see the Section Risk Scaling.
For non-IID returns the situation is more complex and Lo (2003) is the reference to follow.

Herndahl's concentration index

Maximum concentration occurs if one weight has the value one and all other weights are
62 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

zero. Risk concentration is minimal if the portfolio weights are equally weighted. The
Herndahl index which is similar to the Gini Index, is dened by

N
X
Herndahl Index = φ2k . (2.8)
k=1

N
It takes the value +1 in the case of maximum concentration and 1/N = N2
in the EW
portfolio case.

Shannon entropy diversity measurement

The Shannon entropy S for a relative weights long-only portfolio vector φ is dened by

N
X
S(φ) = − φk ln φk . (2.9)
k=1

To understand entropy measurement, consider two dies - one symmetric and the other
distorted. The outcome for the symmetric one is more uncertain than for the other die.
Shannon axiomized this notion of uncertainty in the 1940s in the context of information
theory. He proved that there exists only the above function S(φ), which satises his eight
axioms describing uncertainty.

In nance, entropy measures how close dierent probability laws are to each other.
The prior and the posterior distribution in the Black-Litterman model are an example.
The space of probability laws is just a set and not a vector space. It is not trivial to
nd a reasonable measuring stick to measure nearness of say two normal distributions,
one with mean 0.1 and variance 0.2 and the other one with mean 0.2 and variance 0.1.
The relative entropy S(p, q), the Kullback-Leibler Divergence (KLD), for two discrete
distributions p and q, dened by

X qk
S(p, q) = − pk ln( ), (2.10)
pk
k

measures the similarity of two probability distributions. In machine learning, KLD the
information gain achieved if q is used instead of p. In Bayesian inference, KLD is a
measure of the information gained by revising one's beliefs from the prior probability
distribution q to the posterior p. It is the amount of information lost when a model dis-
tribution q is used to approximate the true data distribution p. Although KLD measure
the nearness of two distributions it is not a metric since it is not symmetric nor does it
satises the triangle inequality.

Roncalli (2014) illustrates the dierent notions of diversication. There are 6 assets
with volatilities 25%, 22%, 14%, 30%, 40%, and 30%, respectively, and the same returns.
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 63

Asset 3 has the lowest volatility. The correlation matrix reads

 
100%

 60% 100% 

 60% 60% 100% 
ρ=  .

 60% 60% 60% 100% 

 60% 60% 60% 60% 100% 
60% 60% 60% 60% 20% 100%
How will this local deviations - the lower volatility for asset 3 and the lower correlation
between asset 5 an 6 - be perceived (it at all) and valued using dierent portfolio diversity
measures? The following portfolio are compared: the global minimum variance (GMV),
the equal risk contribution (ERC), the most diversied (MDP), and the equal weights
(EW) portfolios. The GMV portfolio is the Markowitz optimal solution in (4.1) with
minimal risk. ERC is the portfolio in which the risk contribution of all six assets is
set equal to 1/6 percent, see discussion following the example. Roncalli (2014) provides
us with the results in Table 4.5 where φj , RCj , the risk contribution, are expressed in
percentage values.

Asset GMV ERC MDP EW

φj RCj φj RCj φj RCj φj RCj
1 0 0 15.7 16.67 0 0 16.67 16.18
2 3.61 3.61 17.84 16.67 0 0 16.67 14.08
3 96.39 96.39 38.03 16.67 0 0 16.67 8.68
4 0 0 13.08 16.67 0 0 16.67 19.78
5 0 0 10.86 16.67 42.86 50 16.67 24.43
6 0 0 14.49 16.67 57.14 50 16.67 16.86
Portfolio σ 13.99 19.53 26.56 21.39
Tasche index 0.98 0.8 0.77 0.8
Gini index 0.82 0.82 0.17 0 0.69 0.67 0 0.16
Herndahl index 0.92 0.92 0.02 0 0.41 0.4 0 0.02

Table 2.6: Comparison of the global minimum variance (GMV), equal risk contribu-
tion (ERC), most diversied (MDP), and equal weights (EW) portfolios. All values are
percentages (Roncalli [2014]).

Since correlation is uniform, but for one asset, it is 'overlooked' in the GMV alloca-
tion. Therefore, the GMV optimal portfolio picks asset 3 with the lowest volatility. The
GMV portfolio is heavily concentrated. Portfolio risk measured by GMV is the smallest,
which comes as no surprise.

The MDP, on the other hand, focuses on assets 5 and 6, which are the only ones that
do not possess the same correlation structure as the others. Contrary to GMV, MDP
is attracted by local dierences in the correlation structure. The diversication index is
lowest for the MDP. If we consider the concentration measures of Herndahl, the EW
64 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

should be considered if the investor wishes to have the broadest weight diversity and
the ERC if risk concentration is the appropriate diversication risk measurement for the
investor.

We consider in more detail ERC. The risk contribution of asset j to the portfolio risk
is by denition the sensitivity of portfolio risk w.r.t. to φj times the weight φj . The
Euler Allocation Principle states when the sum of all risk contributions equals portfolio
risk.

Proposition 7. Let f be a continuously dierentiable function on a open subset of Rn .

If f is positive homogeneous of degree 1, this means tf (u) = f (tu) for t > 0, then
n
X ∂f (u)
f (u) = uk , u ∈ Rn . (2.11)
∂uk
k=1

Applying the Euler Theorem to risk measures means:

X ∂R(φ) X
R(φ) = φj =: RCj (φ) . (2.12)
∂φj
j j

Calculating say portfolio risk for10 000 positions in a portfolio is complicated. But using
0
Euler's theorem, we need to calculate 1 000 sensitivities, multiply them with their position
and sum the result which is a much simpler task. For the volatility risk measure this
means:
X ∂R(φ) X (Cφ)j
R(φ) = σp (φ) = φj = φj √ 0 (2.13)
∂φj φ Cφ
j j

where (Cφ)j denotes the j -th component of the vector Cφ. The Euler risk decomposition
holds true for the volatility, VaR, and expected shortfall risk measurements.

Example - Euler allocation principle

Consider four assets in a portfolio with equal weights of 25 percent. The volatilities
are 30%, 20%, 40%, and 25%. The correlation structure

 
1
 0.8 1 
ρ=
 0.7 0.9 1
 .

0.6 0.5 0.6 1

The covariance matrix C is then calculated as (using the formula Ckm = ρkm σk σm )
 
9%
 4% 4% 
C=
 8.4% 7.2% 16%
 .

4.5% 2.5% 6% 6.25%
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 65

The portfolio variance

4
X
σp2 = φi φj Cij = 6.37%.
i,j=1

follows. Taking the square root, the portfolio volatility of 25.25% follows. Using (2.13),
the marginal risk contribution vector



26.4%
Cφ  18.3% 
√ 0 = 
φ Cφ  37.2% 
19%

follows. Multiplying each component of this vector with the portfolio weight gives the
risk contribution vector RC = (6.6%, 4.5%, 9.3%, 4.7%). Adding the components of this
vector gives 25.25% which is equal to the portfolio volatility. This veries the Euler
formula.

Table 2.7 shows that a seemingly well-diversied portfolio in terms of capital is in

fact heavily equity-risk concentrated.

Asset class diversication Risk allocation

Cash 2% Real estate 17% Cash 2%
Domestic equities 14% Hedge funds 10% Equity 79%
IEQ 8% Private equity 5% Commodity 8%
EM equities 4% Venture capital 9% CCR 10%
Domestic govt bonds 9% Natural resources 8% Other 4%
ICB 10% Distressed debt 4%

Table 2.7: Asset class diversication and risk allocation. The rst two columns contain
the diversication using the asset class view. The third column shows the result using
risk allocation. While the investment seems to be well diversied using the asset classes
the risk allocation view shows that almost 80% of the risk is due to equity. IEQ means
international equities, ICB means international corporate bonds, CCR corporate credit
risk.

This fact is often encountered in practice: Equity turns out to be the main risk factor
in many portfolios. But then capital diversication is a poor concept from a risk per-
spective.

The asset allocation of European's asset managers was in 2013 (EFAMA (2015)):

• 43% bonds;

• 33% equity;
66 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

• 8% cash and money market instruments;

• 16% other assets (property, private equity, structured products, hedge funds, other
alternatives).

The allocation has been fairly stable in the past except in the GFC where equities lost
massive value. This average allocation signicantly diers for dierent countries. UK for
example has investment in the equity class between 46% and 52% in the past while in
France the same class is around 20%. This dierence is due to dierences in preferences
of home-domiciled clients and the large dierences in cross-border delegation of asset
management. The ratio of AuM/GDP in UK is 302% which shows the importance of
UK as the leading asset management center of Europe with a strong client basis outside
of the UK. Comparing the allocation for investment funds and discretionary mandates,
the bond allocation is 28% in investment funds and 58% in the mandates and equities
have a share of 39% in the funds and 26% in the mandates. Hence, self-deciders are less
risk averse than those who delegate the investment decisions using mandates.

2.5.7 Risk Scaling

Is it possible given some assumptions to calculate risk for a new time horizon given
its value on a dierent horizon without needing further data, running simulations or
developing a new risk model? If one assumes IID normally distributed returns with
zero mean, then the square-root of time rule can be used to scale volatilities. Consider
investments with two dierent holding periods t < T. The volatility for the T -period
follows from the t-period volatility by the square-root scaling law
p
σ(T ) = σ(t) T /t . (2.14)

To prove this, consider n IID returns:

σ 2 (R1 + . . . + Rn ) = σ 2 (R1 ) + . . . + σ 2 (Rn ) = nσ 2 (R).

For an asset with a one-day volatility of

p 2%, the monthly volatility - assuming 20 trading
days - is equal to 2% × 20/1 = 8.9%. The square-root rule provides a simple solution
to a complex risk scaling problem. The method fails in any of the following situations:

• Modelling volatility at a short horizon and then scaling to longer horizons can
be inappropriate since temporal aggregation should reduce volatility uctuations,
whereas scaling amplies them.

• Returns in short-term nancial models are often not predictable but they can be
predictable in longer-term models. Applying the scaling law one connects the
volatility in two time domains that are structural dierent.

• The scaling rule does not apply if jumps occur in the returns.

• If returns are serially correlated, the square-root rule needs to be corrected (see
Rab and Warnung [2011] and Diebold et al. [1997]).
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 67

2.5.8 Costs and Performance

We did not consider frictions in the risk, return, and performance analysis starting in
Section 2.5.1. We add fees and taxes changing the results in Figure 2.17. We consider
Swiss stocks with a gross average return of 7.73 percent and assume (Kunz [2014]):

• 25% of the return arises from dividends, which face a taxation rate of 30%,

• The long-term ination rate is 2%,

• Investments can be via an investment fund (mutual fund, SICAV) with annual
costs of 1.5 percent, or an index fund with annual costs of 0.5 percent.

The net returns using these gures are given in Table 2.8.

Return after ...

... Fees ... Fees and taxes ... Fees, taxes, and ination
Market index 7.73% 7.15% 5.15%
Investment fund 6.23% 5.65% 3.65%
Index fund 7.23% 6.65% 4.65%

Table 2.8: Returns after Fees (Kunz [2014]).

Given these net returns, an investment of CHF 100 takes after 25 years the values in
Table 2.9.

Value of CHF 100 after 25 years after...

... Fees ... Fees and taxes ... Fees, taxes, and ination
Market index 643 562 351
Investment fund 453 395 245
Index fund 573 500 312

Table 2.9: Net growth of wealth (Kunz [2014]).

Fact 8. Using a cost and tax ecient wrapper for an investment amounts to an annual
return gain of 1.45% compared to an investment fund.
Given the zero-sum game of active investment, see the next Section, that only 0.6%
of 2,076 actively managed US open-end, domestic equity mutual funds generate a posi-
tive alpha after costs, see Section 4.6.6.3, and the possibility to wrap many investment
ideas in cheap index funds or ETFs, it becomes clear why many investor prefer passive
investments.

2.5.9 Passive versus Active Investment; a First Step

Let µm , µp , µa be the expected returns of the fully diversied market portfolio, a passive
portfolio, and an active investment, respectively. We assume that the fraction λ of
68 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

investors is passively invested and 1−λ is invested in active vehicles. By denition,

passive management means following an index, benchmark or another portfolio using
quantitative techniques. Active investors are the non-passive ones. Since any investor is
either an active or passive one and since the market return follows from the aggregate
return of the active and passive investors, we have:

µm = λµp + (1 − λ)µa . (2.15)

Assuming that the return of the passive investment equals that of the market, (2.15)
implies that the active return equals market return independent of the fraction λ.
Therefore, without any probabilistic or behavioural assumptions, before costs the three
investments pay back the same return:

Proposition 9 (Sharpe). Before costs, the return on the average actively managed dollar
will equal the return on the average passively managed dollar.
Because active managers bear greater costs than a passive investment:

Proposition 10 (Sharpe). After costs, the return on the average actively managed dollar
will be less than the return on the average passively managed dollar.
These statements are strong and they are based on strong assumptions. Despite its
beauty, the assumptions that lead to (2.15) trivialize the problem. Suppose all investor
are active ones - who is on the other side of the trades? Returns are not independent
of the demand and supply side but in fact follow in market equilibrium. Demand and
supply matter. Pedersen (2018) extended the Sharpe arithmetic to cases where active
management can on average be more protable than passive one in an equilibrium con-
text. He replaced the unrealistic assumption that an active investor's gain is the loss of
another active investor, leading in the aggregate to a zero sum game. Next, the market
portfolio is not constant. It changes over time since new shares are issued and corporate
actions happen: Passive investors need also to trade regularly. If they have to trade at
less favourable prices than the active investors do, then the logic of Sharpe is broken.

Roll pointed out that a true market portfolio is not observable since it would include
any single asset. Market weighted indices are used as an approximation. In the US,
the Wilshire 4'500 Index contains 4'500 stocks of approximately 5'000 listed stocks. In
Switzerland, SPI Index contains 210 of 270 listed stocks. The global market portfolios
also dier signicantly depending who is calculating it. The major contributors are debt
and equity where equity is split in global equity, EMMA equity, private equity and small
cap equity and debt is split in government bonds, agency bonds, asset backed securities,
EMMA bonds, corporate bonds. The assumption that passive investment means to be
invested in the market portfolio is an approximation. Consider funds. US retail funds
are dierent to US institutional funds and are also dierent to non-US funds. The one-
ts-all argument of Sharpe does not considers the heterogeneity of investment wrappers
across dierent asset classes, dierent geographical regions, dierent client segmenta-
tions. Finally, the result is based on average active managers. It does not account for
2.5. RISK, RETURN, DIVERSIFICATION AND REWARD-RISK RATIOS 69

the dierences between skill and luck.

The goal of active asset management is to outperform benchmarks. The manager tries
to beat the benchmark within a given Tracking Error (TE) limit. In fact proponents of
active investment use argumentation following the work of Berk and Green (2004). They
show that ecient markets not contradict the existence of skilled fund managers who
beat the market consistently. The concept of benchmarking and hence relative perfor-
mance has several advantages for the portfolio manager: Performance measurement is
simple relative to the benchmark, benchmarking has a disciplining force acting on the
asset manager and the structuring of the investment portfolio is simplied.

Active management often has both a passive component, the long-term goals in a
benchmark portfolio, and an active component, playing the views to exploit market
opportunities (TAA). The passive portfolio stabilizes the whole investment.

Denition 11. A passive investment strategy tracks a market-weighted index or portfolio

(the benchmark). The goal of an active investment strategy is to beat the market-weighted
index by changing market weights (asset selection) at the right time (market timing)
within a TE limit.

ETFs, trackers and index funds are examples of passive strategies. Mutual funds, op-
portunistic use of derivatives, and hedge funds are examples of active strategies. While
the deviation of a strategy from a benchmark, the tracking error, should be as small as
possible in passive investment, the tracking error in active investment describes how far
away the active manager moves away from the benchmark.

Dierent types of benchmarks are used. Either the benchmark is used to compare
the performance of a fund with its peers or the benchmark is a market index. While
both methods are meaningful for active investment, in a passive investment only index
benchmarking makes sense.

The main stock benchmark indices are MSCI World Index, FTSE, S&P 500 and
some other well known stock market indices. Since bond securities do not trade on
open exchanges there is less transparency about bond prices and the indexes used for
benchmarking are those created by the largest bond dealers such as the Barclays Global
Aggregate Bond Index, which tracks the largest bond issuers globally. Benchmark indexes
for commodities are for example provided by S&P and Goldman Sachs (S&P GSCI) or
by Bloomberg (Bloomberg Commodity Index). For credit risk of the Markit iTraxx
indices reect the creditworthiness of large corporates. A provider for real estate indices
is MSCI. There are four dierent type of income-producing real estate assets: oces,
retail, industrial and leased residential. Non-income producing assets are houses, vacation
properties or vacant commercial buildings. These dierent types of real estates assets
lead together with the geographical segmentation to many dierent real estate indices.
70 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

2.6 Market Figures

2.6.1 The Demand and Supply Side in Asset Management
The AM industry clients' are segmented into private and institutional clients. Insti-
tutional clients include pension funds, insurance companies, family oces, corporate
treasuries, and government authorities. The two categories dier in many respects. Re-
tail clients pay higher fees than institutional ones. Institutional investors ask for pure
asset management services, while private clients often combine their asset management
demands with other banking services such as nancial planning or mortgage lending.
Private clients invest more heavily in wrappers of investment solutions such as mutual
funds, ETFs or structured products. Institutional clients prefer to invest in cash products
directly (bonds or stocks) and using overlays to cut and paste the return prole. Insti-
tutional clients have better or exclusive market access such as to alternative investments
and regulation is more pronounced for private clients.

Trading units and asset management rms are the suppliers of assets for investment.
Mutual funds or ETFs are often oered by non-banking rms such as BlackRock. These
rms issue products but also provide other services.
3

The largest asset management organizations in 2017 were BlackRock with USD 6.3
trillion AuM followed by the Vanguard Group.
4 The largest fund in 2014 was the SPDR
ETF on the S&P 500 managed by State Street Global Advisors with assets of USD 224
bn; see the Appendix.

2.6.2 Asset Management Industry - the Eurozone

We follow EFAMA (2015). Asset management companies are one channel between
providers and users of funds in the case where the parties do not exchange the assets
directly by using organized market places. AM rms provide a pooling of funds for in-
vestment purposes. Banks, another channel, oer also non-asset management functions.
Insurance companies or pension funds take savings from households or companies and
invest them in money markets and capital markets. The main services of the AM indus-
try to clients are savings management (diversication, reduction of risk by screening out
bad investment opportunities), liquidity provision (providing liquid asset to clients while
investing in not necessarily liquid assets) and reduction of transaction costs (size matters).

The AM rms also contribute to the real economy. Firms, banks and governments
use AM rm to meet their short-term funding needs and the long-term capital require-

3 BlackRock Solutions - the risk management division of BlackRock - was mandated by the US Trea-
sury Department to manage the mortgage assets owned by Bear Stearns, Freddie Mac, Morgan Stanley,
and other nancial rms that were aected by the nancial crisis in 2008. This gained expertise boosted
the BlackRock Solutions to become more important than the asset management rm part.
4 The Vanguard Group 5.1 tr USD, Charles Schwar 3.4 tr USD, UBS 3.1 tr USD, State Street 2.8 tr
USD.
2.6. MARKET FIGURES 71

ments. The AM contribution to debt nancing is 23%: European asset managers held
this amount of all debt securities outstanding which also represents 33% of the value of
euro-bank lending. The equity nancing gures are similar. The AM industry held 29%
of the market value of euro area listed rms and 42% of the free-oat.

From a corporate nance perspective, the valuation and market capitalization of asset
management rms compared to banks and insurance companies between 2002 and 2015
is shown in Table 2.10 (McKinsey (2015)):

Feature Asset management rms Banks Insurance

Market Cap (100 in 2002) 516 313 231
P/E ratio 16.1 11.3 14.8
P/B value 3.2 1.2 1.6

Table 2.10: Key gures 2015 for asset management rms, banks and insurance companies.
(McKinsey [2015])

The number of asset management companies in 2017 in Europe was approximately

40 200 up from 30 300 in 2014. Most companies are located in France, Ireland, Luxembourg,
Germany, UK, Netherlands and Switzerland. The high number in Ireland and Luxem-
bourg is due to their role played in the cross-border distribution of UCITS funds (see
below). The main AM center where the investment management functions are carried
out is London. The average AuM per asset manager range from EUR 9 billion in UK to
less than one billion in Portugal and Turkey for example. The industry is highly concen-
trated in each country. The top 5 asset managers in Germany control 94% percent of all
assets and in the UK the corresponding gure is 36%. In UK and France, less than 20%
of the rms are owned by banking groups. In Germany (60%) and Austria (71%) of the
asset management functions are part of a bank. Insurance companies play a signicant
role in Italy, UK, France and Germany (all 13%) and in Greece (21%). Institutional in-
vestors represent the largest client category of the European asset management industry,
accounting for 71% of total AuM at end 2016. Pension funds and insurance companies
accounted for 28% and 25% of total AuM, respectively.

Total Assets under Management (AuM) in Europe increased by 10% in 2017 to EUR
25.2 trillion. Comparing the growth of investment funds versus discretionary mandates in
Europe, both categories have increased in 2014 to a similar level of EUR 13.1(9.1) trillion
in investment funds and EUR 12(9.9) trillion in discretionary mandates (EFAMA (2018)
and (2015)). The share of investment funds compared to the mandates was falling from
2007 until 2011 but it then started to increase in the last three years. While mandates
represented more than 70% of the AuM in the UK, Netherlands, Italy, Portugal, and more
than 70% of the all AuM in Germany, Turkey or Romania were invested in investment
funds. The dominance of either type of investment can have dierent causes. In the UK
and the Netherlands pension funds play an important role in asset management and they
prefer to delegate the investment decisions. The pool of professionally managed assets in
72 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Europe remains centered in the UK (37% market share), France (20%), Germany (10%),
Italy, Nordic countries and Switzerland.
The number of individuals directly employed (asset managers, analysts) in the indus-
try is estimated 2017 (2013) at 1100 000(900 000) with one-third in the UK. The indirect
employment such as IT, marketing, legal, compliance and administration is estimated
to boost the total number of employees in the whole industry up to a half-a-million
individuals.

2.6.3 Global Figures 2007-2014, Market Structure

The following gures in the period 2007-2014 are from McKinsey (2015).

• Per annum, global AuM growth between is 5%. The main driver was market per-
formance. Typically, the net AuM ows are between 0% and 2% per annum.

• The growth of AuM is 13.1% in Europe, 13.5% in North America and 226% in
emerging markets which is largely due to the money market boom in China.

• The absolute value of prots increased in Europe by 5%, 29% in North America
and 79% in the emerging markets.

• Prot margins as the dierence between net revenues margin and operating cost
margin are 13.3 bps in Europe, 12.5 bps in North America and 20.6 bps in emerging
markets. The observed revenue decline in Europe is due to the shift from active
to passive investments, the shift to institutional clients and the decrease in man-
agement fees. The revenue margin in the emerging markets is only slightly lower
in 2014 compared to 2007 (down to 68.1 bps from 70.6 bps) but the increase in
operating cost margin from 33.8 bps to47.4 bps in 2014 is signicant.

• The absolute revenues in some emerging markets such as China, South Korea,
Taiwan are with values between USD 10.1 bn to USD 3.7 bn. They are almost at
par with the revenues in Japan, Germany, France and Canada (all around USD 10
bn). The revenue pools of UK (USD 21.2 bn) and the US (USD 150.8 bn) are still
leading the global league table.

• The cost margins in Europe are stable between 21 bps and 23 bps. The split of the
cost margin is in sales and marketing (around 5 bps), fund management (around 8
bps), middle/back oce (around 3.5 bps) and IT/support (around 6 bps). There
is a cost increasing trend for IT/support, decreasing costs for sales and marketing
and middle/back oce.

• From a customer segment perspective, retirement/DC grew with a Compounded

Annual Growth Rate (CAGR) of 7.5% is almost twice as strong as the retail sector
with 4% between 2007 and 2014. The institutional customer's CAGR was 5%.
These average global rates dier for dierent geographic regions. The retiremen-
t/DC CAGR dominates in Europe the retail one by a factor of 4 whereas in the
2.6. MARKET FIGURES 73

emerging markets, the CAGR for institutional customers is 13% compared to 11%
for retirement/DC.

By considering the above facts one should take into account the particular circumstances
in the years after the GFC such as the decreasing interest rates level and stock market
boom which were the main factors in the success of the asset management industry in
this period.

Table 2.11 illustrates the global distribution of AuM by product and its dynamics in
the last decade.

Investment type 2003 2008 2012 22016 E2025

Passive /ETF 2 3.3 7.9 14.6 E36.6
LDIs 0.6 1.6 2.5 - -
Active Core 24.8 28.1 30.9 60.1 E87,5
Active Solutions 8.2 10.8 15.1 - -
Alternatives 1.9 3.9 6 10 21.1

Table 2.11: Global distribution of AuM by product and its dynamics in the last decade in
trillion USD. Alternatives includes hedge, private-equity, real-estate, infrastructure, and
commodity funds. Active solutions includes equity specialties (foreign, global, emerging
markets, small and mid caps, and sector) and xed-income specialties (credit, emerging
markets, global, high yield, and convertibles). LDIs (liability-driven investments) in-
cludes absolute-return, target-date, global-asset-allocation, exible, income, and volatil-
ity funds. Active core includes active domestic large-cap equity, active government xed-
income, money market, and traditional balanced and structured products (Valores Cap-
ital Partners [2014]).
The gure for 2016 and the projection to 2015 are from PwC (2018).

The table indicates that the growth rate of passive investments is larger than for
active solutions. McKinsey (2015) states for the period 2008-2014 that cumulated ows
are 36% for passive xed income and 22% for passive equity. Standard active management
is decreasing for some asset classes and strategies: Active equity strategies lost 20% on
a cumulated ow basis while active xed income gained 52%. A next observation is
that active management of less liquid asset classes, or with more complex strategies, is
increasing. An increase of 49% cumulate ows for active balanced multi asset and of 23%
for alternatives. The global gures vary strongly for dierent regions or countries. Swiss
and British customers adopted the use of passive much faster than for example Spanish,
French or Italian investors. Figure 2.20 shows the distribution of global investable assets
by region and by type of investor.
Regulation imposes a great deal of complexity on the whole business of asset man-
agement and banking. On the other side of the fence, there is a so-called shadow banking
sector with much less regulatory overview. Although the expression 'shadow bank' makes
no sense at all - either an institution has a banking license or not - there is an incentive for
banks to consider outsourcing their asset management units to these 'shadow banking'
74 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Figure 2.20: Global investable assets by region in trillions of USD (Brown Brothers
Harriman [2013]).

sector.

Traditional and non-traditional asset managers' (alternative asset class managers)

roles are converging. Traditional asset managers have continuously lost market share to
low-cost ETFs. They therefore consider liquid alternative products to stop the bleeding.
This is one reason for the convergence. Non-traditional asset managers, on the other
hand, want to expand into traditional segments since their non-traditional products are
becoming more liquid and more transparent. This is the other reason for the coming
together of the two, previously distinct, roles. The hedge fund AQR Capital Management
opted for the Company Act Of 1940 (the 40-Act) mutual fund industry regulatory regime.
This act requires much more transparency in reporting than hedge funds usually provide.
This allowed AQR access to a new customer base. This business had grown to USD 19
billion AuM by 2014.

Forward looking estimates by PwC (2014, 2018) for the period 2014-2020 estimate
that actively managed funds will grow at an CAGR of 5.4 percent and mandates with 5.7
percent (PwC [2014]). The actively managed funds growth driver is the growing global
middle-class client base. Mandates growth factors are institutional investors (pension
funds and SWFs) and HNWIs, see Table 2.12. Furthermore, the ratio active:passive =
7:1 by 2012 and is estimated to fall to 3:1 by 2020. By the end of 2014, the AuM in
actively managed funds are distributed as follows - 60% in the Americas, 32% in Europe,
and 12%in Asia. Compared to 2010, there is a relative stagnation or decrease in Europe
2.6. MARKET FIGURES 75

Investment type 2014 - USD trillions E2020 - USD trillions

Actively managed funds 30 41.2
Mandates 32 47.5
Alternative investments 6.9 13

Table 2.12: Actively managed funds, mandates, and alternative investment (PwC [2014]).

and Asia whereas the proportion in the Americas is increasing.

The formation of four regional blocs in AM - South Asia, North Asia, South Asia,
Latin America, and Europe - creates opportunities, costs, and risk. These blocks develop
regulatory and trade linkages with each other based on reciprocity - AM rms can dis-
tribute their products in other blocs. The US, given the actual trends, will stay apart
since it prefers to adhere to its regulatory model. But integration will not only increase
between these blocs but also within blocs. There will be, for example, a strong regula-
tory integration inside the South Asia bloc. The ASEAN platform between Singapore,
Thailand, and Malaysia will be extended to include Indonesia, the Philippines, and Viet-
nam. All these countries possess a large wealthy, middle-class of potential AM service
investors. The global structure UCITS continues to gain attraction worldwide and reci-
procity between emerging markets and Europe will be based on the European AIFMD
model for alternative funds. By 2013, more than 70 memoranda of understanding for
AIFMD had been signed.

The traditional AM hubs London, New York and Frankfurt will continue to dom-
inate the AM industry. But new center will emerge due to the global shift in asset
holdings. There will be a balance between global and local platforms. Whether or not
a global or local platform is pushed depends on many factors: Time-to-market, regu-
latory and tax complexity, behavior and social norms in jurisdiction and the eduction
level matter. AM rms recruit local teams in the key emerging markets - the people
factor. The education of these local individuals started originally in the global centers
but will diuse more and more to the new centers in the emerging markets. Due to the
positive brand identities that tech rms have, they can integrate part of the business
layer into their infrastructure layer and oer AM services under tech rm brands instead
of more traditional banking or AM company brands ( Branding reversal). Finally,
alternatives asset managers on one hand side oer new products - asset managers
move in the space banks left vacated - and on the other hand side try that their alterna-
tive funds become mainstream. New products include primary lending, secondary debt
market trading, primary securitizations, and o-balance-sheet nancing.

2.6.4 Asset Management vs Trading Characteristics

Some key characteristics of asset management rms:

• Agency business model. Asset managers are not the asset owners, they act on a
76 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

best eort basis for their clients and the performance is attributed to their clients.

• Low balance sheet risk. Since asset managers to not provide loans, to not act as
counter parties in derivatives, nancing or securities transactions and they seldom
borrow money (leverage) their balance sheet does not face the risk of a bank's
balance sheet.

• Protection of client assets. Asset managers are regulated and in mandated asset
management, the client assets are held separately from the asset management rm's
assets.

• Fee based compensation. Asset managers generate revenue principally from an

agreed-upon fee. There is no prot and loss as in the trading.

From a risk perspective, asset management is a fee business with conduct, business,
and operational risk as the main risk sources.

Trading is contrary a market, counter party and liquidity risk business which needs a
strong balance sheet of the intermediary. Trading is a mixture of a fee (agency trading)
and a risk-taking business (principal and proprietary trading). Agency trading is a fee
business based on client ow. Clients place their orders and the trading unit executes
the orders on behalf of the client's account. For example, a stock order is routed by
the trader to the stock exchange where the trade is matched. The bank receives a fee
for this service. Principal trading already requires active market risk or counterparty
risk taking by the bank since the bank's balance sheet is aected by the prots and
losses from trading. Principal trading is still based on clients' orders but it requires the
traders to take some trading positions in their market-making function or in order to
meet future liabilities in issued structured products. This is a key dierence to agency
trading. Proprietary trading is not based on the client's ow at all. Proprietary traders
implement trading ideas without any reference to a client activity. This type of trading
puts the bank's capital at risk. New regulations limit proprietary trading by investment
banks such as the The Volcker Rule in the US and 'ring-fencing' in the UK.

AM rms wrap the underlying assets into collective investment schemes ('funds')
while the trading of a bank oers issuance and market making for cash products, deriva-
tives, and structured products. Despite their dierences, trading and asset management
are linked. Portfolio managers in the asset management function execute their trades
via the trading unit or a broker. The market making of ETF and listed fund trading
takes place in the trading unit. Cash products are used by the asset management func-
tion in their construction of collective schemes and asset managers use in their portfolios
derivative (overlay) to manage risk and return characteristics.

2.6.5 Institutional Asset Management versus Wealth Management

Investors are in institutional asset management (IAM) are legal entities such as pension
funds and in wealth management WM private clients. The investment goal in IAM is
2.7. THE FUND INDUSTRY 77

often based on an non-maturing asset-liability analysis while in WM the goal is linked to

the life cycle of the client. Although, this denes long-term investment horizons for both
types of investors, we refer to Section 3.6 for diculties of pension funds f to follow a long-
term strategy. If WM clients use short- or mid-term investment horizons, opportunistic
behavior is motivated. The performance of the investment for IAM is benchmarked while
WM clients also prefer absolute returns. Therefore, for IAM beta is the rst concern and
alpha is added in a satellite form. The responsibility for the performance in IAM is
attached to investment boards, CFOs, board of trustees. In WM, the mandate manager
is responsible for the performance. IAM companies use several mandates, often one for
each asset class, to manage investments while WM either use a fewer number of mandates
or even decide by their own in the advisory channel.

The size of investment is very huge for IAM and smaller for WM. The risk man-
agement for IAM is comprehensive and of the same quality as it is used by say banks
for their own purposes. In WM risk management is often less sophisticated. Fees are
typically lower for IAM than for WM. While IAM are highly regulated the regulation
of WM was in the past much less strong. This changed after the GFC where MiFID II,
Know-Your-Client, product information sheets, etc. heavily increases the WM regulation
setup. Finally, the loyalty of IAM clients is decreasing while WM clients are more loyal.
It will be interesting to observe in the future how loyalty of WM clients will change if
technology will make investments not only more tailor-made but also more open platform
oriented and therefore, less strongly linked to the home institution of the WM clients.

2.7 The Fund Industry

In 1774 Abraham van Ketwich, an Amsterdam broker, oered a diversied pooled se-
curity specically designed for citizens of modest means. The security was similar to a
present day closed-end fund. It invested in foreign government bonds, banks, and West
Indian plantations. The word 'diversication' is explicit in the prospectus of this fund.

The 1920s saw the creation in Boston of the rst open-end mutual fund - the Mas-
sachusetts Investors' Trust. By 1951 more than 100 mutual funds existed and 150 more
were added in the following twenty years. The challenging 1970s - oil crisis - were marked
by a number of innovations. Wells Fargo oered a privately placement, equally weighted
S&P 500 index fund in 1971. This fund was unsuccessful and Wells created a successful
value-weighted fund in 1973. It required hugh eorts - tax and regulatory compliance,
build up stable operations and education of potential investors. Bruce Bent established
the rst money market fund in the US in 1971 such that investors had access to high
money market yields in a period where bank regulated interest rates. In 1975, John
Bogle create a mutual fund rm - Vanguard. They launched 1976 the rst retail index
fund based on the S&P 500 Index. In 1993, Nathan Most developed an ETF based
on the S&P 500 Index. The following table summarizes the worldwide market gures
of investment funds without fund of funds. The fund industry is not free of scandals.
78 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Total Equity Bond Balanced MM GP RE Other ETF IF

Net Assets bn EUR
World 40'786 17'399 8'852 5'104 5'307 48 703 3'372 4'081 3'864
Americas 20'646 10'832 4'875 1'954 2'813 1 19 153 3'047 432
Europe 14'392 4'071 3'428 2'828 1'265 47 606 2'147 624 2'700
Asia 5'613 2'465 545 258 1'208 0 75 1'062 410 733
Africa 135 32 4 65 20 0 4 10 0 0
No. Funds
World 118'978 40'757 20'967 28'008 2'712 954 3'819 21'761 5'277 20'293
Americas 28'379 10'577 6'108 8'152 1'007 26 371 2'138 2'637 3'125
Europe 56'036 14'830 9'914 14'388 906 913 1'816 13'269 1'635 10'994
Asia 14'938 4'881 4'562 750 15 1'552 6'298 1'005 6'174
Africa 1'567 412 64 906 49 80 56

Table 2.13: MM means Money Markets, GP Guaranteed and Protection, RE Real Es-
tate and IF Investment Funds. Data are end of Quarter Q4 2018. Sourcee: EFAMA,
Investment Company Institute (ICI), International Investment Funds Association (IIFA).
Statistics from 47 countries are included in this report.

In 2003 for example illegal late trading and market timing practices were uncovered in
hedge fund and mutual fund companies. Late trading means that trading is executed
after the exchanges are closed. Traders could buy mutual funds when markets were up
at the previous day's lower closing price, and sell at the purchase date's closing price for
a guaranteed prot.

There are dierent types of funds: Mutual funds, index funds, ETFs, hedge funds
and alternative investments. We note some broad characteristics:

• Index mutual funds and most ETFs are passively managed.

• Index funds seek to match the fund's performance to a specic market index, such
as the S&P 500, before fees and expenses.

• Mutual funds are actively managed and try to outperform market indexes. They
are bought and sold at the current day's closing price - the NAV (net asset value).

• ETFs are traded real time at the current market price and may cost more or less
than their NAV.

NAV is a company's total assets minus its total liabilities. If an investment company
assets are worth USD 100 and has liabilities of USD 10, the company's NAV is USD 90.
Since assets and liabilities change daily, NAV also changes daily. Mutual funds generally
must calculate their NAV at least once every business day. An investment company
calculates the NAV of a single share by dividing its NAV by the number of outstanding
shares.
5
5 We assume that at the close of trading a mutual fund held USD 10.5 mn securities, USD 2 mn of
cash, and USD 0.5 mn of liabilities. With 1 million shares outstanding, the NAV is USD 12 per share.
2.7. THE FUND INDUSTRY 79

Funds can be open- or closed-end. Open-end funds are forced to buy back fund shares
at the end of every business day at the NAV, see Table 2.14. Prices of shares traded
during the day are expressed in NAV. Total investment varies based on share purchases,
share redemptions, and uctuations in market valuation. There is no limit on the number
of shares that can be issued. Closed-end funds issue shares only once. The shares are
listed and traded on a stock exchange: An investor cannot give back his or her shares
to the fund but must sell them to another investor in the market. The prices of traded
shares can be dierent to the NAV - either higher (premium case) or lower (discount
case). The vast majority of funds are of the open-end style.

Feature Open-end fund Closed-end fund

Number of outstanding shares Flexible Fixed
Pricing Daily NAV Continuous demand and supply
Redemption At NAV Via exchange
Market share > 95% < 5%
US terminology Mutual fund Closed-end fund
UK terminology Unit trust Investment trust
EU terminology SICAV SICAF

Table 2.14: Features of open-end and closed-end funds. A SICAV (Société

d'Investissement a Capital Variable) is an open-ended collective investment scheme.
SICAVs are cross-border marketed in the EU under the UCITS directive (Undertak-
ings for Collective Investments in Transferable Securities, see below). SICAFs are the
closed-end fund equivalent of SICAVs.

The legal environment is crucial for the development of the fund industry. About
three-quarters of all cross-border funds in Europe are sold in Luxembourg. Luxem-
bourg oers favorable framework conditions for holdings/holding companies, investment
funds, and asset-management companies. These companies are partially or completely
tax-exempt; typically, prots can be distributed tax free. For private equity funds, two-
thirds have the US state of Delaware as their domicile. For hedge funds one-third are
in the Caymans; one-quarter in Delaware. As of Q3 2013, 48 percent of mutual funds
had their domicile in the US, 9 percent in Luxembourg, and around 6 percent in Brazil,
France, and Australia, respectively.

2.7.1 Mutual Funds and SICAVs

The Securities and Exchange Commission (SEC) denes mutual funds as follows:

Denition 12. A mutual fund is a company that pools money from many investors
and invests the money in stocks, bonds, short-term money-market instruments, other
securities or assets, or some combination of these investments. The combined holdings
the mutual fund owns are its portfolio. Each share represents an investor's proportionate
ownership of the fund's holdings and the income those holdings generate.
80 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

In Europe, mutual funds are regulated under the UCITS regime and mutual fund
equivalents are called SICAVs. When we refer below to mutual funds, we always have US
mutual funds in mind. Some characteristics of mutual funds are that investors purchase
mutual fund shares from the fund and not via stock exchange, investors can sell their
share any time, that they pay for mutual fund shares the NAV plus any shareholder
fees, that if there is a new demand, mutual funds create and sell new shares, and nally,
investment portfolios are managed by separate entities (investment advisers) that are
registered with the SEC. Mutual funds are non-listed public companies that neither pay
taxes nor have employees.

The major benets of mutual funds for investors are:

• Diversication and professional management.

• Investor protection (regulation)

• Aordability - the basic unit of a fund unit requires only little money from the
investors and access to assets.

• Partial transparency about the investment process, performance, the investment

portfolio, and the fees.

• Default remoteness. Fund capital is treated as segregated capital.

• Liquidity. Mutual fund investors can redeem at any time their shares at the current
NAV plus any fees and charges assessed on redemption.

• Investment strategy. The investor can choose between active and passive invest-
ment, can have access to rule-based strategies, etc. But he cannot choose a guar-
anteed payo as for structured products. Hence, investors in funds believe that the
fund managers skills generate the performance.

Some disadvantages of mutual funds:

• Lack of control about the securities in the portfolios.

• Price uncertainty. Pricing follows the NAV methodology, which the fund might
calculate hours after the placement of an order.

The Investment Company Institute and US Census Bureau (2015) states that a total
of 43.3% of US households with a median income of USD 85, 000 own mutual funds. The
median mutual fund holdings are USD 103, 000 and the median of household nancial
assets is USD 200, 000. 86% own equity funds, 33% hybrids, 45% bond funds, and 55%
money-market funds. Only 36% was invested in global or international equity funds.
The primary nancial goal (74%) for mutual fund investment are retirement goals.
2.7. THE FUND INDUSTRY 81

2.7.2 US Mutual Funds versus European UCITS

Mutual funds and SICAVs are both collective investment schemes. But there are some
major dierence between the two types of wrapper and the entire industries. We follow
Pozen and Hamacher (2015).

Cross-border distribution has been most successful within the European UCITS for-
mat. This is not only true for Europe. UCITS dominate global fund distribution in more
than 50 local markets (Europe, Asia, the Middle East, and Latin America). This kind
of global fund distribution is the preferred business model in terms of economies of scale
and competitiveness. In 2016 around 80,000 registrations for cross-border UCITS funds
exist. The average fund is registered in eight countries. Furthermore, UCITS are not
required to distribute all income annually.

UCITS do not need to accept redemptions more than twice a month. Although the
two previous points hold in general, many funds oer - for example - the option to dis-
tribute income annually or make redemptions possible on a daily basis. UCITS sponsors
must comply with the EU guidelines on compensation for key personnel: the remunera-
tion directive.

Both, UCITS funds and mutual funds originally were quite restrictive in their in-
vestment guidelines. Then UCITS (similar remarks apply to mutual funds) were allowed
to use derivatives extensively. Using derivatives means, among other things, leveraging
portfolios or creating synthetic short positions - UCITS are not allowed to sell physical
assets short. The strategies of these funds - referred to as 'newCITS' - are similar to
hedge fund strategies and they showed strong growth to USD 294 billion in 2013 accord-
ing to Strategic Insight (2013).

But there are also dierences between US mutual funds and European UCITS on a
more fundamental level. US clients invest in existing funds while European investors are
regularly oered new funds. That is, the number of US mutual funds has been decreasing
in the last decade while the European funds have showed a strong increase in numbers;
see Table 2.15. The stability of the US fund industry is due to the inuence of US
retirement plans (dened contribution), which do not change investment options often.
The tendency to innovate permanently in Europe leads to funds which on average around
six-times smaller than their US counterparts.

2.7.3 Functions of Mutual Funds

2.7.3.1 How They Work
Buying and selling mutual funds is not done via a stock exchange - the shares are bought
directly from the fund. Therefore, the share price is not xed by traders but is equal to
the net asset value (NAV). Investors pay the NAV plus the sales load fee when they buy;
if they sell, they get the NAV minus the redemption fee. While the calculation of the
82 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

2003 2013
US
Number of funds 8,125 7,707
Total Assets USD tr 7.4 15.0
Asset per fund USD mn 911 1,949
Europe
Number of funds 28,541 34,743
Total Assets USD tr 4.7 9.4
Asset per fund USD mn 164 270
Asia
Number of funds 11,641 18,375
Total Assets USD tr 1.4 3.4
Asset per fund USD mn 116 183

Table 2.15: Number of funds, average fund size and assets by region (Investment Com-
pany Institute [2010, 2014] and Pozen and Hamacher [2015]).

NAV is theoretically simple, the process of implementing the calculation is not since one
has to accurately record all securities transactions, consider corporate actions, determine
the liabilities for example. Digitization oer an opportunity to overcome present NAV
calculation problems. If say the NAV can be calculated real-time, why should fund shares
not be listed on a stock exchange?

Mutual funds as companies pay out almost all of their income - dividend and realized
capital gains - to shareholders every year and pass on all their tax duties to investors.
Hence, mutual funds do not pay corporate taxes. Therefore, the income of mutual funds
is taxed only once while the income of 'ordinary' companies is taxed twice.
6

2.7.3.2 Organization of Mutual Funds

The fund's board of directors is elected by the fund's shareholders. It should govern
and oversee the fund, see Figure 2.21. Mutual funds are required to have independent
directors on their boards. The investment adviser manges the fund's portfolio following
the guidance described in the prospectus. The fund administrator oers administrative
services to the fund and ensures that the fund's operations comply with internal and
external legal requirements. The fund's distributor or the principal underwriter sells
fund shares. Mutual funds are required to protect their portfolio securities by placing
them with a custodian. The largest custodians are Bank of New York Mellon, J.P. Morgan
and State Street Bank and Trust Company (see the Appendix for a list of assets under

6 Mutual funds make two types of taxable distributions to shareholders: ordinary dividends and capital
gains. The Internal Revenue Service (IRS) denes rules that prevent ordinary rms from transforming
themselves into mutual funds to save taxes: A rule demands for example that mutual funds have only a
limited ownership of voting securities and that funds must distribute almost all of their earnings.
2.7. THE FUND INDUSTRY 83

custody).

Shareholders

Board of Directors

Mutual Fund

Investment Admini- Transfer

Advisor strators Agent Custodian Indepen-
Principal
Manages the Underwriter Responsible dent Public
Execute Holds fund’s
portfolio fund operations
transactions assets in Certifies the
according to Sells fund and oversees
and document separate fund’s financial
the objectives shares performance of
them to accounts statements
and policies in other
the prosepctus shareholders
companies

Figure 2.21: The organization of a mutual fund (Adapted from ICI Fact Book [2006]).

2.7.3.3 Taxonomy of Mutual Funds

Money Market (MM) Funds

There are tax-exempt and taxable fund types. The former invest in securities backed
by municipal authorities and state governments. Both securities do not pay federal in-
come tax. Which fund to choose is only a question of the after-tax yield. Tax-exempt
funds make sense for investors who face a high tax bracket. In all other cases, taxable
funds show a better after-tax yield. Fund sponsors typically oer a retail and an insti-
tutional investor series of MM funds.

Bond Funds

There are many types of bond funds. Bond funds can be tax-exempt or taxable,
84 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

US and global bonds. In each possible category dierent factors matter: The creditwor-
thiness of the bond, the maturity of the bonds, the segmentation of global bonds into
emerging market bonds and general global bonds and the classication of bonds accord-
ing to dierent economic sectors or specic topics. Finally, alternative bond funds use
techniques from hedge funds to shape the risk and return prole.

Morningstar adopted in 2012 a new classication system to overcome the excessive

number of dimensions that a bond fund can have. The system classies bonds in the
two dimensions creditworthiness (credit quality) and interest-rate sensitivity where each
dimension has three classes such as high/medium or low credit quality and limited/mod-
erate/extensive interest sensitivity. That is, each bond is classied in this 3×3 matrix.
The credit dimension indicates the likelihood that investors will get their invested money
back. The interest-rate sensitivity states the impact of changing interest-rates on the
value of the bonds.

Stock Funds

For stock funds the dierence between tax-exempt and taxable does not exist since
most of their income comes from price appreciation and income from dividends is very
low. Categories are US versus global funds, sectors, regions, style, etc. As for bond
funds, a 3 × 3 style box from Morningstar exists with size as one dimension and style the
other one.

2.7.4 Fees for Mutual Funds

2.7.4.1 Denitions

The SEC (2008) denes the following components for mutual fund fees. (i) fees paid by
the fund out of fund assets to cover the costs of marketing and selling fund shares ... (ii)
'distribution fees', including fees that compensate brokers and others who sell fund shares
and that pay for advertising, the printing and mailing of prospectuses to new investors,...
(iii) 'shareholder service fees' - fees paid to persons who respond to investor inquiries and
who provide investors with information about their investments.

The expense ratio is the fund's total annual operating expenses including management
fees, distribution (12b-1) fees and other expenses. All fees are expressed as a percentage
of average net assets. Other fees include fees related to the selling and purchasing of
funds: Back-end sales load is a sales charge investors pay when they redeem mutual
funds. Front-end sales is the similar fee when funds are bought. It is generally used by
the fund to compensate brokers. Purchase and redemption fees are not the same as the
back- and front-end sales. They are both paid to the fund. The SEC generally limits
redemption fees to 2 percent.
2.7. THE FUND INDUSTRY 85

2.7.4.2 Share Classes

While dierent stock classes are used to express dierent voting rights, dierent mutual
fund classes are used for dierent customers and dierent fees. The most prominent
classes in the US are the A-, B- and C-class.

Class-A shares for example charge a front-end load and have low 12b-1 (distribution)
fees. They are benecial for long run investors. In Europe the type of share classes can
dene the client segmentation, specify investment amount and specify the investment
strategy. For example:

• AA-class: Admissible for all investors, distribution of earnings.

• AT-class: Admissible for all investors, blow back of earnings.

• CA-class: Admissible for qualied investors only, distribution of earnings.

• D-class: Same as CA but blow back of earnings.

• N-class: Only for clients which possess a mandate contract or an investment con-
tract with the bank.

2.7.4.3 TER and Performance

The total expense ratio (TER) is a percentage ratio dened as the ratio between total
business expenses and the average net fund value. TER expresses the total of costs and
fees that are continuously charged. Business expenses are fees for the fund's board of di-
rectors, the asset manager, the custodian bank, administration, distribution, marketing,
the calculation agent, audit, and legal and tax authorities.

The following approach is widely used for performance calculations. Consider a

period starting at 0 with length T. The performance P is dened by:

NAVT × f1 × . . . × fT
P% = × 100 (2.16)
NAV0

with f the adjustment factor for the payout, such as dividends,

NAVex + BA
f= ,
NAVex

with BA the gross payout - that is to say, the gross amount of the earning- and capital-
gain payout per unit share to the investors, and NAVex the NAV after the payout.
86 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Example

Consider a NAV at year-end 2005 of CHF 500 million, 2006 earnings of CHF 10
million, and a capital-gain payout of CHF 14 million. The NAV after payments is CHF
490 million and the NAV at the end of 2006 is CHF 515 million. The adjustment factor
is
490 + 10 + 14
f= = 1.04898.
490
This gives the performance for 2006

515 × 1.04898
P = −1 = 8.045%.
500

There are several reasons why it is important to measure the performance of a fund
correctly: Selection of the best fund, check whether the fund managers do what they
promise and a correctly measured performance allows one to check whether the fund
manager added value.

The performance formula (2.16) can be rewritten in the eective return form

T
Y BAk
(1 + P )NAV0 = NAVT × f1 × . . . × fT = NAVT 1+ . (2.17)
NAVex,k
k=1

If the gross payouts are zero in all periods, then the performance reads

(1 + P )NAV0 = NAVT

with P the simple eective return. Contrarily, assume that in each period a constant
fraction
BA
g = NAV is paid out. Then,
ex

(1 + P )NAV0 = NAVT (1 + g)T ≥ NAVT .

Since (1 + g)T is larger than one, with the same eective return P, the fund without any
payouts achieves a larger nal eective value than the fund with payouts.

Example
The return calculation for funds can be misleading. Consider the following reported
annual returns: 5%, 10%, −10%, 25%, 5%. The arithmetic mean is 7%. The geometric
mean is 6.41%. How much would an investor earn after 5 years if he or she starts with
USD 100?
100 × 1.05 × 1.1 × 0.9 × 1.25 × 1.05 = USD136.4.
2.7. THE FUND INDUSTRY 87

If the fund reports the arithmetic mean, the investor would expect

100 × 1.075 = USD140.2.

Using the geometric mean of 6.41%, the true value of USD 136.4 follows. Although it
is tempting to report the higher arithmetic mean, such a report would be misleading.
Some jurisdictions require funds to report returns in the correct geometric way.

2.7.5 The European Fund Industry - UCITS

Luxembourg attracts dierent kinds of funds by providing dierent vehicles with which to
pool their investments. It oers both regulated and non-regulated structures. For regu-
lated fund in Luxembourg, two options are available. First, an 'undertaking for collective
investment' (UCI), a category which itself is divided into UCIs whose securities are dis-
tributed to the public and UCIs made up of securities that are reserved for institutional
investors. The most common legal form of UCI is a SICAV (Société d'Investissement a
Capital Variable) - that is, an open-ended collective investment scheme that is similar
to open-ended mutual funds in the US. A SICAV takes the form of a public limited
company. Its share capital is - as its name suggests - variable and at any time its value
matches the value of the net assets of all the sub-funds. Closed-end funds are referred
to as SICAFs. Second, a Société d'Investissement en Capital à Risque (SICAR). These
provide a complementary regime to that of UCIs. They are tailor-made for private equity
and venture capital investment. There are no investment diversication rules imposed
by law and a SICAR may adopt an open-ended or closed-ended structure.

Both schemes are supervised by the Luxembourg nancial sector regulator. A main
reason for Luxembourg's attractiveness is taxation. Both, SICAV and SICAF investment
funds domiciled in Luxembourg are exempt from corporate income tax, capital gains tax,
and withholding tax. They are only liable for subscription tax at a rate of 0.05 percent
on the fund's net assets. Also, favorable terms apply with regards to withholding tax.

The UCITS - undertakings for collective investment in transferable securities - direc-

tives were introduced in 1985. They comprise the main European framework regulat-
ing investment funds. Their principal aim is to allow open-ended collective investment
schemes to operate freely throughout the EU on the basis of a single authorization from
one member state ('European Passport'). Their second objective is the denition of levels
of investor protection (investment limits, capital organization, disclosure requirements,
asset safe keeping, and fund oversight).

In summary, UCITS funds are open-ended, diversied collective investments in liquid

nancial assets and are 'product passported' in 27 EU countries.
88 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Total UCITS funds' AuM grew from EUR 3.4 trillion at the end of 2001 to EUR 5.8
trillion by 2010 with a value of EUR 6.8 trillion at the end 2014. Roughly 85 percent of
the European investment fund sector's assets are managed within the UCITS framework.
On average, 10 percent of European households invest directly in funds: Germany, 16%;
Italy, 11%; Austria, 11%; France, 10%; Spain, 7%; and the UK, 6%.

There have been ve framework initiatives - UCITS I (1985) to UCITS V (2016).
Goals of UCITS IV:

• Reduce the administration burden by the introduction of a notication procedure.

• Increase investor protection by the use of key investor information (KID). KID
replaces the simplied prospectus.

• Increase market eciency by reducing the waiting period for fund distribution
abroad to 10 days.

The Mado fraud case and the default of Lehman Brothers highlighted some weak-
nesses in and lack of harmonization of depositary duties and liabilities across dierent
EU countries leading to UCITS V. It considers the following issues. First, it denes
what entities are eligible as depositaries and establishes that they are subject to capital
adequacy requirements, ongoing supervision, prudential regulation and some other re-
quirements. Second, client money is segregated from the depositary's own funds. Third,
the depositary is confronted with several criteria regarding the holding of assets. Fourth,
remuneration is considered. A substantial proportion of remuneration, for example, and
at least 50 percent of variable remuneration, shall consist of units in the UCITS funds
and be deferred over a period that is appropriate in view of the holding period. Fifth,
sanctions shall generally be made public and pecuniary sanctions for legal and natural
persons are dened. Finally, measures are imposed to encourage whistle-blowing.

2.8 Index Funds and ETFs

The work of Fama on market eciency was one reason for the rise in the 70s of low-cost
and passively managed investing through index funds. Another theoretical milestone in
the development of passive management was established by Jensen's (1968) work about
the performance of 115 equity mutual funds:

The evidence on mutual fund performance indicates not only that these 115 mutual
funds were on average not able to predict security prices well enough to outperform a buy-
the-market-and-hold policy, but also that there is very little evidence that any individual
fund was able to do signicantly better than that which we expected from mere random
chance.

A growth analysis of the top ten global asset managers over the past ve years con-
rms this trend. Vanguard with its emphasis on passive products is the strongest growing
2.8. INDEX FUNDS AND ETFS 89

AM, followed by BlackRock with its passive products forming the iShares family. Both
index funds and ETF aim at replicating the performance of their benchmark indices as
closely as possible. Issuers and exchanges set forth the diversication opportunities they
provide - like mutual funds - to all types of investors at a lower cost as for mutual funds,
but also highlight their tax eciency, transparency, and low management fees. Although
actively managed ETFs were launched around twenty years ago their importance remains
negligible. One major reason is that actively managed ETFs lose their cost advantage
compared to mutual funds. As of June 2012 about 1, 200 ETFs existed in the US, in-
cluding only about 50 that were actively managed.

Example Core-satellite

Core-satellite approaches are common in many investment processes. They comprise

a core of long-term investments with a periphery of more specialist or shorter-term in-
vestments. The core is then a passive investment style where index funds or ETFs are
used to implement the passive strategy at low costs (see the following sections for index
funds and ETFs). Satellites are, conversely, often actively managed and the hope is that
they are only weakly correlated with the core.

We next consider in some detail the construction of dierent types of indices.

2.8.1 Index Construction

Besides the member asset prices, there are four other main factors determining the index
value. To calculate the index value the following factors have to be taken into considera-
tion: member weighting, divisor, index return type and value xing. The general formula
for index calculation reads PM
i=1 wi Si
I=
D
where M is the number of assets in the index, Si is the price of a tradable unit of asset
i, e.g. the price of a stock, wi is the weight assigned to the price of that asset and D is
the divisor.

2.8.1.1 Weighting
Various methods are used for determining the weight of individual members in the index.
Within the same category of members there can be subcategories.

• Market capitalization weighting: The members are weighted proportional to the

total market value of the asset issuer, i.e. wi is dependent on the size of the
company for equity. In the equity case this would correspond to the number of
outstanding free-oating shares multiplied by the share price. Subgroups of this
weighting would be if weights were capped at some level, or that no consideration
90 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

was taken into free oat. This is the most common form of weighting for public
indices and the rule for indices such as S&P, FTSE, MSCI and SMI .

• Equal weighting 1 (Price Weighting): The weight assigned to dierent assets is

the same. As a consequence the price of a tradable unit of the asset will have a
determining eect on the weight of an asset in the index . Dow 30 and Nikkei 225
indices are calculated using the equal weighing scheme.

• Equal weighting 2 (Currency Weighting): The CHF weight assigned to each asset
is the same, i.e. Si wi , is the same for each asset. This means that if CHF 500 is
to be invested in a basket of 10 assets, the amount bought of each asset would be
CHF 50.

• Share weighting: The members are weighted proportional to the total number of
tradable units issued, i.e. wi is dependent on the number of the shares outstanding
for the equity asset class.

• Attribute weighting: The members are weighted according to their ranking score in
the selection process. If our ranking is based on ethical and environmental criteria,
and asset Y has a score of 75 and asset X 25, then weight ratio between asset Y
and X will be Weight Y / Weight X = 3.

• Hybrid or Custom weighting: The weighting scheme can be a combination of the

above alternatives or be something totally new, maybe based on the request of
client.

Free-oating is the portion of total shares held for investment purposes. This is opposite
to shares held for strategic purposes, i.e. for control. Some indices are quoted using
dierent weighting schemes, e.g. MSCI. However, the main quoted value is using the
market capitalization weighting method.

Remark:
The dierence between the asset weighting scheme and the weight of an asset in the
index is as follows. For a price weighted index w1 = w2 for asset 1 and asset 2. However
if S1 /S2 = 3 the weight of asset 1 in the index will be 3 times larger than the weight of
asset 2.

2.8.1.2 Divisor
The divisor is a crucial part of the index calculation. At initiation it is used for normal-
izing the index value. For instance, the initial SMI divisor on June 1998 was chosen to
a value, which normalized the index to 1500. However, the main role of the divisor is
to remove the unwanted eects of corporate actions and index member change on the
index value. It ensures continuity in the index value in the sense that the change in the
index should only stem from the investor sentiment and not originate from "synthetic"
2.8. INDEX FUNDS AND ETFS 91

changes. Corporate actions, which need to be accounted for by changing the divisor
value, are dependent on the weighting scheme used for the index.
An example is the eect of a stock split for

• Market capitalization weighting: The price of stock will be reduced, but the number
of free-oating shares will increase. These two eects will be osetting and no
change has to be made to the divisor.

• Equal weighting 1 (Price Weighting): The stock price reduction will have an eect,
but the number of free-oating share has no impact on such a weighting. Therefore,
the divisor has to be changed, to a lower value, in order to avoid a discontinuity in
the index value.

It is important to have a good understanding of the inuence of common corporate actions

such as splits, dividends, spin o, merger & acquisition, rights oering, bankruptcy, etc.
on the index value so that the index value continuity can be ensured.

2.8.1.3 Return Type

How the dividends are handled in the index calculation determines the return type of
the index. There are three versions of how dividends can be incorporated into the index
value calculations:

• Price return index: No consideration is taken to the dividend amount paid out by
the assets. The day-to-day change in the index value reects the change in the
asset prices.

• Total return index: The full amount for the dividend payments is reected in the
index value. This done by adding the dividend amount on the ex-dividend date to
the asset price. Thus, the index value 'acts' as if all the dividend payments were
reinvested in the index.

• Total return index after tax: The dividend amount used in the index calculation
is the after tax amount, i.e. the net cash amount. In contrast, in the total return
index case the gross dividend amount is used.

2.8.1.4 Value Fixing

Another set of rules that characterize an index calculation, is the data values and the
frequency, which they are used. An index value is usually calculated in real time or once
a day. The values that are needed for index value calculation can be quoted in various
versions. For example, the most important value is the asset price. It has to dened
weather the value uses is mid prices, bid or ask prices, last trade prices or any other price
value provided.

In addition, if the index constituents have a wide geographical span, there are other
issues that need to be taken into consideration. Some of the rules that need to dened are:
92 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

index value quotation currency, source of currency rates, index opening and closing hours,
and assets registered on multiple exchanges. For most major indices the quotation is real
time and the currency rate used is also real time. The opening hour for the constructed
index starts with the opening of the exchange of any index member, and the closing occurs
when no index member exchange is open. Having a global index, with constituents from
Japan to USA, would mean that the index would be "open" most hours of the day.

2.8.2 Capital Weighted Index Funds

Index funds are used to gain access to (global) diversied equity market performance.
Traditionally, these indices are constructed using capitalization weights (CWs). In recent
years, new types of weights have been considered. These alternative methods are often
called smart beta approaches. The rationale for CW is the CAPM: all investors hold
the CW market portfolio. The second theoretical input is the ecient market hypothesis
(EMH). These two theoretical streams were the foundation for cost eective, passive in-
vestment in CW instruments: McQuown developed the rst index fund - at Wells Fargo
- in 1970.

One must distinguish between the theoretical index and a strategy that replicates
the theoretical index using securities. The theoretical index is not an investable asset or
security. If we set φi,t for the weight of asset i in the index at time t, with Ri,t the gross
return of the asset in the period t−1 to t, the index value It satises the dynamics

XN
It = It−1 ( φk,t Rk,t ) , I0 = 100 . (2.18)
k=1

The value of the index tomorrow is equal to the present value times the return of each
stock generated until tomorrow weighted by the asset weight. The index fund Ft aims
to replicate (2.18) by investing in the stocks. At each date t the fund has a number nk,t
of stocks k and Ft is equal to the sum of all stocks times their price Pk,t . The dierence
between the values Ft and It is the tracking error where the accuracy of the replication
is often measured with the volatility of the tracking error.

Example

The tracking error (TE) can be calculated directly or indirectly. Consider the follow-
ing returns for a portfolio and its benchmark (market portfolio).

The indirect method uses the following replication of the tracking error. The TE is
equal to buying the portfolio and selling the benchmark. We can use the general variance
2.8. INDEX FUNDS AND ETFS 93

Period [month] Portfolio Market Return dierence

1 0.37% 0.53% -0.16%
2 -1.15% -1.36% 0.21%
3 -1.81% -1.43% -0.38%
4 -0.04% -0.34% 0.30%
5 -1.22% -1.59% 0.37%
6 0.08% -0.30% 0.37%
7 1.18% 1.12% 0.07%
8 -0.52% -0.39% -0.13%
9 1.83% 1.94% - 0.11%
10 -0.70% -0.36% -0.33%
11 -0.66% -0.60% -0.06%
12 -1.60% -1.85% 0.25%
σ √ 1.10% 1.14% 0.27%
σ1y = σ 12 3.80% 3.93% 0.92%

Table 2.16: Direct tracking error calculation. The TE is 0.92%

formula for two random variables and choosing the weights φ1 = +1 and φ2 = −1:

σ 2 = σ12 + σ22 − 2ρσ1 σ2 .

The TE is equal to σ. The covariance of the two time series is 0.011 percent. Dividing by
the volatilities of the two time series the correlation factor ρ = 0.89 follows. This gives
the TE per period and scaling it with the square root law the annualized TE of 0.92%
follows.

Example

This example follows ZKB (2013). Examples of capital-weighted indices include

the S&P 500, FTSO, MSCI, and SMI. Other indices use equal weighting (EW). Dow
Jones 30 and Nikkei 225 are both equally weighted indices. Other types include share
weighting and attribute weighting. In attribute weighting the weights are chosen
according to their ranking score in the selection process. If our ranking is based on
ethical and environmental criteria, and asset Y has a score of 75 and asset X of 25, then
the weight ratio between asset Y and X will be 3.

The divisor is a crucial part of the index calculation. At initiation it is used for
normalizing the index value. The initial SMI divisor in June 1998 was chosen as a value
that normalized the index to 1, 500. However, the main role of the divisor is to remove the
unwanted eects of corporate actions and index member changes on the index value. It
94 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

ensures continuity in the index value in the sense that the change in the index should only
stem from investor sentiment and not originate from 'synthetic' changes. The impact of
corporate actions depends on the weighting scheme used for the index. Consider a stock
split for an index with:

• Market capitalization weighting - The price of the stock will be reduced and the
number of free oating shares increases. These two eects will be osetting and no
change has to be made to the divisor.

• Equal weighting (price weighting) - The stock price reduction will have an eect,
but the number of free-oating shares has no impact on such a weighting. Therefore,
the divisor has to be changed to a lower value in order to avoid a discontinuity in
the index value.

How the dividends are handled in the index calculation determines the return type of
the index. There are three versions of how dividends can be incorporated into the index
value calculations:

• Price return index - No consideration is taken of the dividend amount paid out
by the assets. The day-to-day change in the index value reects the change in the
asset prices.

• Total return index - The full amount for the dividend payments is reected in the
index value. This is done by adding the dividend amount on the ex-dividend date
to the asset price. Thus, the index value acts as if all the dividend payments were
reinvested in the index.

• Total return index after tax - the dividend amount used in the index calculation is
the after tax amount; that is to say, the net cash amount.

The relative weights φ are, for a CW index, dened by

Mk,t Pk,t
φk,t = PN (2.19)
j=1 Mj,t Pj,t

with M the number of outstanding shares. The numerator is the market capitalization
of stock k and the denominator is the market capitalization of the index. The weights φ
can change as follows, where we write MC for the index market capitalization:

∆Mk,t Pk,t ∆Pk,t Mk,t ∆Mk,t Pk,t ∆M C

∆φk,t = + − . (2.20)
MC MC (M C)2
The three possible changes of the weights reect the changes in the outstanding shares,
price changes and changes in the index market capitalization. The second change is
the most important one. The two others are more constant in nature. If the market
2.8. INDEX FUNDS AND ETFS 95

shares are constant over time, the same holds true for the number of shares N that are
needed to construct the fund. This is one of the main reasons why CW is often used:
the constancy of the shares implies low trading costs. This reason and the simplicity of
the CW approach have made it the favorite index construction method.

2.8.3 Risk Weighted Index Funds

There are reasons why one searches for alternatives to the CW approach: The rejection
of the CAPM and the procyclical behavior strategy of CW. Suppose that one single stock
in the CW index formula (2.19) is outperforming all others at a very high rate. Then, the
weights will be concentrated over time in this single stock. Diversication is lost and the
index construction turned into a concentration of idiosyncratic risk with the respective
large drawdown risk of such a construction in the past.

Alternative weighting schemes - smart beta approaches - weight the indices not by
their capital weights but either by other weights, which should measure the economic
size of companies better (fundamental indexation), or by risk-based indexation. Most
often, investors will use a mixture of CW and alternative schemes. A rst requirement for
such a mix is that the two approaches show a low correlation. Fundamental indexation
serves the purpose of generating alpha to dominate the CW approach while risk-based
constructions focus on diversication.

Examples of risk weighted allocations are EW, MV, MDP, ERC and MDP. Roncalli
(2014) compares the dierent methods for the Euro Stoxx 50 index using data from
December 31, 1992, to September 28, 2012. He computes the empirical covariance matrix
using daily return and a one-year, rolling window; rebalancing takes place on the rst
trading date of each month and all risk-based indices are computed daily as a price index,
see Table 2.17.

CW EW MV MDP ERC
Expected return p.a. 4.47 6.92 7.36 10.15 8.13
Volatility 22.86 23.05 17.57 20.12 21.13
Sharpe ratio 0.05 0.16 0.23 0.34 0.23
Information ratio - 0.56 0.19 0.42 0.62
Max. drawdown -66.88 -61.67 -56.04 -50.21 -56.85

Table 2.17: Statistics for the dierent index constructions of the Euro Stoxx 50. CW is
capital weighting, EW is equal weighting, MV is mean-variance optimal, MDP is most
diversied portfolio, and ERC is equal risk contribution (Roncalli [2014]).

2.8.4 ETFs
Exchange traded funds (ETFs) are a mixture of open- and closed-end funds. The main
source is Deville (2007). They are hybrid instruments which combine the advantages of
96 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

both fund types. Mutual funds must buy back their units for cash, with the disadvantage
that investors can only trade once a day at the NAV computed after the close. Further-
more, the trustee needs to keep a fraction of the portfolio invested in cash to meet the
possible redemption outows. Closed-end funds avoid this cash problem. Since it is not
possible to create or redeem fund shares, there is no possibility to react to changes in
demand for the shares in such funds: If there are strong shifts in demand, price reactions
follow such as signicant premiums or discounts with respect to their NAV.

ETF trade on the stock market and shares can be created or redeemed directly from
the fund due to the in-kind creation and redemption process.

The in-kind process idea is due to Nathan Most. ETFs are organized as commodity
warehouse receipts with the physicals delivered and stored, whereas only the receipts
are traded, although holders of the receipt can take delivery. This 'in-kind' - securities
are traded for securities - creation and redemption principle has been extended from
commodities to stock baskets, see Figure 2.22.

It illustrates the dual structure of the ETF trading process with a primary mar-
ket open to institutional investors (AP) for the creation and redemption of ETF shares
directly from the fund. The ETF shares are traded on a secondary market. The perfor-
mance earned by an investor who creates new shares and redeems them later is equal to
the index return less fees even if the composition of the index has changed in the mean-
time. Only authorized participants can create new shares of specied minimal amounts
(creation units). They deposit the respective stock basket plus an amount of cash into
the fund and receive the corresponding number of shares in return. ETF share are
not individually redeemable. Investors who want to redeem are oered the portfolio of
stocks that make up the underlying index plus a cash amount in return for creation units.

Since ETFs are negotiated on two markets - primary and secondary market - it has
two prices: the NAV of the shares in the primary market and their market price in the
secondary market. These two prices may deviate from each other if there is a pressure
to sell or buy. The 'in-kind' creation and redemption helps market makers to absorb
such liquidity shocks on the secondary market, either by redeeming outstanding or by
creating shares. It also ensures that departures between the two prices are not too large
since authorized participants in the primary market could arbitrage any sizable dier-
ences between the ETF and the underlying index component stocks. If the secondary
market price is below the NAV, APs could buy cheap ETFs in the secondary market, take
on a short position in the underlying index stocks and, then ask the fund manager to
redeem the ETFs for the stock basket before closing the short position at a prot. Since
ETF fund manager do not need to sell any stocks on the exchange to meet redemptions,
they can fully invest their portfolio and the creations do not yield any additional costly
trading within the fund. Finally, in the US, 'in-kind' operations are a nontaxable event.

Most ETFs track an index and are passively managed. ETFs generally provide diver-
2.8. INDEX FUNDS AND ETFS 97

Primary
ETF pponsor / fund market
Institutional

Creation
investors

Redemption
Stock basket + cash ETF shares in return
in return ETF shares for stock basket + cash

Autorized participants / Cash

Stock market
Market makers Stocks

Buy / Sell
Institutional and
retail investors

Cash Cash
Buyers Exchange Sellers
ETF shares ETF shares

Secondary
market

Figure 2.22: Primary and secondary ETF market structure where the 'in-kind' process for
the creation and redemption of ETF shares is showsn. Market makers and institutional
investors can deposit the stock basket underlying an index with the fund trustee and
receive fund shares in return. These created shares can be traded on an exchange as
simple stocks or later redeemed for the stock basket then making up the underlying
index. Market makers purchase the basket of securities that replicate the ETF index
and deliver them to the ETF sponsor. In exchange each market maker receives ETF
creation units (50,000 or multiples thereof ). The transaction between the market maker
and the ETF sponsor takes places in the primary market. Investors who buy and sell the
ETF then trade in the secondary market through brokers on exchanges. (Adapted from
Deville [2007] and Ramaswamy [2011]).

sication, low expense ratios, and the tax eciency of index funds, while still maintaining
all the features of ordinary stock, such as limit orders, short selling, and options. ETFs
can be used as a long-term investment for asset allocation purposes and also to im-
plement market-timing investment strategies. All of these features rely on the above
described specic 'in-kind' creation and redemption principle. ETF are constructed by
index providers, exchanges, or index fund managers (the originators).

The costs of an ETF have two components: transaction costs and total expense ratio
(TER). Transaction costs are divided into explicit and implicit costs. Explicit trans-
action costs include fees, charges, and taxes for the settlement by the bank and the
exchange. Implied costs are bid-ask spreads and costs incurred due to adverse mar-
ket movements. ETFs can be constructed by direct replication (physical) or by using
98 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

swap-backed construction (synthetic). Physicaal replication is a transparent approach

with low counterparty risk (which occurs due to securities lending). Physical replication
can be expensive for tracking broad emerging market equity or xed income indices.
Commodities ETFs and leveraged ETFs not necessarily employ full replication because
the physical assets are either dicult to store or to leverage. Referring only to a subset
of the underlying index securities for physical replication leads to a signicant tracking
error in returns between the ETF and the index. In a swap-backed construction, the
performance of a basket is exchanged between the ETF and the swap counterparty.

Trends in ETF investment arise from regulation and investors' desire. From a reg-
ulatory perspective there has been barriers for active managers due to regulations by
Retail Distribution Review (RDR) in UK and MiFID II in the euro zone. But growth
in passive strategies will also be driven by cost transparency and the search for cheap
investments. But also new uses for ETFs will emerge. Institutions will use them to get
access to specic asset class or geographic exposures and retail investors will invest in
ETFs as a lower-cost alternative to mutual funds and UCITS funds. Finally, trends in
the last year are to construct ETF not on an CW basis but on a risk weighted one using
risk parity methods and to focus on risk factors instead of asset classes as underlying
instruments.

2.8.4.1 Unfunded Swap-Based Approach

In the swap-based approach one invests indirectly in a basket by achieving the index
performance via a total return swap (TRS), see Figure 2.23. The ETF sponsor pays cash
to the swap counterparty and indicates which index should matter for the ETF. The
swap counterparty is often the parent investment bank of the ETF sponsor. The TRS
swaps the index return against a basket return - that is to say, the ETF sponsor receives
the desired index return needed for the ETF and delivers a basket return to the swap
counterparty. The basket should be close to the index; the closer it is the lower is the
tracking error borne by the swap counterparty. The swap counterparty delivers a basket
of securities to the ETF sponsor as collateral for the cash paid.

This approach minimizes the tracking error for the ETF investor and enables more
underlyings to be accessed. The basket of securities used as collateral is typically not
related to the basket delivered to the swap counterparty, which mimics the index. Why
should an investment bank, as swap counterparty, enter into such a contract, see the next
example.

Example

Assume that three securities - S1 , S2 , and S3 - make up an Index I . The weights

of S1 and S2 are each 48%, andS3 contributes 4% to the index. The ETF sponsor
delivers the basket consisting of assets S1 and S2 only to the swap counterparty. The
2.8. INDEX FUNDS AND ETFS 99

missing S3 -asset is the tracking error source. The swap counterparty (the investment
bank (IB)) delivers to the ETF sponsor seven securities, C1 ,..., C7 , as collateral. These
assets are in the inventory of the IB due either to its market-making activities or the
issuance of derivatives, i.e. business that is not related to ETFs. When these securities
Ci are less liquid, they will have to be funded either in unsecured markets or in repo
markets with deep haircuts. The IB has, for example, to pay 120% for a security Ci that
is worth only 100% at a given date. Transferring these securities to the ETF sponsor,
the IB may benet from reduced warehousing costs for these assets. Part of these cost
savings may then be passed on to the ETF investors through a lower total expense ratio
for the fund holdings. The cost savings accruing to the investment banking activities
can be directly linked to the quality of the collateral assets transferred to the ETF
sponsor. A second possible benet for the IB is lower regulatory and internal economic
capital requirements since the regulatory charge for less liquid securities Ci is larger than
for the more liquid securities S1 and S2 in the basket delivered by the ETF sponsor.
Summarizing, a synthetic swap has a positive impact on the security inventory costs of
the IB due to non-ETF business or regulatory capital or internal economic risk capital
charges.

The drawbacks of synthetic swaps are counterparty risk and documentation require-
ments (International Swaps and Derivatives Association [ISDA]).

Figure 2.23: Unfunded swap ETF structure (Ramaswamy [2011]).

100 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

2.8.4.2 ETFs for Dierent Asset Classes

The rst and most popular ETFs track broad stock indices, sector indices or specic niche
areas like green power. The evolution of ETFs by region between 2010 and 2013 (World
Federation of Echanges [2014]) shows the dominance of the Americas with around 90%
of the traded ETF volumes, followed by Asia and Europe, both with 5% and 6%. The
size in Europe declined in the period whereas the size in Asia doubled. The worldwide
ETF assets in USD bn were 9.670 in 2010 and 11, 893 in 2013.

Bond ETFs face typically face huge demand when stock markets are weak such as
when recessions occur. An asset rotation from stocks to bonds is often observed in such
cases. Figure 2.24 shows bond inows of USD 800 billion and equity redemption in long-
only equities (LO equities) after the GFC. In the last years an opposite rotation began
due to close-to-zero interest rates.

Figure 2.24: Bond inows and equity redemptions (BoA Merill Lynch Global Investment
Strategy, EPFR Global [2013]).

Commodity ETFs invest in oil, precious metals, agricultural products, etc. The idea
of a gold ETF was conceptualized in India in 2002. At the end of 2012 the SPDR Gold
Shares ETF was the second-largest ETF. Rydex Investments launched 2005 the rst cur-
rency ETF. These funds are total return products where the investor gets access to the
FX spot change, local institutional interest rates, and a collateral yield.

Actively managed ETFs were oered in the United States since 2008. Initially, they
grew faster than index ETFs did in their three years. But the growth rate was not
2.8. INDEX FUNDS AND ETFS 101

sustainable: The number of actively managed ETFs is not growing since several years.
Many academic studies question the value of active ETF management since they face
the same skill and luck issue as mutual fund and much higher costs than static ETFs.

2.8.4.3 Leveraged ETFs (LETFs)

Leveraged ETFs (LETFs) or inverse leveraged ETFs use derivatives to seek a return that
corresponds to a multiple of the unleveraged ETF. LETFs require nancial engineering
techniques in their construction and the life cycle management to achieve the desired
return. Trading future contracts is a common way to construct leveraged ETFs. Re-
balancing and re-indexing of LETFs can be costly in turbulent markets. LETFs deliver
positive or negative multiples of a benchmark's return on a daily basis. Several empirical
studies show that LETFs deviate signicantly from their underlying benchmark. This
tracking error has two main causes - a compounding eect and a rebalancing eect, Dobi
and Avellaneda (2012).

Example

Consider a LETF with positive leverage factor 2 (bullish leverage). We follow Dobi
and Avellaneda (2012). There are three time periods 0, 1, 2 in the example (see Table
2.18). The index value of the ETF starts at 100, loses 10%, and then gains 10%.

Time Grid t0 t1− t1+ t2− t2+

Index Value 100 90 99
AuM 1,000 800 960
TRS exposure needed 2,000 1,600 1,920
Notional TRS 2,000 1,800 1,6000 1,760 1,920
Exposure adjustment 0 - -200 - +160

Table 2.18: Data for the leveraged ETF example. tk,− denotes the time tk before adjust-
ment of the TRS and tk,− after the adjustment of TRS.

The initial AuM is USD 1, 000 at day 0, and the AuM is USD 800 at day 1 due to
the 10% drop on day 1:
USD 800 = 1, 000(1 − 2 × 0.1).
This implies a required TRS exposure of 2 × 800 = USD1, 600. The notional value of the
TRS from day 0 has become, at day 1,

USD 2, 000 × (1 − 0.1) = 1, 800.

102 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

1 is USD 1, 600,
This is the exposure before adjustment. Since the exposure needed at day
the swap counterparty must sell (short the synthetic stock) USD 200 = 1, 800 − 1, 600
of TRS. Doing the same calculation for day 2, the AuM is USD 960 and the exposure
needed is USD 1, 920 at day 2. Similarly, on day 2 the swap counterparty must buy a
TRS amount of USD 160 = 1, 920 − 1, 760, where USD 1, 760 = 1, 600 × (1 + 0.1) is the
exposure before adjustment.

Example

We consider the compounding problem for a LETF. Fix an index and a two-time
LETF, both beginning at 100. Assume that the index rst rises 10% to 110 and then
drops back to 100, a drop of 9.09%. The LETF will rst rise 20% and then drop 18.18% =
2 × 9.09%. But 18.18%120 = 21.82. Therefore, while the index has value 100, the LETF
is at 98.18. which implies a loss of 1.82%. Such losses always occur for LETF when the
underlying index value changes direction. The more frequent such directional changes
are - hence it is a volatility eect - the more pronounced the losses.

These examples illustrate that a LETF always rebalances in the same direction as
the underlying index, regardless of whether the LETF is a bullish one (positive leverage)
or bearish one (negative leverage). The fund always buys high and sells low in order to
maintain a constant leverage factor. A similar results holds for inverse LETFs.

2.8.5 Evolution of Expense Ratios

Figure 2.25 shows the evolution of expense ratios for actively managed funds and index
funds.

The trend of decreasing fees continues. But for the index funds a bottom level seems
to be close. Table 2.19 also considers ETF fees.

Equity Bonds
Mutual funds (*) 0.74% 0.61%
Index funds (*) 0.12% 0.11%
ETFs (**, ]) 0.49% 0.25%
ETF core (**,+) 0.09% 0.09%

Table 2.19: Fees p.a. in bps in 2013 ((*) Investment Company Institute, Lipper; (**)
DB Tracker; (]) Barclays; (+) BlackRock).
2.9. ALTERNATIVE INVESTMENTS (AI) - INSURANCE-LINKED INVESTMENTS103

Expense Ratios of Actively Managed and Index Funds, bps

p.a.
120

100

0
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Actively managed bond funds Index bond funds

Actively managed equity funds Index equity funds

Figure 2.25: Expense ratios of actively managed (upper lines) and index funds (lower
lines) - bps p.a. (Investment Company Institute and Lipper [2014]).

2.9 Alternative Investments (AI) - Insurance-Linked Invest-

ments

AIs are often dened as investments in asset classes other than stocks, bonds, commodi-
ties, currencies, and cash. These investments can be illiquid. We only consider insurance
linked securities in the sequel. It is estimated that alternative investments will reach
to USD 13 trillion by 2020 up from USD 6.9 trillion in 2014. One expects that more
and more investors can access AIs as regulators begin to allow them access to specic
regulated vehicles such as alternative UCITS funds in Europe and alternative mutual
funds in the US.

This section is based on LGT (2014). Insurance-linked investments are based on the
events of life insurers, and of non-life insurers such as insurers against natural catastro-
phes for example. The main products are insurance-linked securities (ILS such as CAT
bonds) and collateralized reinsurance investments (CRI). The size, in global terms, of
this relatively young market is USD 200 bn as of 2014. Regulation plays a signicant
role in the use of alternatives. The creditworthiness of the insurance and reinsurance
company require large capital basis' from a regulatory perspective for the catastrophe
cases. To reduce the capital charge under Solvency II, the catastrophe part of the risks
is transferred to the capital markets using ILS and CRI.
104 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

2.9.0.1 ILS
Insurance buyers such as primary insurers, reinsurers, governments, and corporates enter
into a contract with a special purpose vehicle (SPV). They pay a premium to the SPV
and receive insurance cover in return. The SPV nances the insurance cover with the
principal paid by investors. The principal is returned at the end of the contract if no
event has occurred. The investor receives, in excess to the principal payback, the pre-
mium and a collateral yield.

An example is the catastrophe or CAT bond 'Muteki'. Muteki SPV provided the
insurance buyer Munich Re with protection against Japanese earthquake losses. Central
to ILS investing is the description of the events. The description has to be transparent,
unambiguous, measurable, veriable, and comprehensive. The parametrization in Muteki
is carried out using parameters from the 1,000 observatories located in Japan that use
seismographs. 'Ground acceleration' is used to calculate the value of the CAT bond
index. This determines whether a payout from the investors to the insurance protection
buyers is due.
7 Figure 2.26 shows the peak ground velocities measured during the 11
March, 2011 earthquake. The star indicates the epicenter; the regions with the highest
ground velocities also experienced the related tsunami.
The insurance industry lost an estimated USD 30−35 billion. The ground acceleration
data became available on 25 March, 2015. Multiplying the ground velocity chart by the
weight-per-station chart of Munich Re implied an index level for the CAT bond of 1, 815
points. This index level led to a full payout from the investors to the insurance buyer
since the trigger level - that is to say, the level of the index at which a payout starts to
be positive - of 984 was exceeded and also because the exhaustion level of 1,420 points
was breached. Hence, investors in this CAT bond suered a 100 percent loss.

2.9.0.2 CRI
In collateralized reinsurance investments (CRIs) the same insurance protection buyers as
for ILS buy insurance cover from an SPV in exchange for a premium. The SPV hands
over the premium and collateral yield to the investor. The investor pays, in cases where
he receives proof of loss, the loss payment to the SPV. Between the investor and the in-
surance buyer a letter of credit is set up to guarantee the potential loss payment. Table
2.20 summarizes ILS and CRI product specications. The ILS pays out if an event is
realized and triggers are met. Then the bond pays out. For the CRI, if and event is
realized and triggers are met, the investor makes a loss payment.

ILS and CRI comprise 13 percent and 18 percent, respectively, of total reinsurance
investments. The remainders are traditional uncollateralized reinsurance investments.
The cumulative issuance volume of CAT bonds and ILS started in 1995, reached 20 bn

7 The exposure of Munich Re in Japan is not uniformly spread over the whole country. The insurer
therefore weights the signals of the measuring stations such that the payout in the CAT bond matches
the potential losses of Munich Re from claims incurred due to the event.
2.9. ALTERNATIVE INVESTMENTS (AI) - INSURANCE-LINKED INVESTMENTS105

Figure 2.26: Ground velocities measured by the Japan's 1,000 seismological observatories
during the earthquake of 11 March, 2011, which also caused a huge tsunami and almost
20,000 fatalities (Kyoshin [2011]).

in 2007, 40 bn in 2010 and 70 bn in 2015.

8 Figure 2.27 shows the average catastrophe
bond and ILS expected loss and coupon by year.
The correlation with traditional asset classes are smaller than comparable correlations
between bonds and stocks, see Table 2.21. Nevertheless, correlation is weakly positive.
This is due to the fact that catastrophe events always have an impact on rm value in
both directions. The correlation with government bonds is much less aected and would
become stronger if a catastrophe event had a signicant impact on the entire economy

8 The main intermediaries or service providers to the catastrophe bond and insurance-linked securiti-
zation market in 2014 were Aon Beneld Securities, Swiss Re Capital Markets, GC Securities, Goldman
Sachs, and Deutsche Bank Securities.

Parameter ILS CRI

Wrapping Fixed-income security Customized contract
Return Collateral yield + premium Collateral yield + premium
Term 12 to 60 months 6 to 18 months
Size USD 2 to 500 mn USD 2 to 50 mn
Liquidity Tradable asset; liquid Non-tradable asset
Market size for non-life risk (2014) USD 24 bn USD 35 bn

Table 2.20: Comparison between ILS and CRI investments (LGT [2014]).
106 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Figure 2.27: Average expected coupon and average expected loss of CAT bonds and ILS
issuance by year (artemis.com [2015]).

ILS Govt bonds Corporate bonds Equities

ILS 100%
Govt bonds 8% 100%
Corporate bonds 25% 35% 100%
Equities 23% -22% 63% 100%

Table 2.21: Correlation matrix for dierent asset classes. Monthly data in USD from 31
Dec 2003 until 30 Nov 2014 (LGT [2014], Barclays Capital, Citigroup Index, Bloomberg).

of a nation.

2.10 Private Markets

Private markets (PM) compared to public markets are characterized as follows. First,
the assets to invest are not publicly traded. Second, shares can only be bought and sold
in large quantities. Third, information about the company where investment takes place
is more detailed than in public markets but only accessible for shareholders. Fourth,
shareholders are typically heavily involved and hold often a majority stake in the com-
pany. This means that a private equity rm such as The Blackstone Group not only
2.10. PRIVATE MARKETS 107

buys shares for the investors but is heavily involved in the management of the company.
9
Fifth, PM transactions are characterized by signicant access to capital and to networks
with strong expertise.

The evolution of PM can roughly classied in three periods. In the area 1970-1990
private markets ment emergence of leveraged buyouts focussed on the US and in the
retail, chemical and manufacturing sectors. Such leveraged buyouts (LBO) mean to buy
a company using a combination of equity and debt where the company's cash ow is used
to repay the borrowed money. Debt is used since it costs of capital are lower than for eq-
uity. Interest payments reduce the corporate income tax liability but dividend payments
based on equity do not. The use of leveraged buyouts led to several defaults of rms
since their debt ratio was too high. This led banks to require lower debt-to-equity ratios.
In the period 1990-2010 private equity became broader in the industries they invested
in (healthcare, education) and PM became a global activity. In the last period starting
after the GFC PM became broader with three pillars: Debt, real estate and infrastructure.

The low interest rate environments made private markets attractive for investors such
as pension funds which before the GFC did not invest in these markets. A study of Tow-
ers Watson in 2017 highlighted that 94% of the actual PM investors will increase or
maintain their private market allocations in the longer-term. The AuM in PM steadily
increased from 2006 USD 1.5 tr to more than USD 4 tr in 2017. Dry powder however
did not increase in the same period but dropped from around 40 percent before the GFC
to values between 30 and 35 percent in the last years. Dry powder refers to highly liquid
securities. If deal activity falls and dry powder accumulate a risky situation can emerge
when investors adds pressure to the PM rm to deploy that capital, i.e. doing transac-
tions they might not otherwise do.

A second observation related PM and public markets in the last 25 years. First, the
number of publicly listed rms dropped from 7'322 in 1996 to 3'671 in 2016 (Credit Su-
isse, Doldge et al. (2016)) and second, private rms stay private longer or even forever.
Facebook for example was founded 2004 and had its IPO in 2012.

The valuation of share in PM and public markets are in 2018 both at historic highs.
The S&P 500 index increased by a factor of almost 2.5 in the last 6 years and EV/EB-
BITA in PM also increased around 40 percent in the same period for value 14x for large
caps and 12x for small and mid caps (Sources: S&P and Partners Group (2018)).

Figure 2.28 shows that operational value creation drives performance more than -
nancial development which is the opposite compared to the Leveraged Buyout period.
A further signicant tendency of investor is abstain from excessive diversication

9 The ten largest PE rms in 2017 according to PEI Media are The Blackstone Group, Kohlberg
Kravis Roberts, The Carlyle Group, TPG Capital, Warburg Pincus, Advent International Corporation,
Apollo Global Management, EnCap Investments, Neuberger Berman and CVC Capital Partners.
108 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Figure 2.28: Drivers of performance in PM. (Partners Group [2017]).

but to search high conviction portfolios. Excessive diversication was a result of lack of
transparency whereas high conviction is the result of experiences and successful selection
of the investments. Typically, in the past institutional investor spread their PM invest-
ments among hundreds of assets. Today, the most successful PM rms see the investment
spreads only across several dozen of assets.

Comparing return in PM after the GFC with public markets, roughly PM have a 3
to 4 percent average higher return than their public counter parts in the equity, debt,
real estate and infrastructure investments and considering maximum drawdowns in the
period 2000-2015, the gures for PM are between 20 and 30 percent lower in the above
four classes compared to the public counter parts.

Some major players in the PM start to oer part of their PM oering to wealthy
private clients or auent clients. This requires to transform some of the PM oerings
into public ones. Since many investors became familiar with PM in the last years they in-
creased their allocations and invest globally. This requires PM rms to consider portfolio
construction techniques on a more sophisticated level than in the past.
2.11. HEDGE FUNDS 109

2.11 Hedge Funds

2.11.1 What is a hedge fund (HF)?
HFs allow for private placement collective investments for mostly qualied investors. HFs
are an investment strategy and not an asset class in their own right since they often trade
in common liquid asset classes.
10 HFs often use short positions, derivatives, and lever-
age in their strategies. From a regulatory and tax perspective, HFs were often oshore
domiciled on certain islands/countries that oer tax advantages or have low regulation
standards. But regulation of hedge funds is changing. But since 2012, HFs with assets
exceeding USD 150 million have to register and report information to the SEC.
11 HFs
have to satisfy less stringent disclosure rules than mutual funds.

HFs often have a limited number of wealthy investors. If a HF restricts the number
of investors it is not a registered investment company. It is then in the US exempt from
most parts of the Investment Company Act Of 1940 (the 40-Act). Most HFs in the US
have a limited-partnership structure. The limitation of the number of investors auto-
matically increases the minimum investment amount to USD 1 million or more. Many
HFs do not allow investors to redeem their money immediately. The reason are short
positions of the funds. To reduce this risk, HF needs to pay margins. If short positions
increase, HFs need to add more and more margin and would then eventually face liquid-
ity problems if at the same time investors redeem their money. Mutual funds are not
allowed to earn non-linear fees, while most HFs do charge a at management fee and a
performance fee (rule 2/20 for 2% management fee, 20% performance fee). The business
of running a hedge fund has become more expensive due to the increased regulatory
burden. KPMG (2013) outline the following gures for the average set-up costs: USD
700, 000 for a small fund manager, USD 6 million for a medium-sized one, and USD
14 million for the largest. In all, KPMG estimated hedge funds had spent USD 3 billion
meeting compliance costs associated with new regulation since 2008 - equating to, roughly,
a 10 per cent increase in their annual operating costs. KPMG (2013).

HFs can face losses due to their construction or the market structure even in cases
when there are no specic market events. As Khandani and Lo (2007) state, quantitative
HFs faced a perfect nancial storm in August 2007 in a normal market environment. The
Global Alpha Fund, managed by Goldman Sachs Asset Management, lost 30 percent in a
few days although it claimed to be designed for low volatility and low correlated strategies.
The HF received an injection of USD 3 billion to stabilize it.

10 The main sources are the hedge fund review of Getmansky, Lee, and Lo (2015) and Ang (2013).
11 Fatca, the Foreign Account Tax Compliance Act, is an US extraterritorial regime of hedge fund
regulation. It requires all non-US hedge funds to report information on their US clients. Europe's
Alternative Investment Fund Managers Directive (AIFMD) requires information by any fund manager
independent where they are based if they sell to an EU-based investor.
110 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

2.11.2 Hedge Fund Industry

The rst HF was set up by Jones in 1949. This fund was based on three principles.
First, it was not transparent how Jones was managing the fund. Second, there was a
performance fee of 20 percent, but no management fee. Third, the fund was set up as a
non-public fund. This framework is still applied by most HFs today.

The largest HF in 2014, 2017, 209 are shown in Figure 2.22. Total HF size in 2014
was USD 2.85 trillion versus USD 2.6 trillion in 2013. The average growth in HF assets
from 1990 to 2012 was roughly 14 percent per year. The decrease in AuM after the GFC
was fully recovered six years later. The losses incurred during the GFC were around 19
percent, which is only around half the losses of some major stock market indices. In the
period 2009 to 2012, HF performance was lower than the S&P 500, ranging between 4.8
percent and 9.8 percent on an annual basis.

Hedge Funds USD bn 2014 USD bn 2017 USD bn 2019 Growth 14-
Bridgewater Associates USA 87.1 122.2 124.7
40
AQR Capital Management USA 29.9 69.9 62
J.P. Morgan Asset USA 59.0 45.0 47.7
Renaissance Technologies USA 24.0 42.0 110
Two Sigma Investments/Advisers USA 17.5 38.9 51
D.E. Shaw USA 22.2 34.7 62
Millenium Management USA 21.0 33.9 39
Man Group, London UK 28.3 33.9 62 20
Och-Zif Capital Management USA 36.1 33.5 32
Winton Capital Management UK 24.7 32.0 22.1 30
Elliott Management Corporation USA 23.3 31.3 35

Table 2.22: Largest hedge funds. (Barclays Hedge Fund Database)

The decreases in AuM during the GFC and the European debt crisis from USD 2.1
tr to 1.5 tr show that investors allocate money pro-cyclically to HFs, similar to mutual
funds or ETFs. The following facts regarding the largest HFs are from Milnes (2014)
(the number after the hedge fund's name is its ranking in the list of the world's largest
HFs as of 2014).

• Bridgewater Associates (1). There was a relatively poor performance of the three
agship funds in 2012 and 2013 of 3.5%, 5.25%, and 4.62%. The performance over
ten years is 8.6%, 11.8%, and 7.7%.

• J.P. Morgan Asset Management (2). J.P. Morgan bought 2004 the global multi-
strategy rm Highbridge Capital Management for USD 1.3 billion. Highbridge's
assets have 2004 multiplied by nearly 400 percent to USD 29 billion.
2.11. HEDGE FUNDS 111

• Brevan Howard Capital Management (3). This HF maintains both solid returns
and asset growth - which is the exception of a HF. The agship is a global macro-
focused HF (USD 27 bn AuM), which - since its launch in 2003 - has never lost
money on an annual basis.

• Och-Zi Capital Management (4) oers publicly traded hedge funds in the US with
far greater disclosure than other HFs. Its popularity is mainly due to Daniel Och's
conservative investing style.

• BlueCrest Capital (5) was a spin-o from a derivative trading desk at J.P. Morgan
in 2000. It has grown rapidly and is one of the biggest algo hedge fund rms. Its
reputation boosted up in 2008 when it made large prots while most other HF
facing losses.

• AQR Capital Management (7), co-founded by Cli Asness, gives retail investors
access to hedge fund strategies. Asness is also well-known for his critique of the
unnecessarily high fees charged by most HFs and his scientic contributions.

• Man Group (9) was founded in 1783 by James Man as a barrel-making rm. It has
225 years of trading experience and 25 years in the HF industry. In recent years,
its agship fund AHL struggled due to its performance.

• Baupost Group (11) is an unconventional, successful HF. Baupost avoids leverage,

is biased toward long trades, holds an average of a third of its portfolio in cash and
charges only 1 percent fee.

• Winton Capital Management (13) has its roots in the quant fund AHL (founded
1987 and bought by Man Group in 1989). David Harding, like many in the quan-
titative trading eld with a math or physics education was also a pioneer in the
commodity trading adviser (CTA) eld. Winton is the biggest managed futures
rm in the world.

• Renaissance Technologies (15). The fameous mathematician Jim Simons founded

Renaissance Technologies. Simons became the pioneer of quantitative analysis in
the hedge fund industry. Renaissance mainly relies on scientists and mathemati-
cians to write its moneymaking algorithms. It has been consistently successful over
the years.

The largest loss a HF has suered was the USD 6 billion losses of Amaranth in 2006.
This loss, of around 65 percent of the fund's assets, was possible due to extensive leverage
and a wrongheaded bet on natural gas futures.

2.11.2.1 HF Strategies
An important selling argument for HFs is that their investment only weakly correlates
with traditional markets. Starting in 2000, correlation between MSCI World and the
112 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

broad DJ CS Hedge Fund Index (HF Index) changed on a two-year rolling basis: Cor-
relation was 0.16 (HF index) in the years 2000-2007 and jumped to 0.8 in 2007-2009
since a signicant number of HFs' managers started 2007 to invest traditionally in stocks
and commodities. Many HF use similar strategies as in factor investing. The main dif-
ference is transparency of the latter one, implementation of the factors as indices and
construction of a cross-asset oering of factors. This main advantages make it attractive
for investor to switch their investments from the more opaque and often more expansive
HF to a factor portfolio.

2.11.3 CTA Strategy

CTA strategies are managed futures strategies where the HF invests in highly liquid,
transparent, exchange-traded futures markets and foreign exchange markets.
12 Invest-
ments are made in dierent markets following a rule based investment strategy. The
predominant investment strategy is market-neutral trend following: There is no need for
any fundamental input nor for a forward looking market opinion. The portfolio construc-
tion is usually risk-weighted. Figure 2.29 shows the size evolution of the managed futures
industry.

Figure 2.29: Development of the managed futures industry. Data are from Barclay CTA
index (Gmür [2015]).

The gure shows the strong inow in 2009 after the GFC where managed futures

12 The abbreviation CTA means Commodity Trading Advisors which are heavily regulated in the US
by NFA / CFTC. Typically traded instrument are futures (and options) on equities, equities indices,
commodities, xed income such as spot, forwards, futures and options in FX asset class.
2.11. HEDGE FUNDS 113

were successful and other investments in HF faced heavy losses. The last 4 years show
stagnation in the growth of AuM. Many events in the recent past made trend following
dicult: Euro Sovereign Debt Crisis, Greece, China Crisis 2015, etc. The zig-zag be-
haviour of markets due to such events is the natural enemy for trend models since trend
reversal signals are 'too late'. The largest player as of end of 2017 with around USD 32 bn
is Wynton Capital, followed by MAN HL and Two Sigma Investments. Geographically,
the London area dominates followed by the US and Switzerland. In the last two decades
there has been a signicant shift from the US to London and other European countries.

2.11.4 Fees and Leverage

Most hedge funds charge annual fees of a xed percentage of AuM (1% − 2% of the NAV
per year) and an incentive fee that is a percentage of typically 20% of the fund's annual
net prot dened as the fund's total earnings above some minimum threshold such as
the LIBOR return and net of previous cumulative losses (high-water mark).

Are HF fees justied? Titman and Tiu (2011) document that on average HF in the
lowest R2 quartile charge 12 basis points more in management fees and 385 basis points
more in incentive fees compared to hedge funds in the highest quartile. Feng et al. (2013)
nd that management fees act similar as a call option at maturity, and that HF man-
agers can therefore increase the value of this option by increasing the volatility of their
investments. For CTAs one observes that very professional investors in CTAs prefer to
set the xed management fee to zero and instead to share even more than 20% of the
performance fee.

Fees are particularly opaque for double layer funds of funds, see Brown et al. (2004).
They nd that individual funds dominate funds of funds in terms of net-of-fee returns
and Sharpe ratios. The performance fee impacts compensation of HF managers or owner.
While top hedge fund managers can earn billions of USD in one year. This dominates
salaries of bluechip CEO by factors 10 to 30 times.

The fee discussion continues to damage the reputation of HF. California Public Em-
ployees' Retirement System (CalPERS) decided 2014 to divest itself of its entire USD 4
billion portfolio of HF.

Hedge funds often use leverage to boost returns. Since leverage increases both re-
turns and risks, it is most relevant for low volatility strategies. Besides return volatility,
illiquidity is another risk source for leveraged investments, i.e. the loans are linked to
margin calls. This can force HFs to shut down in a crisis when the HF is unable to cover
the large margin calls. Ang et al. (2011): ... hedge fund leverage decreased prior to the
start of the nancial crisis in 2007 and was at its lowest in early 2009 when the leverage
of investment banks was at its highest.

Leverage is not constant over time. Cao et al. (2013) nd that HF are able to adjust
114 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

their portfolios' market exposure as a function of market liquidity conditions. Several

pitfalls exist in the context of leverage. Consider the use of futures for CTAs. Suppose
that an investor invests USD100 with a margin of 10 but he desires a leveraged exposure
of USD 200 which requires a margin of USD 20. How much can the investor lose? In the
worst case USD 100 when there is a margin call which exceeds USD 80. If the investor
cannot comply with the margin call or if the investor is not able to pay the called amount,
the positions are closed and the loss of the investor is USD 100.

2.11.5 Withdrawing Restrictions, Fund Flows and Capital Formation

Getmansky et al. [2015] state various restriction for investors to withdraw money from
a hedge fund:

• a subscription process for investors,

• the capacity constraints of a given strategy,

• new investors are often forced into a one-year 'lockup' period during which they
cannot withdraw their funds,

• withdrawals that are subject to advanced notice,

• temporary restrictions on how much of an investor's capital can be redeemed in a

crisis.

Such restrictions protect against re-sale liquidations causing extreme losses for the
HF remaining investors. The discretionary right to impose withdraw gates can be very
costly for investors if the losses accumulate during the period where withdrawing is not
possible, see Ang and Bollen (2010). Several studies document a positive empirical rela-
tionship between fund ows and recent performance. HF investors seek positive returns
and ee from negative returns (Goetzmann et al. [2003], Baquero and Verbeek [2009],
and Getmansky et al. [2015]). The relationship between fund ows and investment per-
formance is often non-linear.
13

2.11.6 Biases, Entries and Exits

Hedge fund managers report, voluntarily, their returns to databases. They are free to stop
reporting at any time. Hence, a number of biases are possible in HF returns databases.

• Survivor bias and selection bias, i.e. there is a stronger reporting incentive if
returns are positive. This bias increases the average fund's return, ranging between
0.16% − 3%, see Ackermann et al. [1999], Liang [2000] and Amin and Kat [2003].

13 See Aragon, Liang, and Park (2013), Goetzmann et al. (2003), Baquero and Verbeek (2009), Teo
(2011) and Aragon and Qian (2010) report about some non-linear relations.
2.11. HEDGE FUNDS 115

• Backll bias. The primary motivation for disclosing return data is marketing. HF
start to report after they have been successful: They ll in their positive past
returns; the 'backll bias'. Fung and Hsieh (2000) estimate a backll bias of 1.4
percent p.a. for the Lipper TASS database (1994-1998). Malkiel and Saha (2005)
estimate that the return of HFs that backll is twice the return gure for those not
backlling.

Backlling means that part of the left tail loss return distribution are missing in HF
databases. Since large, well-known HFs do not need to engage in marketing by report-
ing to commercial databases also part of the right-hand return tail is missing in the
databases. We recall the ndings of Patton et al. (2013) in Section 2.4.5 about the
revision of previously reported returns.

Given these biases why do databases not correct in a transparent and standardized
way these biases when publishing their data? Figure 2.30 shows that impact if one
corrects for survivorship and backll biases annualized returns half.

Figure 2.30: Summary statistics for cross-sectionally averaged returns from the Lipper
TASS database from January 1996 through December 2014. The last value - box p-value -
represents the p-value of the Ljung-Box Q-statistics with three reported legs (Getmansky
et al. [2015]).

We consider entries and exits in HF. More than twice as many new funds entered
Jan 1996-Dec 2006 the Lipper TASS database each year, despite the high attrition rates.
This process reversed in the GFC period. After the peak number of new HF in 2007 -
2008, the attrition rate jumped to 21 percent, the average return was the lowest at −18.4
116 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

percent, and 71 percent of all hedge funds experienced negative performance.

The survival rates of hedge funds is estimated by several authors, see Horst and
Verbeek (2007). Summarizing, 30 − 50 percent of all HFs disappear within 30 months of
entry and 5 percent of all HFs last more than 10 years. These rates dier signicantly
for dierent stlyes, see Getmansky et al. (2004).

2.11.7 Investment Performance

We use the strategy categorization of the Lipper TASS database in 11 main groupings.14

2.11.7.1 Basic Performance Studies

There are several facts that limit the alpha of the HF industry. The number of HF
managers has increased from hundreds to more than 10, 000 in the last two decades.
Although the average fund manager today has higher technical skills than in the past,
it is becoming increasingly dicult for the individual manager to beat the HF market:
Take out the superstars, and you are left with an expensive, below-benchmark industry.
A second limitation is the increased eciency of some markets. The closer markets are
to the EMH, the less possible it is to predict future returns. Finally, an increasing size
of the fund typically lead to a weaker performance.

Asness (2014) plots the realized alpha of hedge funds over a period of 36 months. He
takes the monthly returns over cash, subtracts 37 percent for the S&P 500 excess return
- which is the full-period, long-term beta - and looks at the annualized average of this
realized alpha (see Figure 2.31).

We observe a decreasing alpha over time which ends up negative in the near past.
Recent years seem to have been particular. Unlike for mutual funds, a number of studies
document positive risk-adjusted returns in the HF industry before the GFC. Ibbotson et
al. (2011) report positive alphas in every year in the period 1995-2009. While the alphas
of the HF industry have been decreasing steadily in the last two decades, correlation
with broad stock market indices shows the opposite evolution.

The performance of HF is often linked to specic circumstances. Gao and Huang

(2014) report that hedge fund managers gain an informational advantage in securities
trading through their connections with political lobbyists. They nd that politically
connected hedge funds outperform non-connected funds by between 1.6 percent and 2.5
percent per month on their holdings of politically sensitive stocks as compared to their
less politically sensitive holdings.

14 Convertible Arbitrage, Dedicated Short Bias, Emerging Markets, Equity Market Neutral, Event
Driven, Fixed Income Arbitrage, Global Macro, Long/Short Equity Hedge, Managed Futures, Multi-
Strategy, and Fund of Funds.
2.11. HEDGE FUNDS 117

Figure 2.31: Average monthly returns (realized alpha) of the overall Credit Suisse Hedge
Fund Index and the HFRI Fund Weighted Composite Index for a rolling 36 months
(Asness [2014]).

2.11.7.2 Performance Persistence

There is mixed evidence regarding performance persistence.

• Agarwal and Naik (2000a), Chen (2007) and Bares et al. (2003) nd performance
persistence for short periods.

• Brown et al. (1999) and Edwards and Caglayan (2001) nd no evidence of perfor-
mance persistence.

• Fung et al. (2008) nd a positive alpha-path dependency. Given a fund has a
positive alpha, the probability that the fund will again show a positive alpha in
the next period is 28 percent. The probability for non-alpha fund is only half of
this value. The year-by-year alpha-transition probability for a positive-alpha fund
is always higher than that of a non-alpha fund.

While performance persistence is sought out by investors, excessive persistence is a

signal that something is wrong.
Figure 2.32 shows the extremely smooth return prole of Faireld Sentry compared
to the S&P 500. Faireld Sentry was the feeder fund to Mado Investment Securities.

We consider the performance of the CTAs Winton and Chesapeake. Starting with
USD 1 of investment in October 1997 until the January 2013 (Quantica [2015]), the rst
118 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Figure 2.32: Monthly return distribution for Faireld Sentry (line) and S&P 500 (dots)
returns (Ang [2013]).

CTA pays out around USD 9 ad the end of 2013 and the second one USD 18. Both CTAs
had positive return until the GFC. Then Chesapeake's volatility started to increase and
the positive past trend became essentially a at one. This behaviour is typical for other
CTAs too. For Winton, there is almost no suering of return during and after the GFC.
The reason is risk. Winton takes much less risk than Chesapeake. Why can a CTA strat-
egy work? Empirical evidence for the equity index market shows that skewness and the
Sharpe-ratio are highly positively related in equity markets: Investors are compensated
with excess returns for assuming excess skewness rather than excess volatility. Trend-
following strategies which oer positive risk-premia with positive skewed returns. Market
participants often belief that hedge funds are excessively using short strategies. This is
not the case for CTAs - around 80% of the investments are long-only strategies and 20%
use short strategies.

Figure 2.33 shows the attribution of the prot and loss to the dierent asset classes in
the last decade. During the GFC, CTA did not produced a positive return by huge short
positions in equity markets but by long positions in the trend model for xed income:
The decreasing rates in this period where a constant source of positive returns.

2.11.7.3 Timing Ability

Hedge funds are much less restricted compared to mutual funds to engage in several
forms of timing. This includes market timing, volatility timing, or liquidity timing. The
2.11. HEDGE FUNDS 119

Figure 2.33: Annual sector attribution of the prot and loss for the Quantica CTA
(Quantica [2015]).

study of Aragon and Martin (2012) gives evidence that HF successfully use derivatives
to prot from private information about stock fundamentals. Cao et al. (2013) nd that
HF managers increase (decrease) their portfolios' market exposure when equity market
liquidity is high (low), and that liquidity timing is most pronounced when market liquidity
is very low.

2.11.7.4 Luck and Skill

Criton and Scaillet (2014) apply the false discovery methodology to hedge funds. They
use a multi-factor model with time-varying alphas and betas. This means that they con-
sider dierent risk factors for the dierent asset classes. For equity, one risk factor is the
S&P500 minus the risk-free rate and for bonds one factor is represented by the monthly
change in the 10-year treasury constant maturity yield.

They consider equity long/short strategy, emerging markets, equity market neutral,
event driven, and global macro strategies. The main results are that the majority of
funds are still zero-alpha funds (ranging from 41% to 97% for dierent strategies) similar
to mutual funds. But there is a higher proportion of positive alpha funds compared to
mutual funds (0%−45%) and the proportion of negative-alpha funds ranges between 2.5%
and 18.6%. The highest skilled funds are emerging market strategies, followed by global
macro and equity long/short. The proportion of skilled or unskilled funds is dierent
for dierent market stress periods. But there is not an uniform decline of skilled funds
120 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

observed over the period from 1992 to 2006 as for mutual funds. This is some evidence
that successful mutual fund asset managers moved to the HF and/or that markets are
less ecient for HF strategies than for mutual fund ones.

2.11.7.5 Hedge Fund Styles

Hedge fund styles are highly dynamic and behave very dierently from those used by
mutual funds. Getmansky et al. (2015), see Figure 2.34, report correlations of monthly
average returns of hedge funds in each Lipper TASS style category.

• High correlation. Correlations between Event Driven and Convertible Arbitrage

categories are 0.77.
• Negative correlation. Correlations between Long/Short Equity Hedge and Dedi-
cated Short Bias are −0.74.
• Virtually no correlation. Managed Futures have no correlation with other categories
except for Global Macro.

Figure 2.34: Monthly correlations of the average returns of funds for the 10 main Lipper
TASS hedge fund categories in the Lipper TASS database from January 1996 through
December 2014. Correlations are color-coded with the highest correlations in blue, in-
termediate correlations in yellow, and the lowest correlations in red (Getmansky et al.
[2015]).

Getmansky et al. (2015) use a factor model based on PCA to gain more insight into
possible correlations. The size of the eigenvalues indicates that 79% of the strategies'
2.11. HEDGE FUNDS 121

volatility-equalized variances is explained by only three factors. This suggests that a

large fraction of hedge funds' returns are generated by a very small universe of uncorre-
lated strategies. The largest estimated eigenvalue takes the value 52.3%. The authors
simulate one million correlation matrices using IID Gaussian returns and they compute
the matrices' largest eigenvalues. The mean of this distribution is 13.51%, while the
minimum and maximum are 11.59% and 17.18%, respectively. These values are much
smaller than 52.3%. This is strong evidence that the dierent HF returns, although they
are claimed to be dierent in their styles and even unique, in fact are driven by few com-
mon factors. Since 79% of HF category returns are driven by three factors, the benets
of diversication are limited for HF.

The heterogeneity and commonality among HF styles is shown in Figure 2.35. Ded-

Figure 2.35: Summary statistics for the returns of the average fund in each Lipper TASS
style category and summary statistics for the corresponding CS-DJ Hedge Fund Index
from January 1996 through December 2014. Sharpe and Sortino ratios are adjusted for
the three-month US treasury bill rate. The 'All Single Manager Funds' category includes
the funds in all 10 main Lipper TASS categories and any other single-manager funds
present in the database (relatively few) while excluding funds of funds (Getmansky et al.
[2015]).

icated Short Bias underperformed all other categories. Multi-Strategy hedge funds out-
performed Funds of Funds, Managed Futures funds' returns appear roughly IID and
Gaussian. The returns of the average Convertible Arbitrage fund are auto-correlated
and have fat tails. The styles Long/Short Equity, Event Driven, and Emerging Markets
122 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

funds have high correlations with the S&P 500 total return index between 0.64 − 0.74.
Return volatility of the average Emerging Markets fund is three times greater than for
the average Fixed Income Arbitrage fund.

The CTA Quantica shows a low correlation with the traditional asset classes inclusive
10 − 15% correlations to the S&P 500, USD Gov
the global hedge fund index: betweeen
Bonds 3-5y and GSCI commodity index, 24% to the HFRX Global Hedge Fund Index and
68% to the Newedge CTA index. The large correlation with the CTA index indicates
that many CTA are using similar models - trend-following models which are broadly
diversied. Although CTAs show a persistent upwards drift in the long run (see Figure
2.36), they may well suer from temporary heavy losses.

Figure 2.36: Drawdown periods for S& P500 total return, GS commodity total return
index and Barclays US Managed Futures index BTOP 50. Data are from Dec 1986 to
Mar 2013 (Bloomberg).

It follows that the CTA index shows much less heavy drawdowns than an equity
or a commodity index. The main reason is discipline in investment. This has two
components. First, CTAs are fully rule based. If a stop-loss trigger is breached losses
are realized. Second, CTAs allocations are risk-based where again, the risk attribution is
carried out mechanically. CTAs therefore follow the investment advice of David Ricardo
written in The Great Metropolis 1838: Cut short your losses, and let your prots
run on.
2.12. AM INNOVATION - VIEWS ON DISRUPTION 123

2.12 AM Innovation - Views on Disruption

2.12.1 Replacement and Prices
At the root of disruption lies replacement. Existing successful goods and services are
replaced by new ones: Automobiles replacing rail transport, high speed rail replacing
short distance ights, word processing software replacing the typewriter, ultrasound re-
placing X-ray imaging, plastic replacing metals and wood, personal computers replacing
workstations and Wikipedia replacing traditional encyclopaedias.

Disruption is considered uncontrollable - unlike a transformation. After the nan-

cial crisis in 2008, digital disruption for the nancial intermediaris (FI) meant foremost
replacing the semi-automated value chains by automating ones. Cost reductions and
scalability were the drivers.

But digital disruption has a much broader meaning than eciency. Innovation and
the new entrants, FinTechs and the Tech Giants, can disrupt ownership of the FI both
on the production and the customer side. FinTech innovation is driven from an end-
customer perspective and customer will follow their business model. Hence FI have to
adopt this view too and leave their bank centric approach. Since FI can integrate or copy
the solutions of the many FinTechs they are not a real threat. But the few Tech Giants
are. They already have a broad customer base, a technological advantage and almost
unlimited resources. The Revised Payment Service Directive of the European Union,
eective 2018, is an example. It has disruption potential since it is an important step
towards an open nance system. That is a system where the end-costumers choose their
best products and services from a platform. The FI deliver their services and products to
the platform. Compared to the traditional business model in an open nance economy
the link end-costumer / FI is broken and the FI are in competition on the platform.
Clearly, in an open economy business becomes much less protable unless the FI owes
the platform. The above directive is a regulatory driver for Tech Giants to oer their
superior data analytic capabilities to end-customers.

Besides breaking the customer-FI link, new entrants can act disruptively by changing
the topology of the nancial architecture (Blockchain, cryptocurrencies, platforms). This
means to redene the market participants connections and to reallocate ownership rights
in the architecture. Their broad assumption is that the action space of FI in the value
chains can be largely reduced, sometimes even completely replaced. In fact, technology
is able to replace monopolistic or oligopolistic ownership of key centralized FI functions
by decentralized solutions based on game theory (aka blockchain). This denes the in-
frastructure channel of digital disruption in the nancial industry.

The internet revolution in the 90s revolutionized the ow of information by sending
information quickly and free of charge to many individuals. This is an ecient way
to copy and distribute information. But nancial intermediation is based on asset val-
124 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

ues and their distribution based on contracts. To revolutionize the existing generation
and ow of values, the internet solution approach is useless: Copying a USD 10 bill for
payment purposes is useless. In a digital value ow, someone has to validate that the
payer owes USD 10, that he has not promised it to anyone else and that the millions of
payments in the system are synchronized to prevent fraudulent actions. Formally, trans-
action feasibility, transaction legitimization and transaction consensus have to assured
for each transaction at each date. In the current nancial world, banks, central banks
and exchanges are oering and owning these functions: They provide a payment system
and they validate as third parties the transaction (consensus). Bitcoin based on the
mutually distributed ledger technology proved that complete digitized payment systems
and currencies are possible were code and mathematics replaces all functions of the FI
in at money banking. Whether the thousands of cryptocurrencies survive is not clear.
Each currency needs to reect an economic value and not just a believe of investors,
they need to be competitive (transaction fees, speed, security), ecological sound (energy
consumption) and address the monetary perspective (sti supply side).

The attack on the end-customer /FI link is dierent in nature. The iPhone method
of integration makes it possible to integrate all FI activities in such a unique device.
Furthermore, the methods to express and analyze customers needs will in the very near
future replace the capabilities and quality of any relationship advisor. This sets the stage
for an Open Finance system: They want to have a single access to an intelligent platform,
where they can decide in a user experienced way. The FI, if they do not owe the platform,
are reduced to product service providers and to running the accounts in the background.
The quality of the digital services will ultimately create a time and location independent
emotional relationship with the end-customers. At this stage denitively there will be
no further need for a human FI interaction. Some FI have proven to be able to adapt
quickly to a new environment and they use their powerful resources to act as a shaper.

2.12.2 Market Entrants

There are start-ups (FinTechs) and Tech giants entrants. Start-up rms produce goods
and services in a better way than traditional AM rms. 'Better' can mean cheaper, tailor-
made to the customers, scalable, interconnected with other needs of the customers or an
increased functionality or quality. Most FinTechs are mono-liners by oering a single
service or product, they have no client base and they innovate mostly from a end-client
perspective, i.e. at the end-client FI interface. This makes FinTechs vulnerable. The
main strategy observed is to enter into a cooperation contract with FI. But cooperations
are unlikely stable end states - the FI can break cooperation if they have caught up with
the technological advantage of the FinTechs and FinTechs themselves can leave if they
have access to customers.

The nancial crisis 2008 can be considered a starting point for digital disruption in
the nancial sector. Of the 248 surveyed European FinTechs in the Roland Berger (2016)
study, 15 were founded before 2008 and the rest after the nancial crises. Three triggers
2.12. AM INNOVATION - VIEWS ON DISRUPTION 125

cumulated in this period: The iPhone made it impossible to empower the end client,
FI had to spend many their resources to meet the regulatory avalanche, and FI had to
increase protability by lowering costs. The survey of McKinsey (2015) for the sample
of more than 120 000 FinTech start-ups states:

• Target clients: 62% of the start-ups target private customers, 28% SMEs and the
rest large enterprises.

• Function: Most start-ups work in the area of payment services (43%) followed by
loans (24%), investments (18%) and deposits (15%).

Even FinTechs consider Tech Giants to be more dangerous for FI as they are them-
selves (Roland Berger (2016). The Big Four Amazon, Apple, Google and Facebook are
examples for Tech Giant entrants in the nancial sector. Our Western-centric view is
to simplify the discussion. For each of the Big Four there is a comparable and equally
successful Chinese counterpart.
15 While the Big Four are less agile than FinTechs, their
almost unlimited resources, their strong client basis and their technological advantage
make them a real threat for FI.

Although it has long been speculated that the Big Four will enter the banking business
on a large scale so far this does not happen. Google has a banking license for Europe
since 2011, Facebook requested one but nothing happened so far. One can speculate
about the reasons: More protable alternatives, to heavy regulatory costs or business
risk such as the program AdWords Business Credit which was discontinued? Facebook
could oer banking services to its 1.5 billion users which are living in countries with a
non-stable political, nancial, social and legal system. Apple which is active with Apple
Pay could do a lot more. It is meaningful what disruption could mean. One scenario
is the the Big Four to become full FI. But they could also prefer to take over the end-
customer interface due to their superior technology and data analytics methods. This
latter model ts well into the so-called open nance paradigm where end-customers are
self-decision makers, they are connected to a platform or a cloud where data analytics
methods provide decision making services for portfolio management for example, where
the data of the customers in dierent FI are aggregated in the platform, where the best
FI is selected to deliver products and services once an end-customer made its decision.
Hence, FI become pure product providers and lose the interface to their clients. Since
2018, the Revised Payment Service Directive (PSD2) from the European Union points
in this direction and puts core banking functions under stress. PSD2 obliges banks
which are active in payments to reveal customer data to third parties if the customers
wishes to do so. Banks could then lose a main part of their value chain since the Tech
Giants could use their excellent analytics to provide services to the end-clints. Banks
will defend their value chain. Their main weapon is the existing payment infrastructure

15 JD.com, Renren, Baidu, Sina, Tencent and Alibaba are counterparts to the Big Four. Tencent for
example started with a market cap of USD 210 m at IPO in 2004 which is up to USD 233 bn according
to Bloomberg. The user base of their application WeChat grew from 50 million in 2011 to over 800
million in 2016.
126 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

which they build up such as IBAN and SWIFT. They will price costs to the new entrants.

The four main channels for disruption are:

• Eciency channel.

• Customer-centricity channel.

• Transaction values or verication channel, see Blockchain.

• Data channel, see Big Data.

Disruptive eciency is the classical view of nancial intermediaries: All banks are
looking at ways to cut costs and also generate more revenues. Ermotti (2016) . Digital
eciency has a dierent meaning than past automation based eciency which meant
to digitalize the information workow in a value chain to reduce human activity and to
reach scalability: Doing the same at lower costs, with fewer errors and using scalability.
Disruption means to redesign the workows and to eliminate humans to a before not seen
degree using Bots, avaters or smart contracts. They are not only digitized legal docu-
ments - say trade conrmations - but they also contain code which make it possible that
the documents manage themselves over the life cycle. Platforms are a second example
of disruption. While platforms since ever changed the connectivity of the participants,
present platforms possess two new features: Not only numerical information ows but
any form of information gained from structured and unstructered data and platforms are
using AI to analyze, manage, control and direct the information ow in the platform.

FinTech drive Customer-centricity by developing solutions starting with the end-

customer in mind. The customers will have more bargaining power and autonomy to
decide. They can decide, value and compare say any investment advice in much deeper
and intuitive way than in the past. The digital instruments also allow to take into account
the customer's environment or his context. Customers expect tailor-made, convenient,
aordable and integrated (smartphone) solutions for their asset and wealth management
purposes.

How will asset protection considered in a digital world where cyber criminals, govern-
ments and bank defaults dene risk sources? The FinTechs do not have the reputation
and size for protecting accounts, insurance contracts and security deposits. The Tech
Giants' reputation is decreasing, in particular in the US and Europe. In the FinTech
study conducted by Roland Berger (2016), the European FinTechs mentioned customer
trust to the nancial intermediaries as the only success factor for nancial intermediaries.
Their protection function works since decades. So far there is no strong alternative to FI
regarding safe keeping of money and nancial assets.

Customer-centricity also aects regulation. It changes the interactions between regu-

lation and innovation that have existed for decades: New regulation leads to innovations,
2.12. AM INNOVATION - VIEWS ON DISRUPTION 127

which in turn trigger regulation. This occurred on a structural (Glass-Steagall Act) or a

product level (Eurobonds). This cat-and-mouse game becomes less important due to the
customer centricity where the interaction nancial innovation/customer will dominate.
This challenges regulators. How should they behave in the spotlight of this new dynam-
ics and what is the best approach to any regulation? Evidently, in the new dynamic,
complex interaction cannot be eectively and eciently controlled with static, large reg-
ulatory frameworks as it was done in the past.

The WEF 2015 document The Future of Financial Services (2015) (FFS) summarizes
and extends the discussion. The paper identied 11 clusters of innovation in six functions
of nancial services, see Figure 2.37.
The approach of considering six independent intermediary functions and identifying
within these functions the eleven clusters is a silo business view. The clusters can be
grouped into six themes that cut across traditional functions:

• Streamlined Infrastructure: Emerging platforms and decentralised technologies pro-

vide new ways to aggregate and analyse information, improving connectivity and
reducing the marginal costs of accessing information and participating in nancial
activities.

• Automation of High-Value Activities ...

• Reduced Intermediation: Emerging innovations are streamlining or eliminating tra-

ditional institutions' role as intermediaries, and oering lower prices and / or
higher returns to customers.

• The Strategic Role of Data ...

• Niche, Specialised Products: New entrants with deep specialisations are creating
highly targeted products and services, increasing competition in these areas and
creating pressure for the traditional end-to-end nancial services model to unbundle.

• Customer Empowerment ...

2017 the FFS paper was reconsidered and updated. Some expected trends materialized
in the two years period while for others the expectations were revised due to lack of
demand, technological immaturity or regulatory considerations.

Two years later in 2017 the working group of the WEF published a status report.
The main ndings are:

• Fintechs have seized the initiative dening the direction, shape and pace of inno-
vation across almost every subsector of nancial services.

• Fintechs have reshaped customer expectations, setting new and higher bars for user
experience.
128 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Figure 2.37: The six functions (payments, market provisioning, investment management,
capital raising, deposits and lending, and insurance) and the 11 innovation clusters (new
market platforms, smarter & faster machines, cashless world, emerging payments rails, in-
surance disaggregation, connected insurance, alternative lending, shifting customer pref-
erences, crowd funding, process externalization, empowered investors) (The Future of
Financial Services [2015]).

• Failure: Customer willingness to switch away from incumbents has been overesti-
mated.

• Fintechs have struggled to create new infrastructure and establish new nancial
services ecosystems.

We close this section with sentiments of people about the digital disruption. On a
broad scale, two-thirds of 400 CEOs in the US surveyed by KPMG in 2016 believed that
the next three years will be more critical for the business performance of their companies
than the past 50 years. Additionally, Grossman (2016) states in a CEO survey, based
2.12. AM INNOVATION - VIEWS ON DISRUPTION 129

on Russel Reynolds Associates, that there are only three industry sectors, Health Care,
Asset Management and Industries, where less than 50% of the CEOs expect massive
or moderate digital disruption. For Media, Consumer Financial Services and Telecom
more than 60% expect such a scenario. 34 percent of all 4.000 Chief Information Ocer
respondents surveyed in more than 50 countries note that digital disruption is already a
reality in their companies, and further 28% say that this will happen in the next 1 to 2
years (Harvey Nash (2015)). Those responsible for information expect a much stronger
disruption for the services industry due to the lack of physical components than for the
processing, pharmaceutical or energy sector.

It is a scientic fact that nancial literacy of the population is at a low level. Hence,
any link to end-customers which is not based on a rational but an emotional paradigm
is likely to win the end-customers connection battle. Although in principle an emotional
link can be formed by using a human interaction with the end-customer this approach
is not scalable: A client advisor has in Europe between 100 and 400 clients to serve.
Therefore a digital link is a more promising solution. But how can a communication
between a human and a software generate an emotional basis? The software needs to
care and perform. This means to understand the customer's needs in his life cycle context
and to give meaningful advice. AI is pointing in this direction. If this is possible why
should end-customers bother that they do not communicate with a human?

2.12.3 Value Chain, Investment Process and Technology

Asset management is more than just investment theory. Roughly, by knowing an in-
vestment strategy we have not set up machinery that shows how the strategy can be
implemented, priced, sold and managed for many investors eciently, and we do not
know how to export our AM capacity to other cultures and jurisdictions in a compliant
and protable way. These issues dene the value chain of AM, see Figure 2.41 where the
production part of the value chain is shown.
The chain has two layers: the business and infrastructure level. The business layer
has the following main functions (see Figure 2.41):

• The front -, middle - and back oce.

• Product management.

• Solution providers.

The front oce consists of the distribution channel and the investment process. In
this part of the chain the investor's preferences, risk capacity, and the type of investment
delegation (execution-only, mandate, or advisory) are dened. All communication to end
clients is made via this channel - new solutions, performance, risk reporting, etc. The
investment process, headed by the CIO, starts with the investment view applied to the
admissible investment universe. The view is then implemented by portfolio managers
where dierent procedures can be followed. More precisely, the investment process has
the following sub-processes for mandate clients:
130 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Figure 2.38: Structure of the AM value chain.

• Investment view by the CIO.

• Tactical asset allocation (TAA) construction.

• Implementation of the TAA by asset managers.

• Matching of the eligible client portfolio to the implemented portfolios.

The middle oce is responsible for reporting and for controlling the client portfolio
with respect to suitability, appropriateness, performance, risk and it also constructs the
eligible client portfolio. The back oce is responsible for the execution and settlement
of the trades.

The product management denes for the investor an eligible, suitable and appropriate
oering. It is also responsible for overall governance, such as market access and regu-
latory requirements. The product management strategy tries to understand where the
market is headed, how this compares with current products, client segments served, and
rms' capabilities, and how competitors price their services in dierent channels. Product
managers anticipate the people, process, and technology requirements for the product.
They also assess gaps versus current capabilities and propose counter measures. A main
function is the new-product-approval (NPA) process oce. This oce guarantees both
an optimal time-to-market and an eective implementation of new products. Finally,
product management also oversees out- or insourcing opportunities in the business value
chain. The solution providers in the investment process provide the building blocks for
2.12. AM INNOVATION - VIEWS ON DISRUPTION 131

implementing the portfolios including funds, cash products, ETFs and derivatives.

The infrastructure layer naturally develops, maintains, and optimizes the IT infras-
tructure for the several functions of the business layer. The technology ocer oversees
the developments in technology and data management and considers the out- or insourc-
ing opportunities along the infrastructure value chain.

To deal with the digital disruption, many leading companies are looking at their
businesses and operations anew, taking something of a 'blank sheet of paper' view of
the world. Many outsource important parts of their back oces (NAV calculations, 'on-
boarding', investor statements, etc.), largely as a reaction to investor pressure following
the scandals, see Section 2.4.5. According to PwC's recently released Alternative Ad-
ministration Survey, 75 percent of alternatives fund managers currently outsource some
portion of their back oce to administrators and 90 percent of hedge funds behave in
this way. While the initial experience has been mixed in many respects, it has helped to
rethink business from scratch.

2.12.4 Some Innovations

Platforms are a synonym for technical connectivity. Novus is such a platform provider.
Dec 2017, almost 200 of the world's top investment managers and investors - managing
a combined total of approximately USD 3.5 trillion - are using Novus platform. At its
essence, Novus is a platform via which the industry's top investors can collectively inno-
vate. Novus aggregates funds' performance and position data. This denes a single point
of access for asset managers. Using this platform, almost all worldwide funds and their
performance are catalogued and analyzed based on an automated collection of regulatory
reporting data.

Externalization of processes is a key strategy for FI in the digital world. FFS classies
dierent innovations in process externalization:

• Advanced analytics. Using advanced computing power, algorithms and analytical

models to provide a level of sophistication for the solutions.

• Cloud computing to improve connectivity with and within institutions. This allows
for simpler data sharing, lowers implementation costs. streamlines the maintenance
of processes, and enables real-time processing.

• Natural language leading to more intuitive processes for end users.

Kensho as an example models investment scenarios for fully automatized decision-making.

The cost per generated scenario are much lower than those few manually generated sce-
narios. Using Kensho, institutions can shift their resources away from the management
of processes to functions with higher value and where the asset management rm has
comparative advantages. Kensho threatens the ability to model market projections and
132 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

hypotheses by quants of large nancial institutions by oering next-generation tools,

application, technology and data bases. Common models of the process externalisation
providers are:

• Platform, real-time databases or expert systems, leverage automation for the users
and the solution providers.

• As-a-service reduces infrastructure investments to a minimum level by externaliza-

tion.

• Capability sharing between institutions frees them to build up all possible capabil-
ities and allows integration of dierent legal and technical standards.

Process externalization means for the AM industry:

• AM rms use advanced technologies to externalize, consolidate, and commoditize

processes in a more ecient and sophisticated manner.

• Winning AM activities shift from process execution to more 'human' factors.

• External service providers give small and medium-size asset managers access to
sophisticated capabilities that were not previously attainable due to lack of scale.
This gives access to small and medium-size asset managers to top-tier processes
and smaller players are able to compete with large incumbents.

• Cross-border oering become protable with well controlled conduct and regulatory
risk due to the platforms. But it could also amplify the risks of non-compliant activ-
ities and unclear liabilities when centralized externalization providers fail. Automa-
tion also increases the speed at which nancial institutions implement regulatory
changes. Therefore, regulators will receive faster consistent inputs from nancial
institutions.

• Since more capabilities, technologies, and processes are externalized, asset manage-
ment rm becomes more dependent on third parties, lose negotiating power and
continuity.

The constantly evolving regulation across geographies means an increase of compliance

resources require solutions about regulation and its changes which is consistent within
and across dierent jurisdictions. New entrants are able to interpret regulatory changes
and translate them into rules. Such a rules based approach is scalable and allows asset
managers responding fast to regulatory changes, see Figure ??.

FundApps is such a FinTech rm. It organizes regulatory information from various

sources, and delivers a cloud-based service that automates shareholding disclosure and
monitors investment restrictions across over one-hundred regulatory regimes. FundApps
partners with a global legal service provider to monitor and translate changes in rele-
vant regulations into rules on a daily basis. If regulatory agencies partner rms such as
2.13. ESG INVESTING 133

FundApps in the future, they could ensure consistent compliance across nancial institu-
tions, make dissemination of regulatory changes in disclosure regimes faster, and reduce
the compliance burden faced by the industry. FFS.

2.13 ESG Investing

Investors more and more consider not only the risk-adjusted return but also the extra-
nancial criteria such as environment, social and governance (ESG) scores. In the past,
ESG investments were percived to underperform due to Sthe operational costs to achieve
corporate social responsibility (CSR) objectives: Research costs, transaction costs and
integration costs. But the work by Friede et al. (2015) synthesized more than 2000
studies concludes that ESG integration has a positive eect on business performance.

Environmental (E) Social (S) Corporate

Governance (G)
Board of Directors/Board Functions 8%
E1. GHG Emissions S1. CEO Pay Ratio G1. Board Diversity Board of Directors/Board Structure 24%
Board of Directors/Compensation Policy 13%
E2. Emissions Intensity S2. Gender Pay Ratio G2. Board
Independence Integration/Vision and Strategy 76%
E3. Energy Usage S3. Employee Turnover G3. Incentivized Pay Shareholders /Shareholder Rights 60%
Margins /Performance 7%
E4. Energy Intensity S4. Gender Diversity G4. Collective
Bargaining Profitability /Shareholder Loyalty 5%
E5. Energy Mix S5. Temporary Worker G5. Supplier Code of Revenue /Client Loyalty 2%
Ratio Conduct Emission Reduction 89%
E6. Water Usage S6. Non-Discrimination G6. Ethics & Anti- Product Innovation 79%
Corruption Resource Reduction 87%
E7. Environmental S7. Injury Rate G7. Data Privacy Customer /Product Responsibility 69%
Operations
Society /Community 48%
E8. Climate Oversight S8. Global Health & G8. ESG Reporting
Society /Human Rights 91%
/ Board Safety
Workforce /Diversity and Opportunity 78%
E9. Climate Oversight S9. Child & Forced G9. Disclosure
/ Management Labor Practices
Workforce /Employment Quality 31%
Workforce /Health & Safety 39%
E10. Climate Risk S10. Human Rights G10. External Workforce /Training and Development 73%
Mitigation Assurance

Figure 2.39: Left Panel: ESG Classication of Nasdaq. Right Panel: Scoring of Vadofone
using the Renitiv scoring system.

Nevertheless, the question of performance remains a controversial issue. Academic

ndings revealed a U-shape pricing of stocks in the equity market, meaning that both
best-in-class and worst-in-class ESG stocks have been rewarded by the equity market in
the past. These studies are based on long-term historical data.
16 But the tools for sys-
tematic and comprehensive extra-nancial analysis of listed companies are much younger.
Therefore, long time series can be misleading. Bennani et al. (2018) consider in their

16 Sources: Andersson et al. (2016), Global Sustainable Investment Alliance (2017), Van Duuren et al.
(2016), Hong and Kacperczyk (2009).
134 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

analysis data starting from the GFC; thereby avoiding the production of noisy and non-
robust results that do not reect the current behavior of how ESG is used nowadays.

They use the ESG metrics for each company provided by the Amundi ESG Research
department which are not public available but the scoring system depends on the data
of four external providers. The data are cleaned, normalized, checked by data analysts,
and the

nal sector-neutral score is reviewed and validated by ESG analysts.

They consider ve investment universes covered by MSCI indices North America,
EMU, Europe ex EMU, Japan and World and three quarterly rebalanced strategies from
Jan 2010 to Dec 2017: active management, passive management or optimized index
portfolios and factor investing. Standardization means to eliminate geographical or sector
biases. For each stock i, its corresponding ICB industry sector code is denoted by I(i)
and its score at time t is denoted by S(i, t). The Z-score is dened as:

S(i, t) − S̄(i, t)
Z(i, t) =
σ(i, t)

where the bar values are the average values of the sector.

Annualized return of ESG sorted portfolios

North America Eurozone

Figure 2.40: Annualized return of ESG sorted portfolios. Sorted portfolios following
Fama and French (1992) are constructed. Stocks are quarterly ranked with respect to
their score forming ve quintiles Q1 , with the equally weighted portfolio Q1 corresponding
to the 20 percent best-ranked stocks. Roncalli et al. (2018)
2.13. ESG INVESTING 135

A main result is hat the impact of ESG is highly dependent on the time period.
Before , the investment universe or the strategy. There is no evidence of a consistent
reward of ESG integration in stock prices between 2010 and 2013 although in each period
there is a variability between the ve regions, see Figure ??. But for 2014 and 2017 most
indicators are positive. In North America, buying the best-in-class stocks and selling the
worst- in-class generated an annualized excess return of 3.3 percent and 6.4 percent for
the eurozone. We refer for the relative impacts of the three factors E, S and G to the
paper.

For institutional investors which prefer to implement ESG passively by using opti-
mized tracking error between the benchmarking portfolios and a non-ESG based SAA. It
follows that improving the normalized ESG score implies to accept an increase in tracking
error. Being an ESG investor requires taking on a tracking error risk. This integration of
ESG in passive management reduced performance between 2010 and 2013 but improved
annualized return between 2014 and 2017.

The authors characterize the asset pricing implications in order to better identify
and understand the drivers of performance. Figure ?? shows four possible hypothesis
between the ESG score and return or risk, respectively.

Figure 2.41: Roncalli et al. (2018).

The market con

guration (b) is not observed in North America and the Eurozone whatever the score
used. But a skewed-risk market con
136 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

guration for the environmental pillar in North America is observed. Con

guration (a) for ESG score is observed in North America but the most frequent market
con
guration is (c): ESG investing has only an impact on best and worst-in-class assets.

We conclude with some market gures.

Region 2016 2018

Europe 12'040 14'075
US 8'723 11'995
Japan 474 2'180
Canada 1'086 1'699
AUS/NZ 516 734
Total 22'838 30'683

Table 2.23: Asset values in bn USD. Soure: Global Sustainable Investment Alliance.

Sustainable investments extend across the range of asset classes. The majority of 51
percent of the assets were allocated to public equities, followed by xed income with 36
percent. Real estate/property and private equity/venture capital each held 3 percent of
global sustainable investing assets.

2.14 Green Investing

The climate change is one of the major threats and opportunities in this century. We
rst provide an overview about some key facts of the climate change including its po-
tential impact on society, bio diversity and the economy. Dierent examples are given
where the interplay of nancial innovation and technological progress leads to ecological
and economic meaningful solutions. The goal is to design solutions which do not rely on
government's command & control.

Data show that humankind is facing a climate change which largely will be irre-
versible. Global temperature anomalies of the recent past compared to the 19511980
show that the last years were the warmest one, see Figure 2.42.
Energy demand will further increase due to population growth, progressing indus-
trialization and increasing wealthiness. This will without any countermeasures increase
human made CO2 emissions. But keeping in line with the 2 degree goal a drastic reduc-
tion of CO2 emission is needed: For most countries 30 percent is ok but better would be
50 percent.

As stated above, we do not focus on governmental law or laisser fair but we consider
cases where it pays economically to reduce CO2 emission and to invest in energy eciency.
The results depend on the following facts:
2.14. GREEN INVESTING 137

Figure 2.42: 2015 was the warmest year in the NASA/NOAA temperature record, which
starts in 1880. It has since been superseded by the following years (NASA/NOAA; 20
January 2016).
138 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

• There is enough clean energy, i.e. energy which can be used to replace CO2 emitting
energy.

• The technology to transport energy eciently exists.

• There exist nancial market solution which match investor's demand for sustainable
investment with the demand for energy project nance.

Figure 2.43 shows the impact of potential climate changes on dierent dimension of
humanity and the ecosystem.

Figure 2.43: Impact of potential climate change. Sources: Stern Review, IPCC, 4th
Assessment Report, Climate Change 2007, WWF and Credit Suisse 2011.

The increase in CO2 over the last 100 years has lead to a measurable change in cli-
mate. The data from researchers show that climatic change less related to risk but more
to uncertainty: There is lack of knowledge about the speed, the irreversibility, possible
feedback eects and some hidden non-linearities. The impact of the worst case scenarios
on GDP forecasts a drop between 5 and 20 percent. Estimates of Credit Suisse and
2.14. GREEN INVESTING 139

other institutions state that investment ows between USD 700 and 2000 billion per an-
num is required over the next decade to limit warming to 2 degrees Celsius.. This would
mean around 2 percent of global GDP per annum. The majority of the required cap-
ital investment is concentrated in low carbon energy, energy eciency, and low carbon
transport infrastructure. Low carbon energy is primarily linked to investment in renew-
ables, electricity infrastructure like grids and transmission and storage. The opportunity
is concentrated in China, the US and the EU27. They represent nearly 60 percent of
the mitigation cost. Figure 2.44 shows the distribution of the investments necessary to
achieve the 2 percent pathways. The matrix has the dimensions geographical location
and area of investment. The authora state dierent type of barriers aecting the current
decarbonization eorts. Since regulatory mechanisms do not exist yet which price the
externalities of carbon emissions technical and nancial barriers exist: The economics of
low carbon projects are often less attractive than those of their high carbon alternatives.
Structural barriers include network eects (consumer will not buy electric cars unless
there are workable and available charging solutions, but private investor hesitate to build
a charging network unless there is sucient demand), agency problems (the party making
low carbon investment is under existing structures often not the one which will benet
from the savings) or the status quo bias (strong bias towards maintaining the status quo
instead of making changes).

To highlight the challenges in setting up projects on a large scale we consider the

DESERTEC concept. The bottom line of the project was that a 300 times 300 kilo-
meter thermal solar energy plant in a desert is sucient to generate enough energy
to cover the world wide electricity demand. DESERTEC includes energy security and
climate protection as well as drinking water production, socio-economic development, se-
curity policy and international cooperation. DESERTEC was founded 2009 in Germany
and rapidly gained support from large cooperates and politics. The goal was to produce
energy in the Sahara for Europe. To some extend similar projects on much lower scale
have been implemented in the US and other countries. DESERTEC faced several risks.

• Political risk. Countries such as Algeria, Libya, Saudi Arabia given their natural
oil resources have no interest in a solar energy project. But is not the case; in fact
Saudi Arabia owes leading solar energy institutes. Furthermore, the solar energy
project will be benecial for employment and job creation in these countries to a
far larger extent and due to the excess solar energy these states will be able to
create new farming land.

• Why should these countries produce energy for Europe given their own need? This
point led also to heavy debates in Europe and to a standstill of the project.

• Another risk factor is political instability. The events starting in 2011 demon-
strate that this risk exists and that without the protection of an army the project
cannot be sustainable. But which army should guarantee the functioning of the
technology?
140 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Figure 2.44: Annual investment required to achieve 2 degrees Celsius pathway is USD
700 bn. Sources: Credit Suisse/WWF analysis based (2011) on McKinsey's Climate Desk
tool.
2.14. GREEN INVESTING 141

• A further political risk is that the middle and northern part of Europe will depend
on a single point of entry in the Mediterranean region.

• Technology and nancing risk. Natural damage risks and energy losses in
the transport are not material. But the need to construct a new powerful energy
infrastructure in Europe triggers delicate nancial issues.

Besides the DESERTEC example other examples show the risks of large scale environ-
mental projects. The Lisbon Strategy, adopted in 2000, largely failed on its three pillars,
where the environmental pillar recognized that economic growth must be decoupled
from the use of natural resources. The overly complex structure with multiple goals and
actions, an unclear division of responsibilities and tasks and a lack of political engage-
ment from the member states let to its failure and GFC was then the nal blow to the
strategy. At the Spring Summit 2010 EU leaders endorsed the European Commission's
proposal for a Europe 2020 strategy. This new strategy puts knowledge, innovation and
green growth at the heart of the EU's blueprint for competitiveness and proposes tighter
monitoring of national reform programmes, one of the greatest weaknesses of the Lisbon
Strategy.

Another example is water pollution in Switzerland in the 60s of last century. Den-
ing incentives and providing nancial support by the Swiss government a new industry
emerged (clarication plants) and treatment of farming land was changed. After some
decades water from Swiss lakes or rivers is often potable. In the US, acid rain led to the
implementation of the Clean Water Act which solved also the problem.

2.14.1 Green Bonds

When it comes to nancial innovation, many dierent initiatives were raised since the
1980s. But most of them failed to be of sustainable success while so-called green bonds
are proving to be the favourite wrapper.

Market-based solutions have proven consistently more eective in protecting the

environment than government regulation alone. Project nancing, public/private part-
nerships, and tradable permits have come to supplement or replace conventional regula-
tion and purely tax-based instruments. This approach can minimize the aggregate costs
of achieving environmental targets while providing dynamic incentives for the adoption
and diusion of greener technologies. The most practical solution for building a greener
economy is to correct faulty pricing by making consumers and rms pay for the envi-
ronmental damage they cause. Once these negative externalities are internalized,
they will be incorporated in the prices of goods and services, creating real incentives for
the creation and adoption of clean technologies. One of the most compelling examples of
using these principles to x broken markets is that of cap-and-trade pollution markets.
In such markets, the cap (or maximum amount of total pollution allowed) is usually set
by government. Businesses, factory plants, and other entities are given or sold permits
142 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

to emit some portion of the region's total amount. If an organization emits less than its
allotment, it can sell or trade its unused permits to other businesses that have exceeded
their limits. Entities can trade permits directly with each other, through brokers, or in
organized markets.

The rst green bond was issued 2007 by the European Investment Bank (EIB) and
World Bank. For more details see www.climatebonds.net where the following data are
taken from. In November 2013 the rst corporate green bond was issued by a Swedish
company. Tesla Energy issued the rst solar ABS in November 2013. The biggest ABS
issuer is Fannie Mae. ABS includes solar ABS, green MBS, green RMBS, green CMBS,
and other types. The green bond market 2018 issuance reached USD 167.3 bn with over
USD 500 bn currently outstanding.
Using debt capital markets to fund climate solutions
The majority of the green bonds issued are green 'use of proceeds' or asset-linked
bonds. The following products fall in the category green bond, taken from www.climatebonds.net.

• 'use of proceeds' bonds. Proceeds from these bonds are earmarked for green
projects. The same credit rating applies as issuer's other bonds. Barlays Green
Bond are an example.

• 'use of proceeds' Revenue Bond or ABS Earmarked for nance of green projects
Revenue streams from the issuers though fees, taxes etc are collateral for the debt.
The Hawaii State ABS is backed by fee on electricity bills of the state utilities.

• Project Bond. This are ring-fenced for the specic underlying green project. Re-
course is only to the project's assets and balance sheet. Example is the Invenergy
Wind Farm bond which is backed by Invenergy Campo Palomas wind farm.

• Securitisation (ABS) Bond. They renance portfolios of green projects or proceeds

and they are earmarked for green projects. Recourse is to the asset pool such as a
pool of green mortgages. Tesla Energy is for example backed by residential solar
leases.

• Covered Bond. They are earmarked for eligible projects included in the covered
pool. Recourse is to the issuer and to the collateral pool. The Berlin Hyp green
Pfandbrief is an example.

• Loan. A loan is not a security. Loans are earmarked for eligible projects and full
recourse to the borrower in the case of unsecured loans and in the case of covered
bonds to the collateral. Examples are MEP Werke, Ivanhoe Cambridge and Natixis
Assurances (DUO).

Benets for issuers outweigh their additional costs compared to non-green bonds since
issuers must track, monitor and report on use of proceeds. The benets for the issuers
are reputation, branding and build up of know about environmental investments.
2.14. GREEN INVESTING 143

Green bonds are at and the same as for ordinary bonds, i.e. they are pari pasu to
vanilla issuance. As an outlook, investors with $ 45 tr of assets under management have
made public 'commitments' to climate and responsible investment. This is around 50
percent of all AuM.

2.14.2 Energy Contracting and Structured Finance

We consider in some details how the tech and nancing aspects are designed for energy
contracting. Such contracts are dened between the following parties:

• Energy eciency searching institution. An institution - public entity, a corporate

- in our case wants to reduce energy costs in an existing building or a new project.
To be specic we consider a large city administration.

• Energy solution provider. A corporate provides the technology to realize the energy
cost gains. The energy solution should lead to a substantial reduction in energy
costs.

• Financial solution provider. A FI oers dierent possibilities to nance the project.

The nancial solution should reect the particular nancial and political needs of
the city administration.

As an overview the following gures hold as rough rules in case buildings are made energy
ecient. The data are from Siemens (2015).

Type of Optimization Energy Saving Amortization Period

Measure & Visualize ∼ 10% ∼ 1 − 2y
Optimizing Operations ∼ 10 − 25% ∼ 2 − 3y
Building Services Eengineering ∼ 25 − 35% ∼ 3 − 7y
Renew ∼ 30 − 45% ∼ 6 − 12y

Table 2.24: Rough gures for building energy eciency measures.

Measure and Visualize means that a rm makes transparent its energy consumption
at well-chosen location within the rm. Elevators are often used since most people in
an elevator search for a x point to focus on or the entrance lobby is also well suited.
It has by now been reported in several studies that simple transparency or monitoring
without any other actions leads to an approximate energy reduction of about 10 percent.
It seems that such a transparency changes behavior of some employees leading to this
reduction.

When it comes to nancing the project a major requirement is that the project also
makes economic sense. That is, we require Gain > 0. The gain is a sum of investment
costs I and the savings of energy costs over time. Saving of the energy costs has four
risk sources:
144 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

• Investment risk. The amount I = I¯ + dI is equal to the expected costs I¯ and

possible deviations dI .

• Volume risk. I.e. the amount of saved energy ct is given by

ct = c̄ + dct

with c̄ the expected amount of saved energy once the project is nished and dct
the risk of deviation from the expectation.

• Energy price risk. I.e. the price pt of saved energy (oil, electricity, a mixture of
them) is equal to
pt = p̄ + dpt
with p̄ the forward/futures prices and dp the deviation risk from the forward prices.

• The last risk is counter party risk of the energy solution user - here the city admin-
istration. Depending on type of nancing the project the counter party risk matter
for the investors or not, see below for details. We write default risk is in the form
u = 1 − dk with 1 for not-defaulting and dk for the expected default rate.

The gain of the project can be written symbolically - i.e. without using summation and
discounting notation, but focussing on the dierent parts in the gain function - as follow:


 ¯
I, expected investment costs;




 dI, investment risk;
c̄ × p̄, estimated savings (costs and volume);



Gain = dc × p̄, volume risk;
c̄ × dp,



 energy price risk;
dc × dp,

cross risk;



dk × c × p,

default risk.

This denes the risk prole for the city without any structuring of risk. Therefore, the
next question is: Who bears which risk? Professional technology provider keep the
investment and volume risk due to their experience and their large project portfolio.
That is variation in these two factors are absorbed in a large project portfolio. Consider
an investor. The investor is willing to pay the expected investment costs I¯ in exchange of
participating at the future energy saving. That is, the city and the investor share future
energy savings: The city participates with c̄× p̄×a and the investor with c̄× p̄×(1−a) at
future energy savings. This denes the performance contract. The function a denes
as function of time future participation. Since the investment has to be paid back to the
investor, he will participate stronger at the beginning than the city. Else, the payback
time increases. In this set-up the whole investment is risk free for the city. The only risk
which is not attributed is default risk of the city. Either it is passed and compensated to
the investor or the bank keeps this risk. This type is a structured product solution.
Other possible solutions are:
2.14. GREEN INVESTING 145

• City pays the project cash.

• City issues a bond.

• City issues a green bond.

• Bank issues a structured product (solution above).

• A special purpose vehicle is setup.

Before we consider some of these solutions we provide an example for the structured
product. Assume a project which payback time 4y. Then the amount of saved energy
c̄ = 25%. Assume that the project costs 100 in a currency, that a increases linearly from
10 to 40 percent, 1−a decreases linearly from 90 to 60 percent, that energy price risk
is ±2 percent per annum, that default risk of the city is 10 bps, that fees in structuring
the deal are 1 percent per annum and that interest rates are at at 2 percent.
Then,

• After 8y the whole energy savings belong to the city.

• After 5 years the investment amount is amortized, i.e. the years 6-8 generate return
for the investor.

• The return for the investor is in case of constant energy prices equal to 6.3 percent,
5.3 percent if energy price fall by 2 percent each year and 7.1 percent in monotone
increasing case. This return has to be corrected by the possible default of the city.
If the investor does not want to take this default risk, the returns are lowered by
the credit risk costs for the city.

Finally, if an investor wishes to get ride-o energy price risk the structuring delivers him
x energy prices or prices which are kept within a bandwidth.

From the other nancing possibilities we only mention the green bond. This bond
is issued by the city as an ordinary bond. The dierence to such a bond is the coupon
payment. The value of the coupon each year is determined by the price of the saved
energy amount, i.e. it is a coupon derived from the underlying value 'energy price ×
saved energy volume'.

Clearly such a construction requires strong legal and documentation work for and
between the dierent parties. Furthermore, more hazard issues exists: The energy solu-
tion provider can change an excessive price I for the investment to cover possible price
risk dI or the energy solution provider can predict biased low saved energy amounts to
reduce its energy volume risk. To avoid such potential disincentives, a simple solution
is let the energy rm itself invest into the project, i.e. to take a part of the investor's
stake. This then both reduces moral hazard related to the investment amount and also
to the expected energy volume savings since systematic deviations reduce the return of
investment.
146 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

2.15 Uniformity of Minds

Technology not only connects the dierent worldwide market places, it also allows infor-
mation to spread about any given local event, without delay, to the rest of the world.
This fact may also homogenize the way in which people think and make decisions in ge-
ographically and culturally dierent places. Is such an alignment of minds taking place,
and - if so - what are the possible consequences? We follow Bacchetta and van Wincoop
(2013) and Bacchetta et al. (2013), all of whom compare the GRF or 'Great Recession'
of 2008 with the global economic recession (Great Depression) of the 1930s.

2.15.1 The Great Depression and the Great Recession

Figure 2.45 compares the economic impact on the US and non-US economies during the
GFC and the Great Depression.

Figure 2.45: Comparing global GDP growth (pecent, annual, real) in the Great Recession
and Great Depression for the US and developed non-US countries (Bacchetta and van
Wincoop [2013]).

There was basically no dierence during the Great Recession between the GDP
growth in the US and that in the G20 states representing the main worldwide econ-
omy without the US. But in the Great Depression, the decline in US GDP growth did
not spread with comparable intensity to the rest of the world. This indicates that while
the Great Recession can be called a global crisis, the Great Depression was more local
in nature. The authors show that the Great Recession was, in historical terms, the rst
global recession. The rst question is: How could the crisis spread from the US nancial
2.15. UNIFORMITY OF MINDS 147

sector to the US real sector? The second question is: Why did the Great Recession
spread almost instantaneously from the US economy to the global economy - how did
the recession become a global one?

2.15.2 Uniformity of Minds

Bacchetta and van Wincoop (2013) show that standard macroeconomic approaches fail
to provide convincing explanations. Before we turn to the global issue, we reconsider
the US. First, one can consider - for the US - direct eects of the nancial sector on the
real economy. Examples of these direct eects include broken nancial intermediation
leading to a credit crunch or stock market declines leading to negative wealth eects.
While such explanations sound convincing, they are awed due to the main methodical
problem of the GFC not being exogenous but endogenous in the macroeconomic cycle.
That is to say, the impact of the nancial crisis is not a separated output variable acting
on the economy but is part of the whole economy and must therefore impact the real
economy.

As many authors have shown, the nancial crisis was part of the so-called boom -
bust cycle of the real economy. Of particular importance are real-estate boom - bust
cycles. Reinhart and Rogo (2008) illustrate the following pattern. Set T to be the date
of a banking crisis. Consider the growth rate of the real-estate asset class some years
before and after this date. One typically observes that before T prices increase and that
they fall after or shortly before the banking crisis. In this sense, a nancial crisis is part
of a boom - bust cycle. The surprising aspect of the most recent crisis was not that
it happened, but that such a crisis could be strong enough to destabilize the nancial
system of a developed economy (the US, here).

Given this US view, how could the recession become a global one? The standard
channel for explaining global linkages is trade. But the US is not a very open economy,
and imports - for many countries - to the US are relatively small. There is no empirical
evidence of a link between openness in terms of trade and a decline in growth. Hence,
the macroeconomic trade channel fails to provide an answer to the question of how the
recession spread globally. Another possible channel is the nancial channel. That is
to say, the decline is asset prices and real-estate prices and changes to the credit supply
channelled into the real economies outside of the US. But this hypothesis is not supported
by empirical evidence either. While real-estate prices dropped in, say, Spain and Ireland,
they did not in Germany or Switzerland. While Switzerland has a much stronger nan-
cial link to the US than do most European countries, the European countries were much
more aected by the Great Recession. While some countries faced a decline in credit
supply, others did not. Although policy makers have often used the expression credit
crunch', rms participating in surveys about the period have indicated that - during the
Great Recession - lower demand was more important to them than reduced credit supply.
Summarizing, standard macroeconomic models cannot explain the global recession.
148 CHAPTER 2. ASSET MANAGEMENT OVERVIEW

Bacchetta et al. (2013) argue that there must have been other drivers that caused
the global recession. They argue that it was not the globalization of the economy, as
considered above, but rather the globalization of how individuals form expectations that
was responsible for the recession spreading worldwide. This argument is, of course,
linked to questions of information technology, information transmission, and information
quality in worldwide terms. In contrast with the past, information today is spread almost
in real time around the world, it is more dicult to control information distribution, and
mainstream information is mostly costless to the consumer. Therefore, one can argue
that - given a nancial crisis and its related information ow - individuals around the
world had access to similar information sets upon which to form their expectations. The
authors claim that panic, by consumers and rms throughout the world, lead to declines
in aggregated demand in most countries. Such panic must show a systemic component
to have a worldwide impact. They assume therefore that such panic is rational or self-
fullling:

• Agents rst expect low future income due to the information available and uncer-
tainty at play at the beginning of the nancial crisis.

• This leads to low current consumption.

• This reduction in consumption lowers rms' current prots.

• This leads to low future production and income, which matches the agents expec-
tations as outlined in the rst step.
Chapter 3

Fundamentals Theory
3.1 Returns and Performance Attribution
Returns are key in asset management for the calculation of risk and performance. The
calculation of returns is not as straightforward as one might guess. One needs to cal-
culate returns for arbitrary complicated cash ow proles where cash can be injected
or withdrawn at dierent time dates. Dierent assets possess dierent time scales for
return calculations varying from intraday to months for illiquid assets. Returns often
need to be aggregated for risk calculations to reduce the dimensionality and risk models
are needed to value expected returns. Finally, the return for an investor can be the result
of several money managers, i.e. returns should be decomposable to account for dierent
contributors.

Why do we work with returns and not with prices? Price growth behaviour in price
time series are statistically hard to manipulate. The mean value has little meaning if
prices grow exponentially. One works with a scale free quantity; Returns. Why one
St −St−1
does work with log-returns? The simple return over a period,
St−1 ∈ [−1, ∞), is not
useful if one tries to model returns assuming a normal distribution since simple returns
range from −1 (total loss) to +∞ while the normal distribution ranges over the reals.
Furthermore, 10 days gross return is the product of ten one-days gross returns and not
the sum. But the product of normal distributions is not normal. One therefore prefers
to work with log-returns where the product of return aggregation is replaced by a sum
and the sum of log-normals is log-normal.

3.1.1 Time Value of Money (TVM)

Since returns compare cash ows (CF) at dierent dates, the time value of money matters:
CF at dierent dates cannot be added since the value of CHF 1 today is dierent from
CHF 1 at any other date. The microeconomic assumption of impatience rationalizes why
there is a time value of money. Consider consumption of the same good c at time t and

149
150 CHAPTER 3. FUNDAMENTALS THEORY

T > t. If investors prefer consumption earlier to later,

u(ct ) ≥ u(cT )

with the utility function u. To make the investor indierent, the consumption good at
time T must be larger than at time t, i.e.

u(ct ) = u(ct + ∆t,T ) =: u(ct (1 + Rt,T ))

with ∆ the interest and R the interest rate to compensate for impatience.

The function which weights CFs at dierent dates T > t is the discount function
D(t, T ). Discounting restores additivity which makes it possible to add CFs at dierent
dates. Any two complicated cash ow proles can be compared for investment purpose.
price of a product can be
Discounting is the necessary ingredient such that the
written as the probability and a time weighted sum of future cash ows.
Given the CF additivity, they can be mapped to a single point; the present value (PV)
and future value (FV). It is irrelevant which date is chosen for mapping of all CFs in
comparing two investment opportunities.

The discount function has the form D(t, T ) = D(T − t), i.e. the homogeneity of
time or the irrelevance of the vista time. The inverse operation of discounting is com-
pounding D(t, T )D(t, T )−1 = 1. If the discount factor is 1, interest rates are zero, if if
is larger than one, interest rates are negative.

Consider CHF 1 at time T and two scenarios. First, discount the CHF back directly to
t. Second, discount it rst back to a time s, t < s < T , and then from s to t. There is no
risk. The value at t of the Swiss franc should be independent of the chosen discounting
path. Else, by buying low and selling high generates a money machine (arbitrage in a
risk free environment). Formally,

D(t, s)D(s, T ) = D(t, T ), D(t, t) = 1 . (3.1)

Cauchy proved that the exponential function is the unique continuous function which
satises (3.1):
D(t, T ) = e−a(T −t) , a > 0 .
This motivates exponential discounting. a has the dimension inverse time and calculating
∂D
the growth rate of the discount factor, ∂T
D = −a, identies a with the interest rate R.
The discount function D(t, T ) for dierent maturity dates T denes the spot rate term
structure which we write {D(t, T )} := {D(t, T ), T ≥ 0}. Assume that there exists
interest rate risk. Then equation (3.1) makes no sense sind D(s, T ) is a random variable.
To restore the identity, we have to x the rate between s and T at time t, i.e. with the
discount factor D(t, s, T ) we again have that D(t, s)D(t, s, T ) = D(t, T ). This denes
the Forward Rate Term structure {D(s, t, T )}. Given one term structure, the other term
structure follows by no arbitrage.
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 151

• The forward curve is a function t → F (t, T ).

• The zero or discount curve is a function T → p(0, T )

• The par swap rate curve is vector of spot starting swap rates for all maturities.

Maturity Swap Rate Discount Factor Spot Rates Forward Rates

1y 4.50% 0.95638 4.5615% -
2y 4.95% 0.90647 5.0324% 5.55%
3y 5.39% 0.85158 5.5015% 6.57%
4y 5.57% 0.80151 5.6872% 6.28%
5y 5.68% 0.75409 5.8071% 6.31%

Table 3.1: To obtain the discount factors from the swap rate we use (3.5). To get the spot rates
1/T
from the discount factor we use R(0, T ) = D(0,t)
1
− 1 and the forward rates are calculated
D(0,T )T
as F (0, S, T ) = D(0,S)S
. The day-count factor reads act/360/100 =1/36'000*365=0.0101388.

The absence of arbitrage implies that there exists exactly one discount factor for
each currency and for each maturity; else build a money machine. But there are many
dierent interest rate, prot and loss and performance calculations. The reasons are:

• The method of compounding - do investors reinvest their proceeds in future periods

(compounding) or do they consume them (simple compounding)?

• Do we use market rates for discounting or synthetic rates from an asset management
perspective such as the yield-to-maturity (YtM) to value and compare dierent
investments?

• The calender and day-count-convention dier: The number of days within a year
varies for dierent countries, exchanges and products.

Examples

Compounding

Investing n years with compounding and simple compounding implies:

F Vnd = P V (1 + Rd )n , F Vns = P V (1 + nRs ) . (3.2)

Hence, F Vnd ≥ F Vns . The formulae can be generalized to the case with sub-annual peri-
ods and where R is not constant. The limit forward value is achieved for instantaneous
interest rates which results in the exponential compounding formula as a limit how fast
capital can grow.
152 CHAPTER 3. FUNDAMENTALS THEORY

Unique discount factor

Dc = e−Rc (T −t) , the discrete time Dd = (1 + Rd )T −t

The continuous discount factor
and the simple Ds = (1 + Rs (T − t))
−1 all have to attribute the same PV to a future

CHF 1. Therefore, Rc , Rd , Rs are in one-to-one relationship. Equating for example

F Vnd = F Vns implies Rd = (1 + nRs )1/n − 1. Continuous discounting:

P V c = D(0, 1)F V c = e−Rc F V c

FV

implies Rc = ln PV . If we consider short time periods (say daily return calculations),
FV

the logarithm can be approximated up to rst order by the gross simple return: ln PV ∼
FV
PV − 1 =: R.

Remarks:

• Interest rates are quoted on a p.a. basis.

• Typically, nominal interest rates are quoted in nancial markets.

• Simple discounting is used for LIBOR rates, products with maturity less than a
year, discrete compounding for bonds and continuous compounding for derivatives
or Treasury Bills.

The discount function is a simple function of the interest rate. But the interest rate
itself is a complicated function of a risk free rate, the creditworthiness of counter parties,
liquidity in the markets etc. The discount function construction is the key object in
nancial engineering.

Example Zero-Coupon Bond and Discount Factor

p(t, T ) the price of a zero-coupon bond (ZCB) at time t paying USD 100 at
Let
maturity T if there is no default. Except from counter party risk, a ZCB is the same as a
discount factor. ZCB are the most simple interest rate products. More complex products
such as coupon paying bonds can be written as a linear combination of ZCBs. Consider
a coupon bond with a yield R, i.e. the rate needed such that the PV of the bond is
equal to its present price is R. The slope of the price-yield graph is negative since a bond
issued today will have a lower price tomorrow if the interest rates increase (opportunity
1
loss). The relation is non-linear since p(t, T ) = D(t, T ) × 1 = (1+R(t,T ))T −1
× 1.

Example Eective rate of return and Yield-to-Maturity (YtM)

3.1. RETURNS AND PERFORMANCE ATTRIBUTION 153

The eective simple rate Re,s is the gross return needed to reach from the PV value
the FV value:
(1 + Re,s )P V := F V .
Consider Re,s for a n-year investment in a stock S (where PV=S0 , FV=Sn ):

n
Sn Sn Sn−1 S1 Y
1 + Re,s = := ... = (1 + Rk,k−1 )
S0 Sn−1 Sn−2 S0
k=0

where Rj,j−1 is the sub-period return. The eective, simple gross return is equal to the
product of the period returns. The compounded eective rate Re,d follows by taking a
square root n in the above formula. If compounding is continuous, the eective return is
equal to the arithmetic sum of period returns since the log of a product is a sum. This
is one reason why continuous compounding is preferred.

A particular decision problem for an investor is to choose between two bonds:

• Bond 1: Price 102, coupon 5%, maturity 5 years.

• Bond 2: Price 98, coupon 3%, maturity 5 years.

Bond 1 has more attractive future CFs but bond 2 is cheaper. Which one to prefer? If
maturity would increase then bond 1 should become more protable and the opposite
holds if the price of the bond 2 become more cheaper compared to the bond 1. The
yield-to-maturity (YtM) y is a decision criterion which assumes that products are kept
until maturity. The YtM y solves by denition the equation:

n
X c N
Price = + .
(1 + y)j (1 + y)n
j=1

The bond with the higher y is the preferred one. This equation can be solved easily
numerically. YtM, which has a at term structure, is the most important example of a
Money-Weighted Rate of Return (MWR), see below.

3.1.2 Interest Rate Swaps

Discount factors are derived from prices of liquid nancial instruments together with
mathematical interpolation for maturities where no observable asset prices exist. For
maturities up to one year, the money markets, futures or forward rate agreements (FRA)
are used. For longer maturities, the capital markets, bonds or swap rates are used. We
consider interest rate swaps (IRS) and FRA. Both instruments are OTC derivatives.
154 CHAPTER 3. FUNDAMENTALS THEORY

Vanilla interest rate swaps (IRS)

1 are bilateral contracts contracted over-the-counter
(OTC), i.e. not via a stock exchange where typically xed versus oating rates are ex-
changed. The reference rate for the oating leg is typically LIBOR or EURIBOR. The
notional amount is not exchanged. It serves only as a calculation gure. In USD or Euro
maturities range between 2-30 years. To enter such a contract an ISDA agreement and
counter party risk limits are needed. Minimum contract size in Swiss Francs is CHF 2
Mio.
2 The counter party paying the xed rate is the 'payer', the other one the 'receiver'.
The payer (receiver) is by convention long (short) the swap.

Originally, IRS were introduced for interest arbitrage reasons. Consider two rms
A and B. A has a high creditworthiness, B a low one. Both rms can borrow at a xed
or oating rate given in Table 3.2.

A B Dierence
Fixed 5% 6 % 1 %
Floating LIBOR LIBOR + 0.75 % 0.75 %

Table 3.2: Rates for rms A and B .

The two rms can both benet if they enter into a IRS, since the dierence in xed
rate borrowing diers from the oating rate one by 0.25%. Both parties can realize and
divide this amount using an IRS. To lock in the prot, each party borrows where it has
an advantage: A borrows xed and B oating. B agrees to pay A oating rate LIBOR
plus 0.75 percent and A agrees to pay B xed 5.9 percent. A gets oating rate fund-
ing at LIBOR minus 0.15 percent and B gets an advantage in xed funding of 0.1 percent.

The rst swap was designed 1981 between the World Bank and IBM, see Figure 3.1.
IBM received DM from its funding program and used this money to nance project in
the US. That for IBM needed to change periodically USD in DM to serve the coupon pay-
ments. Since USD became stronger in that period compared to DM, IBM made currency
gains. To realize these gains IBM needed to get ride of its DM-liabilities. The World
Bank borrowed in the capital markets and lent to developing countries for project -
nance. The costs of the loans were the same than the nancing cost of the World Bank
in the markets. US interest rates were at 17 percent in this period and in Germany and
Switzerland they were 12% and 8%. World Bank preferred raising all funds in lower in-
terest rate currencies. But it was constraint to borrow in these countries and had to use
also USD. It searched for party which owed DM and wanted to exchange them against
USD. An investment banker at Salomon Brothers realized that a currency swap would
1 The expression vanilla is used for basic or simple derivatives. Call and put options are examples.
More complicated products are called exotic.
2 For the xed payments the day count convention of bond markets 30/360 is used and act/360 is used
for the oating leg.'30/360' means that each month has 30 days and each has 360 days. The convention
'act' means that the actual calender dates are summed. The reset frequency is the frequency of oating
payments.
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 155

Bond Market
Pays
DM & CHF

Pay Coupons and Debt in Borrow USD

USD from the Swap
DM & CHF
Coupons & Nominal

World Bank IBM USD

FX Market
Exchange USD in DM & CHF
Pays DM & CHF Swap Pays USD Trade Income

USD
Coupons & Notional DM & CHF
Pay back of Loans Loans Coupons & Notional
DM & CHF DM & CHF
Clients Existing Loans
of DM & CHF
World Bank

Figure 3.1: Swap between the World Bank and IBM.

solve the problems of both parties: IBM could change their DM-liabilities into USD and
the World Bank could buy DM at favourable rates. The World Bank lent IBM over
notional amounts and coupons denominated in DM and received notional and coupons
in USD in exchange. Such a direct swap without involvement of the banks balance sheet
is a back-to-back swap.

Banks started in the 80's and 90's to enter into own-name transactions. The swap
counter parties discussed directly with the bank as intermediary their desired risk and
return prole. Entering in-between the two swap parties the bank faced counter party
risk. One also stared to develop standardized documentation documents which allowed
to process customized transaction eectively, the ISDA agreements. The third period was
characterized by beginning market making. Banks started to trade swaps with several
counter parties. Market and counter party risk increased due to this wider activities -
large investment in risk management followed. Market risk was often compensated with
transactions in other markets.

3.1.2.1 Swap Pricing

We x swap initiation date 0 and maturity date T. The xed rate at which the swap
can be executed is the constant par swap rate s0,T . This rate by denition sets the
value of the swap at initiation to zero, i.e.
PVSwap (0, s0,T ) =0
156 CHAPTER 3. FUNDAMENTALS THEORY

since at initiation no cash ows are exchanged. Fixed payments s0,T (0) are made annu-
ally,
3 oating ones quarterly.
4 Figure 3.2 shows replication of a swap into a par xed
bond and a oating rate note (FRN). We prove that the PV of FRN must be worth par
at each quarterly LIBOR reset date. Since the initial value of a swap is zero, the initial
value of the xed leg must also be worth par.
Solving for the swap rate and using PV(Float) = 1 − p(0, T )N we get

1 − p(0, T ) PV Floating
s0,T = = (3.3)
A0,T (0) Annuity
PT
where A0,T (0) = j=1 p(0, j) is the present value of an annuity and p(0, t) is the price
of a zero coupon bond; the level of the swap.
Proposition 13. The PV of a oating rate note is equal to the notional.
Since D(0, 0) = 1, the PV of the oating leg equals

PV(Float) = (1 − p(0, T ))N .

To prove the claim, we set Lj = L(tj−1 , tj ) for the LIBOR forward rate xed at tj−1
with payment at tj . At time 0 only L0 is
known. To replicate a random cash ow
Lj = L(tj−1 , tj ) we need one unit of a currency at time tj−1 , which can be invested at
the rate Lj such that we get in tj the payo 1 + Lj : Buy a zero coupon bond p(0, tj−1 )
and sell a zero coupon bond p(0, tj ). The balance of both bonds at tj is Lj . Replication
is accomplished. Consider the next cash ow Lj+1 . Then the short bond p(0, tj ) for the
former cash ow will enters as a long bond: The bond cancels. Considering a series of
cash ows Lj , the replicating bonds cancel but the last one. This proves the claim.
Consider a 2y FRN with reset date each 6m, notional 10 000 and given spot rates.
Setting the day count fraction to 1/2 , we get the values in Table 3.3:

Maturity Spot Rate Forward Rate Cash Flow FRN PV FRN

0.5 2.45% 2.45% 12.25 -
1 2.62% 2.76% 13.78 -
1.5 2.80% 3.08% 15.40 -
2 3% 3.45% 1017.37 -
- - - - 1000

Table 3.3: Valuation of a FRN. The forward rates are calculated using simple compounding
1+T ×R(0,T )
−1 F (0,S,T )
F (0, S, T ) = 1+S×R(0,S)
T −S . The FRN cash ows are derived from CF(T ) = 10 000 × 2 and
the PV follow from PV(CF(T )) = CF(T )
1+T ×R(0,T ) .

We close with some OTC market gures. Figure 3.3 shows notional and gross amounts
in OTC markets. Notional amounts are USD 600 tr which is 8 times worldwide GDP.
The gross amount is more than a factor 10 smaller. The markets cover OTC foreign
exchange, interest rate, equity, commodity and credit derivatives.
3 We assume that the dierence between two consecutive dates is equidistant.
4 They are equal to act/360 times the 3m LIBOR rate at the beginning of the quarter. This is called
setting in advance and paying in arrears .
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 157

Floating Cash
Flows

Swap

t0 t1 t2 t3

Fixed Cash Flows

1 1

Trick
t0 t1 t2 t3

1 1
Fixed Cash Flows

Fixed
Coupon
t0 t1 t2 t3
Bond

+ 1

Floating
Rate Note
t0 t1 t2 t3
(FRN)

Figure 3.2: Graphical representation of a payer swap replication (payer means the party which
pays the xed rate and obtains the oating one). Dotted lines represents oating cash ows.
Replication is obtained by virtually adding and subtracting notional amounts at the beginning
and maturity of the swap. We assume for simplicity the same periodicity for the oating and
xed leg. The gure shows an important property of risk structuring: To obtain the cash ow
prole of a new product one can add to an existing prole new products and add them vertically.
158 CHAPTER 3. FUNDAMENTALS THEORY

Global OTC Market Values, mn USD

Percentage Gross Market Values OTC Deriatives
800'000'000 40'000'000
Categories
Other derivatives
700'000'000 35'000'000 1%
Credit default
swaps
600'000'000 30'000'000 18%

500'000'000 25'000'000

400'000'000 20'000'000
Equity-linked
300'000'000 15'000'000 contracts…

200'000'000 10'000'000 Credit derivatives

18%
100'000'000 5'000'000

- -
01.06.1998
01.07.1999
01.08.2000
01.09.2001
01.10.2002
01.11.2003
01.12.2004
01.01.2006
01.02.2007
01.03.2008
01.04.2009
01.05.2010
01.06.2011
01.07.2012
01.08.2013
01.09.2014
01.10.2015
01.11.2016
01.12.2017

Commodity
contracts
16%
Notional amounts Gross market values

Figure 3.3: OTC market gures. The statistics on the country level is based on data reported
every six months by dealers in 12 jurisdictions (Australia, Canada, France, Germany, Italy, Japan,
the Netherlands, Spain, Sweden, Switzerland, the United Kingdom and the United States) plus
data reported every three years by dealers in more than 30 additional jurisdictions. (Source:
BIS, 2018)
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 159

The gross positive market values is the sum of the replacement values of all contracts
that are in a current gain position to the reporter at current market prices and similar
for gross negative market value. The gross positive market value is the sum of the two
absolute values. Gross means that there is no netting or osetting. Gross market values
supply information about the potential scale of market risk in derivatives transactions
and it is a measure of comparable economic signicance across markets and products.

3.1.3 Forward Rate Agreements

IRS can be considered as a sequence of forward rate agreements (FRAs): A FRA is
a swap with a single oating and single xed leg. Consider a client which would like to
obtain a loan of CHF 10 Mio. starting in 6m with 6m maturity. LIBOR spot L(6m, 6m)
is xed in 6m. The client believes that 6m LIBOR starting in 6m will be higher than
present 6m LIBOR. He would like to freeze the loan term on the actual interest rate
level, i.e. on the xed forward LIBOR rate K = F (0, 6m, 6m): He wants to swap the
oating rate L(6m, 6m) against the xed rate F (0, 6m, 6m). A FRA contract achieves
this client's need:

• At initiation time 0 no cash ows are exchanged.

• At time 12m the client has to pay CHF −10(1+L(6m, 6m)) Mio. without an FRA.
But the client would like to pay CHF −10(1 + F (0, 6m, 6m)) Mio.
• An FRA contract pays/receives an amount A in 6m and A(1 + F (6m, 6m)) in 12m
such that A balances the payments in 12m between the unwanted risky payment
without a FRA and the wanted xed payment: A solves in 12m the equation:
A(1 + L(6m, 6m)) − 10(1 + L(6m, 6m)) = −10(1 + F (0, 6m, 6m)) .
| {z } | {z } | {z }
Balance Without FRA Desired Payment

Solving for A, inserting the year-fraction α = Hedging

360
period
and K = L(0, 6m, 6m) we
get:
10α(L(6m, 6m) − K)
A= .
1 + αL(6m, 6m)
R(s, t) and forward rates F (s, t, u)
The no arbitrage relation between spot rates

183 182 365
1 + R(0, 6m) 1+K = 1 + R(0, 12m) (3.4)
360 360 360
implies
K = F (0, 6m, 6m) = 7.26%.
Returning to swap pricing and using the no arbitrage relationship between zero bonds
and forward rates we get

T
X p(0, j)
s0,T = wj L(0, Tj−1 , Tj ) , wj = .
A0,T (0)
j=1
160 CHAPTER 3. FUNDAMENTALS THEORY

The sum over all weights wj equals 1. This shows that a IRS is a weighted sum of
FRA's.

3.1.4 Constructing Discount Factors

We construct the discount function {D(0, T )} starting from the par swap rates {s0,T },
i.e. the mapping
{s0,T } → {D(0, T )}.
We start with a 1y par swap rate s0,1 and 6m LIBOR for the oating leg. Proposition
13 implies
N (D(0, 1) − 1) = N s0,1 (0)D(0, 1)α0,1
with α the day count convention. Solving for the rst discount factor:

1
D(0, 1) = .
1 + s0,1 α0,1

To obtain D(0, 2) we consider a 2y swap with swap par rate s0,2 . From

N (D(0, 1) − 1) = N s0,2 (D(0, 1)α0,1 + N D(0, 2)α1,2 )

D(0, 2) follows as a function of D(0, 1) (Bootstrapping, curve stripping). Solving,

1 − s0,2 α0,1 D(0, 1)

D(0, 2) = .
1 + s0,2 α1,2
An immediate recursion gives

t−1
P
1 − s0,T (0) αi−1,i D(0, i)
i=1
D(0, T ) = . (3.5)
1 + s0,T .αT −1,T

Table 3.1 shows how dierent rates are derived from the given swap rates. Using these
rates we price 5y swap with a notional of 50 Mio. in a given currency. Table 3.4
summarizes the oating leg pricing. The PV of the oating leg, see Proposition 13, is

−120 3950 159 = −500 0000 000(1 − 0.75409) .

We price the xed leg using 1% as an ad hoc xed rate.

The PV using 1% xed is 20 1350 015, hence the xed swap rate s follows:

PVFloating (0)
s=− = 5.806%.
PVx at 1% (0)

So far, we assumed that necessary input rates exist. What if there are holes, i.e.
times were no observable instrument exists? Then we have to to interpolate. Such a
construction should satisfy several requirements:
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 161

Floating Leg 1y 2y 3y 4y 5y
Rates 4.5615% 5.5518% 6.5750% 6.2827% 6.3128%
Cash ows -2'280'743 -2'775'921 -3'287'486 -3'141'361 -3'156'409
PV of cash ows -2'181'246 -2'516'291 -2'799'551 -2'517'843 -2'380'228
Fix Leg 1% 1y 2y 3y 4y 5y
Fix 1% 1% 1% 1% 1% 1%
Cash ows 500'000 500'000 500'000 500'000 500'000
PV of cash ows 478'188 453'235 425'789 400'757 377'047

Table 3.4: Floating leg pricing. Up to 1y spot rates are used for longer maturities forward rates
apply. Lower Panel: Pricing with a xed 1% rate.

• Liquid Mark-to-market. The value of a dollar at a future date should be deter-

mined by liquid securities. This minimizes the risk that cash ows, are misspecied.

• Stability. The constructed term structures should be stable when switching from
one structure to another one. Switching from a meaningful discount curve to a
forward curve should also provide a meaningful forward curve.

• Smoothness. Curves should not be ragged unless a sound economic explanation

exists.

• Consistency. Estimated term structures today should be consistent with the

dynamics of interest rate models. More precisely which parameterized families
used to estimate the forward rate curve are consistent with arbitrage free interest
rate models? We do not consider this issue and refer to Filipovic (2009).

Table 3.5 shows CHF money and capital market instruments.

There are several methods to nd a curve which interpolates observed data. Linear
interpolation leads to jagged curves. Suppose the zero rate curve is constructed linearly.
The forward rate is basically a derivative of the zero rate curve and hence the kinks
in linear interpolation lead to jumps.
5 Errors and kinks in the linear approximation of
the zero rates lead then to jagged forward rate curves. We need therefore higher order
polynomials. Approaches are the so-called B-splines or cubic splines, smoothing splines
and the exponential-polynomial approach (Nelson-Siegel, Svensson). The last approach
is used by most central banks. We consider an interpolation by 3rd order polynomials.
The spot rate R(0, t) is given by

R(0, t) = at3 + bt2 + ct + d

such that the curve matches known rates at specic dates tk = ky for k = 1, 2, 3, 4 with
values 4, 4.5, 5, 5.3 percent, respectively. We search for the intermediate rate R(0, 2.5y):
R(0, 2.5) = a(2.5)3 + b(2.5)2 + c(2.5) + d.
5 From the no arbitrage relation 1+f (0, S, S+∆) = p(0,S+∆)
, we get f (0, S, S+∆) = p(0,S+∆)−p(0,S)
∼
p(0,S) p(0,S)
∂p(0,S)
∂S
.
162 CHAPTER 3. FUNDAMENTALS THEORY

Period SARON Period LIBOR % Period Swiss Gov. Bonds

o/n -0.745
1m -0.817
3m -0.768
6m -0.748
12m -0.634
2y -0.896
3y -0.923
4y -0.992
5y -1.006
7y -0.892
8y -0.798
10y -0.654
20y -0.248
30y -0.012

Table 3.5: CHF interest rates as of July 2019. Note that all rates are negative. SARON
(Swiss Average Rate Overnight) is an overnight interest rates average referencing the
Swiss Franc interbank repo market. The data in the table are blended: if several
possibilities exist to construct the table, the most convenient instruments are used to ll
out the table. Source: Swiss National Bank.

That this unknown rate matches the 4 given ones is equivalent to a linear system:
 
1 1 1 1
8 4 2 1 
M x = y , x = (a, b, c, d)0 , y = (4%, 4.5%, 5%, 5.3%, )0 , M = 


 27 9 3 1 
64 16 4 1
where the matrix M has the time index powers as entries. Using the inverse matrix M −1
implies for x = (−0.00033, 0.002, 0.00133, 0.037) the rate
R(0, 2.5) = −0.00033(2.5)3 + 0.002(2.5)2 + 0.00133(2.5) + 0.037 = 4.762% .

3.1.5 Return Bookkeeping

Given a single return calculation, how are returns of portfolios calculated? We always
consider a nite economy: A nite number of dates 0, 1, 2, . . . , T , S0 , a risk-less asset
S0 , normalized to S0 (0) = 1, N risky assets Sj (t) ≥ 0, j = 1, . . . , N in all future states
and no frictions (tax, spreads, etc.).

The value or wealth process measures position times price (we often neglect the
superscript ψ ):
N
X
ψ
V (t) = ψ0 S0 (t) + ψj Sj (t) =: hψ(t), S(t)i , (3.6)
j=1
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 163

with ψ0 the amount invested in the risk-free asset, ψj the number of units of the risky
security j held in a period [t, t + 1) and hψ, Si the scalar product. The vector ψ(t) is a
portfolio or a strategy. Dividing (3.6) by the value leads to a normalized portfolio:

Denition 14. A normalized portfolio φ(t) is dened by

ψ0 (t)S0 (t) ψk (t)Sk (t)
φ0 (t) = ψ
, φk (t) = , k = 1, . . . , N . (3.7)
V (t) V ψ (t)

If all positions are positive, i.e. a long-only portfolio, and there is no leverage, then
the normalized weights are probabilities and add up to one.

Asset managers are interested in self-nancing portfolios. The change in value in

any period is due only to changes in asset values and not by an external money in- or
out-ow. Writing ∆Xt := Xt − Xt−1 , the change in portfolio value V = ψS with one
asset S reads in general:

∆Vt = (∆ψt )St + ψt ∆St .

In the rst term on the RHS a change in portfolio value between two dates is due to
external money added or withdrawn. Self-nancing rules out such strategies: (∆ψt )St =
0. We summarize some immediate facts:

Proposition 15. 1. The normalized portfolio components without leverage adds up

to 1.

2. The return of a portfolio is equal to the weighted sum of the portfolio constituent's
return:
N
X
Rφ = φj Rj =: hφ, Ri . (3.8)
j=1

3. If the portfolio is self-nancing, then

t X
X N
V (t) = V (0) + ψj (s)∆Sj (s) . (3.9)
s=1 j=1

The last fact states that the portfolio value at a future date is given by the sum of all
portfolio prot and loss over time. Each intermediate P&L is determined by the invest-
ment decision at the beginning of the period times the random P&L in the period. The
simple return of a portfolio is invariant of the size of the portfolios: Scaling the portfolio
value by a factor, the factor cancels out in the return calculation. Hence, without loss of
generality we set V (0) = 1.
164 CHAPTER 3. FUNDAMENTALS THEORY

φ
The proposition implies for the growth rate of wealth R[0,t] from 0 to t:

φ Vt Vt Vt−1 V1
1 + R[0,t] := = ... (3.10)
V0 Vt−1 Vt−2 V0
= (1 + Rφ (t))(1 + Rφ (t − 1)) . . . (1 + Rφ (1))
= (1 + hφ(t), R(t)i)(1 + hφ(t − 1), R(t − 1)i) . . . (1 + hφ(1), R(1)i),

i.e.
t
Y
Vt = V0 ((1 + hφ(s), R(s)i) .
s=1

Wealth growth follows from a geometric rate and not an arithmetic one.

3.1.6 Returns and Rebalancing

. We dene two basic investment strategies:
6

Denition 16. ψ(t) is a buy-and-hold (BH) or static portfolio if ψ(t) = ψ(0) for all
t ≥ 0.
ψ(t) is a constant rebalanced (RB) portfolio if ψ (t−)S (t) = c for all positions j
j j j
and all t with c given. t− denotes a prior time arbitrary close to t where the asset value
of the period is realized and the portfolio weight ψj (t − 1) chosen at t − 1 is changed to
ψj (t) such that the position value equals the predened position cj .

V which consists of two asset S and B where at each

We consider a portfolio value
date the weight of the S -asset is 60% of the total portfolio value. If φ represents the
number of shares S in the portfolio and ψ those of B , we have at time 0 (by abusing our
notation):

V0 = φ0 S0 + ψ0 B0 = 0.6V0 + 0.4V0 .
V0
To achieve the weights, the investor has to buy at time 0 φ0 = S0 × 0.6 of asset S
and similarly, for asset B. After one time step the absolute portfolio value before
rebalancing reads:

V1 = φ0 S1 + ψ0 B1 6= 0.6V1 + 0.4V1
where a change in portfolio value is entirely due to changes in asset values and not in
changing the positions (self-nancing investment strategy). Then the required values are
restored by rebalancing. It follows that the weight of the asset with a price increases is
reduced and vice versa for the other asset.

The market portfolio is a buy-and-hold portfolio. Rebalancing to constant wealth

levels keeps a constant dollar mix in the positions but not a constant risk mix. We only

6 Literature for this section: Hallerbach (2014), Blitz (2015), Hayley (2015), White (2015), Pal and
Wong (2013) and Quian (2014)
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 165

consider this type of rebalancing unless otherwise stated. The proportion on capital in
stock j just before rebalancing is given by

ψk (t)(1 + Rk (t + 1))
ψk (t + 1)− = N
.
P
ψj (t)(1 + Rj (t + 1))
j=1

the weights ψk (t + 1)− are the drifted weights. In a buy-and-hold portfolio drifted
weights equal rebalanced weights at each date.

We show below that rebalancing strategies take volatility into account. Buy-and-
hold strategies and market weighted strategie do not consider volatility. The volatility
drag expresses the dierence between expected geometric and arithmetic returns by the
volatility. Since wealth growth is geometric and rebalancing takes volatility into account,
such strategies are expected to outperform non-volatility-based strategies.

We write GM for the geometric mean and AM for the average arithmetic mean for T
periods:
T
!1/T
Y
GM = (1 + Rk ) −1 . (3.11)
k=1

Taking logarithm,
T
1X
log(1 + GM) = log(1 + Rk )
T
k=1

with Ri the return of the portfolio between time i−1 and i. Writing µ for the expected
mean return of the portfolio we get:

T
1X
E(log(1 + GM)) = E(log(1 + Rk ))
T
k=1
T
1 X
= E(log(1 + µ + Rk − µ)
T
k=1
T
(Ri − µ)2

1X Ri − µ
= log(1 + µ) + E −E + o(µ)
T 1+µ 2(1 + µ)2
k=1
σ2
= log(1 + µ) + 0 − + o(µ).
2(1 + µ)2

If µ is small log(1 + µ) = µ + o(µ), using the Neumann series in the portfolio volatility
term and approximating the log in GM implies the volatility drag equation:
σ2 σ2
E(GM ) = µ − + o(µ) = E(AM) − + o(µ) . (3.12)
2 2
166 CHAPTER 3. FUNDAMENTALS THEORY

The equation also holds if there exists no risk and for individual assets. The volatility
drag denes strategies to harvest volatility. Hence, strategies which take volatility into
account such as rebalancing strategies are expected to outperform pure buy-and-hold or
equal market weighted strategies.

3.1.7 Rebalancing Example

We illustrate rebalancing for the following indices: Swiss Market Index SMI, MSCI World
UCITS ETF, JPM Global Aggregate Bond Index, equal-weighted index of 1,600 hedge
funds, FTSE NAREIT All Equity REITs Index, gold dollar price and the S&P 500 Index
using weekly data from Sept 2 1993 to Mar 27 2015, see Figure 3.4. The top left panel

7 Indices Prices Performance of Investment Strategies

6.000 3.50

5.000 3.00

4.000 2.50

2.00
3.000

1.50
2.000
1.00
1.000
0.50
-
03.09.1993
03.09.1994
03.09.1995
03.09.1996
03.09.1997
03.09.1998
03.09.1999
03.09.2000
03.09.2001
03.09.2002
03.09.2003
03.09.2004
03.09.2005
03.09.2006
03.09.2007
03.09.2008
03.09.2009
03.09.2010
03.09.2011
03.09.2012
03.09.2013
03.09.2014

-
03.09.1993
03.09.1994
03.09.1995
03.09.1996
03.09.1997
03.09.1998
03.09.1999
03.09.2000
03.09.2001
03.09.2002
03.09.2003
03.09.2004
03.09.2005
03.09.2006
03.09.2007
03.09.2008
03.09.2009
03.09.2010
03.09.2011
03.09.2012
03.09.2013
03.09.2014
SMI MXWO SZG2TR JGAGGUSD
BCOMTR XAU Curncy SPX Rebalanced to EW Equal Weighted Buy-and-Hold Average TX Costs

Rebalanced Strategies of 7 Indices Performance of Investment Strategies

2.50 4.00

3.50
2.00
3.00
1.50 2.50

2.00
1.00
1.50
0.50 1.00

0.50
-
03.09.1993
03.09.1994
03.09.1995
03.09.1996
03.09.1997
03.09.1998
03.09.1999
03.09.2000
03.09.2001
03.09.2002
03.09.2003
03.09.2004
03.09.2005
03.09.2006
03.09.2007
03.09.2008
03.09.2009
03.09.2010
03.09.2011
03.09.2012
03.09.2013
03.09.2014

Figure 3.4: Rebalancing example for SMI, MSCI World UCITS ETF (MXWO), JPM
Global Aggregate Bond Index (SZG2TR), equal-weighted index 1,600 hedge funds (JAG-
GUSD), FTSE NAREIT All Equity REITs Index (BCOMT), gold dollar price (AU Cur-
rency) and the S&P 500 Index (SPX).

shows the index or price evolution. The dot.com and GFC crisis are visible. The bottom
left panels shows the rebalancing strategy. Basically, winners are sold and losers are
bought. This panel is a mirror image of the price chart. The panels on the right hand
side show performance of dierent investment strategies. On the top right, the rebal-
anced strategy and the equal weighted buy-and-hold strategy are shown. Both strategies
fail to provide protection if markets are under stress although the rebalancing strategy
suers from a lower shortfall. But it also cuts the upside potential which leads to overall
underperformance. The red line assumes transaction costs of 10 bps per rebalancing.
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 167

In the lower right panel, the rebalanced to EW strategy is compared with the inverse
volatility strategy IV and two momentum strategies. In the IV strategy, the rebalancing
update of the strategies is adjusted by the past volatility of the indices - the more volatile
an index was, the less weight it will have in the next period (negative leverage). With this
strategy the large market stress periods are neutralized but the strategy also annihilates
the growth potential. In the momentum approach, strategies are updated according to
whether a strategy belonged to the winner or looser strategy over the past month. More
precisely, the average last month return of all strategies are calculated. Each strategy
is compared to this average: If the performance is higher (lower) than the average, the
strategy is a winner (looser) one and the updated rebalancing strategy is updated by
adding/subtracting a constant number, respectively. The strategy is a long only strategy
which is atypical for momentum strategies which are implemented as long-short portfolios
(buy the winners, sell the losers). The momentum strategy shows boost and crash before
and during the GFC. These two eects typically are reinforced in a long-short set-up.

3.1.8 Rebalancing = Short Volatility Strategy

A short volatility strategy means that investors sell out-of-the-money call and put op-
tions. Since the price of an option is in 1:1 relation with the volatility, shorting a call
is the same as shorting volatility. Therefore, 'Rebalancig = Short Volatility Strategy'
means that the investor is eectively selling rewarded options by rebalancing which leads
to the additional growth rate. We follow Ang (2013).

Consider a single risky asset S and a risk-free bond that pays 10 percent each period
in a two-period binomial model. The stock starts with a value of 1 and can go up or
down in each period with the same probability of 50 percent (see the data in Figure 3.5).
If an up state is realized, the stock value doubles; otherwise the stock loses half of its
value.
Using these assumptions, wealth projections for the buy-and-hold strategy follow at
once. The value in the node 'up - up' - that is, 2.884 follows from

2.884 = 1.64(0.7317 × 2 + 0.2683 × 1.1),

where 1.64 is the wealth level of the former period node; 2 and 1.1 are the returns of the
risky asset (up) and the risk-free asset, respectively; and 0.7317 = 0.6 × 2/1.64 is the
holding in equity after the rst period. The rebalancing dynamics are calculated in the
same way but with xed proportions in the two assets.

The payos after period 2 show that rebalancing adds more value to the sideways
paths but less value to the extremes (up - up or down - down) compared to the buy-and-
hold strategy. This transforms the linear strategy of buy-and-hold - that is, payo is a
linear function of the stock value, in a non-linear way. Precisely, consider a European
call option with a strike value 3.676 at time 2 and a European put option with a strike
of 0.466. The option prices at date0 and date 1 follow from no-arbitrage pricing.
168 CHAPTER 3. FUNDAMENTALS THEORY

Stock dynamics Buy-and-hold wealth dynamics Rebalancing wealth dynamics

Figure 3.5: Rebalancing as a short volatility strategy in a binomial tree model. Left are
the risky asset's dynamics, in the middle are the wealth values if a buy-and-hold strategy
(60/40) is used, and right are the wealth levels for a rebalancing strategy to xed (60/40)
weights. Note that up and down is the same as down and up. Therefore, there are two
paths for the stock value after period 2, both with the result of 1 (Ang [2013]).

Consider the following two strategies:

• A rebalancing strategy.

• A short call + short put + long bond + long buy-and-hold strategy. The rst two
positions are the short volatility strategy.

A calculation - see Ang (2013) - shows that:

• Both strategies start with the same value 1 at time 0.

• Both strategies attain the same values in all 3 states at time 2.

Therefore the two strategies are identical. This shows that a short volatility strategy,
nanced by bonds and the buy-and-hold strategy, is the same as a rebalancing strategy.
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 169

Since volatility is a rebalancing means short volatility, the investor automatically earns
the volatility risk premium. The short volatility strategy makes the payo in the center
of the probability distribution larger at the costs of the extreme payos. Short volatility
or rebalancing underperforms buy-and-hold strategies if markets are either booming or
crashing, but it performs well if markets are showing time reversals.

3.1.9 Optimal Investment Strategy and Rebalancing

How is rebalancing related to optimal multi-period investment theory? Merton laid
the foundations in his works from 1969 and 1971 (Merton [1969, 1971]). The rational
agents optimize their lifetime expected utility of consumption by choosing their optimal
consumption path and optimal investment portfolio, see Section 4.3.2 for the model. The
work of Merton triggered a myriad of academic papers. These papers dier from one
another in many respects, including the innovation risk sources, the agents preferences,
information asymmetries. The main lessons learned from dynamic models is how risk is
best distributed over time and in the cross-section (diversication), while static models
only consider the latter one. Hence, dynamic investment matters for pension funds and
personal nance for example. For most models the optimal investment strategy φ(t) has
the same structural form.

φ(t) = Short-Term Weight + Opportunistic Weight (3.13)

The short-term weight is the myopic investment demand. The opportunistic weight
hedges demand against changing investment opportunity sets. This means that besides
the liquid asset's risk there are other risk sources. They are described with a state
variable Y. They can be correlated to the risky asset risk source. Examples are ination
or deation risk, house price risk, divorce risk or unemployment risk. This general rule
follows from the the 'Principle of Optimality' of R. Bellman, see Section 4.2.1.
We write (3.13) more explicitly in the case of a single risky asset for the fraction φ(t)
of wealth invested in this asset:

αt − rt
φ(t) = × RRA−1 + (1 − RRA−1 )∆Y × RIRA−1 (3.14)
σt2

where:

αt −rt
• The Market Price of Risk (MPR): MPR = σt2
.

−1
• RRA the inverse relative risk aversion - the investor's risk tolerance - where
00
RRA = − uu0(c)c
(c) with c consumption.

−1
• RIRA the inverse relative Y -risk aversion.

• ∆Y is the hedge of Y risk factor.

170 CHAPTER 3. FUNDAMENTALS THEORY

If the investment opportunity set is constant, the state variable Y is zero and also
∆Y = 0, then optimal investment is equal to myopic investment. The myopic com-
−1
ponent MPR × RRA is equal to a the optimal solution of a one-period model; it
maximizes the Sharpe ratio and it is mean-variance ecient. The opportunistic weight
represents the desire to hedge against future opportunity changes.

Denition 17. Equation (3.14) denes the theoretical TAA.

The myopic part corresponds to the one-period TAA used in practice, see Section
4.3.4.6.

We comment on the optimal strategy formula (3.14). First, the optimal investment
strategy is time-varying; buy-and-hold is not optimal from a dynamic investment point
of view. But there is nothing in the optimal formula which states that it is optimal to
rebalance to constant weights.

Second, while rebalancing is countercyclical the myopic part is pro-cyclical. If the

market price of risk increases due to larger returns and/or lower risk, more weight is
given to the respective asset. In a rebalancing to equal weights set-up you instead sell
the winners and invest in the looser asset.

Third, which of the two components in (3.13) is more important? In the extreme
case where returns are not predictable or stochastic opportunities are not changing over
time or the investor has a logarithmic utility function, then long-term investment is zero.
But in other less extreme cases the literature is ambiguous about the relative strength.
The result depends on size of the opportunistic weight which is driven by two factors:
predictability and investment opportunity. The closer asset returns are to predictability
and/or the less stochastic opportunity set variations matter, the less important is the
opportunity component.

Fourth, the MPR keeps its form for many assets but the division by the variance is
replaced by a multiplication with the inverse covariance matrix C −1 :

MPR = RRA−1 C −1 (µt − rt ). (3.15)

Comparing this with the solution of the Markowitz problem with no risk-free asst (4.3),
φ = 1θ C −1 µ, shows that the rst component of the optimal investment strategy (3.13)
denes a mean-variance ecient portfolio. This rationalizes the Sharpe ratio and
the Markowitz model to many period investing. Similarly, in the opportunistic weight
the covariance between liquid asset

Fifth, the inverse relative risk aversion measures the curvature of the utility function
−1
as a function of wealth: If the investor is risk neutral, RRA = 1. The more risk averse
−1
an investor is, the smaller RRA and the more is optimally invested in the risk-free
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 171

asset. The notion of relative risk aversion raises two delicate issues. First, there is a
calibration result by Rabin (2000) that shows that expected-utility theory is an utterly
implausible explanation for appreciable risk aversion over modest stakes. Second, the
measurement of RRA is, in itself, a delicate matter.

Six, the opportunistic weight consists of three dierent terms: First, if the investor is
−1
getting more risk averse, RRA decreases, then the myopic component in the optimal
portfolio becomes less important. Second, the aversion to innovation risk sources. Third,
a hedging demand against innovation risk. This is proportional to cov(R
e , R ) in (3.13) -
I
that is to say, the hedging demand follows from the correlation pattern of the innovation's
portfolio return with the overall portfolio return. Investors will increase their holding of
the risky asset given by the rst term if it covaries negatively with state variables, that
matter in the value function to the investor. A bond is such a hedge against falling
interest rates.
Seventh, if liabilities matter such as in goal based investment, then in both expres-
sions in (3.14) functions of time dierences f (T − t) enter where T is a realization time
of a liability. These functions take into account the 'way to go' eect. It is for example
optimal to take more risk given a positive drift if there is 5 years left to nance, given
an actual nancing degree, a liability than if only one month is left until maturity of the
liability.

Logarithmic utility facilitates calculations but is behavioral specic. Log investors al-
ways act optimally myopic (one-period view) independent of the dynamic context. Their
demand for hedging long-term risks is zero. To understand why, a log investor maximizes
log returns. Assuming normality of the returns, the log return over a long time horizon
is equal to the sum of one-step returns. Long-term return is therefore maximized if the
sum over the one-period returns is maximized which is the same that each one-period
return is maximal.

To see how the optimal investment formula fail to be applied in reality, consider the
Great Financial Crisis (GFC). Pick an investor with a relative risk aversion of 1, a normal
market return of 6% in stocks, a risk free rate of 2% and volatility of 18%. The investor
assumes that returns are IID; he is a myopic investor. From the optimal portfolio formula
0.06−0.02
(3.14): φ= 0.18 = 0.6. That means the investor holds 60% in equities and 40% in a
risk-less asset. In the GFC, volatility (both realized and implied one) increased to levels
around 70%. The optimal myopic formula implies φ = 0.04, i.e. a 4% equity position or
a reduction by 93% from the pre-crisis investment. But stock market participation was
not reduced by 93%. Since the average investor holds the market, he did not show the
same behavior as our theoretical.
We compare three well-known strategies with the myopic part of (3.14):

• Do nothing (buy-and-hold) [In (3.14) all parameters are constant];

• Buy falling stocks, sell rising ones (constant-mix 60/40 rebalancing strategies) [Con-
172 CHAPTER 3. FUNDAMENTALS THEORY

trarian view to the myopic part in (3.14)];

• Sell falling stocks, buy rising ones (portfolio insurance strategies) [In-line with the
myopic part of (3.14)].

We follow Perold and Sharpe (1988) and Dangel et al. (2015). They consider buy-and-
hold, constant mix (say 60/40 strategies), constant-proportion portfolio insurance and
option-based portfolio insurance. We start with the rst two strategies with a risky asset
S and a risk free asset B. In th payo diagrams value of the assets is a function of the
value of the stock and in th exposure diagrams the relation between dollars invested in
stocks to the total assets is calculated.

The payo diagram for the 60/40 rule is a straight line with a slope of 0.6, the max-
imum loss is 60% of the initial investment and the upside is unlimited, see Figure 3.6.
The exposure diagram is also a straight line in the space where the value of the assets
are related to the stock position. For a buy and hold strategy, the slope is 1 and the line
intersects the x-axis of the value of the assets at USD 40. If the portfolio is less than 40
USD, the demand to invest in the stock is zero, for the constant mix strategy there is
always a demand.

If there is no volatility in the market, either stocks rise or fall forever. Then the buy-
and-hold payo always dominates the constant mix portfolio. But with volatile markets,
the success of the strategy depends on the paths of asset prices, see volatility drag and
volatility harvesting. A constant mix portfolio tends to be s superior strategy if markets
show reversal behavior instead of trends.
This shows that the performance of rebalancing depends on the investment environ-
ment: dierent economic and nancial market periods lead to dierent results. Ang
(2013) compares the period 1926-1940 with the period 1990-2011. He compares buy-
and-hold investments in US equities and US Treasury bonds and pure investments in the
two asset classes with the rebalanced (60/40) investment portfolio in the two assets. The
countercyclical behavior of rebalancing smooths the individual asset returns. It leads to
much lower losses after the stock market crash in 1929 but it was not able to follow the
strong stock markets before the crash compared to the static strategy. The rebalancing
strategy also leads to much less volatile performance than the single asset or bond strat-
egy.

We consider portfolio insurance. Maximizing expected return with constant absolute

risk aversion implies that optimal static sharing rules are linear in the investment's pay-
o: It is optimal to hold a certain fraction of a risky investment rather than negotiating
contracts with nonlinear payos. This also holds in some dynamic models such as the
Merton model (1971). If investment opportunity sets are not changing, the proportions
of risky and risk-free assets are kept unchanged over time. But this requires portfolio
rebalancing: Buying/selling the risky asset when it decreases/increases in value and sell-
ing it with increasing prices - the constant mix strategy holds. Theoretically, with this
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 173

60/40 60/40
Buy-and-hold Buy-and-hold Zero volatility
Value of assets

Value of assets
60/40
Constant mix
40

Stock value Stock value

60/40
Volatile stocks
Buy-and-hold
60/40

Value of assets
Weight stocks

Constant mix
60/40
60/40
Buy-and-hold
Constant mix

40 Value of assets Stock value

Figure 3.6: Payo and exposure diagrams for constant mix and buy-and-hold strategies
(Adapted from Perold and Sharpe [1988]). The left panels shows the payo diagram for
the 60/40 buy and hold strategy and the exposure diagrams for the 60/40 strategy once
buy-and-hold or dynamic, that is assuming a constant mix. The upper right panel shows
the superiority of the buy-and-hold strategy when there are only trends and the lower
diagram shows that constant mix strategy can dominate the buy-and-hold one if there
is volatility depending on the stock asset path which is represented by the thickness of
the asset value line.

strategy investors invest in risky assets even in market stress situations. In practice, how-
ever, there is a strong demand for portfolio insurance since investors have a considerable
downside-risk aversion. Therefore, a rebalancing method 'opposite' to the constant mix
is required: selling stocks as they fall.

Returning to the tree alternatives the payos of the strategies are linear, concave or
convex. The last strategy is called convex since the paoy function is increasing with
an increasing rate if the stock values increase. Hence, rebalancing has an impact on the
payo. Concave strategies, such as the constant mix strategies, are the mirror image of
convex strategies such as portfolio insurance. The buyer of one strategy is also the seller
of the other one.

Summarizing, buying stocks as they fall leads to concave payos. These are good
strategies in market with no clear trend since the principle 'buy low, sell high' applies.
In markets under stress, losses are aggravated since more and more assets are bought.
174 CHAPTER 3. FUNDAMENTALS THEORY

The convex payo of portfolio insurance strategies limits the losses in stressed markets
while keeping he upside intact. But if markets oscillate, their performance is poor.

There are many ways to construct convex payos:

• Stop-loss strategies. The investor sets a minimum wealth target or oor that must
be exceeded by the portfolio value at the investment horizon. This strategy is
simple but once the loss is triggered the portfolio will no longer be invested in the
risky asset and hence participation in a risky asset recovery is not possible.

• In the option based approach one buys a protective put option. While simple,
this strategy has several drawbacks. First, it act against many investor's behavior
that one should buy portfolio insurance when it is cheap - stock markets boom.
Second, buying an option at the money is expansive compared to the expected
risky asset return and since one has to roll the strategy costs multiple. Therefore,
such option based strategies are often used in long-short combinations (buying
out-of-the-money put and sell an out-of-the-money call).

• Constant Proportion Portfolio Insurance (CPPI). This strategy is a simpler version

of the protective put strategy.

3.1.10 Stochastic Portfolio Theory (SPT)

Stochastic portfolio theory (SPT) was introduced by Fernholz (2002). Its goal is to con-
struct investment strategies that outperform a certain reference portfolio such as the
market portfolio. We work in discrete time and follow Pal and Wong (2016). We have
seen that rebalancing can but must not outperform a benchmark return. SPT make
precise under which conditions such a outperformance is possible. This means SPT gives
a precise and model independent meaning for example to volatility harvesting. The
conditions are variants of the Fernholz master equation. The original equation depends
on the assumed dynamics of the underlying asset prices. Pal and Wong showed in the
discrete time set-up that the master equation does not depend on the asset dynamics.
It allows a return decomposition for each path without relying to the asset stochastic law.
7

We start with a classic example with two assets. Asset 1 earns −50% return for all
odd periods and 100% return for all even periods. Asset 2 is a risk-free asset whose
return is always 0%. Table 3.6 shows dierent portfolio returns for a buy-and-hold (BH)
portfolio and a rebalanced portfolio (RB) to equal weights in each period. Initial wealth
is USD 1.
Investing the dollar BH in either of the two assets leads to zero growth of wealth. Re-
balancing to equal weights in each period leads to a portfolio growth of 0.75×1.5 = 1.125.
Systematic rebalancing is capable of capturing prot `from volatility even when the under-
lying assets experience zero growth. If we extend the model to many periods a sequence

7 Recently, Schied et a. (2018) showed that also in continuous time a dynamic independent master
equation is possible.
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 175

Period BH 1 BH 2 RB 1+2 RB 1 RB 2
0 1 1 1 0.5 0.5
1- 1 0.5 0.75 0.25 0.5
1+ 1 0.5 0.75 0.375 0.375
2 1 1 1.125 0.756 0.375
Total Return 0% 0% 12.5% +51.5% -25%

Table 3.6: Buy-and-hold versus equal-weight reabalanced portfolios. 1− means the time
before rebalancing and 1+ is the portfolio and position value after rebalancing.

of alternating return products 0.75 × 1.5 = 1.125 determines excess return of RB relative
to BH. The order of the return products does not matter but the number of such pairs
of of matchings. If we can form N pairs, the growth is boosted as (1.125)N .

To formalize the discussion, consider a binomial tree with two risky asset prices S1 , S2 .
Then
S1 (t)
X(t) = log
S2 (t)
is a measure of relative price.
8 We set ∆X(t) := ±σ where σ 2 is instantaneous variance
of relative prices. For a strategy φ = (φ1 , φ2 ),

V φ (t)

φ S1 (t) + S2 (t) S2 (0)
W (t) := =
S2 (t)/S2 (0) S1 (0) + S2 (0) S2 (t)

is the value of the portfolio φ relative to asset 2. It satises the dynamics (using the
same arguments as for (3.20)):

W φ (t + 1)
= 1 + φ1 (t) e∆X(t) − 1 =: A(t) .
W φ (t)
Iterating this equation implies

t−1
Y
φ
W (t) = A(s).
s=0

Assume that φ is constant, then a volatility matching pair of up and down moves gener-
ates the growth factor which is larger than 1, i.e. A(s)A(s − 1) > 1. It is maximal for
the equal weighted portfolio φ = 0.5. The recursive form of wealth implies that wealth
growth of the constant weighted portfolio dominates the benchmark growth rate of asset
2 if the number of matching pair dominates the moves in the price paths which do not
match. In a perfect zig-zag price path all moves match, in a monotone increasing or
decreasing path there is no matching at all and volatility harvesting is a loser strategy.

8 The log follows from the standard representation of up and down moves in the binomial tree.
176 CHAPTER 3. FUNDAMENTALS THEORY

We use these ideas to develop the formal set-up of SPT starting with some notations.
The market weights µj (t), where Xj (t) is the market capitalization at time t of asset
j, read:
Xj (t)
µj (t) := N
. (3.16)
P
Xk (t)
k=1

Investing in each period according to the market weights, the investment portfolio is the
market portfolio V µ. The temporal update of the market weights, if only asset returns
lead to capital changes, reads

µj (t)(1 + Rj (t + 1))
µj (t + 1) = N
. (3.17)
P
µk (t)(1 + Rk (t + 1))
k=1

and: P
µ j Xj (t)
V (t) = P . (3.18)
j Xj (0)

Let Vµ be the market portfolio value and Vφ any other portfolio value. We dene the
relative portfolio
V φ (t)
V φ/µ (t) := . (3.19)
V µ(t)
The time evolution of the relative portfolio depends only on the market weights for all
t > 0:
N
V φ/µ (t + 1) X µk (t + 1)
φ/µ
= φk (t) (3.20)
V (t) µk (t)
k=1

and V φ/µ (0) = 1.

After these preparations, we write

∆ log V φ/µ (t) (3.21)

N
!
X µk (t + 1)
= log φk (t)
µk (t)
k=1
N N N
!
X µk (t + 1) X µk (t + 1) X µk (t + 1)
= φk (t) log + log φk (t) − φk (t) log
µk (t) µk (t) µk (t)
k=1 k=1 k=1
N
X µk (t + 1)
=: φk (t) log + γ φ/µ (t).
µk (t)
k=1

The expression
γ φ/µis always non-negative by Jensen's inequality and it strictly positive
µk (t+1)
if Y := log µk (t) is not constant, i.e. if there is temporal volatility which we assume
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 177

t
to hold true.
9 We write Γφ/µ (t) =
P
γ φ/µ (s) for the cumulated excess growth rate.
s=0
Since γ φ/µ > 0, Γ → ∞ if time goes to innity, Gamma is called the energy term.
The second transformation for the log return in (3.21) is to rewrite the rst term
using relative entropy:

N N X N
X µk (t + 1) X µk (t + 1) φk (t)
φk (t) log = φk (t) log − φk (t) log .
µk (t) φk (t) µk (t)
k=1 k=1 k=1
P
With the relative entropy notation S(p, q) = j pj log(pj /qj ) for two probability distri-
butions we get:

N
X µk (t + 1)
φk (t) log = S(φ(t), µ(t + 1)) − S(φ(t), µ(t)).
µk (t)
k=1

Summarizing, relative log return for any strategy can be decomposed into:

∆ log V φ/µ (t) = γ φ/µ (t) + S(φ(t), µ(t + 1)) − S(φ(t), µ(t)). (3.22)

If rebalancing takes place to constant weights, then φ is constant and the decomposition
reads:

∆ log V φ/µ (t) = γ φ/µ (t) + S(φ, µ(0)) − S(φ, µ(t)). (3.23)

Figure 3.7 shows the decomposition of a log portfolio value, rebalanced to constant
weights, into its energy and entropy decomposition. Gamma measures the amount of
market volatility captured by the portfolio - the number of matched factors in the in-
troductionary examples. The relative entropy term measures how much the relative
performance deviates from Gamma. This term depends only on the initial and current
positions of the market weight vector, i.e. how the change in capital distribution aects
the performance of the portfolio. There is no volatility eect. The uctuations of the log
return are dominated in the short run by the entropy part and long term growth comes
from the cumulated excess growth rate.

For constant weighted portfolios the following theorem characterizes log return growth.

Theorem 18. Consider a constant weighted strategy φ. Assume that the market returns
µ(t) for all t are element of a compact set K and that Γ(t) is increasing to innity for t
to innity. Then, portfolio value V φ/µ (t) also tends to innity.
The statement is a pathwise one and free of any stochastic modeling assumptions.
Long term outperformance follows whenever the two path properties are satised. The
validity of these two conditions can be evaluated by a portfolio manager at each date.
The authors extend Pam and Wong extend the discussion to non-constant rebalancing
strategies.

9 γ φ/µ = log Eπ/µ (eY −Eπ/µ (Y ) ) ≥ 0 follows by using the denition of Y and of log. A Taylor approxi-
mation shows that Gamma is proportional to an excess growth rate.
178 CHAPTER 3. FUNDAMENTALS THEORY

Log V

Entropy

Figure 3.7: Decomposition of a constant weight rebalanced portfolio in the energy and
entropy paths.

3.1.11 Return Attribution

The attribution of returns is key in AM to understand the return drivers. We consider the
Arithmetical Relative Return (ARR). It is dened as the dierence between a portfolio
return RV and a benchmark return Rb
X
ARR = RV − Rb = (φj RjV − bj Rjb ) (3.24)
j

with b the benchmark portfolio weights. Figure 3.8 shows how this return dierence can
be split into three dierent rectangles for each j:

ARRj = 1 + 2 + 3 = (φj − bj )Rjb + (RjV − Rjb )bj + (φj − bj )(RjV − Rjb ) . (3.25)
| {z } | {z } | {z }
=:A =:S =:I

A, a dierence arising in portfolio weights, represents the tactical asset allocation

(TAA) (the Brinson-Hood-Beebower (BHB) eect), S the stock selection eect and I
the interaction eect. BHB invented this decomposition in 1986. It is often used as a
starting point in performance attribution.

Figure 3.9 shows the performance attribution tree for the MSCI World ESG Quality
Index. The total return RT can be written in the form RT = RT − Rb + Rb = ARR + Rb .
Since fees are not available, total return is a gross return. The gure shows that the
ARR has several levels.
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 179

Benchmark
Weight

Portfolio

1 3

jj
jjb 2

Rjb Return
Rj

Figure 3.8: Arithmetic return decomposition. Source: Adapted from Marty (2015)

The ARR is rst decomposed in asset classes. Then the asset class equity is further
decomposed into three types: Sector and geographical diversication G, the selection
part S and a part which invests into a portfolio of factor risk premia. In the return
attribution, return numbers add up in the hierarchy but non-linear risk gures to not
add up.

Given the return attribution, how do we calculate returns? This is trivial if no cash
inows or outows need to be considered. Two methods to calculate investment return
are distinguished:

• Time-Weighted Rate of Return (TWR).

• Money-Weighted Rate of Return (MWR)

We refer to Marty (2015) for a detailed discussion. The TWR measures the return of an
investment where in- or outows do not aect the return of the investment. Its reects
the return due to the asset managers decisions taken in the past. As an example, start
with USD 100 in period one, where another USD 200 are added at the beginning of
period two and portfolio value at the end of period two is USD 300. The net gain of
the portfolio is zero, but calculating the linear return results in a 200 percent return if
do not take into account intermediate cash ows. TWR controls for these cash ows in
return calculations. MWR reect the return from an investor's perspective: In and out
cash ows as well as the prot and loss matter in this perspective. The MWR method
is based on the no arbitrage principle. Both, the MWR and TWR can be applied on an
180 CHAPTER 3. FUNDAMENTALS THEORY

=RB + ARR Fees: net of fee return

Asset Classes

TAA & Selec.

Risk Premia

Figure 3.9: Performance attribution tree for the MSCI World ESG Quality Index where
the information written in red comes from me (Adapted from MSCI [2016]).

absolute or relative return basis.

The TWR
TWR
R0,T of a an investment starting in 0 and ending in T, with T −1 time
points in between (not-necessarily equidistant) is dened by:

−1
TY −1
TY −1
TY
TWR Vi+1 − Vi Vi+1
1+ R0,T = (1 + Ri,i+1 ) = 1+ = (3.26)
Vi Vi
i=0 i=0 i=0

Writing out the second product it follows that

TWR
1 + R0,T maps V0 into VT - all interme-
diate steps cancel. The following properties holds for TWR:

Proposition 19. 1. Adding or subtracting any cash ow cbt at any time b
t does not
change TWR.

2. If φi (j) = λi φi−1 (j) for all assets j and all time points i, then TWR equals the
return of the nal portfolio value relative to its initial value. Hence, all intermediate
returns cancel in (3.26).
3.1. RETURNS AND PERFORMANCE ATTRIBUTION 181

The TWR method is used by most index providers since cash in- or out-ows do not
impact the return of the index. To prove the rst property, x a time t and
b let cbt be an
arbitrary cash ow. The relevant terms in the TWR with this additional cash ow are:

Vb − Vbt Vbt − Vbt−1

1 + t+1 1+ .
Vbt Vbt−1

Assuming that Vbt = Vt + cbt , i.e. the additional cash ow is added, and inserting this in
the last expression implies
Vt+1
Vt−1
which is the same result as simplifying the two terms in the TWR without any additional
cash ows.

In the MWR cash ows cj are reinvested at the internal rate of return (IRR), i.e.
RM W R solves:
T
X
MW R
P V (C, R )= D(0, j; RM W R )cj (3.27)
j=1

where the discount factor D depends explicitly on the RM W R . Since RM W R enters the
denominator of the discount factor, (3.27) is solved numerically. Using the rst order
1
approximation D ∼ 1+R transforms (3.27) into a linear equation for R- the so-called
Dietz Return (with AIC the Average Investment Capital):

−1
TP
ST − S0 − cj
Dietz P &L j=1
R = := −1
TP
. (3.28)
AIC 1
S0 + 2 cj
j=1

This approximation implies simple compounding and assumes that the CF are realized
in the middle of the respective periods.

3.1.12 Returns and Leverage

We consider two assets, see Anderson et al. (2014). The return of this portfolio without
leverage R0 in a single period reads

X
R0 = hφ, Ri, φi = 1 (3.29)
i

with φ, 1 − φ the invested amounts in asset 1 and 2, respectively. Consider a leveraged

position with leverage ratio λ ≥ 1. The portfolio value in absolute terms reads at any
date
V λ = λ(ψ1 S1 + ψ2 S2 ) + (1 − λ)ψ3 B (3.30)
182 CHAPTER 3. FUNDAMENTALS THEORY

where the rst part represents the levered portfolio and the last term represents borrowing
costs for the leveraged position which is an investment in the borrowed asset B. Note
that this term is negative. In relative terms,

1 = λ(φ1 + φ1 ) + (1 − λ)φ3

which shows that for λ=1 we are back in the unlevered case.

As an example, consider two assets S1 , S2 100 CHF at date 0, one

both with prices
unit of each asset in the portfolio and leverage λ = 3. Hence, 600 = λ(ψ1 S1 + ψ2 S2 )
CHF. Therefore, ψ3 B = 400 and V0 = 200 CHF. Assume that the return of the borrow-
ing asset B is 2%. If the S -asset's return is 10%, then the leveraged portfolio return is
22% = 3 × 10% − 4 × 2%. But if the S assets falls by 10%, then the leveraged portfolio
return is −38%. This means that after costs for leveraging, a boost of positive returns is
smaller than the comparable negative return boost if asset prices drop.

Calculating the return of the leveraged portfolio:

Rλ = λhφ, Ri + (1 − λ)φ3 RB . (3.31)

Setting φ = (φ1 , φ2 ) and φ3 = 1 − φ1 − φ2 . Inserting this, implies

λ
= φ1 + φ2 .
2λ − 1
If there is no leverage, φ3 = 0 λ = 1. Calculating
or the excess return relative to a risk
free rate Rf and to the borrowing rate RB we get:

Rλ − Rf = λhφ, R − Rf i + (1 − λ)φ3 (RB − Rf )

λ
R − RB = λhφ, Ri . (3.32)

The excess return relative to the borrowing rate scales linearly in the leverage ratio. But
for the excess return relative to the risk free rate, if RB > Rf , increasing of the leverage
ratio reduces the gains in the original portfolio.

The leverage ratio λ is in many investment strategy applications not a constant

over time but a random variable. Rewriting the second equation in (3.32) and taking
expectations we get:

E(Rλ ) = λhφ, E(R)i + E(1 − λ)E(R − RB ) + cov(λ, R − RB ) . (3.33)

Anderson et al. (2014) call the rst two terms on the right hand side the magnied
source terms due to leveraging. How important is the covariance correction in the last
term? To quantify it we need to consider the volatility drag. Formula (3.33) summarizes
that the expected return of a leverage portfolio also contains a covariance reduction term
between the random leverage ratio and the excess return. Summarizing, in a multi period
investment, there are three factors which matter:
3.2. BASICS OF NO ARBITRAGE 183

• The covariance correction present in leverage portfolios.

• The volatility drag which is present in any multi-period investment.

• Transaction costs.

Anderson et al. (2014) consider these three factors in a 60/40 target volatility investment
with US equity and US Treasury bonds. They consider monthly returns from Jan 1929
to Dec 2012. The target volatility is set equal the xed 11.59% realized volatility in the
observation period. Since volatility is not known ex ante, the leverage ratio is a random
variable. The borrowing for the leverage is done at the 3m Eurodollar deposit rate and
trading costs are proportional to the traded volume.

The authors nd that the magnied source return in equation (3.33) dominates all
other components. But his portfolio is not realizable. The gross return of the source
portfolio, i.e. 60/40 target (gross of trading costs) and
the risk parity portfolio with
3m Eurodollar nancing (net of trading costs) is 5.75% in the period. The magnied
source term contributes 9.72%. This implies that 3.97% is due to the leverage and
excess borrowing return. The total levered arithmetic return is 6.84%. The dierence
to 0.72% is the covariance correction −1.84% and the trading costs of −1.04%. Finally,
the variance drag value is −0.4% which implies the total geometric levered return of
6.37%. Summarizing, the three eects - transaction costs, covariance correction and
variance drag - reduced the positive leverage return impact of 3.97% by 82% to 0.69%
(3.97 − 1.84 − 1.04 − 0.4 = 0.69%).

3.2 Basics of No Arbitrage

We consider in this and the following sections relative or derivative asset pricing. This
is used for pricing derivatives and options. It is based on the no arbitrage principle,
To motivate the arbitrage approach, we consider a minimal market with two dates,
0 and T, one state at 0 and two states ω1 , ω2 at time T.
• There is a stock S with price S0 = 100 at time 0. Research estimates that

the stock raises to a value of ST (ω1 ) = 120 with a probability of p = 90% in

the state ω1 and

that the stock drops with a probability of 1 − p = 10% to a price of ST (ω2 ) =

80.

• The investor can buy the following contract - a call option C0 at time 0 with the
payos CT at time T

20, if the stock rises,

0, if the stock drops.
Summarizing, the payo of call at time reads CT = max(ST − 100, 0).
184 CHAPTER 3. FUNDAMENTALS THEORY

• There is a risk-less instrument B with price 1 today and which pays 1.1 at time T
independent whether the stock rises or falls.

How much is the investor willing to pay at time 0 for the derivative C ? This denes the
pricing problem.

We show that there is a unique, fair answer to this question in complete markets.
We start with the motivations of a seller (writer or trader of a bank) and of a buyer of
the derivative.

The writer of the derivative would like to obtain a price from the buyer at time 0
such that he can buy a portfolio V0 at 0 which will have a value VT at time T which is
always worth at least the liability value of the derivative CT at time T, i.e.

VT (ω) ≥ CT (ω) , in all states ω.

The price at time 0 should be high enough that the writer can pay the liability at time
T using the price change of the portfolio V0 up to time VT without additional money
10
and using the three instruments S, B, C only. The buyer of the derivative does not want
to pay a price at 0 for the derivative such that the writer can buy a portfolio V at time
0 which is worth more than the derivative value at time T:

VT (ω) ≤ CT (ω) , in all states ω,

is the buyer's intention. The price, if it exists, where both motivations are met

VT = CT , in all states ω.

is called the fair price of the replication portfolio (we skipped the state variable
ω ). There are no restrictions on the portfolio positions, i.e. we can be long or short any
instrument.

What is a state? It represents everything that is relevant for the value of the rm,
including rm-specic variables such as earnings and leverage, industry-specic variables
such as product demand and input prices, and macroeconomic variables such as interest
rates and exchange rates. The state includes everything that we are not going to model
explicitly. Sometimes it includes the stocks price, so that, even though we are ultimately
interested in deriving the stock price as a function of more primitive variables, the dis-
tinction between the state and the price becomes blurred.

Replication is not always possible. This leads to the following denitions.

Denition 20. • Complete Market : VT = CT holds in all states. This is the

case of replication or a perfect hedge.
10 The portfolio is required to be self-nancing: All changes in the portfolio value from time 0 to time
T are due to changes in asset prices in that period.
3.2. BASICS OF NO ARBITRAGE 185

• Incomplete Markets: V = CT T not always holds true

. The portfolio value at
time T can be smaller or larger than the liability value in some states. A portfolio
is called a hedge in such a setup and there exists always hedging risk.
11 To nd the replication portfolio V we buy or sell an amount φ2 of the risky asset
and trade φ1 risk less products. The condition V T = CT is equivalent to two linear
equations for φ2 and φ1 :

φ2 ∗ 120 + φ1 ∗ 1.1 = 20
φ2 ∗ 80 + φ1 ∗ 1.1 = 0 . (3.34)

The problem with a risky asset is expressed in terms of linear algebra where no entry is
risky. Whether or not a (unique) replication exists is therefore reduced to the questions
when does a linear system has no, many or a single solution. By solving the system we
get

• φ2 = 0.5, i.e. buy 1/2 risky asset.

• φ1 = −36.36, i.e. long a loan with value 36.36.

Choosing φ2 , φ1 in this way, there are no hedge risks.

How do we get the fair price C(0)? To answer this we calculate the portfolio value
at time 0 using the above strategy:

V0 = 0.5 ∗ S0 − 36.36 = 0.5 ∗ 100 − 36.36 = 13.64 . (3.35)

This is the fair derivative price, i.e. V0 = C0 = 13.64. Indeed we apply the Law of One
Price which is a weaker formulation than the no arbitrage principle:

Denition 21 (Law of One Price). Consider a perfect market. Two assets with identical
cash ows must trade at the same price or if the replication price of an option exists,
then this price is unique.

One often states the law of one price as follows: If we have three payos x, y, z at a
given date with
z =x+y .
Then the prices p(·) at any date of these equal payos also agree:

p(z) = p(x) + p(y) .

If this is not true, one constructs money machines.

11 In practice one often uses the word hedging both for replication and for the case with hedging risk.
186 CHAPTER 3. FUNDAMENTALS THEORY

Since V T = CT we must have V 0 = C0 . If a dierent price follows, the writer can make
risk less prots in a risky environment. For V0 < C0 , the writer invests the dierence in
the risk less asset. If C0 < V 0 the writer buys the derivative from the investor and sells
it to the fair price. Again the dierence is looked in and invested in the risk less asset.
The law of one price is the most important special cases of no arbitrage.

How does the Law of one Price ts into the no arbitrage relation (3.1) in a risk less
environment, i.e. D(t, s)D(s, T ) = D(t, T ), D(t, t) = 1? Take USD 1 at time T . Dis-
counting this dollar back to time t using two dierent paths has to give the same value
at time t by the Law of one price. This exactly what (3.1) states.

The probabilities P assumes that the risky asset goes up with 90 percent and down
with 10 percent. This probabilities are derived from historical data using econometric
methods. They do not matter explicitly in the pricing of derivatives. The fair
price in a complete market is independent on individual belief 's of the market partici-
pants or real (historic) probabilities. This is a major reason for the success of derivative
pricing since it liberates buyers and sellers to estimate these probabilities which in a multi
period set-up turn out that one has to guess the drift of the risky asset price processes.
Given this observation people are tempted to state that the belief is of no importance at
all. This is not true. Suppose that the common belief is that Google's stock price will
raise by 10% in one week. Then the belief does not enter a derivative contract of Google
but it clearly aects the level of the stock. Therefore, beliefs matter in derivative pricing
by aecting the underlying's price level.
12

In the replication approach a portfolio of bonds and stocks was set up to replicate the
derivative payo. In the hedging approach one considers an unknown amount of the
stock and the derivative. This portfolio is then specied by requiring that the portfolio
has the same value in all states of the world as the risk less bond. Therefore, using the
option and the stock one derives the bond property. A portfolio with this property is
hedge position. The same value for φ2 follows as under the replication approach. One
could equally take the last combination - the derivative and the bond - as a portfolio
combination an replicate the stock.

What happens if the two asset payos are linearly dependent (redundant assets)?
Then, the replication system has no solution. Similar if there is only one asset, the op-
tion cannot be replicated.

We change our market in the initial example as follows: The time T -values of the risky
12 As a game one should setup the above market and ask friends to report about the price that they
would pay for the call option. For each announced price you can state an investment strategy which gives
you risk less prots except one friend announces the fair price. The strategy follows the above receipt:
if the announced price is higher than the fair price, use this latter fraction for hedging and invest the
dierence in the risk free asset. People will be astonished if they see how you (the writer) of the option
will make risk less prots in a risky environment - a king of magic in option pricing happens.
3.2. BASICS OF NO ARBITRAGE 187

80 and 105 and the derivatives pays 10 in the upper state and 0 in the lower one.
asset are
Forming the replication portfolio and solving the equations we get A = 0.4, B = −29.1
and V0 = 10.9. This price makes no sense. Why should anybody pay 10.9 for a contract
which pays 10 or 0 at time T ? The replication portfolio was setup correctly. Therefore
something must be wrong in the market structure. We show that the market is not free
of arbitrage. To see this, we write the risky asset price moves using the up ('u') and
down ('d') notation, i.e. 120 = 100u and 80 = 100d.
Proposition 22. Arbitrage is not possible in the above one period market if and only if
u>1+r >d (3.36)

holds
To explain the result, we note that u > d. Suppose 1 + r > u > d. Then the risk less
investment always dominates the risky one - in all possible stated tomorrow. Therefore,
go short the risky asset and long the risk less one. In the case u > d > r+1 a similar
argument applies. We obtain for the example at the beginning of the section:

1.2 > 1.1 > 0.8 ⇒ u > 1 + r > d .

The above proposition gives us a simple criterion to check whether no arbitrage is possible
or not in a binomial model.
We consider risk neutral pricing. We have seen that historical probabilities or
beliefs about the risky asset price dynamics do not matter for fair option pricing in the
replication approach. But there is a pricing approach where probabilities matter. These
probabilities are dierent from the empirical or subjective ones. We dene

R−d
q := ,R = 1 + r . (3.37)
u−d
Proposition 23. Suppose that there are no arbitrage possibilities. Then q is a probability,
the so-called risk neutral probability.
To prove this we show 0 ≤ q ≤ 1. Since r − d > 0 and u − d > 0, we have q > 0.
Assume q > 1. This is equivalent to R − d > u − d, i.e. R > u. This contradicts the
assumption of no arbitrage.

If we calculate q in the original setup we get q = 0.75. In the variant with an arbitrage
opportunity we have q = 1.25. The RNP has the form of a Sharp Ratio:
Return relative to risk free
q= .
Volatility

We can q to characterize no arbitrage. The denition of q is equivalent to qu−qd+d = R.

Multiplying the last equality with S0 we get (using the notation STu = S0 u)

qS u + (1 − q)STd = E Q [ST ] = RS0 .

| T {z }
Expected value under Q
188 CHAPTER 3. FUNDAMENTALS THEORY

Dividing by R:
Q ST
E = S0 .
R
In an arbitrage free market, the expected value of discounted risky assets under the risk
neutral probability equals today's discounted asset value (note that S0 /1 is the discounted
value in 0). The First Fundamental Theorem of Finance states that converse also holds.
The existence of a martingale measure implies no arbitrage. Since martingales have no
drift, the expected value of the discounted price process is constant.

We relate the replicating approach to the risk neutral one. The solution of the general
replication equations

φ2 ∗ S 0 ∗ u + φ1 ∗ R = C u
φ2 ∗ S 0 ∗ d + φ1 ∗ R = C d (3.38)

is
Cu − Cd C u C u (d − u − 1) − C d u
φ2 = =: ∆ , φ1 = + .
S0 u − S0 d R R(u − d)
φ2 , the Delta, measures the price sensitivity of the derivative given a price change of the
underlying risky asset. φ1 is negative. We have at time 0:

V0 = φ2 S0 + φ1 B0 = ∆S0 + φ1 B0 . (3.39)

Transforming this expression we get after some algebra:

1 u d

Q 1
V0 = C q + C (1 − q) = E CT ,
R R
i.e. the fair option price is equal to the discounted payo under the RNP. The price of
a call is long the underlying times its delta and long a loan since φ1 is negative. From
a balance sheet perspective buying a call asset is equivalent to add an asset, the delta
part, and adding a liability the loan part. Since the delta is not larger than one, if the
call is bought by selling the underling asset, the asset side of the balance gets shorter
and a liability is added: Leverage of the balance sheet increases.

From
V0 = C0 = ∆S0 + Cash
we get
1 − Cash/C0 = S0 × ∆/C0 .
| {z }
Leverage Ratio L
L>1 represents a loan (cash is negative) and L<1 the lending case. For L = 5, we
gain with the costs for the option an exposure in the underlying value which is 5 times
larger. To achieve this, we borrow 4/5 of the costs and invest 1/5 of the costs from our
own money.
3.2. BASICS OF NO ARBITRAGE 189

Theorem 24. Consider a complete and perfect market.

• Fair derivative prices are expected values of discounted terminal payos under the
risk neutral probability.

• The discounted derivative process is a martingale under the risk neutral probability.

• The existence of a synthetic probability Q leads to an arbitrage free market structure.

• The objective or empirical probabilities P do not enter explicitly in derivative pricing

formulae.

If markets are incomplete our statements holds true with the exception, that Q is not
unique or equivalently, the price of the derivative is not uniquely determined using no
arbitrage.

We consider a more general setup with two risky assets:

φ2 ∗ STu + φ1 ∗ XTu = CTu

φ2 ∗ STu + φ1 ∗ XTd = CTd . (3.40)

with S X two assets and C(S) the derivative. Using X as numeraire, the discounted
and
option C/X and risky asset S/X are both martingales. A numeraire is by denition
a positive random variable. X = 1/(1 + r)
T is a deterministic numeraire. Indeed, it

follows that C/X or the replicating portfolio V /X is a martingale if and only if S/X is
a martingale:
C0 QX CT S0 QX ST
=E if and only if =E . (3.41)
X0 XT X0 XT
The probability QX depends on the choice of the numeraire:

S0 STd
X0 − XTd
qX = .
STu STd
XTu − XTd

R−d
If we set X equal to a risk free asset, qX becomes the well-known q= u−d .

We summarize. First, the advantage of relative pricing w.r.t. X is that the probability
QX is independent of the derivative, the replicating portfolio and the pricing equation
(3.41) holds for all derivatives C . Second, both the derivative and the relative asset
price are martingale measures, i.e. the measure related to the numeraire. Third, we could
choose S as a numeraire instead of X . This leads to a new measure QS such that X/S and
C/S are martingales under this new measure: The choice of a numeraire does not alter the
price of the derivative
13 . One can chose the most convenient numeraire for calculations.
Fourth, if we would consider absolute pricing (pricing using a general equilibrium model)

13 The numeraire has to be a strictly positive random variable or stochastic process.

190 CHAPTER 3. FUNDAMENTALS THEORY

instead of relative one, the martingale measure for a derivative depends on the specic
derivative payo VT . This is a main reason why one uses relative no arbitrage pricing in
practice much more often than a fully edged general equilibrium model. We summarize:

Theorem 25. • Under no arbitrage pricing the pricing formula holds for all types
of payos or derivatives.

• The risk neutral probability in the linear pricing formula depends only on the asset's
characteristic (numeraire and other assets).

• In absolute pricing (General Equilibrium) the probability entering in the linear pric-
ing formula depends on the assets and payo/derivative.

3.3 No Arbitrage and Derivative Pricing

We generalize no arbitrage pricing to one period models with many states and many as-
sets. We assume that there are N risky assets in a single period with a nite number of
states s > 1. S j (k) is the asset price of asset j in state k := ωk . RN is the space of portfo-
lios where each component represents an amount of an asset hold. The linear payo map
P : RN → RS associates to a portfolio φ a payo Pφ. In the simplest market structure
each payo can be reached by a portfolio given a payo map. Every risk in the economy
can be perfectly replicated. But typically, the space of payos which can be reached is
smaller than the state space. This smaller vector space is called the asset span hSi ⊂ RS .

We impose the weak internal consistency condition of no arbitrage in the market: We

exclude portfolios which allow to make no losses in all future states and gains in some
states in a risky environment. No arbitrage is equivalent to the Law of One Price if we
consider an economy with a nite number of assets and a nite number of states, i.e. we
consider the set-up:

Denition 26. Consider a one-period model with S = s > 1 states at time T and N − 1
risky assets S and a risk less asset B . The price of asset j at time T in state k := ωk is
given by S j (k). The payo matrix P is dened by14
 1
B (1) S 2 (1) · · · S N (1)

.. .. .. ..
P= . . . .  . (3.42)
 
1 2
B (s) S (s) · · · S (s)N

A portfolio (strategy) is a vector φ = (φ1 , . . . , φN )0 .

The matrix P has the dimension S × N. The payo or portfolio value X at time T is

X = Pφ . (3.43)

14 We suppress the time index T for the assets.

3.3. NO ARBITRAGE AND DERIVATIVE PRICING 191

Denition 27. A payo X is attainable given P if a portfolio φ exists such that X = Pφ.
The portfolio φ is called a replication portfolio. The space of attainable payos, the asset
span, is denoted hSi ⊂ RS .

Investors are interested to nd φ given the payo, i.e. to solve

X = Pφ

with the solution

φ = P+ X + N (P) (3.44)

with A+ the Moore-Penrose Pseudo Inverse.

15 and N the kernel of P. If P is invertible,
the pseudo inverse equals the inverse and the kernel space is empty. The pseudo inverse
matters if the payo matrix is not onto; the kernel is not empty. If the kernel is not
2
empty, the pseudo inverse minimizes ||Pφ − X|| which in the invertible case is given by
the inverse matrix. If N = s and P onto, the inverse exists. For N = s the sources
of randomness can be spanned in all state by the assets. If there are more states than
assets, S > N, the replication problem has no solution. In the case S<N an innite
number of solutions is the generic case. The 'perfect' market set-up is dened as follow:

Denition 28. A market with payo P is complete if each claim X is attainable.

For P invertible, market completeness follows. Given a market with payo matrix P
and an asset price vector S0 := (B0 , S01 , . . . , S0N ), we dene arbitrage:

Denition 29. Consider a market with payo matrix P and asset price vector S0 . An
arbitrage is a portfolio φ = (φ1 , . . . , φN )0 such that

• the initial portfolio value V0 = hS0 , φi ≤ 0,

• Pφ ≥ 0

• and there exists at least a single state k̃ where x(k̃) > 0 holds.

If there is arbitrage, a zero costs portfolio today and ends up with no loss tomorrow in
all states and with the chance of a prot in at least one state. Consider a single asset which
prices S+ and S− and B the risk free asset price. In a risky environment S− < B < S+
is the market structure leading to the absence of arbitrage. If B< S− < S + , you borrow
a large amount by selling the risk free asset and invest the whole amount in the risky one

15 Let A ∈ Cm×n . A Moore-Penrose Pseudo Inverse A+ is a matrix of the same dimension as A if it

satises: (i) AA+ A = A, (ii) A+ AA+ = A+ , (iii) (AA+ )∗ = AA+ and (iv) (A+ A)∗ = A+ A. If A has
linearly independent columns , then (A∗ A)A∗ A is invertible) and
A+ = (A∗ A)−1 A∗ ,

i.e. A+ is the left-inverse A+ A = I. P = AA+ and Q = A+ A are orthogonal projection operators with
the properties: P A = AQ = A, A+ P = QA+ = A+ . P is the orthogonal projector onto the range of A
and (IP ) = (IAA+ ) is the orthogonal projector onto the kernel of A∗ .
192 CHAPTER 3. FUNDAMENTALS THEORY

such that V0 = hS0 , φi = 0, i.e. 0 = φ1 B0 + φ2 S 0 . Whatever the realized future state is

you will make a certain prot since

B(1) S(1) = B(1)φ1 + S(1)φ2 = B0 φ1 (r + d) > 0
Pφ = φ (3.45)
B(2) S(2) = B(2)φ1 + S(2))φ2 = B0 φ1 (r + u) > 0

where we inserted φ2 from 0 = φ1 B0 + φ2 S 0 and S + = S(1) = uS0 , S − = S(2) = dS0 .

We never make a loss starting with zero net worth.

When is a market free of arbitrage? Using state prices ψ the First Fundamental
Theorem of Finance (FFTF) answer the question.

Proposition 30 (First Fundamental Theorem of Finance (FFTF)). There is no arbitrage

opportunity in a nite discrete market model if and only if there exists a vector of state
prices ψ ∈ RS , ψj > 0 for all j , such that

P0 ψ = S0 . (3.46)

The FFTF states that the price of asset i at time 0 is given by

S
X
S0i = ψj Pi (j) .
j=1

To interpret state prices, consider a risk less asset:

S
X S
X S
X
B(0) =: B0 = ψj B 1 (j) = ψj × 1 = ψj =: ψ0
j=1 j=1 j=1

since the risk less asset pays 1 in all possible states. We dene the probabilities with
values (0, 1)
qi := ψi /ψ0 , ∀i .
The risk less asset can be rewritten

S
X
B0 /ψ0 = qj B 1 (j) = E Q (B 1 ) = E Q (1) = 1 .
j=1

Therefore, ψ0 is the discount on a risk less borrowing. If r in the risk less annual interest
rate, we write
1
B0 = ψ 0 = .
(1 + r)T
This implies for all other risky assets:

S S
Si

X 1 X
S0i i i Q Q i

= ψj S (j) = q j S (j) = E = E M S (3.47)
(1 + r)T (1 + r)T
j=1 j=1
3.3. NO ARBITRAGE AND DERIVATIVE PRICING 193

with M the Stochastic Discount Factor SDF.. This factor is in this model deterministic
and does not depends on consumption as it is the case in more general models. We will
consider the SDF in more detail below. That means that in the pure nance set-up with a
given market structure and no-arbitrage the Fundamental Asset Pricing in (3.47) follows.
1 PS
qi 's is equal to 1, qi > 0 and S0i = i (j) = E Q S i,∗ , where

The sum of the
(1+r)T j=1 qj S
S i,∗ is the discounted asset price. Then the q 's are the risk neutral probabilities
(RNP). This shows the equivalence of risk neutral probabilities and state prices:
qi
! ψi . (3.48)
(1 + r)T

State prices are Arrow-Debreu securities e(j), j = 1, . . . , S where the security e(m) pays
1 CHF if the state m is realized and zero otherwise. The FFTF implies

(1 + r)T · · · · · · (1 + r)T
    
1 ψ1
 e(1)   1 0 0 0   ψ1 
 ...  = 
     .
··· ··· ··· ···  ... 
e(S) 0 ··· 0 1 ψS

Hence, e(j) = ψj follows. We summarize.

Corollary 31. State-price densities are the prices of the Arrow-Debreu securities. State
price densities and risk neutral probabilities are equivalent.
Each payo can be written as a linear combination of the Arrow-Debreu securities.
In practice one often prefers to work with risk neutral probabilities instead with state
prices. We state the FFTF using risk neutral probabilities.

Proposition 32 (First Fundamental Theorem of Finance). There is no arbitrage oppor-

tunity if and only if a risk neutral probability exists.
Consider the RNP condition for a single asset S∗ =E Q [S ∗ ] . This is equivalent to (note
t t t+1
that S0∗ = S0 since discounting at a single date is 1)

EtQ St+1 /St − 1 = EtQ Rt+1

∗ ∗
= 0.

The expected discounted return of the risky asset under RNP is zero: The discounted
asset has no trend under this probability. If we consider several investment periods, the
risky asset condition for a RNP reads

St = EtQ St+1
∗

with the conditional expectation at time t given the information set at this date. Inter-
preting conditional expectation as best guess, the condition states that under a RNP the
best guess of future discounted prices is today's price. Such price processes are called
martingales. The absence of arbitrage does not implies that the risk neutral probability
is unique. The second Second Fundamental Theorem of Finance considers this question.
194 CHAPTER 3. FUNDAMENTALS THEORY

Proposition 33 (Second Fundamental Theorem of Finance). Consider an arbitrage free

market. The risk neutral probability is unique if and only if the market is complete.
Proof. To show that completeness implies uniqueness of the state vector, assume that
there exist two state vectors ψ1 , ψ2 which both solve the equation S0 = Pψ . This implies
0 = P(ψ1 − ψ2 ), i.e. the vector dierence is orthogonal to all rows of the payo matrix.
Therefore, the dierence is not attainable. This contradicts that the two vectors are state
price vectors. The other direction is proven with a similar argument.

The theorem is equivalent to the uniqueness of SDF or risk neutral probabilities.

Following this structural representation of arbitrage free markets we consider deriva-

tive asset pricing in such markets. We assume r>0 and the assets

(Bt , S1∗ (t), . . . , SN

∗
(t))t=0,T

where S1∗ (t) = Sj (t)/B(t) are the discounted asset prices. A derivative is a contract
signed at t = 0 which leads to a state contingent non-negative reward X1 (ω) at T .
How do we price X fair?

Denition 34. X0 is a fair price for the contingent claims XT , if the enlarged market
(Bt , S1∗ (t), . . . , SN
∗
(t), Xt∗ )t=0,T

is free of arbitrage opportunities.

If the enlarged market is free of arbitrage, there exists a RNP Q with Si (0) =
EQ [Si∗ (T )] since the initial market was by assumption arbitrage free and the fair price
∗
is X0 = EQ [XT ]. The next theorem summarizes the pricing of derivatives by extending
what we already know.

Proposition 35 (Risk-neutral Valuation Principle) . In arbitrage free markets, the fair

price of attainable contingent claims XT is uniquely given by
1
X0 = EQ [XT∗ ] = EQ [XT ] .
1+r
We apply this to put and call option. A Call-Option gives the buyer the right to buy
the underlying asset S at time t=1 to theExercise Price K (or Strike-Price) and a put
option gives her the right to sell the underlying asset to the price K . For a call option
this contingent claim is formally given by

CT := (S − K)+ := max{0, S − K},

and for a put PT := (K − S)+ .

Then, the fair price C0 of an attainable call option in a arbitrage free market is

1
C0 = EQ [(S − K)∗+ ] = EQ [(S − K)+ ].
1+r
3.4. APPLICATION 195

3.4 Application
3.4.1 TAA Construction, Forwards and Futures
We consider the construction of the TAA for a Swiss intermediary (Source: ZKB [2013]).
Figure 3.10 shows the inputs in the TAA construction. The asset classes are cash, bonds,

ISV TAA ISV TAA Weight Duration

Country FX Position Benchmark ISV Weight Duration Implementation ETF ETF ETF
CH CHF Liquidity Libor 1 Monat 4.50 Pictet CH Short-Term Money Market CHF 4.50 0.14
CH Govis 14.00 4.78 14.00 4.78
CH CHF Bonds 1 - 5 SBI Domestic AAA-BBB 1-5y TR 9.66 CS ETF (CH) on Swiss Bd Idx Dom Govt 1-3 8.76 1.82
CH CHF IG SBI Domestic Non-Gov AAA-BBB TR 14.00 ZKB-CIF Swiss Bd TM Idx AAA-BBB Dom E 14.00
EU EUR Govis JPM EMU Government Xy TR 2.50 4.90 2.50 4.90
EU EUR Bonds 5 - 10 JPM EMU GOV 5-7/7-10y 1.13 iShares Barclays Cap Euro Gov Bd3-5 1.64 3.60
EU EUR IG Citigroup EUROBIG Corporate TR 2.25 ZKB-CIF EUR Corp. Bond Index E 2.25
EU EUR HYCB 0.25 iShares Markit iBoxx Euro Hi-Yld Bd (IE) 0.25
UK GBP Govis JPM GBI UK Xy GBP TR 1.00 5.06 1.00 5.06
USA USD Govis JPM GBI US Xy USD TR 1.00 5.00 1.00 5.00
USA USD Bonds > 10 JPM GBI US 10y+ USD TR 0.05 iShares Barclays Cap $ Trsy Bd7-10 0.55 7.53
USA USD IG Citigroup USBIG Corporate TR 0.80 ZKB-CIF USD Corp. Bond Index E 0.80
USA USD HYCB BofA Merrill Lynch US High Yield TR 0.20 iShares iBoxx $ High Yield Corporate Bd 0.20
CAN CAD Govis JPM GBI Canada Xy CAD TR 1.00 4.85 1.00 4.85
J JPY Govis JPM GBI J Xy JPY TR 0.00 0.00Swisscanto (CH) Inst BF-JPY I 0.00 4.81
AUS AUD Govis JPM GBI Australia Xy AUD TR 1.00 4.64UBS (Lux) BF AUD P Acc 1.00 4.90
EM USD EM Bonds JP Morgan EMBI+ TR 3.00 iShares JPMorgan $ Emerging Markets Bond 3.00
CH CHF Stocks MSCI Switzerland NR 11.00 Amundi ETF MSCI Switzerland A 11.00
EU EUR Stocks MSCI Europe ex Switzerland NR 9.50 Amundi ETF MSCI Europe Ex Switzerland 9.50

N-Americas USD Stocks MSCI North America USD NR 8.50 iShares MSCI North America 8.50
Asia / Pacific USD Stocks MSCI Pacific USD NR 6.50 ComStage ETF MSCI Pacific 6.50
EM USD Stocks MSCI Emerging Markets USD NR 4.50 db x-trackers MSCI Emerg Mkts TRN 1C 4.50
Global CHF Hedge Funds HFRX Global Hedge Fund CHF Index 4.50 db x-trackers db Hedge Fund 5C 4.50
Global USD Commodities DJ UBS Commodity TR Hedge to CHF 3.00 ZKB-CIF Commodity Index hedged CHF E 3.00
Global USD Gold Spotpreis USD/Unze CHF Hedged 3.00 ZKB Gold ETF CHF Hedged 3.00
CH CHF Real Estate CH SXI Real Estate Funds Index 4.00 UBS-IS - SXI Real Estate Funds I 4.00
Total 100.00 100.00

Figure 3.10: Inputs in a TAA. The table shows the dierent asset classes, their volatility
adjusted benchmark, the implementation when using ETFs, the weights and durations
of the benchmark and ETF portfolio. Source: ZKB (2013)

stocks, hedge funds, commodities, gold and real estate CH. Bonds are split in three time
buckets, 1-5y, 5-10y and more than ten years, into government bonds and corporate
investment grade (IG) and high yield (HYCB) bonds. The list of ETF represents a pos-
sible implementation of the benchmarks. The weights of the portfolio are the result of
an optimization such as a mean-variance optimization. The weights of the dierent asset
classes are volatility weighted (the ISV notation) which will be explained below.

The positions in the TAA are replicated using liquid and cheaper instruments than
ETF: futures, forwards and swaps which we call REP instruments, see Table 3.7. Equity,
foreign bonds, commodities are replicated using futures, currency using forwards and
swaps are used for CHF bonds. This means that asset class Aj is written as a linear
combinations of the REP instruments. Say the 11% allocation of Swiss stocks is equal to
9% in SMI Fut and 2% in SMIM Fut. The splitting of the individual assets classes into
the REP experiments is done by minimizing the tracking error of the REP instruments
towards the benchmark. Hence, a given REP instrument can contribute to several asset
196 CHAPTER 3. FUNDAMENTALS THEORY

classes. The allocation of the REP instruments is shown in the table Allocation 100%.
Hedge funds and real estate cannot be attributed to the liquid REP instruments. Their
allocation weights are therefore zero.

Instrument Allocation 100% Allocation Index Instrument Allocation 100% Allocation Index
Liquidity CHF 4.5% 4.5% AUS 10 YR Bond Future 0.6% 1.2%
SMI Fut 10.7% 20.7% Natural Gas Future 0.3% 0.5%
SMIM Fut 1.2% 2.3% Crude Oil Future 0.2% 0.4%
FTSE Fut 4.1% 7.9% Brent Oil Future 0.4% 0.8%
Euro-Stoxx 50 Fut 6.2% 12% Live Cattle Future 0.2% 0.3%
S&P 500 E-mini Fut 8.4% 16.4% Wheat Future 0.2% 0.4%
TSX 60 Fut 0.7% 1.4% Corn Future 0.2% 0.4%
SPI 200 Fut 1.8% 3.5% Soybean Future 0.4% 0.8%
TOPIX Fut 4.2% 8.2% Sugar Future 0.2% 0.3%
Hang Seng Fut 0.6% 1.2% Aluminum Future 0.2% 0.4%
MSCI Singapore Fut 0.4% 0.7% Copper Future 0.4% 0.8%
MSCI EM Fut 4.8% 9.4% Gold Future 3.8% 7.3%
Swap CHF 3 YR 24.1% 46.8% EUR/CHF Fw 11.5% 22.4%
Swap CHF 7 YR 6.2% 12.1% GBP/CHF Fw 5.1% 10%
Swap CHF 10 YR 1.7% 3.2% USD/CHF Fw 18.7% 36.2%
Euro-Schatz Fut 1.6% 3.2% CAD/CHF Fw 1.8% 3.5%
Euro-Bobl Fut 2.4% 4.7% AUD/CHF Fw 2.9% 5.6%
Euro-Bund Fut 1.3% 2.6% JPY/CHF Fw 4.2% 8.2%
Short Gilt Fut 0.6% 1.2% HKD/CHF Fw 0.6% 1.2%
Long Gilt Fut 0.5% 0.9% SGD/CHF Fw 0.4% 0.7%
US 2 YR Note Fut 1.2% 2.3% Total Cash 4.5% 4.5%
US 5 YR Note Fut 2.4% 4.7% Total Futures 95.5% 185.5%
US 30 YR Note Fut 1.8% 3.4% Total Forwards 45.2% 87.8%
Can 10 YR Bond Fut 1.1% 2.1% Volatility 60d 4.1% 8%
Mini JGB 10 YR Bond Fut 0% 0% Investment Degree 100% 194.2%
AUS 3 YR Bond Fut 0.5% 0.9%
Table 3.7: Futures, forwards and swaps in the TAA replication. The allocation to 100% is scaled
to the allocation index which takes into account that the total volatility of the TAA should be
equal to 8%, see text for explanations.

The next step is to consider volatility. For each instrument, calculate the daily re-
turns for one year. Then form the sum of the products of the returns at each date. This
gives the return of the allocation index time series. The volatility of this index is the
standard deviation multiplied by the square root of return days within one year (square
root rule). This implies the volatility 4.12%. Given the target volatility of 8%, the in-
vestment degree of 194.2% follows, see the Allocation Index. Such a model TAA is an
input for the CIO which makes pairwise bets of dollar value each at inception.

Consider an index of forwards, futures and swaps replicating a TAA. The index value
It at time t is updated as follows from a index value at a prior time s<t
!
X F Xtk k X X
It = 1+ φks R + φ l
Swap
l
+ φm m
s Forws,t Is = Gs,t
F Xsk F ut,t s s,t
m
k l

where RF ut,t is the simple futures return and the FX component only matters for the
futures since the swaps and the forwards are in CHF. φ is the allocation vector arising
k
from an optimization problem. F Xt is the exchange rate of the currency of the future
k against the Swiss franc. francs at time t where 1 currency unit is equivalent to F Xtk
of Swiss francs. The value of the futures k is calculated as the price of the futures k at
the time s in local currency multiplied by the contract unit of the futures. An oil future
l
with contract unit 1, 000 and USD 108 local currency has value USD 108, 000. Swaps,t
3.4. APPLICATION 197

is the value of the swap l at time s with nal date t' at fair value interest rate and with
m
nominal CHF 1.00. The value of the currency forwards Forws,t at time t in Swiss francs
is given by a the fair value forward rate xed at time s with a nominal value of CHF
1.00. The maturity of the forward corresponds to the next planned roll date. The above
formula shows that interest exposure follows from the chain rule.

Iterating the index updating rule we get

t
Y
It = Gk−1,k I0 .
k=1

Hence, gross return Rg 0, t := It /I0 is given by the product of one step adjustments.
Gross return can also by written as a product of one-step gross returns:

t
g It It It−1 I1 Y g
R0,t = = ... = Rk−1,k .
I0 It−1 It−2 I0
k=1

Therefore,
t t
g g
Y Y
R0,t = Rk−1,k = Gk−1,k .
k=1 k=1

3.4.1.1 Forwards and Futures

A forward contract is an obligation for the buyer (seller) to buy (sell) a specied quantity
of an underlying asset at a specied price at specied future date. The terms date of
exchange T, underlying asset S , quantity to exchange N and delivery price K are all
xed at conclusion t of the contract. Contrary to futures contract, all terms of a forward
can be freely xed by the two counter parties. They are OTC contracts and the parties
face counter party risk of the opponent.

Forwards can be used to hedge risks. Consider a German based rm which wants to
buy goods in the US. The rm could buy the goods at a future date at the spot rate
USDEUR S(t). To avoid this risk the rm can enter today into a forward contract. The
forward price F (t, T ) is xed such that no cash ows exist at spot date t; the PV of a
forward is zero. The delivery price K is set equal to F at t, i.e. K = F (t, T ). The
equality K = F (t, T ) does not hold any longer after t. If at maturity S(T ) = K , then
the German buyer of the contract faces neither losses nor gains. If S(T ) > K , the Ger-
man rm makes a prot since USD can be bought at the cheaper price K than spot price.

Summarizing, a forward contract V has the following value for the buyer at maturity:

V (T ) = N · (S(T ) − K) ,

with N the notional amount. The payo is a linear function of S(T ) contrary to
options, where the payo is non-linear: The price of a forward does not depend on
198 CHAPTER 3. FUNDAMENTALS THEORY

(spot) volatility since the probability of making a gain or a loss at maturity is sym-
metric. No arbitrage leads to a unique forward price. Taking risk neutral expectation
V (t) = N · e−r(T −t) E Q ((S(T ) − K)) implies

V (t) = N · e−r(T −t) E Q (S(T )) − N · e−r(T −t) K.

Since E Q (S(T )) = er(T −t) S(t), the discounted price process is a martingale, we get

V (t) = N S(t) − N · e−r(T −t) K.

Since at contract initiation V (t) = 0, the initial forward F (t, T ) = K has to be chosen
as follows:

Proposition 36. The unique arbitrage free forward price F (t, T ) given a risk free rate
r is under continuous compounding:

F (t, T ) = S(t) · er(T −t) (3.49)

This price only holds for forward where no dividends, no cost of storage, no interest
rate dierential costs and no convenience yield apply. The growth rate of the future price
F is equal to r. If r 6= 0 then F (t, T ) > S(t) or F (t, T ) < S(t) with F (T, T ) = S(T ).
For simple compounding, using the approximation ex ∼ 1 + x:

F (t, T ) ∼ S(t) · (1 + r(T − t)) . (3.50)

We generalize to forwards with stocks paying dividends with a continuous rated, com-
modities with storage cost rate s, bonds with coupon payment rate c, FX-transaction with
dierent interest rates foreign and domestic i (interest rate dierential) and commodities
with a convenience yield y . These extensions are captured by the net cost-of-carry yield
q:
q = r + s − y − d − c ± i.

Proposition 37. The unique arbitrage free forward price F (t, T ) given q is given under
continuous compounding by:

F (t, T ) = S(t) · eq(T −t) (3.51)

If q > 0, the costs to possess the underlying value are larger than its value. This hap-
pens if storage costs are very high for a commodity forward. The buyer of the forward
therefore compensates the seller for these costs.

We next consider the valuation of a forward at intermediate dates. We recall that the
forward contract has value 0 at initiation time t. For s an intermediate date t < s < T,
the value V (s) of the forward contract is dened by

V (s) := e−r(T −s) E Q (S(T ) − F (s, T )) .

3.4. APPLICATION 199

The PV of S(T ) equals S(s) if no dividends are paid and

E Q (F (t, T )) = D(s, T )F (t, T )

which implies
V (s) = S(s) − D(s, T )F (t, T ) .
Using the no arbitrage relation S(s) = F (s, T )D(s, T ):

Proposition 38. The value of forward contract at time s is given by

V (s) = (F (s, T ) − F (t, T ))D(s, T ) . (3.52)

Setting s=t or s=T shows that the known initial and nal values follow.

Consider a stock with S(0) = 25 CHF and a 6m forward contract. The 6m interest
rate is r = 7.12%. Using simple compounding we get

F (0, 0.5) = 25(1 + 0.0712/2) = 25.89 CHF

After 3m the stock price is S(t + 3m = 0.25) = 23 CHF and 3m interest rates are
r = 8.08%. The forward price of a new contract with the same maturity reads

F (0.25, 0.5) = 23(1 + 0.0808/4) = 23.46 CHF .

The value of the old contract is

V (0.25) = (F (0.25, 0.5) − F (0, 0.5))D(0.25, 0.5) = −2.38 CHF .

If the buyer (long) wants after 3m leave the original contract he has to pay 2.38 CHF
to the seller.

The dierence between the forward and spot price is called the basis b: b(t) =
F (t, T ) − S(t). When the underlying S of the futures market and the cash market
are identical, the basis converges to zero on the maturity date. Basis risk arises if either
there is a mismatch of underlying asset or mismatch of maturity. Consider a forward
F (t, T ) = q is known at time 0, we set h =
e−q(T −t) S(t). If e−q(T −t) . Then, the payo
qt
at time t of h-forwards h(F (t, T ) − F (0, T )) = S(t) − S(0)e is the same as for a forward
entered at time 0 with maturity t. Then there is no basis risk.

Forwards oer full exibility to the two involved parties. But forwards also possess
some drawbacks. Each seller needs to nd a buyer and both parties face counter party
default risk. These two drawbacks are eliminated using futures instead of forwards. Fu-
tures are traded at a future exchange. Each future exchange has a clearinghouse. A
clearinghouse is a well-capitalized nancial institutions. It acts as an intermediary be-
tween the two parties. The house guarantees contract performance to both parties. The
200 CHAPTER 3. FUNDAMENTALS THEORY

two parties have an obligation to the clearinghouse and no longer to each other. To re-
duce default risk of the clearinghouse, the buyer and seller must deposit funds with their
broker; the margins. The form of the margin must be eligible such as cash or specied
securities. Since the initial margin is typically a one digit fraction of the goods repre-
sented in the future contract potential losses are much higher than the margin deposit.
To counteract this risk the potentially large gains or losses in future contracts are not
left to grow over time but they are realized on a daily basis. Values of futures positions
are daily settled by marking-to-market of the contracts. Besides the initial margin, the
maintenance margin reects the necessary minimum amount on the margin account and
the variational margins is payable if a shortfall of the margin account is considered.

Futures contracts are standardized contracts w.r.t. to delivery date T , the underlying
value and the quantity to deliver. The two parties only have to agree about the delivery
price K and the number of contracts.

Since a futures initial price is zero, to buy a future is equivalent to buy the underlying
value nanced by borrowing (leveraging). Futures allow for the same exposure than the
underlying value but at lower costs - lower fees and smaller bid-ask spreads. Since future
price can vary heavily during its life time it requires enough liquidity for the potential
margin calls. At the Chicago Mercantile Exchange the minimum amount is 250 thousand
US dollar. The largest future exchanges are CME, CBOT, Eurex.

We consider an example:

• Monday

Investor buys futures USDEUR with a notional amount EUR 1250 000.
The underlying value is
USD
0.7 EUR and maturity is 1 year.

• Tuesday

USD
Price underlying: 0.5 EUR
USD × EUR 1250 000 =
0.2 EUR USD 250 000 are taken away from the margin
account.

• Wednesday

USD
Price underlying: 0.8 EUR
USD × EUR 1250 000 = USD 350 500 are credited to the investor's margin
0.3 EUR
account.

In a Dax future the notional amount is equal to the Dax index value times 25 Euro.
For Dax at 3'900 points the notional amount of a future contract is 97'500 Euro. The
initial margin for a Dax Future is Euro 30 850. With Euro 100 000 on the margin account
one can enter into at most 2 futures contracts. The maturity of Dax futures is typically
3.4. APPLICATION 201

3 months. The tick for the futures is 0.5 Dax index points. Since the value of a single
Dax point is Euro 25, the value of a tick is 12.5 Euro. The exchange fees are Euro 50
cents per contract.

Proposition 39. (Valuation Futures) If there is no interest rate risk, default risk of the
counter parties and no arbitrage holds, then the valuation of futures is the same as the
valuation of corresponding forward contracts.
Table 3.8 proves the proposition where r is the xed one-period interest rate: If we

Time Forward Future

0 0 0
1 0 +r(VF (1) − VF (0))rT −1
2 0 +r2 (VF (2) − VF (1))rT −2
3 0 +r3 (VF (3) − VF (1))rT −3
. . .
. . .
. . .
T-1 0 +rT −1 (VF (T − 1) − VF (T − 2)r1
T rT (S(T ) − F (t, T )) +rT (S(T ) − VF (T − 1))

Table 3.8: Equivalence of forwards and futures for deterministic interest rates

price futures in a market without any frictions, the pricing of futures is given by the cost-
of-carry model, i.e. no arbitrage is the driver. If C represents the expected cost-of-carry,
i.e. the costs which are necessary to carry the good forward from t to delivery date T,
the no arbitrage relations

F (t, T ) = S(t)(1 + C) , or F (t, T ) = S(t)eq(T −t) (3.53)

relate the costs C uniquely to the cost-of-carry yield q.

We illustrate (3.53) for S an equity index, D the value of the dividends before maturity
and r the annualized nancing rate or money market yield. The fair futures prices reads

T −t
F (t, T ) = S(t)(1 + r )−D . (3.54)
360
Next, let B(t, T ) be the market price of a bond including accrued interest rate (dirty
price). The fair futures price is given by

F (t, T ) = Bond Price + Interest Cost - Coupon Income

where the interest cost are the interest opportunity cost and the coupon payments are
those up to expiration of the futures contract. Let c be the annualized coupon rate, A
the days of accrued interest rate, then

T −t T −t+A
F (t, T ) = B(t, T )(1 + r ) − cB(t, T ) .
360 360
202 CHAPTER 3. FUNDAMENTALS THEORY

This formula assumes that the bond can be bought and delivered at any data. But this
needs not to be true. Consider US Treasury bond futures which are traded on the Chicago
Board of Trade (CBOT). These bonds have quarterly expiration dates. The size of one
futures contract is equal to USD 100'000 face value of a eligible Treasury bonds having
at least 15 years to maturity and which are not callable for at least 15 years. Therefore
B(t, T ) in the second bond expression in the last formula is replaced by a tradeable bond
for the short seller - the cheapest to deliver bond Bcd (t, T ). Since dierent bonds have
dierent characteristics, standardization is lost at this stage. To give the short seller
exibility in choosing which bond is actually delivered the actual Treasury bond selected
by the short seller for delivery is price adjusted by a delivery factor f such that the bond
reects a standardized 8 percent coupon rate:

−t
B(t, T )(1 + r T360 ) − cBcd (t, T ) T −t+A
360
F (t, T ) = . (3.55)
f

Delta hedging is used to hedge the risk of futures. Consider gold with spot price 400
in a currency, net cost-of-carry 6% p.a. and time-to-maturity one year, i.e. F (t, T ) =
400e0.06 = 425. In a static hedge an investor's is short futures and long spot. Table
3.9 summarizes the prot and loss for two spot price scenarios (N = 1). A gain or loss

Scenario 1 Scenario 2
today tomorrow today tomorrow
Spot 400 600 Spot 400 200
Future 425 637 Future 425 212
1:1 Hedge -25 -37 1:1 Hedge -25 -12
P& L -12 P& L 12

Table 3.9: Static approch which is not a hedge.

follows which is not what we expect in a hedged position. The reason is that spot and
futures move only 1:1 if the cost-of-carry is zero.

A Delta hedge ∆Fut

Spot restores the 1:1 relation where

Change Spot
∆Fut
Spot = .
Change Futures

If x is the change in the futures price and τ = T − t, no arbitrage between spot and
futures price implies
e−qτ x
∆Fut
Spot = = e−qτ , (3.56)
x
i.e. the Delta is determined by the cost-of-carry. Hence, a Delta hedged portfolio
W is short ∆ times the futures and long spot:

W (t) = S(t) − ∆Fut

Spot F ut(t) .
3.4. APPLICATION 203

If the futures price changes, the portfolio changes are zero:

∆W (t) = ∆S(t) − ∆Fut

Spot ∆F ut(t) = ∆S(t) − ∆S(t) = 0

which proves the hedging property.

3.4.2 Currency Forward and Futures

For a portfolio invested in multiple currencies there is currency risk from exchange rates
uctuation. Foreign exchange forward and futures contracts futures and options can
be used to mitigate risk. Considering forwards, the relationship between the spot rate
and the forward rate is determined by the dierence in the interest rates earned on the
respective currency pairs; the cost of carry. To understand this, forwards and futures
imply in the absence of no arbitrage the Covered Interest Rate Parity which relates
interest rates and FX rates.

• Covered Parity (CIP) : The return of a domestic risk free investment equals the
return of a foreign risk free investment if the FX risk is hedged using a forward
contract.

• Uncovered Parity (UIP) : The interest dierential between two countries is compen-
sated by the expected FX changes.

We consider the covered parity for the Japanese yen (JPY) and Brazilian Real (BRL). If
JPY are exchanged against BRL there is no guarantee that BRL does not de-evaluates.
Using a FX forward we eliminate this risk. We assume

• Interest rates Yen RJP Y = 1% p.a., Real RBRL = 10% p.a. and spot rate S(t) =
0.025 BRLJPY.

We consider two dates 0 and T = 1y for a Japanese investor. He acts as follow at 0:

• He borrows JPY 1000 at 1% for 1y.

• He changes the JPY into BRL at the spot rate which gives BRL 25.

• He invests the BRL 25 at 10% for 1y: he receives at T BRL 27.50.

In T , the investor changes the BRL 27.50 into JPY at spot S(T ) which is not known at 0.
The above strategy is risky. To choose a risk free FX strategy he replaces today unknown
spot rate S(T ) by the known forward price F (0, T ). The forward price is determined
with the following no arbitrage argument. We write Rd for the nominal interest rate in
the domestic currency JPY and Rf for the interest rate in the foreign currency BRL.
Table 3.10 illustrates the strategy where borrowing is in the foreign currency. At 0:

• The investor borrows BRL for one year. He exchanges the BRL at the spot S(0)
in JPY and invests the JPY for 1y .
204 CHAPTER 3. FUNDAMENTALS THEORY

• He buys a forward F (0, T ) to exchange in one year JPY against BRL.

At T:

• The investor exchanges (1 + Rd ) × F (0, T ) JPY in BRL.

• He pays back the borrowed BRL amount and pays (1 + Rf )× BRL.

t: 1 BRL borrow → S(t) JPY receive

↓ ↓
T: (1 + Rf ) BRL → (1 + Rd )S(t)/F (t, T ) BRL

Table 3.10: Representation of the forward strategy to hedge the FX risk.

To avoid arbitrage at time T the amount received in foreign must equal the amount
of foreign currency payed back. This implies the Covered Interest Rate Parity
Theorem (CIP)
(1 + Rd )
F (t, T ) = S(t) (3.57)
(1 + Rf )
with
Rd − Rf
the interest rate dierential. CIP states that the dierence between domestic and
foreign interest rate determines the forward price.

Consider next an USD-based investor with a 5 bn yen equity investment. He wishes to

fully hedge the currency risk for the next year. The current spot rate is 96.50 JPYUSD,
one-year interest rates are 5.63% in the US and 0.7% in Japan. The one-year forward
rate is
(1 + 0.70%)
× 96.50 = 92.00 JPYUSD.
(1 + 5.63%)
Since interest rates in Japan are signicantly lower than in the US, the forward price of
yen is at a signicant premium to the current spot price (92.00 vs. 96.50). To hedge
the investment, the investor enters into a forward contract to sell 5 billion yen at 92.00
JPYUSD one year from today. If the JPYUSD exchange rate ends the year lower (higher)
than the forward rate, the investor will realize a loss (gain) on the contract. Consider
a falling yen scenario. The initial Yen investment 5, 000, 000, 000 is worth initially USD
34, 722, 722. Initial spot is 144.0 and after one year, assumed spot is 156.7. The forward
rate is 136.8, the spot return is 8.1%, the forward premium on Yen is 5.0% and the
currency surprise is 13.3%. The spot P&L is USD 2, 810, 847, hedging P&L is 4, 551, 885
which results in a total P&L of 1, 741, 039.

What is the dierence between the uncovered (UIP) and the covered parity? UIP
replaces the forward price in the CIP procedure by expected spot price, i.e. F (t, T ) by
3.4. APPLICATION 205

Et [S(T )]: :
(1 + Rd )
UIP: Et [S(T )] = S(t) . (3.58)
(1 + Rf )
Which view enters the expectation? Carry trades are bets that expectations formation
diers from the e forward rate view:

(1 + Rd )
Et [S(T )] 6= F (t, T ) = S(t) . (3.59)
(1 + Rf )
Consider a Swiss investor which needs in 30d JPY. He buys the 30d JPYCHF forward
which xes the exchange rate for 30d in CHF. This is a covered position, i.e. there is no
FX risk. A dierent strategy is to exchange the CHF in JPY at the spot rate S(t), to
invest the amount in the Japanese money market for 30d and to pay the debt in JPY
back. This leads to the CIP. Finally, a third strategy is to invest the CHF amount and
to exchange it in 30d into JPY. This investment is not covered. FX risk is only zero if
realized 30d spot rates equals the forward price. If the forward is lower than indicated
by the CIP one borrows money in the foreign currency, exchange it in domestic currency
at the spot price and lend in the domestic currency.

Given the uncovered interest rate parity (UIP), arbitrage implies that the change of
a FX rate is equal to the nominal interest rate dierential between the two currencies.
Hence monetary policy (xing interest rates) and exchange rates are dependent. The
so-called Trilemma or Impossible Trinity. holds. A country cannot simultaneously
choose three policies: 1) a xed exchange rate (exchange rate stability), 2) open capital
markets (nancial integration) and 3) monetary policy autonomy. It can pick two; the
third follows by no arbitrage. If a country chooses open capital markets, uncovered
interest parity must hold: Arbitrage equalizes expected returns at home and abroad; the
domestic interest rate must equal the foreign interest rate plus the expected appreciation
of the foreign currency. If a country chooses open capital markets and xed exchange
rates, domestic interest rates have to equal the base-country interest rate, ruling out
monetary policy autonomy. If a country chooses open capital markets and wishes to set
domestic interest rates at levels suitable to domestic conditions, then exchange rates can
no longer be xed.
Since 2014, the CIP between USD and major other countries is broken. Borio et
al. (2016) analyze reasons for this fact. We show how one can exploit this arbitrage
opportunity. Consider a rm which denominates its income and balance sheet in CHF -
a currency where the CIP with USD in the period since 2014 is broken. Although CHF
interest rates are negative up to several years of maturity the rm cannot take a prot
out of this fact since interest rates are oored, deposits pay zero interest rates, and loans
are shifted upwards. The broken CIP makes it possible that the rm participates at the
negative interest rate environment. The rm asks for the loan in USD. Together with an
USDCHF swap FX risk is hedged and participation at the negative CHF interest rates
follows.
206 CHAPTER 3. FUNDAMENTALS THEORY

Figure 3.11: Left Panel: If a nation adopts position a, then it would maintain a xed
exchange rate and allow free capital ows, the consequence of which would be loss of
monetary sovereignty. Sweden for example decided to have control over the interest rate
and the free international capital ows and accepts that the exchange rate follows, i.e.
they cannot be controlled.Source: Wikipedia. Right Panel: Monetary policy selections
for four countries. Source: J.P. Danthine, Swiss Finance Institute (2011).

With market rates as of June 12, 2017, the mechanics is the following:

• At t = 0, a FX USDCHF swap is xed, the USD loan is received, the amount is

changed into CHF at the spot rate and the 3m forward USDCHF is xed. Con-
sider a loan value in CHF 10 mn and spot USDCHF 0.9730. The rm gets USD
10.277.490 for a CHF loan of 10 mn.

• At t = 3m, the USD are bought back at the forward rate 3m USDCHF 0.9670
which implies a pay-back USD amount:

CHF9.979.660 = 10.277.490(1 + 1.58/4) ∗ 0.9670

where a margin of 0.4 percent is added to the USD 3m LIBOR rate 1.18%. Hence,
a P&L in 3m of CHF 20.340 which means a return p.a. of 0.81 percent follows.

The strategy can be rolled-over until the CIP is eventually restored in the future.
For a portfolio that invests in dierent countries, its value is aected by asset prices
changes, interest-bearing incomes and by P& L from exchange rates. Investment in each
country is a composition of exposures in asset markets and in exchange rates. A cur-
rency overlay modies the currency positions such that FX risk becomes acceptable. We
restrict to linear overlays, i.e. forwards.
3.4. APPLICATION 207

The general set-up to currency overlays is a straightforward generalization of the

following example using linear algebra. Consider the 3 assets bond, equity and cash
with four currencies USD, EUR, GBP and JPY dening an international portfolio. The
structure of the portfolio is given in Table 3.11.

US D UK J
Bond a11 a12 a13 a14
Equity a21 a22 a23 a24
Cash a31
Forward USDEUR F1 −F1 0 0
Forward USDGBP F2 0 −F2 0
Forward USDJPY F3 0 0 −F3
Forward EURGBP 0 F4 −F4 0
Forward EURJPY 9 F5 0 −F5
Forward GBPJPV 0 0 F6 −F6
Overlay Exposures F1 + F2 + F3 −F1 + F4 + F5 −F2 − F4 + F6 −F3 − F5 − F6

Table 3.11: The structure of international portfolio investing in four countries with two asset
classes.

We write aij for the exposure to asset class i of country (=currency) j and F rep-
resents the respective forward position of a contract on a given currency. That is, in
a given currency j the currency exposure is equal to the sum of asset exposures of the
investor in a currency j plus the overlay position consisting of all forward contracts F
in that country or currency. For each forward contract, a minus sign indicates selling
and a plus sign indicates buying. The goal of the investor is to nd the optimal asset
exposure weights a and the optimal forward contracts F. This optimization, using for
example a mean-variance framework, is done under several restrictions. Besides usual
transaction cost constraints, of interest are overlay position constraints. Let L be the
total overlay limit allowed on a portfolio and Lm the total overlay of a portfolio equal
1 P P
to
2 j| i Fij | with Forwards position of contract i on currency j . If L = 1, then the
total currency exposure can deviate from total asset exposure up to 100% of a portfolio.
If Lm = 0, then the portfolio is unhedged and forward contracts are not allowed to to
shift from less-performing to better-performing currencies which hence better improve
the risk-return prole of the portfolio.

Entering into forward contracts incurs the cost of carry , i.e. the interest rate dier-
ential. For an investment in any country j, the total return Rj is given by:

Rj = ar Rja + cj Rjc + vj ij

where aj , cj
vj are respectively asset exposure, currency exposure and overlay posi-
and
tion on country j and the other variables are expected asset return, expected currency
return and expected interest rate of country j . Since overlay position is dened as the
208 CHAPTER 3. FUNDAMENTALS THEORY

dierence in currency and asset exposures, we can write

Rj = aj (Rja − ij ) + cj (Rjc + ij ).

Hence, the portfolio total return is equal to the product of adjusted returns times asset
exposure and currency exposure, respectively. Therefore, the expression of overlay posi-
tions is not explicitly required to calculate total returns of a portfolio. For more details
see Chatsanga and Parkes (2017).

3.4.3 Call-Put-Parity
Consider an extended arbitrage free market with a call and a put option. There exists a
RNP Q such that

1 1 1
EQ [ S] = S0 , EQ [ C1 ] = C0 , E Q [ P1 ] = P0 .
1o + r 1+r 1+r
From the denition of the call and put follows (S − K) = (S − K)+ − (K − S)+ and
therefore
EQ [S − K] = EQ [S] − K = EQ [C1 ] − EQ [P1 ] .
Inserting the expressions above implies the Put-Call parity
1
C0 − P0 = S0 − K .
1+r
The parity is useful, since once knowing the price of say a call, the corresponding put
price follows : A call is a put and a put is a call. The parity holds for more general
markets too.

3.4.4 Market Structure

We next consider the market structure:

4 6 2
P= , S00 = (7, 3, 5)
12 3 9

The equation for the state price density reads

P0T ψ = S0 .

A solution of the linear system is:

ψ 0 = (1/4, 1) .

The market is free of arbitrage. The existence of a unique solution is an exception since
there are 3 equations and 2 unknowns. Typically, the payos of the three non-redundant
securities are conicting.
3.4. APPLICATION 209

Consider a market with a risk less asset with zero interest rate and a risky asset with
3 states:  
1 180
P =  1 150  , S0 = (1, 150)0 .
1 120
This market is incomplete. Solving Pψ = S0 , ψj > 0 and the set of state prices is
parametrized by:
ψ = {(a, 1 − 2a, a) , a ∈ (0, 1/2)} .
This incomplete market is free of arbitrage within the given parametrization set. Given
the incompleteness, there exist self-nancing portfolios φ such that there are claims
X ∈
/ hSi: X 6= Pφ for some states. Consider a call option which payo (30, 0, 0). This
call option is not attainable. Since X − Pφ is not zero in all states, hedge risk exist.
No arbitrage alone does not leads to a unique price in this case. A second criterion is
needed to enforce uniqueness. There are many possible criteria. One is that the market
chooses the single RNP Q which is used for pricing. The derivative price is then xed by
mapping the parametrized theoretical prices to observed market prices. This approach
is used in interest rate modelling ('inverting the yield curve').

3.4.5 Incomplete Market

We consider an incomplete market with two securities S (risky) and B (risk less) with
r the interest rate. The risky security S can achieve three states: S u = S0 u > S m =
S0 m > S d = S0 d with S0 the initial price and the up/mid/down parameters u/m/d.
This trinomial model is incomplete since the payo matrix has the dimension 3 × 2.
Since there is a risk less asset, we know that the three state prices add up to the risk less
discounting factor:
1
= ψ1 + ψ2 + ψ3 . (3.60)
1+R
ψj S j S j = S0 × x,
P
The second condition S0 = j follows from no arbitrage. Since
x = u, m or d, this condition reads:

1 = uψ1 + mψ2 + dψ3 . (3.61)

The two equations (3.60) and (3.61) should determine the three dimensional state price
vector. The solution of the two equations, which are two planes, is in general a line or
arbitrage free prices - and not a point as in a complete market. The state price vector is
not unique. Despite the incompleteness the no arbitrage condition is the same as in the
well-known binomial model: There is no arbitrage if and only if

d<1+r <u .

The line is bounded by the requirement that state prices are positive. Each vector on
the line segment used to price derivatives leads to arbitrage free prices. Solving the two
210 CHAPTER 3. FUNDAMENTALS THEORY

equations, the boundary points of the line segment follow: For m≥1+r

1+r−d u−1−r
ψ1 = , ψ2 = 0 , ψ 3 = ,
(1 + r)(u − d) (1 + r)(u − d)

and
m−1−r 1+r−d
ψ1 = 0 , ψ2 = , ψ3 = .
(1 + r)(m − d) (1 + r)(m − d)
A similar corner solution holds for m < 1 + r.

The boundary values do not lead to arbitrage free prices since some components of
the state price densities are zero. If m→d or m → u, the trinomial model collapses to
the binomial one with the state prices:

1 1
ψ1 = q , ψ2 = (1 − q) , ψ3 = 0 .
1+r 1+r

3.4.6 Multi Period Derivative Pricing

So far we considered single period pricing and hedging. The extension to multi pe-
riod and discrete states modelling is discussed next. We assume that the price of
risky assets follows a recombining binomial tree: At each date the stock price can go up
or down with constant steps across the periods and an up-down moves is the same as
down-up move starting from any node in the tree. This denes the Cox-Ross-Rubinstein
model structure where there is one risky asset living on the tree and a risk less asset.
The ideas of the one period option pricing model transfer to the multi period modelling:
The existence of a martingale measure is equivalent to the absence of arbitrage and the
uniqueness of such a measure denes a complete market.

r > 0 and the dynamics of

The risk less interest rate describing the risk less asset is
Bt = (1 + r)t with B0 = 1. The initial price of the risky asset is
the risk less asset is
S0 . The price in period t + 1 can go up (u) or down (d) with constant rates d and u
starting from the t-price. The dynamics reads under an objective probability measure
P = (p, 1 − p)

St (1 + u), with probability p;
St+1 = t = 0, 1, . . . , T − 1.
St (1 + d), with probability 1 − p.

The price at time k is then

Sk = S0 (1 + u)Nk (1 + d)k−Nk

with Nk the random number of upwards moves in k time steps. The values of a portfolio
Vt reads

Vt = φt Bt + ψt St , ∆Vt = φt ∆Bt + ψt ∆St

3.4. APPLICATION 211

where ∆Vt = Vt − Vt−1 , φt the amount of CHF invested in risk less asset at time t and
ψt the number of shares held at time t. The number of shares ψt has to be known before
t, that is a time t − 1. This property of random variable sequences (stochastic processes)
is called predictability. We only consider self-nancing strategies., see Section 3.1.5.
If φt is self-nancing, the portfolio value reads

t
X
Vt = V0 + φj ∆Xj .
j=0

The nal portfolio value is equal to the initial value plus the cumulative gains and losses
from the price changes of the asset X over time weighted by the investment strategy. If
we recall that replication means Vt = Ct in all states and at all time points, the above
equation transforms to
t
X
Ct = C0 + φj ∆Xj .
j=0

φj is the replication strategy which given the initial option price C0 generates the ran-
dom option claims Ct . The martingale representation theorem states when such
a strategy exists. The notion of an arbitrage strategy carries over from the one-period
case. Formally:

Denition 40. A self-nancing strategy φ is an arbitrage strategy, if the portfolio value

under this strategy satises: V0 = 0, Vt ≥ 0 for all t = 1, . . . , T − 1, VT ≥ 0 for all states
and VT > 0 for one state.
In a dynamic context we have to dene how information evolves. The information set
Ft at time t is not a vector space. Consider a 3-period recombining stock price model. In
each period stock prices can move up or down with given probabilities. We can observe
8 possible path realizations ωk after there periods, see Figure 3.12:

ω1 = (u, u, u) , ω2 = (u, u, d) , ω3 = (u, d, u) , ω4 = (u, d, d)

ω5 = (d, u, d) , ω6 = (d, d, u) , ω7 = (d, u, d) , ω8 = (d, d, d) .

The set of all observable outcomes ωj is the sample space Ω. To understand informa-
tion dynamics, suppose that the rst move was up. Then four paths are still possible
after this step, the others are impossible. After say a down move in the second step,
only two paths remain possible. After the last price move, a single realized path is left.
This allows to introducepossible events. For 8 observable events, the power set A = 28
denes all possible events.

Filtrations (Ft ) describe the possible event structure dynamics. Ft ∈ A represents

the possible information up to time t: Only past information generated by all possible
price paths enter into the ltration. We require

Ft ⊂ Ft+1 , Ft ∈ A ∀t .
212 CHAPTER 3. FUNDAMENTALS THEORY

Time

0 1 2 3
 = set of Realized state
observable states w4

w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4
….
w5 w6 w7 w8 w5 w6 w7 w8 w5 w6 w7 w8

A1
w1 w2 w3 w4 w1 w2 w3 w4 w1 w2 w3 w4
….
w5 w6 w7 w8 w5 w6 w7 w8 w5 w6 w7 w8
A2

F0 = Empty Set F1= Empty Set, F3= Empty Set,

and Power set of Power set of , Power set of ,
 A1, A2 A1, A2 , A3, ...

Figure 3.12: Illustration of the information and ltration structure for the three period
CRR.

Intuitively, increasing time means information resolution increases (there are more sets).
At t = 0, F0 = {∅, A} everything is possible, i.e. all future information is random.
16 At
t = 1, we dene the sets
A1 = {ω1 , ω2 , ω3 , ω4 } , A2 = {ω5 , ω6 , ω7 , ω8 } .
A1 (A2 ) is the set of all events where the rst price move is 'up' ('down'). We set

F1 = {∅, A, A1 , A2 } .
This assures that F0 ⊂ F1 . F2 is the power set of all eight observable states. The
information sets were generated by the evolution of the asset prices only. This is the
standard information structure set-up in asset and derivatives pricing.

The FFTF transforms to the CRR model case. The theorem is based on the notions
ofRNP or equivalet martingale measures.
Denition 41. n Consider a price process under a probability P . A probability Q is
equivalent to P , written Q ∼ P , if they have the same impossible sets. A probability
Q ∼ P is a risk neutral probability if the discounted price process S̃ := S/N is Q-
martingale with N > 0 the numeraire, i.e.
h i h i
S̃t = E Q S̃s |Ft =: EtQ S̃s ,
16 The inclusion of the empty set guarantees that the set F0 is closed under countable intersection and
complement set formation.
3.4. APPLICATION 213

holds for all t and s ≥ t.

Equivalence means, that P (State k) > 0 for all states implies Q(State k) > 0 for all
states and vice versa. Then, as in the static case, the CRR model is free of arbitrage if
and only if a RNP Q exists.

It is useful to introduce the random variable

Xt+1 := St+1 /St .

This price ratio takes only the values (1 + u) or (1 + d). SinceQ is strictly positive, both
values are attained with a positive probability. This implies d < R = 1 + r < u, else the
equation E Q [St+1 /St ] = 1 + r does not hold true. The last inequality is the no arbitrage
condition of the one period model. If the inequality is violated, arbitrage is possible.
We construct the measure Q?

Proposition 42. Assuming 0 < d < 1 + r < u. The following statements are equivalent:
1. S̃k is a Q-martingale.

2. The random variables Xk+1 are i.i.d. under Q = (q, 1 − q) with

R−d
Q[Xk+1 = 1 + u] = q =
u−d
u−R
Q[Xk+1 = 1 + d] = 1 − q = .
u−d

The risk neutral probability is unique, i.e. the CRR market is complete. If the
underlying instrument pays a dividend yield δ ≥ 1, the risk neutral probability and the
no arbitrage condition are:

R/δ − d
u > R/δ > d , q = .
u−d
We show how to price an European call option in the CRR model. Such a contract pays
at maturity C(ST ) = max(ST − K, 0) with K the strike value. The following proposition
is proven in Appendix ??.
Proposition 43. The arbitrage free price of a call option in the n-period CRR model is:
n
n
q n (1 − q)n−k max(S0 uk dn−k − K, 0) .
P
C(S, t) = k (3.62)
k=0

Separating the two terms in the payo formula

17 , the price of a call option shows
the same form as in the one period model. The price is proportional to the underlying
value and the present value of the strike. Since this last expression has negative sign

17 Due to the max operator a separation leads to an adjustment of the summation range.
214 CHAPTER 3. FUNDAMENTALS THEORY

it represents a loan. The dierence to the one period model are the more complicated
factors in front of S and K. They are probabilities. The formula states in words that
the price of a call (or put) option in the CRR model at a date t with maturity T is given
by:

X
C(S, t) = Path Probability × No. of paths × Payo End Node T (3.63)
paths

where the path probability equals q ku (1 − q)kd with q the risk neutral probability ku the
number of 'up' moves on the given path from t to T and similarly kd the number of
'down' moves, 'No. of paths' the number of paths connecting the node at time t with
the end node at T.

We compare the accuracy of the binomial CRR model with observed option prices.
Consider a call on ABB Ltd. with strike CHF 31 and expiration June, 20 2008. The bid
and ask prices where at CHF 0.33 and 0.34 respectively and the actual ABB share price
was CHF 29.9. The gures are calculated using (3.63).
u, d, r from the real world data. If R is the annual
The rst step is to calculate the tree
rate, r the rate on the tree, n τ = T − t time to
the number of periods in the tree and
n τ
maturity, we have the relationship (1 + r) = R . The number of periods in the CRR
model is n = 11, time to maturity is τ = 0.917. This implies 1 + r = R
τ /n = 1.00327.

The discount factor is D = e

−τ (1+r)/n
√ = 0.920. We need the up and down values. A
σ1y τ /n
standard approach is to set u = e , d = 1/u, we comment about this choice at the
end of the example. Using ABB closing data we get a daily volatility of σ1d = 0.0150983.
To obtain an annualized volatility we use the square-root rule, i.e.

p √
σ1y = daysσ1d = 250σ1d = 0.2882

where we assumed that there are 250 days in a year. We also need to know the risk
less rate for one period. This gives u = 1.087, d = 0.92. These values imply for the risk
neutral probability q = 0.499. The table shows the pricing result.
The sum of the path weights over all end nodes is 1 and the sum of the payos over
all nodes, i.e. the last column, is CHF 3.41205. Discounting this value back to time zero
gives 3.1383. Using the ratio 1 : 10 gives the theoretical price of 0.31 CHF compared to
the actual bid-ask prices of 0.33 − 0.34.

We relate the discrete and continuous time variables. Consider a continuous time
model for the risky asset where the mean and variance of the asset ratio St+dt /St are
given by
2 dt
E(St+dt /St ) = erdt , var(St+dt /St ) = e2rdt (eσ − 1)
with σ the volatility of the continuous time price process of the risky asset. Using a
Taylor approximation we get
dS/dt
E =r,
S
3.4. APPLICATION 215

Node S0 uk−10 dk max(ST − K, 0) No. of Paths q k−10 (1 − q)k S.P. Sum Payo
11 74.541 43.541 1 0.0004 0.0004 0.020
10 63.114 32.114 11 0.0004 0.0052 0.168
9 53.440 22.440 55 0.0004 0.0264 0.593
8 45.248 14.248 165 0.0004 0.0796 1.134
7 38.312 7.312 330 0.0004 0.1600 1.170
6 32.440 1.440 462 0.0004 0.2250 0.324
5 27.467 - 462 0.0004 0.2260 0.000
4 23.257 - 330 0.0004 0.1622 0.000
3 19.692 - 165 0.0004 0.0814 0.000
2 16.673 - 55 0.0004 0.0272 0.000
1 14.118 - 11 0.0004 0.0054 0.000
0 11.954 - 1 0.0005 0.0005 0.000
Sum 1 3.41205

Table 3.12: Valuation of the call option in the 11 period model with ABB as underlying
value. 'S.P.' means 'sum of path weights'.

i.e. the expected risky asset growth rate equals the risk free rate. In the one period
model the mean and variance of the same asset ratio are

E(St+1 /St ) = qu + (1 − q)d

and
var(St+1 /St ) = qu2 + (1 − q)d2 − (qu + (1 − q)d)2 = q(1 − q)(u − d)2 .
Equating the moments in the models we get:

2 dt
qu + (1 − q)d = erdt , q(1 − q)(u − d)2 = e2rdt (eσ − 1) .

Matching expected values implies the RNP formula q= erdt −d

u−d . Making the symmetric
choice u = 1/d from the variance matching condition a complicated expression for u
follows. Using a Taylor approximation (dt is small) one gets

√ σ
u = 1/d = 1 + σ dt + dt + ... .
2
√
The rst terms agree with the power series expansion of u = eσ dt - this justies the

approach in the above pricing of the ABB call option.

3.4.7 Black and Scholes

Given multi period dynamic models in discrete time the next generalization is to consider
derivative pricing in continuous time and with continuous state space. The seminal work
of Fischer Black and Myron Scholes is the starting point and in fact it was published
before the CRR model. We could introduce stochastic calculus and develop the model
216 CHAPTER 3. FUNDAMENTALS THEORY

from scratch using no arbitrage. This would take us to far away and therefore, we state
the Black and Scholes model as a limit model of the CRR.

We have to consider a two-fold limit: Discrete time spacing and discrete states be-
come continuous. We x time T of the continuous model. We to make sure in the limit
procedure that (i) the value of one dollar in [0, T ] is the same in the CRR model and in
the Black and Scholes model and (ii) that the CRR-price process St converges towards
a continuous price process which is log-normally distributed. This is the assumed distri-
bution in Black and Scholes; assuming a dierent distribution dierent continuous time
models follow.

To satisfy (i), divide [0, T ] in m equidistant binomial model in m periods. We have

(Bk , Sk )k=0,...,m with Bm = BT , Sm = ST . We adjust the parameters rm , um , dm such
that they can be compared with the parameters of the CRR model. Let R = ln(1 + r) be
RT m
the risk free interest rate in the time unit [0, 1]. Setting rm :=
m and Bm = (1 + rm )
we have in the limit

RT m

lim (1 + rm )m = lim 1+ = eRT = (1 + r)T .
m→∞ m→∞ m

Hence, Bm converges towards the same value as in the T -maturity continuous model.

We consider (ii) and dene relations betweenrm , um , dm :

r
T
ln(1 + um ) = ln(1 + rm ) + σ
m
r
T
ln(1 + dm ) = ln(1 + rm ) − σ
m
with a constant σ > 0, which is independent of m and we set as in the T -model

rm − dm
p̂m := .
um − dm

The above parametrization implies u/d = 1. The denitions guarantee that log Xi is
normally distributed with a mean and variance that reduces to the same one as in the
Black and Scholes model. Formally:

Proposition 44. Let rm , um , dm dened as above. Then

!
S̃T σ2T 2
lim ln ∼ N (− , σ T ).
m→∞ S0 2

That is, the random variable ln S̃ST0 converges for m → ∞ in probability to a normally
2
distributed random variable with mean − σ 2T and variance σ 2 T .
3.4. APPLICATION 217

We apply this result to price call and put options in Black and Scholes model.

Proposition 45 (Black-Scholes Formula). Let um , dm , rm , p̂m be given as above. The

prices of European call and put options in the Black and Scholes model are given by:
(m)
lim C0 = S0 Φ(d1 ) − Ke−RT Φ(d2 ) (3.64)
m→∞
(m)
lim P0 = Ke−RT Φ(−d2 ) − S0 Φ(−d1 )
m→∞

with d1 , d2 given by:

√
ln(S0 /K) + RT σ T
d1 = √ +
σ T 2
√
√ ln(S0 /K) + RT σ T
d2 = d1 − σ T = √ −
σ T 2
Although the formulae look complicated, they share the same logic and properties as
in the single period model. First, they are expected values of the discounted call payo
under the RNP which is not visible since we took the limits in the CRR model. Second,
a call is proportional to the stock times its Delta minus cash, see below.

We comment on some properties and intuition of the pricing formulae. Consider an

out-of-the-money call option with price

C(S = 90, K = 100, r = 2%, σ = 20%, t = 0, T = 6 m) = 1.99 ∼ 2 .

This positive price cannot be due do interest rates only since investing CHF 90 for 6m
gives90 · er(T −t) = 90.90 < 1005 CHF. The reason for a price of 2 is due to the fact the
underlying S random variable has a potential to grow above the strike value in the next
6m. The returns of the underlying is normally distributed, i.e.

ln(ST /St ) ∼ N (µ, σT )

This implies that the stock price at T is log-normal distributed:

ST ∼ St LN (µ, σT ) = LN (ln(St ) + µ, σT ) .

Since we know the distribution, we can price the call using the no arbitrage principle.
The price is given
C(S, K, r, σ, t, T ) = E Q [max(ST − K, 0)] . (3.65)

How do we nd the risk neutral probability? No arbitrage implies that the discounted
price process S is a martingale with the risk free interest rate as numeraire. But this
means that the expected value S has to grow like the risk less asset - else the drifts are
not the same. But if the drift of S and the drift r of the riss less asset are not the same,
their ratio - S /risk less asset - cannot be driftless. Summarizing, we must have at T

E[ST ] = St exp(r(T − t)) . (3.66)

218 CHAPTER 3. FUNDAMENTALS THEORY

But the expectation of log normal distributed random variable with mean µ and volatility
σ is given by
E[ST ] = St exp(µ + σT2 /2) . (3.67)

Equations (3.66) and (3.67) imply:

µ + σT2 /2 = r(T − t) ⇒ µ = r(T − t) − σT2 /2 (3.68)

The volatility σT from t to maturity is determined from the annual maturity by the
square-root rule:
√
σT = σ T − t
with σ the annualized volatility. Summarizing

√

1 2
ln(ST /St ) ∼ N r(T − t) − σ (T − t), σ T − t (3.69)
2
or
√

1
ST /St ∼ LN r(T − t) − σ 2 (T − t), σ T − t
2
All these expressions enter d1 and d2 in the Black and Scholes formula. How can we
calculate the probability that we exercise the call? The option is exercised if ST > K .
This reads for the continuous return rS = ln(ST /St )

rS = ln(ST /St ) > ln(K/St ) .

But we know from (3.69) the distribution of rS :

√

1 2
rS = ln(ST /St ) ∼ N r(T − t) − σ (T − t), σ T − t
2

A calculation shows

P (ST > K) = P (rS > ln(K/St )) = Φ(d2 ) .

The probability to exercise the option is equal to Φ(d2 ) which is the Delta.

3.4.8 Hedging and Greeks

Risk management for options is based on the Greeks, i.e. sensitivities or partial deriva-
tives of option prices with respect to parameters. Since derivatives are linear; a Greek of
a portfolio is equal to the sum of the position Greeks time the quantity. Since there are
many parameters in the pricing formula, several Greeks exist.

We dene the Delta (∆) = ∂C

∂S . A Delta of +0.5 implies that if an underlying stock
rises by CHF 1, the theoretical option price increases by CHF 0.5. Consider an investor
which is long 10 option contracts 50-calls (i.e. strike 50) on Nestle stock with a Delta
3.4. APPLICATION 219

of 0.5 with 70 shares of the stock per option contract and he is short 200 Nestle stocks.
The position Delta :
−200 + 0.5 × 10 × 70 = +150 .
Gamma Γ states how much the Delta of an option changes when the price of the stock
moves. Theta Θ, or time decay, is an estimate of how much the theoretical value of an
option decreases when 1 day passes. Thetas for same-parameter calls and puts are not
equal. The dierence depends on the cost-of-carry for the underlying stock. When the
dividend yield is less than the interest rate, positive cost-of-carry, Theta for the call is
higher than for the put. The dierence between the extrinsic value of the option with
more days to expiration and the option with fewer days to expiration is due to Theta.
Therefore, long options have negative Theta and short options have positive Theta.
We consider Delta and Gamma hedging in for the portfolio V:
• Short 10 000 calls, Time-to-Maturity (TtM) 90 days, strike 60, volatility 30%, risk
less rate 8%. The currency is irrelevant.

• The fair option price using Black and Scholes is 4.14452 with Delta 0.581957. We
therefore receive a premium of 4144.52 by selling the options.

• To hedge the position we buy 581.96 stocks at the price 60. That for we borrow
(cash)
581.96 × 60 − 4144.52 = 340 917.39 − 4144.52 = 300 772.88 .

The portfolio value today is zero. We consider the portfolio value after1 day, i.e. TtM
is 89 days. In the Scenario 'unchanged' the underlying value remains at 60. Using Black
and Scholes, the option is worth 4.11833, i.e. Theta acts. This lower option liability
value is partly o-set by the increased cash liability:

300 779.62 = 300 772.88 × (1 + 0.08/365) .

A gain 19.44 follows, see Table 3.13.

Value
unchanged up down
Underlying 34'917.39 35'499.35 34'335.44
Cash -30'779.62 -30'779.62 -30'779.62
Option -4'118.33 -4'721.50 -3'559.08
Sum 19.44 -1.77 -3.26

Table 3.13: Value of the portfolio V after 1 day for dierent scenarios.

This shows that the Delta hedge is eective for small changes in the underlying value.

Can we additionally hedge the Gamma? Since one option is used for the Delta hedge,
we need a second option to achieve also Gamma neutrality. The data of this option are:
220 CHAPTER 3. FUNDAMENTALS THEORY

• Call, TtM 60 days, strike 65.

• All other parameters are the same as for the rst option, see Table 3.14.

TtM Strike Option Price Delta Gamma

Option 1 90/365 60 4.14452 0.581957 0.043688
Option 2 60/365 65 1.37825 0.312373 0.048502

Table 3.14: Option data.

Delta and Gamma neutrality means to choose a number of stocks z of option z such
that:

∆V = x − 10 000∆Opt1 + z∆Opt2 = 0
ΓV = −10 000ΓOpt1 + zΓOpt2 = 0 .

Solving these two linear equations gives

x = 300.58 , z = 900.76

To x cash, one solves V =0 at time 0:

V = xS + Cash − 1000 ∗ Opt1 + z ∗ Opt2 = 0 =⇒ Cash = −150 131.77 .

To be Delta and Gamma neutral we are long in the underlying, long in option 2 and short
cash. Table 3.15 compares the hedge eectiveness between Delta and Delta & Gamma
hedging.

Underlying after 1d Delta & Gamma Delta

58 -2.04 -71.35
58.5 0.3 -31.56
59 1.07 -3.26
59.5 0.81 13.69
60 0.02 19.45
60.5 -0.79 14.22
61 -1.11 -1.77
61.5 -0.49 -28.24
62 1.52 -64.93

Table 3.15: Delta & Gamma vs. Delta Hedge.

Vega is an estimate of how much the theoretical value of an option changes when
volatility changes by 1 percent. Option prices and volatility are in a 1:1 relation in the
Black and Scholes model. You can quote option in CHF or in volatility points. Vega is
highest for ATM options. Rho ρ is an estimate of how much the theoretical value of an
option changes when interest rates move 1.00 percent.
3.4. APPLICATION 221

Sensitivity w.r.t. Math Finance Expression

∂C(S)
Underlying S ∂S Delta ∆ ∆C = Φ(d1 ) > 0
∂P (S)
∂S ∆P = Φ(d1 ) − 1 < 0
∂C(τ ) √
Time-to-maturity τ ∂τ Theta Θ ΘC = −Sσφ(d1 )/(2 τ ) − rKe−rτ Φ(d2 ) < 0
∂P (τ ) √
∂τ ΘP = −Sσφ(d1 )/(2 τ ) + rKe−rτ Φ(−d2 ) < 0
∂C(r)
Risk free rate r ∂r Rho ρ ρC = Ke−rτ τ Φ(d2 ) > 0
∂P (r)
∂r ρP = −Ke−rτ τ Φ(−d2 ) < 0
∂C(σ) √
Vola σ ∂σ Vega ω ωC = φ(d1 )S τ > 0
∂P (σ)
∂σ ωP = ωC
∂ 2 C(S) √
Underlying S ∂S 2
Gamma Γ ΓC = φ(d1 )/(Sσ τ ) > 0
∂ 2 P (S)
∂S 2
ΓP = ΓC

The sensitivities are linked by the Black and Scholes pricing equation:

σ 2 S 2 Γ + rS∆ − rC = −Θ . (3.70)

Inserting the derivatives of the call w.r.t. to the sensitivities shows that this is a partial
dierential equation for the unknown call price. This equation follows by the assumed
market structure and the assumption of no arbitrage - no further economic assumptions
are needed. Adding the specic option contract as a terminal condition, the solution C
of the equation is the Black and Scholes formula for the option under consideration - a
second method to price options beside calculating expected values under a RNP.

We nally consider the creation of an option trading book in a liquid market. Con-
sider the liquid stock Lafargeholcim (LH). We start with a short position of 1000 calls
on LH with price 7.232 CHF (Step 1). The option price is theoretically calculated. If LH
∂C
stock moves, up to rst order
∂S =: ∆, a loss of CHF −587 on the derivative position
follows, see Table 3.16.

Step 2: To reduce Delta risk, we buy 620 LH stocks at the price 80. To generate
P&L dierent possibilities exist. First (Step 3) one sells the options slightly at a higher
price than their values are. This gives a P&L of CHF 268. Second, price movements
as described above lead to P&L (step four where LH gains 1). Step 5 describes how
volatility movements generate P&L. We assume that the portfolio V is Delta neutral.
Volatility is 20%. If volatility increases by 1 volatility point, the bank loses 304 CHF.
If the trader hedges the Vega exposure he needs to trade in dierent options. Step 6
shows that if he trades in a second option, the Vega of the position is reduced but Delta
increases moves from zero. Hence, both Greeks can be controlled.

calculated reference volatility curve, e.g. the Eurex curve: Volatilities of the warrants
are larger than the corresponding reference values and vice versa for the
222 CHAPTER 3. FUNDAMENTALS THEORY

Step 1 Product Size B & S Tr.Pr. Pos.Val. Delta P & L

Option LH -1000 7.232 -7232 -587
Position -7232 -587 0

Step 2 Product Size B & S Tr.Pr. Pos.Val. Delta P & L

Option LH -1000 7.232 -7232 -587
Stock LH 620 80 80 49600 620
Position 42368 33 0

Step 3 Product Size B & S Tr.Pr. Pos.Val. Delta P & L

Option LH -1000 7.232 7.5 -7232 -587 268
Stock LH 620 80 80 49600 620 0
Position 42368 33 268

Step 4 Product Size B & S Tr.Pr. Pos.Val. Delta P & L

Option LH -1000 7.232 7.5 -7232 -587 268
Stock LH 620 81 81 50220 620 0
Delta 33
Position 42988 33 268
Table 3.16: Positions in the option portfolio construction. Tr.Pr. means Trading Price,
B & S the theoretical Black and Scholes model price and Pos.Val. Position Value.
3.4. APPLICATION 223

Step 5 Product Size Price Pos.Val. Delta Vega in CHF

Option LH -1000 7.232 -7232 -587 -304
Stock LH 588 80 47040 588
Position 39808 1 -304

Step 6 Product Size Price Pos.Val. Delta Vega in CHF

Option LH -1000 7.232 -7232 -587 -304
Stock LH 588 80 47040 588 0
Option LH 2 400 7.232 2893 235 122
Position 42701 236 -182
Table 3.17: Position in the option portfolio construction. The gure Delta is expressed
in numbers of LH shares.

3.4.9 Structured Products (SP) and Structured Investments

18 The expression 'structured products' has dierent meanings in dierent jurisdictions.
We use the Central European terminology where SP are securities issued by a trading
unit in a bank which provide a payo prole to the investor. The payo is often anti-
symmetric and therefore involves the use of options. The underlying assets in the SP are
liquid. Many SP can be sold to retail investors.

These products are not related in any sense to structured nance products such as
MBS, CDOs. The latter one are based on pooling and slicing risk of illiquid assets.

Consider markets which are disrupted unpredictably by certain events and investors
want to choose an investment in response of the event. Investments should hence be fast
deployed and not capturing any diversication needs but being bets due to the market
disruption and hence the belief, that markets will drift back to normal levels.

There are dierent causes for these events - macroeconomic, policy interventions,
break down of investment strategies, or rm-specic events (for example, Lehman Broth-
ers). While some events are isolated and aect only single corporates, events at the
political or market level often lead to broader investment opportunities. Policy inter-
ventions can trigger market reactions that in turn can lead to new policy interventions.
The Swiss National Bank's announcement, in January 2015, that it would remove the
EURCHF cap and introduce negative interest rates had an eect on Swiss stock markets,
EURCHF rates, and xed-income markets.

Such events can impact nancial markets for a short period of time (a ash crash),
a medium time period (the GFC), or a long time (the Japanese real-estate shock of the
1990s). Making a bet when markets are under stress is simpler than in normal times

18 This section is an (almost) verbatim transcription of Mahringer et al. (2015).

224 CHAPTER 3. FUNDAMENTALS THEORY

where it is uncertain if an event happens at all. We stress that a general requirement

for investments based on events is the tness of all parties involved - investors, advisory,
and the issuer.

If an event occurs, the time-to-market to generate investment solutions and to make

an investment decision are key. Wrappers of such solutions such as funds or ETFs take
too much time to be constructed. The wrappers used are derivatives and structured
products (SP). Both are manufactured and issued by trading units or derivative rms -
not by traditional asset management rms. SPs are a combination of traditional invest-
ment instruments and at least one derivative. SP dene payo liabilities for the issuer
and hence aect the balance sheet of the issuer. The investor faces issuer risk.

SP are in some sense an opposite investment vehicle to funds since most of them do not
rely on the discretionary power of an asset manager but the nal payo is promised ex
ante to the investor. The issuer has to generate with the initial investment amount the
nal payo in any market circumstances: Trading, structuring, pricing and hedging are
key disciplines for SP.

The replication of the payo of an SP with cash products and vanilla options is cen-
tral to the pricing and hedging of the SP. The price of the SP is equal to the sum of the
prices of the building blocks. The no arbitrage paradigm applies. The hedge corresponds
to the position of the dealer of the bank, which must generate the promised payo of the
SP. Theoretical equivalent replications can be dierent in practice if components have
dierent liquidity or if taxation diers. The buyer of a SP faces only claims but no obli-
gations unlike in a swap contract for example. The only counter-party for the investor
is the issuer whose creditworthiness enters in the pricing of the SP.

Table 3.18 compares mutual funds with structured products.

How are SPs dened? The denition varies for dierent jurisdictions, sometimes a
proper denition is missing but only a description exists. In the UK for example they
dierentiate between capital-at-risk and non-capital-at-risk products. In the former one
conditional on the issuer non-defaulting the investor gets payed back a xed amount of
his initial investment at maturity as a minimum amount. A capital-at-risk product is
dened as ... a product, other than a derivative, which provides an agreed level of income
or growth over a specied investment period and displays the following characteristics:

• (a) the customer is exposed to a range of outcomes in respect of the return of initial
capital invested;

• (b) the return of initial capital invested at the end of the investment period is linked
by a pre-set formula to the performance of an index, a combination of indices, a
'basket' of selected stocks (typically from an index or indices), or other factor or
combination of factors;
3.4. APPLICATION 225

Mutual funds Structured Products

Mass products Taylor made, starting from CHF 10'000
No issuer risk Issuer risk (but COSI, TCM)
Long time-to-market Short time-to-market
Performance promise Payo promise
Large setup costs Low setup costs
Liquid and illiquid assets Liquid assets
Strong legal setup, standards, market access No legally binding denition of Structured Products
High-quality secondary markets
On balance sheet

Table 3.18: Mutual funds vs. structured products. COSI are structured products with
a minimal issuer risk thanks to collateralization vis SIX exchange. Triparty Collateral
Management (TCM) serves the same purpose.

• (c) if the performance in (b) is within specied limits, repayment of initial capital
invested occurs but if not, the customer could lose some or all of the initial capital
invested.' Source: FSA Handbook.

Point (c) denes that capital repayment is contingent on realization of events. A

typical event are breaches of barriers by the underlying value.

SP should not be confused with structured nance products such as MBS, CDO or
CLN. The latter one arise as products by pooling illiquid assets whereas SP are dened
for liquid assets.

3.4.10 Pricing of Structurd Products

Using the SP denition of last section, we consider risk structuring for an investor
with the following preferences and views: He wishes capital protection at maturity of the
contract and participation in the performance of an underlying asset:

• Redemption of the investment capital at a minimum guaranteed percentage of the

invested capital at maturity of the investment contract.

• Participation in the performance of an underlying asset.

The main economic idea to structure a Capital-Guaranteed Product (CP) is to present

the value of say CHF 100 in 5 years is a lower amount today - the dierence is used for
participation. Then the seller of the guarantee, i.e. the issuer of the structured product,
acts as follows:

• Suppose that the annual interest rates are 2%. The PV of the guaranteed CHF
100 in 5 is
90 = (1 − 5 × 0.02) × 100 CHF
226 CHAPTER 3. FUNDAMENTALS THEORY

using linear compounding. If the issuer invests today CHF 90 in a zero-bond, then
the capital guarantee promise in 5 years can be satised - if the issuer does not
defaults.

• The amount of 10% is used to dene participation for the investor.

Therefore, the investment product SP Vt consists of a zero bond with price p(t, T ) at
time t and maturity T and a participation product whose price depends on the price of
the underlying asset St . In the simplest variant, the value of the product VT at maturity
T is determined as the product of the face value and the participation in the underlying
asset's price return:
ST − S0
VT = N × 1 + max(0, b )
S0
with N the face value and b the participation rate. Rewriting, we get

bN
VT = N + max(0, ST − S0 ) . (3.71)
S0
We note that a '+'- sign in a payo value is a long position and a '-'-sign a short position.
The payo formula (3.71) is written from a buyer's perspective.

Equation (3.71) shows that the payo of the CP at maturity equals an investment in
a zero bond and a long position in a European call option C(S, K, T ) with strike K = S0 .
The number of options is equal to the face value divided by the initial price. (3.71) is a
replication of the payo at maturity xed in the contract. No arbitrage implies for the
fair value of the contract V0
bN
V0 = p(0, T ) + C(0, S, K) (3.72)
S0
with p(0, T ) the zero bond and C(S, K, 0) the arbitrage free option price, i.e. S(0, S, K) =
E Q [D(0, T ) max(ST − S0 , 0)].
Consider an investor with dierent preferences:

• He beliefs that UBS stock is likely to raise over the next year.

• He believes that the stock will not raise strongly. He also prefers a partial capital
protection if UBS stocks falls. He is in turn willing to give up the upside potential
of the stock.

• He prefers a coupon which is larger than the UBS stock dividend.

A SP is able to match these investor preferences; A Barrier Reverse Convertible

(BRC). The investor gets independent of UBS stock price movements a coupon, say
10 percent. The investment amount is fully payed back unless the underlying value
dropped below a barrier level during the life time of the product: The repayed capital
amount is contingent on an event of a barrier hit. If the underlying value UBS once
3.4. APPLICATION 227

hits the barrier, the capital protection is knocked out and the payback at maturity is
the UBS stock value at this date plus the 10 percent coupon. A BRC delivers a higher
coupon than the stock dividend plus a contingent capital protection. Contrary to CP,
the investor faces market risk of UBS breaching the barrier. The investor gives up the
stocks upside: The coupon is the maximum return possible which is higher than the
UBS dividend yield. Consider the replication of the BRC. The BRC payo at maturity
is replicated with two products:

• a long zero coupon bond position and

• a short down & in put ( DIP). I.e. the investor sells a DIP - a barrier option on
UBS. This money is used to generate the coupon value.

A barrier put option is characterized by a strike K and a barrier B. In our case

B<K - this leads to the expression 'down'. The payo is: If UBS is for the whole time
to maturity not breaching B, the option is worthless. Contrary if at any date the barrier
is at least hit once the put option becomes active - the option is 'in'. In the BRC the
investor is short the DIP - he sells it. If the barrier is hit a loss follows.

Contrary to the problem to classify innovations in general, for RSP successful classi-
cation schemes exist. One of them is the Swiss Derivative Map from the Swiss Derivative
Association. With minor adaption this map is also used in the European Structured
Product Association. The Swiss map denes main categories:

• Capital protection.

• Yield Enhancement. BRC are a prominent product in this class.

• Participation, i.e. product which globally have linear payo prole. 'Globally'
means that for some bounded region in the underlying value the payo can be
non-linear.

• Leverage Products. This is the class of warrants and mini futures.

• Reference Entity Products. In addition to the credit risk of the issuer, redemption
is subject to the solvency (non-occurrence of a credit event) of the reference entity.

In each category there a sub categories.

We consider a discount certicate (DC) as a next example, see Figure 3.13 for the
payo prole at maturity of the DC.
The investor is willing to give up the upside of the underlying in exchange for a buer
if the underlying drops. If the spot price of the underlying is CHF 236 and the issuing
price of the DC is CHF 210, 11 percentage lower than the price of the underlying,
i.e.
then the investors gets a discount of 11% and the maximum return is 250/210 − 1 = 19%.
The nal payo of the DC is given by min(ST , K) with ST the value of the underlying at
maturity and K the strike value. To price a DC one replicates the payo by using simpler
228 CHAPTER 3. FUNDAMENTALS THEORY

Profit
Underlying

Cap

Discount Certificate

Strike CHF 250

Underlying
Price

Loss

Figure 3.13: Payo of a discount certicate.

products: the price of the DC is by no arbitrage equal to the replication payo 's price.
Since the payo is non-linear, options are needed for replication and a model such as
Black and Scholes is used to price the options. The replication portfolio is long a LEPO
(Low Exercise Price Option) and short a call with strike K = 250. LEPOs are European
type call option with very low strike of K = 0.01 CHF. The current value of a LEPO is
equal to the current price of the underlying share compounded by the risk-free interest
rate, less the accumulated value of dividends and the strike price. Since K is close to
zero, the price sensitivity of th LEPO which implies that the price of the LEPO is well
approximated by the price of the underlying minus the PV of the dividends. Hence, the
DC payo is graphically equivalent to a straight line(LEPO) plus a short call payo.

3.4.11 Political Events: Swiss National Bank (SNB) and ECB and SP
Investment
The SNB announced, on 15 January 2015, the removal of the euro cap and the intro-
duction of negative CHF short-term interest rates. This decision caused the SMI to lose
about 15 percent of its value within 1 - 2 days, and the FX rate EUR/CHF dropped
from 1.2 to near parity. Similar changes occurred for USD/CHF. Swiss stocks from
export-oriented companies or companies with a high cost base in Swiss francs were most
aected. The drop in stock prices led to a sudden and large increase in Swiss stock
market volatility. Swiss interest rates became negative for maturities of up to thirteen
years.
3.4. APPLICATION 229

It was also known at the time that the ECB would make public its stance on quan-
titative easing (QE) one week later. The market participants' consensus was that Mario
Draghi - president of the ECB - would announce a QE program. The events in Switzer-
land, which came as a surprise, and the ECB QE measures subsequently announced
paved the way for the following investment opportunities:

1. A Swiss investor could invest in high quality or high dividend paying EUR shares at
a discount of 15 percent. EUR shares were expected to rise due to the forthcoming
ECB announcement.

2. All Swiss stocks, independent of their market capitalization, faced heavy losses
independently of their exposure to the Swiss franc.

3. The increase in volatility made BRCs with very low barriers feasible.

4. The strengthening of the Swiss franc versus the US dollar, and the negative CHF
interest rates, led to a USD/CHF FX swap opportunity that only qualied investors
could benet from.

5. The negative interest rates in CHF and rates of almost zero in the eurozone made
investments in newly issued bonds very unattractive. Conversely, the low credit risk
of corporates brought about by the ECB's decision oered opportunities to invest
in the credit risk premia of large European corporates via structured products.

Before certain investment opportunities are discussed in more detail, it should be

noted that by the time this paper had been written (about ve months after the events
described above took place), all investments were protable and some even had two-
digit returns. This certainly does not mean that the investments were risk free, as such
investments are not risk free. But it shows that many investment opportunities are
created by policy interventions. This contrasts with the often voiced complaints about
negative interest rates and the absence of investment opportunities for rms, pension
funds, and even private investors. Some investment ideas will now be considered in more
detail.

3.4.11.1 Opportunities to Invest in High Dividend Paying EU Stocks

The idea was to buy such stocks at a discount due to the gain in value of the Swiss
franc against the euro. The rst issuer of a tracker oered such products on Monday, 19
January 2015 - that is to say, two business days after the SNB's decision was announced.
With all products, investors participated in the performance of a basket of European
shares with a high dividend forecast. The basket's constituents were selected follow-
ing suggestions from the issuing banks' research units. Investors could choose between
a structured product denominated in Swiss francs or in euros depending on their will-
ingness to face - besides the market risk of the stock basket - also the EUR/CHF FX risk.
230 CHAPTER 3. FUNDAMENTALS THEORY

This investment had two main risk sources. If it was denominated in euros, the EU-
R/CHF risk held and one faced the market risk of the large European companies whose
shares comprised the basket. Most investors classied the FX risk as acceptable since
a signicant further strengthening of the Swiss franc against the euro would meet with
counter measures from the SNB. More specically, a tracker on a basket of fourteen Eu-
ropean stocks was issued. The issuance price was xed at EUR 98.75. As of 1 April
2015 the product was trading at EUR 111.10 (mid-price) - equivalent to a performance
of 12.51 percent pro rata. Similar products were launched by all the large issuers.

Other issuers launched a tracker on Swiss stocks, putting all large Swiss stocks in a
basket that had only a little exposure to the Swiss franc, but which also faced a heavy
price correction after the SNB announcement in January. Again, the input of each issu-
ing bank's research unit in identifying these rms was key. The underlying investment
idea for this product can be seen as a typical application of behavioral nance: an over-
reaction of market participants to events is expected to vanish over time.

The risk in this investment was twofold. First, one did not know whether the SNB
would consider further measures, such as lowering interest rates further, which would
have led to a second drop in the value of Swiss equity shares. Second, international
investors with euros or US dollars as their reference currency could realize prots since
the drop in Swiss share values - around 15 percent - was more than oset by the gain
from the currency, which lost around 20 percent in 'value'; roughly, an institutional
investor could earn 5 percent by selling Swiss stocks. Since large investors exploit such
opportunities rapidly, it became clear three days after the SNB's decision was announced
that the avalanche of selling orders from international investors was over.

3.4.11.2 Low-Barrier BRCs

Investors and private bankers searched for cash alternatives with a 100 percent capital
guarantee. The negative CHF interest rates made this impossible: if 1 Swiss franc today
is worth less than 1 Swiss franc will be worth tomorrow, one has to invest more than 100
percent today to get a 100 percent capital guarantee in the future.

Low-barrier BRCs - say, with a barrier at 39 percent - could be issued with a coupon
of 1 to 2 percent depending on the issuer's credit worthiness and risk appetite for a ma-
turity of one to two years. S&P500, Eurostoxx 50, SMI, NIKKEI 225, and other broadly
diversied stock indices were used in combination as underlying values for the BRCs.
The low xed coupon of 1˘2 percent takes into account that the product is considered
as a cash alternative with a zero percent, or even a negative, return. See last section for
more details about BRC.
3.4. APPLICATION 231

3.4.11.3 Japan: Abenomics

As expected, the Liberal Democratic Party of Japan gained a substantial parliamentary
majority in the 2012 elections. The economic program introduced by the newly elected
PM Shinzo Abe was built on three pillars: 1) scal stimulus, 2) monetary easing, and 3)
structural reforms ('Abenomics'). Subsequently, the Yen (JPY) plunged versus its main
trading currencies, providing a hefty stimulus to the Japanese export industry. The issuer
of one product oered an outperformance structured product on the Nikkei 225 in quanto
Australian dollars, meaning that the structured product in question is denominated in
AUD and not in JPY, which would be the natural currency given the underlying Nikkei
225. This means that investors did not face JPY/AUD currency risk but if they were
Swiss investors, who think in Swiss francs, they still faced AUD/CHF risk. The term
'quanto' means 'quantity adjusting option'.

Outperformance certicates enable investors to participate disproportionately in price

advances in the underlying instrument if it trades higher than a specied threshold value.
Below the threshold value the performance of the structured product is the same as the
underlying value. How can investors invest in an index in such a way as to gain more
when markets outperform a single market index investment, but still not lose more if
the index drops? The issuer uses the anticipated dividends of the stocks in the index to
buy call options. These options lead to the leveraged position on the upside (see Figure
3.14).

Figure 3.14: Payo of an outperformance structured product.

The reason for using quanto AUD is the higher AUD interest rates compared to JPY
232 CHAPTER 3. FUNDAMENTALS THEORY

interest rates. Higher interest rates lead to higher participation and the participation
in the quanto product was 130 percent. The risk of the investment lay in whether
Abenomics would work as expected; and possibly FX AUD/CHF. The economic program
in Japan worked out well and the redemption rate lay at 198 percent after two years.
This redemption contains a loss of 16.35 percent due to the weakness of the Australian
dollar against the Swiss franc.

3.4.12 Market Events

The focus here will be on the credit risk of structured products. Although the examples
are presented under the heading of market events, the status of the market in the most
recent GFC and in 2014/2015 was the result of a complicated catenation of business
activities, policy interventions, and market participants' reactions. The discussion below
shows that structured products with underlying 'credit risk' oer, under specic circum-
stances, valuable investment opportunities to some investors. But the number of such
products issued is much smaller than the number of equity products. One reason for
this is that not all issuers are equally experienced or satisfy the requirements for issu-
ing credit-risky structured products (necessary FI trading desk, balance sheet, and risk
capital constraints). Another reason is the lack of acceptance of such products among
investors, regulators, portfolio managers, and relationship managers, all of whom often
do not have the same level of experience and know-how as they have regarding equity
products.

3.4.12.1 Negative Credit Basis after the GFC

Negative credit basis is a measurement of the dierence in the same risk in dierent
markets. The basis measures the dierence in credit risk - measuring once in the deriva-
tives markets and once xed in the bond markets. Theoretically, one would expect that
the credit risk of ABB has the same value independent of whether an ABB bond or a
credit derivative dened on ABB's credit risk is being considered. This is indeed true
if markets are not under stress - at which point the credit basis is close to zero. But if
liquidity is an issue, the basis becomes either negative or positive. In the most recent
GFC, liquidity was a scarce resource. The basis became negative since investing in bonds
required funding the notional while for credit derivatives only the option premium needs
to be nanced. For large corporates, the basis became strongly negative by up to −7
percent. Table 3.19 shows how the positive basis in May 2003 changed to a negative one
in November 2008.

To invest in a negative basis product, the issuer of a structured product locks in the
negative basis for an investor by forming a portfolio of bonds and credit derivatives of
those rms with a negative basis. For each day on which the negative basis exists a cash
ow follows, which denes the participation of the investor. When the negative basis
vanishes, the product is terminated.
3.4. APPLICATION 233

Corporate Credit basis in May 2003 (bps) Credit basis in November 2008 (bps)
Merrill Lynch 47 -217
General Motors -32 -504
IBM 22 -64
J.P. Morgan Chase 22 -150

Table 3.19: Credit basis for a sample of corporates in 2003 and their negative basis in
the most recent GFC.

Example

Investing in the negative credit basis of General Motors (see Table 3.19) leads to a
return, on an annual basis, of 5.04 percent if the basis remains constant for one year.
If the product has a leverage of 3, the gross return is 15.12 percent. To obtain the net
return, one has to deduct the nancing costs of the leverage.

Structured products with this idea in mind were oered in spring 2009 to qualied
investors. The products oered an annual xed coupon of around 12 percent and partic-
ipation in the negative basis. The high coupons were possible as some issuers leveraged
investors' capital. This could only be oered by those few issuers in the most recent GFC
that were cash rich; typically AAA-rated banks. The products paid one coupon and were
then terminated after 14 months since the negative basis approached its normal value.
The product value led to a performance of around 70 percent for a 14-month investment
period. Was this formidable performance realized ex ante a free lunch - that is to say,
a risk-less investment? No. If the nancial system had fallen apart, investors would
have lost all the invested capital. But the investors basically only needed to answer the
following question: Will the nancial system and real economy return to normality? If
yes, the investment was reduced to the AAA issuer risk of the structured product.

Many lessons can be drawn from these products. A very turbulent time for markets
can oer extraordinary investment opportunities. The valuation of these opportunities
by investors must follow dierent patterns than in times of normal markets: There is
for example no history and no extensive back-testing, and hence an impossibility of
calculating any risk and return gures. But there is a lot of uncertainty. Making an
investment decision when uncertainty is the main market characteristic is an entirely
dierent proposition to doing so when markets are normal and the usual risk machinery
can be used to support decision-making with a range of forward-looking risk and return
gures. If uncertainty matters, investors who are cold-blooded, courageous, or gamblers,
and analytically strong, will invest, while others will prefer to keep their money in a safe
haven.
234 CHAPTER 3. FUNDAMENTALS THEORY

3.4.12.2 Positive Credit Basis 2014

The monetary interventions of the ECB and other central banks led to excess liquidity,
which was mirrored in a positive basis for several large rms. Monetary policy also im-
plied low or even negative interest rates. This made investment in newly issued bonds
unattractive. To summarize, investors were searching for an alternative to their bond
investments, but an alternative that was similar to a bond.

A credit linked note (CLN) is a structured product. Its payo prole corresponds to
a bond's payo in many respects. A CLN pays - similarly to a bond - a regular coupon.
The size of the coupon and the amount of the nominal value repaid at maturity both
depend on the credit worthiness of a third party, the so-called reference entity (the issuer
of the comparable bond). This is also similar to the situation for bonds. But the size
of the CLN coupon derives from credit derivative markets. Hence, if the credit basis is
positive, a larger CLN coupon follows, as compared to the bond coupon of the same ref-
erence entity. CLNs are typically more liquid than their corresponding bonds since credit
derivative markets are liquid while many bonds, even from large corporates, often suer
from illiquidity. CLNs are exible in their design of interest payments, maturities, and
currencies. CLNs also possess, compared to bonds, tax advantages; in fact, the return
after tax for bonds that were bought at a price above 100 percent is in this negative in-
terest rate environment often negative. The investor in a CLN faces two sources of credit
risk: the reference entity risk as for bonds, and the issuer risk of the structured product.
As an example, Glencore issued a new 1.25 percent bond with a coupon in Swiss francs.
Due to the positive basis, the coupon of the CLN was 1.70 percent. Another product
with, as the reference entity, Arcelor Mittal in EUR implied a higher CLN eective yield
compared to the bond of 1.02 percent in EUR.

Let us consider a more detailed example. Consider the reference entity Citigroup
Inc. The bond in CHF matures in April 2021 and its price is 102.5 with a coupon of
2.75 percent. The bond spread is 57 bps, which leads to a yield to maturity of −0.18
percent - an investor should sell the bond. The CLN has a spread of 75 bps, which
proves the positive basis and an issuance price of 100. The coupon of the CLN is - then
−0.71 percent, which leads to a yield to maturity of 0.57 percent if funding is subtracted.
Therefore, selling the bond and buying the CLN generates an additional return of 75 bps.

3.5 Collateral
3.5.1 Prime Finance
Prime Finance is an important trading activity which is frequently used by asset man-
agement rms. Prime Finance has dierent aspects:

• Lending and borrowing of securities, the Securities and Lending Business (SLB).

• Repos, i.e. sale and repurchase agreements.

3.5. COLLATERAL 235

• Synthetic nance such as synthetic SLB or synthetic Repo. These transactions

combine a SLB with a derivative where the underlying is the security of the SLB
transaction.

The general motivation for repos is the borrowing or lending of cash. In securities lend-
ing, the purpose is to temporarily obtain the security for other purposes, such as covering
short positions or for use in complex nancial structures. Securities are generally lent
out for a fee. Securities lending trades are governed by dierent types of legal agreements
than repos.

Prime nance business changed heavily after the GFC and is still transforming. Sev-
eral rationales motivate prime nance activities and its transformation. A rst rationale
is collateralized banking. Repo business can be considered as secured banking were
collateral serves as a creditor protector for non-retail investors. Creditors are bank, in-
surance companies, governments, rms, AM or pension funds. Markets which are widely
collateralized are for example xed income repo, equity nance, exchange traded se-
curities, OTC derivatives, securities lending, banks loans, asset backed securities. An
important property of collateral is its eligibility, i.e. the extend how collateral can be
converted into an economic value if the counter party defaults. Liquidity, quality in terms
of embedded credit risk and the possibility to settle the collateral dene the collateral eli-
gibility. Cash is the most used collateral followed by government bonds, large-cap shares.
For traders, repos are used to nance long positions, obtain access to cheaper funding
costs of other speculative investments, and cover short positions in securities. A second
rationale is cost reduction in the custody of securities where lending and borrowing secu-
rities generates earnings which lower these costs. Third, to cover short positions one has
to borrow securities. Short positions can be the results of market making, the hedging of
derivative positions or part of an investment strategy. Finally, regulatory requirements
lead to lower risk weighted assets in the regulatory capital charge if one switches from
unsecured to secured transactions.

3.5.2 Repo Transaction

A repo, a bilateral contract between a buyer and a seller, allows a borrower to use a
nancial security as collateral for a cash loan at a xed rate of interest. The borrower
agrees to sell immediately at 0 a security to a lender and also agrees to buy the same
security from the lender at a xed price at some later date 1. A repo is equivalent to a
cash transaction combined with a forward contract. The dierence between the forward
price and the spot price is the interest on the loan while the settlement date of the
forward contract is the maturity date of the loan. A repo can be cash or security driven.
It is security driven if the investor wishes to lend a security. Repos can be described as
follow:

• At 0: Assignment of the securities from the seller to the buyer.

236 CHAPTER 3. FUNDAMENTALS THEORY

• At 1: Redemption of the loan and interest rate payments to the buyer and reas-
signment of the security from the buyer to the seller.

The purchase price in 0 equals the market value (dirty price) of the underlying security
minus an add on (Haircut). The haircut provides a restricted protection against falling
security prices. The payback price equals the purchase price plus an agreed interest pay-
ment (repo rate), which depends upon the quality of the security. If the security losses
value, a margin call follows. Using a repo the Buyer obtains favourable rates compared
to an unsecured loan and the Seller receives collateral.

Almost any security may be employed in a repo. But highly liquid securities are
preferred because they can be easily secured in the open market where the buyer has
created a short position in the repo security through a reverse repo and market sale.
Treasury, Government bills, corporate and Treasury/Government bonds, and stocks may
all be used as a collateral in a repo transaction. Coupons which are paid while the repo
buyer owns the securities are passed to the repo seller although the ownership of the
collateral rests with the buyer during the repo agreement. There are three types of repo
maturities: overnight, term (i.e. with a specied date), and open repo.

The most important forms of a repo transactions are specied delivery and tri-party.
The rst form requires the delivery of a prespecied bond at the onset, and at maturity
of the contractual period. Tri-party essentially is a basket form of transaction, and allows
for a wider range of instruments in the basket or pool. The tri-party agent, acts as an
intermediary between the two parties to the repo. The tri-party agent is responsible for
the administration of the transaction, marking to market, and substitution of collateral.
The largest one being Clearstream and JP Morgan Chase.

A reverse repo is the same repurchase agreement from the buyer's viewpoint, not the
seller's. The term reverse repo is used to describe a short position in a debt instrument
where the buyer in the repo transaction immediately sells the security provided by the
seller on the open market.

Example:
While investors trade bonds on a stand alone basis, trading desks use repo jointly with
bond trading. Buying a bond is completed immediately by selling the bond in a repo,
i.e. one nances the bond. We consider an US Treasury Bond with the following dates:

• T trading day to buy the bond.

• T1 = T + 1 settlement day for the bond. Start/opening the repo.

• T2 = T + 2. Closing of the 1-day repo.

At T the trader buys the bond for the price B(T ) from a counter party A. At T +1 the
repo transaction starts to nance the bond. To achieve this
3.5. COLLATERAL 237

• the repo desk delivers the bond for 1 day, i.e. the period of the repo transaction is
overnight from T1 to T2 for a price B(T1Repo ), to the repo counter party and
• the repo desk agrees to buy the bond back at T2 for the price

B(T1Repo )(1 + r/360)

with r the repo rate.

The prices B(T ) and B(T1Repo ) can dier at T1 . The dierence is a residual cash position
with a cash rate rcash . This rate is in general dierent from the repo rate. At T2 the
Repo
repo desk pays B(T1 )(1 + r/360) to the counter party, receives the bond back and
delivers the bond for the price B(T1 ) to the buyer. The P& L of this transactions over
1 day reads:

P&L = P (T1 ) − B(T ) Price Change Bond (3.73)

Repo
− B(T1 )r/360 Repo Costs
Repo
+ (B(T1 ) − B(T ))rCash /360 Dierence Repo vs. Cash Market.

Using the data notional 100 Mio. USD, coupon 4 percent, T is Oct 2 for trading the
bond, settlement Oct 3 from the clean Price of the bond 100'078'125 USD (= 100 − 02+
in US Treasury notation) with accrued interest the settlement price 100'110'911 USD
3
follows where the accrued interest rate is
183 × 0.04/2: The bond accrues interest since
Sept 30 and a half a year has 183 days. The repo rate r equals 3.4 percent, the cash rate
is 3.5 percent. Since the bond settles Oct 3, the repo desk nances the bond. The bond
price changes from Oct 2 to Oct 3 by (100-05). Therefore, the value of the position in
dirty prices increased to

3
1000 1890 036 = (1 + 5/32 + × 0.04/2) × 100Mio. USD .
183
At Oct 3 the following payments/transactions are made:

• Bonds are received with value USD 1000 1100 911 and exchanged for a secured loan
0 0
of USD 100 189 036 with the repo counter party.

• They deliver cash payments of 780 125 USD.

At Oct 4 the following payments/transactions are made:

• The repo counter party hands back the lent bond and obtains the repo rate interest:

1000 1980 499 = 1000 1890 036 × (1 + 0.034/360) .

• The bond is sold from the repo desk to the buyer. The price equals the clean price
of Oct 3 with Oct 4 settlement plus accrued interest. If the bond increased to
100-08, we have

4
1000 2930 715 = (1 + 8/32 + × 0.04/2) × 100 Mio. USD .
183
238 CHAPTER 3. FUNDAMENTALS THEORY

The P&L components are:

• Change in bond price: 1000 2930 715 − 1000 1100 911 = +1820 803 USD.

• Repo costs : −1000 1890 036 × 0.034

360 = −9462 USD.

• Dierence Repo vs. Cash Market: 780 125 × 0.035

360 = +7.7 USD.

A 1-day P&L of 173'349 USD follows.

Contrary to the SLB business, repo is always of the type cash against security. Both
transaction types face the same market risk but settlement risk can be dierent.

Eurex, one of the world wide largest exchange for futures and option trading, also
oers platforms for bond trading and for repo (Eurex Repo). The platform is open to all
nancial institutions. The Eurex Repo platform is a TriParty platform with integrated
trading and settlement functionalities. This means that a a third party to the Buyer and
Seller is responsible for administration and operations. The largest providers TriParty
Repo programs are Clearstream and JP Morgan Chase. The Eurex platform integrates
trading, settlement and legal documentation. Participants at Eurex Repo can choose
from a broad menu of repo transactions. An advantage of the Eurex Repo platform is
that the securities which are received as collateral can be used immediately for a new
repo transaction. This allows banks to raise cash if they need to do so. The Eurex market
consists of four links for the participants in CHF repos:

• Trading is via the Eurex Repo platform.

• Clearing, Settlement and Collateral Management takes place at SIX SIS.

• Cash Clearing is done via SIX Interbank Clearing.

• There is a link to SNB which publishes the SNB-eligible securities.

As an example, consider a bond trader (Seller) which wishes to borrow CHF 20 Mio.
to nance for one week an investment of CHF 18 Mio. Swiss Government Bonds with
3 percent coupon. A repo buyer oers a repo rate of 2 percent. The seller accepts the
rate. He delivers CHF 18 Mio. nominal against CHF 20 Mio. cash. At the same day he
pays the buyer CHF 20 Mio. in exchange of the CHF 18 Mio. bonds. After one week
the buyer gives back the bond to the seller. The seller pays back the loan plus accrued
interest:
0.02 × 7
200 0000 000 × = 70 777.8 CHF .
360

3.6 The Ecient Market Hypothesis (EMH)

The TAA raises the question of whether asset prices are predictable. Predictability is
part of the broader Ecient Market Hypothesis (EMH) concept.
3.6. THE EFFICIENT MARKET HYPOTHESIS (EMH) 239

Malkiel (2003): Revolutions often spawn counter-revolutions and the ecient market
hypothesis [EMH] in nance is no exception. The intellectual dominance of the ecient-
market revolution has more been challenged by economists who stress psychological and
behavioural elements of stock-price determination and by econometricians who argue that
stock returns are, to a considerable extent, predictable.

Lo (2007): The ecient market[s] hypothesis (EMH) maintains that market prices fully
reect all available information. [...] It is disarmingly simple to state, has far-reaching
consequences ..., and yet is surprisingly resilient to empirical proof or refutation. Even
after several decades of research and literally thousands of published studies, economists
have not yet reached a consensus about whether markets - particularly nancial markets
- are, in fact, ecient.

Asness and Liew (2015):The concept of market eciency has been confused with every-
thing from the reason that you should hold stocks for the long run to predictions that stock
returns should be normally distributed to even simply a belief in free enterprise.

Shiller (2014): [If markets are ecient] there is never a good time or bad time to
enter the market [...]

We start with the EMH denition.

Denition 46. A nancial market is ecient when market prices reect all available
information about value.

All available information includes past prices, public information, and private infor-
mation. These dierent information sets Ft lead to dierent EMHs (see below). The
statement 'reecting all available information' is not dened. If a company announces to
expect twice as much earnings, do stock prices double, triple, or fall? Reect all available
information means in the sense of Jensen that trading based on the information set does
not lead to an economic prot. An asset pricing model is needed to make precise what
reecting all information means in the EMH. Eciency testing means to test whether the
properties of expected returns implied by the model of market equilibrium are observed
in actual returns. This is referred to as the joint hypothesis problem (Fame [1970]):

• Pillar 1: Do prices reect all available information - that is, are market ecient?
Prices can only change if new information arrives. The information content.

• Pillar 2: Developing and testing asset pricing models. The price formation mecha-
nism (Asset Pricing Model).

See Section ?? for a discussion of information sets and their evolution over time.
19 This section is based on Fama (1965, 1970, 1991), Cochrane (2011, 2013), Malkiel (2003), Asness
(2014), Lo (2007), Nieuwerburgh and Koijen (2007), and Shiller (2014).
240 CHAPTER 3. FUNDAMENTALS THEORY

Returning to the EMH, let Rt+1 be an asset's return, assumed information

FM the
used in the market to set the equilibrium price of the asset and F the real information
used to form asset prices. Market eciency means that the expected returns at t+1 given
the two information sets at time t are the same (we assume that rational expectations
are formed)
E(Rt+1 |FM,t ) = E(Rt+1 |Ft ) . (3.74)

The standard asset pricing equilibrium model of the 1960s assumed that the equilibrium
expected returns are constant: E(Rt+1 |FM,t ) = constant. If the EMH (3.74) holds, then

E(Rt+1 |Ft ) = constant

follows. To test the EMH, the regression of the future Rt+1 returns on the known infor-
mation Ft should have a zero slope. If this is not the case, the market equilibrium model
could be wrong or the denition of FM,t overlook information in price setting, FM,t and
Ft are not equal, or both channels could be awed.

Remarks
• The EMH does not hold if there are market frictions (trading costs, cost of obtaining
information). In the US, reliable information about rms can be obtained relatively
cheaply and trading securities is cheap too. For these reasons, US security markets
are thought to be relatively ecient.

• Grossman and Stiglitz (1980) show that perfect market eciency is internally in-
consistent.

• The EMH does not assumes rationality of investors. But to operationalize the
EMH one often assumes rationality. Fama proposes the following form:

E[Rt+1 − E[Rt+1 |Ft ]|Ft ] = 0 . (3.75)

Given the information set there is on average no systematic deviation of future

returns and its expectations. In this sense prices reect all available information.
Clearly, investors are assumed to be rational under this assumption. The EMH does
not assume that all investors have to be informed, skilled, and able to constantly
analyze the information ow. One can prove that market eciency is possible even
if a small number of market participants are informed and skilled.

• The EMH is applicable to all asset classes. If the EMH holds true, then prices react
quickly to the disclosure of information.

Why is the EMH important for AM? Fama's work on market eciency (1965, 1970)
triggered passive investing with the rst index launched 1971. In ecient markets buying
and selling securities is a game of chance rather than one of skill. Active management
is a zero-sum game. If the EMH holds, the variation of the performance of the active
managers around the average is driven by luck alone. Many studies found little or no
3.6. THE EFFICIENT MARKET HYPOTHESIS (EMH) 241

Figure 3.15: Performance ranking of the top 20 equity funds in the US in the 1970s and
in the following decade. The average annual rate of return was 19 percent compared
to 10.4 percent for all funds. In the following decade, the former 20 top funds had an
average rate of return of 11.1 percent compared to 11.7 percent for all funds (Malkiel
[2003]).

correlation between strong performers in one period and those in the next one, see Figure
3.15.

Suppose that one is able to pick in advance those managers who outperform others.
As per the EMH, investors would give them all their money; no-one would select those
managers doomed to underperform. But who will be on the other side of the outper-
former's trades? This process would be self-defeating.

The same conclusion also holds for technical analysis, the study of past stock prices
to predict future prices, and fundamental analysis, the analysis of nancial company
information to select undervalued stocks. If the EMH holds, both approaches are useless
in predicting asset prices. The value of nancial analysts is not in predicting asset values
but to analyse incoming information fast such that the information is rapidly reected
in the asset prices. In this sense analysts support the EMH. Fama (1970) denes three
dierent forms of market eciency, this means dierent sets F. In the weak-form
EMH, F is all available price information at a given date. Hence, future returns cannot
be predicted from past returns or any other market-based indicator. This precludes
242 CHAPTER 3. FUNDAMENTALS THEORY

technical analysis from being protable. In the semi-strong EMH, F is all available
public information at a given date, i.e. nancial reports, economic forecasts, company
announcements, etc. matter. Technical and fundamental analyses are not protable in
this case. This the form of the EMH which is often subsumed in the literature. In the
strong-form EMH, F is all available public and private information at a given date.
This extreme form serves mainly as a limiting case.

Example

A well-known story tells of a nance professor and a student who come across a
hundred dollar bill lying on the ground. As the student stops to pick it up, the professor
says, 'Don't bother - if it were really a hundred dollar bill, it wouldn't be there.' This
story illustrates well what nancial economists usually mean when they say markets are
ecient. But suppose that the student assumes that nobody so far tested whether the
bill is indeed real but that all assumed that someone else checked the bill's validity. Then,
there were no eorts made to generate the information needed to value the bill. But if
nobody faced the costs of generating that information then Ft is the empty set. Then
the EMH cannot hold. This shows that a reasonable assumption about human behavior
can lead to a violation of the EMH.

Example

A rm announces a new drug that could cure a virulent form of cancer. Figure 3.16
shows three possible reactions of the price paths. The solid path is the EMH path:
prices jump to the new equilibrium value instantaneously and in an unbiased fashion.
The dotted line represents a path where market participants overreact and the dashed
one where they underreact. The dash-dotted line is a strong signal for insider trading,
front running, or any other form of illegal trading.

3.6.1 Predictability
If the EMH holds, returns follow a random walk:

Denition 47. Let Rt be the return of an asset with the dynamics

Rt = Rt−1 + m + t , m ∈ R, R0 = r . (3.76)

If the sequence (t ) is IID with mean zero, variance σ 2 and zero covariance cov(t , t−1 ) =
0, then Rt is a random walk with drift m.

Setting m = r = 0, then E[Rt ] = 0 and var(Rt ) = tσ 2 . If we take the standard

deviation as risk measure, then risk grows with the square-root of time. Assuming simple
3.6. THE EFFICIENT MARKET HYPOTHESIS (EMH) 243

Figure 3.16: Possible price reactions as a function of the day relative to the announcement
of a new drug.

St −St−1
returns given by Rt = St−1 , the random walk equation implies

St+1 − St St − St−1
= + t
St St−1
and after some algebra
t s
Y S1 X
St = S0 ( + k ) .
S0
s=1 k=1
While in a random walk returns are a zero sum game, prices are by no means driftless.

A return Rt+1 is predictable by the information set Ft if the return is measurable

w.r.t. to Ft , i.e. 'the random variable is known and hence not random'.
20 In particular,

E[Rt+1 |Ft ] = Rt+1 , ∀t, (3.77)

since conditioning on the information set Ft has no value. When returns are not pre-
dictable, prices follow a martingale:
Denition 48. Assume21
E Q [St+1 |Ft ] = St , ∀t, (3.78)

20 Measurability is a well-dened mathematical notion and it is not equivalent to the above verbal
description.
21 The expected value of S exists.
244 CHAPTER 3. FUNDAMENTALS THEORY

with the expectation is under a probability measure Q. Then St is a Ft Q - -martingale.

If S is a martingale, using the tower property of conditional expectation

E Q [St+1 |Ft ] = E Q [St ] , ∀t. (3.79)

Hence, the expected price of a non-predictable process is constant. The price process has
no drift, else the average value would not be constant. But the price itself can vary. If
returns are martingales, then the operational form of the EMH (3.75) holds true.

The return Rt+ in period t to t+1 of a stock is equal to the capital gain plus a
dividend yield D, i.e.
St+1 − St Dt+1
Rt+1 = + . (3.80)
St St
1
Rewriting this equation St = 1+Rt+1 (St+1 +Dt+1 ). Solving this linear dierence equation
implies for k periods

k j k
! !
X Y 1 Y 1
St = Dt+j + St+k . (3.81)
1 + Rt+m 1 + Rt+m
j=1 m=1 m=1

It is common to assume that asset price grow at a lower rate than the return - the second
term has tend to zero for k → ∞. We get:

Theorem 49. If the asset prices growth is lower than the asset returns, the price St is
equal to the discounted future dividends, i.e.
∞ j
!
X Y 1
St = Dt+j . (3.82)
1 + Rt+m
j=1 m=1

Since there is no randomness, the future dividends are known. We extend this formula
by adding risk and considering the EMH. Consider the operational form of the EMH
(3.75) and take conditional expectations in (3.80):

E[St+1 |Ft ] + E[Dt+1 |Ft ]

St = .
1 + E[Rt+1 |Ft ]
This shows that an asset pricing model to evaluate E[St+1 |Ft ] is needed to calculate
today's prices. If prices are a martingale, the last equation becomes

E[Dt+1 |Ft ]
St = .
E[Rt+1 |Ft ]
Hence, capital gains do not matter for asset pricing. If dividends are martingales and
returns are random walks, then the famous pricing formula follows where asset prices are
equal to the ratio of the constant expected dividend and expected return, see the next
example.
3.6. THE EFFICIENT MARKET HYPOTHESIS (EMH) 245

Example constant dividends and returns

If expected dividends and returns are constant, the above valuation equation reads

D
St = = constant . (3.83)
R
But empirical evidence shows that expected returns and dividends are both not constant
over time. Therefore, (3.83) is too naive. It implies that the volatilities of the growth
rates are the same:
dRt dDt
volatility = volatility .
Rt Dt
But the return volatility is around 16% while the dividend volatility is only about 7%.
Therefore something else must be time varying. Furthermore, the return volatility is
time varying. Monthly market return volatility uctuated between values of 20% and
more in market stress periods (Great Depression, Great Financial, Crisis) and 2% in the
60s and mid-90s of last century, see next section.

Since the pricing formula has the same structure with and without risk, the formula
of last theorem carries over to the case with risk providing the ex-ante version:

Theorem 50. If the expected asset prices growth is lower than the expected asset returns,
the price St is equal to the discounted future dividends, i.e.
∞ j
!
X Y 1
St = E[Dt+j Ft ] . (3.84)
1 + E[Rt+m |Ft ]
m=1
j=1

Formula (3.84) is a fundamental asset pricing equation. The formula is highly

non-linear which makes it challenging to test it empirically. Straightforward approxima-
tions, Taylor series, are used to linearize the formula. A second manipulation is to write
the price of an asset today as the expected value of changes in dividends and returns.
One shifts the pricing equation one step ahead and subtracts it from the non-shifted
expression, see below.

Example Skill and luck, martingales

We show that a little amount of skill makes a hugh dierence for wealth growth in a
gamble - the same observation is true if one considers skills in active asset management.
Consider an investor with initial capital W0 playing a the following dice game: She invests
in each period 1 unit of her capital. The outcome of the strategy in each period is of +1
with probability p or −1 with probability q = 1 − p. She does not change her strategy
over time. The outcome in each period is an IID sequence (Xk ) of random variables. Her
246 CHAPTER 3. FUNDAMENTALS THEORY

wealth after n periods reads

n
X
Wn = W0 + Xk .
k=1

What is the probability that she attains a nal wealth level Wf > W0 ? To derive the
wealth dynamics equation the rst step is to dene disjoint sets of events which allow to
calculate probabilities. We set

n
X m
X
AW0 ,n = {W0 + Xk = Wf , 0 < W0 + Xk < Wf , m < n}
k=1 k=1

for the set where she reaches the desired wealth level for the rst time after n plays
without being bankrupt before. Since (Xk ) are IID, the sets (AW0 ,n )n are independent.
Therefore the probability p̃(W0 , Wf ) that the investor reaches the desired wealth level
Wf sometime is given by

∞ ∞
!
[ X
p̃(W0 , Wf ) = P AW0 ,n = P (AW0 ,n ) .
n=1 n=1

The following dynamics

p̃(W0 , Wf ) = q p̃(W0 + 1, Wf ) + pp̃(W0 − 1, Wf ) (3.85)

captures game logic. This is a rst order dierence equation. A solution is found by
inserting the guess

p̃(W0 , Wf ) = A + BrW0 , r = q/p (3.86)

if q 6= p with A, B two constants. A, B are determined by the two conditions p̃(0, Wf ) =

0, p̃(Wf , Wf ) = 1. We get for Wf > W0 :

rW0 −1
(
W , if p 6= q ;
p̃(W0 , Wf ) = r f −1 (3.87)
W0
Wf , if p = q.

If the game is fair (a martingale), then the probability to reach a 50 percent higher
wealth level than the starting value of 100 units is 66%. If the investor's strategy has a
small skill component such that q = 0.49 and p = 0.51, then the probability to reach the
desired level is 98%.

Predictability from a forecast point of view uses linear regressions of returns R on

a variable xt of the form
Rt+1 = a + bxt + t+1 (3.88)

with a, b constants, t+1 a sequence of IID normal random variables with mean 0 and
3.6. THE EFFICIENT MARKET HYPOTHESIS (EMH) 247

variance σ2. The variable xt can be the return itself or a market price variable such as
price-dividend ratios. The regression (3.88) becomes a random walk if b=0 or if a = 0,
b=1 and xt = Rt . For the latter choice, the random walk regression

Rt+1 = Rt + t+1 (3.89)

implies
t+1
X
Rt+1 = R0 + j , Et (Rt+1 ) = R0 , σ 2 (Rt ) = tσ 2 .
j=1

This shows that R is martingale and that the variance increases over time. R0 = 0
is a reasonable assumption for short term returns and it implies that discounted price
processes are martingales too. Therefore, discounted prices are martingales if the returns
are martingales with expected zero return.

3.6.2 Testing Predictabilty

Cochrane (2013) tests for lag returns predictability by considering

Rt+1 = a + bRt + t+1 (3.90)

for US stocks and T bills using annual data, see Table 3.20.

Object b t(b) R2 E(R) σ(Et (Rt+1 ))

Stock 0.04 0.33 0.002 11.4 0.77
T bill 0.91 19.5 0.83 4.1 3.12
T bill excess 0.04 0.39 0.00 7.25 0.91

Table 3.20: Regression of returns on lagged returns annual data 1927-2008. t(b) is the
t-statistic value and σ(Et (Rt+1 )) represents the standard deviation of the tted value
bRt (Cochrane [2013]).

The result shows that stocks are almost not predictable while T bill returns are.
A value of b = 0.04 for stock means that a if returns increase by 10% this year the
expectation is that they will increase by 0.4% next year. Also the R2 is tiny and the
t-statistic is below its standard threshold value of 2. For the T bill returns the story
is dierent - high interest rates last year imply that the rates this year will again be
high with a high probability. Can this foreseeability of T bills be exploited by a trader?
Suppose rst that stocks would be highly predictable. Then one could borrow today and
invest in the stock market. But this logic does not work for T bills since borrowing would
mean to pay the same high rate than one receives. To exploit T bill predictability the
investor has to change his behavior - save more and consume less today which is totally
dierent from the stock case. This is a main reason why one considers excess returns Re
- return on stocks minus return on bonds - in forecasting with Rb the benchmark return:

Re,t = Rs,t − Rb,t . (3.91)

248 CHAPTER 3. FUNDAMENTALS THEORY

By analysing the excess return one separates the dierent motivations 'to consume less
and to save' from the willingness to bear risk. Table 3.20 shows that considering excess
return we are back for T bills in the almost non-predictable stock case. Lo and MacKin-
lay (1999) nd that short-run serial correlations are not zero and that the existence of
'too many' successive moves in the same direction enables them to reject the hypothesis
that stock prices behave as random walks. There is some momentum in short-run stock
prices. Even if the stock market is not a perfect random walk, its statistical and eco-
nomic signicance have to be distinguished. The statistical dependencies are very small
and dicult to transform into excess returns. Considering transactions costs for example
will annihilate the small advantage due to the momentum structure (see Lesmond et al.
[2001]).

We consider longer time horizons and use market prices or yields to forecast returns
following Cochrane (2005). Following the dividend/price (D/P) issue of last section,
we consider the return-forecasting regressions of Cochrane (2013) in Table 3.21. The
regression equation reads

e Dt
Rt→t+k =a+b + t+k (3.92)
St
with Re the excess return dened as CRSP
22 value-weighted return less the three-month
Treasury bill return. The return-forecasting coecient estimate b is large and it grows for
e
σ(Et (Rt+1 ))
Horizon b t(b) R2 e ))
σ(Et (Rt+1 e
E(Rt+1 )
1 year 3.8 (2.6) 0.09 5.46 0.76
5 years 20.6 (3.4) 0.28 29.3 0.62

Table 3.21: Return-forecasting regressions, 1947-2009, annual data. t(b) is the t-

statistic value and σ(Et (Rt+1 )) represents the standard deviation of the tted value
bDt
St , σ(Et (Rt+1 )) = σ(b D
e
St ) (Cochrane [2013]).
t

longer time horizon. Hence, high dividend yields D/S (low prices) mean high subsequent
returns and vice versa. The R2 of 0.28 is large when we compare it with an R2 of predict-
ing stock returns on say a weakly basis which are seen to be not predictable. Therefore,
excess returns are predictable by D/P ratios. Fama and French (1988) document that 25
to 40 percent of the variation in long-holding-period returns can be predicted in terms of
a negative correlation with past returns. Behaviorists Behaviorists attribute this 'fore-
castability' to stock market price 'overreaction' which is due to investors facing periods
of optimism and pessimism which cause the deviations from the fundamental asset values
(DeBondt and Thaler (1995)).

The above tests are not stable. First, the point estimate of the return forecasting
coecients and its associated t-statistic vary signicantly if dierent sample periods are

22 Center for Research in Security Prices at Chicago Booth business school.

3.6. THE EFFICIENT MARKET HYPOTHESIS (EMH) 249

considered. Second, the denition used for 'dividends' impacts the results.

If we take conditional expectations in equation (3.92),

e Dt
Et (Rt+1 )=a+b . (3.93)
St

Since dividend/price ratio varies over time between 1 and 7, return predictability is the
same as to say that expected returns vary over time. b = 3.8 and a variation
Using
of D/P by 6 percentage points, turns into a long-term variation of expected returns of
3.8 × 6 = 22.8 percentage points which is too high given that the long-term average
expected return is 7 percentage points

Dt+k

are
When we analyze the regression of dividend growth,
Dt replaces the return in
(3.92), Cochrane (2013) states:Returns, which should not be predictable, predictable
[see Table 3.21]. Dividend growth, which should be predictable, is predictable.not
This contradicts the traditional view that expected returns are constant and that if
prices fall then future dividends should also decline: Dividends have to be predictable
since they have to approach the low price levels. The above observation states that
on average we observe a dierent pattern. To deepen the discussion, we consider the
multi-period Fundamental Asset Pricing equation (3.84),

∞ j
!
X Y 1
St = Et Dt+k . (3.94)
Rt+k
j=1 k=1

Using log-variables (lower case symbols) change products into sums and we get for one-
period from (3.94):

st − dt = Et (∆dt+1 ) − Et (rt+1 ) . (3.95)

This generalises to many periods with ρ the discount factor:

∞
X
Et ρj−1 (∆dt+j − rt+j ) .

st − dt ∼ (3.96)
j=1

Rearranging, it follows that long-run return uncertainty comes from cash-ow uncer-
tainty (changes in dividends and D/P ratios). The more persistent r and ∆d are, the
stronger is their eect on the D/P ratio since more terms in the summation matter. If
dividend growth and returns are not predictable, their conditional expectations are con-
stant over time, then the D/P ratio is constant which is not observed. This extension
to many periods for the D/P ratio also holds for the variance equation (3.98) where the
discounted summation enters in the return and dividend growth variables. As in the one-
period model, the long-run return and long-run dividend growth regression coecients
must add to one. By regressing the long-term return and dividend growth Cochrane
250 CHAPTER 3. FUNDAMENTALS THEORY

(2013)states:

Return forecasts - time-varying discount rates - explain virtually all the variance of
market dividend yields, and dividend growth forecasts or bubbles - prices that keep rising
forever - explain essentially none of the variance of price.

This changes the traditional view on the EMH. Traditionally, expected returns were
assumed to be constant (asset pricing model) and stocks were martingales with zero drift
(random walks). In this reasoning, low D/P ratios happens when people expect declines
in dividend growth and variations in D/P are due to cash ow news entirely (dividend
predictability). The above result states that the opposite is true. The variance of D/P
is due to return news and not to cash ow ones.

Predictability is also related to the volatility of prices. Shiller states that if prices are
expected discounted dividends then prices should vary less than their expected variables.
But prices vary wildly more than they should even if we knew future dividends per-
fectly. This is the excess volatility of stock returns pointed out by Shiller.

We claim that return predictability and excess volatility have the same cause. To
obtain an equation for the variance we rst write regressions of returns and dividend
growth on dt − pt with br , bd the respective coecients. Plugging the regressions into
(3.95) we get:
1 = br − bd , 0 = t+1,r − t+1,d (3.97)

where the residuals enter the two regression. Therefore, the expected return can be
higher if the expected dividend is higher or the initial price is lower. The only way the
unexpected return can be higher is if the unexpected dividend is higher, since the initial
price cannot be unexpected. Since a regression coecient is covariance over variance,
1 = br − bd reads:

σ 2 (pt − dt ) = cov(pt − dt , ∆dt+1 ) − cov(pt − dt , rt+1 ) . (3.98)

This shows that D/P ratios can only vary if they forecast dividend growth or forecast
returns in regressions. Since the dierence between the two coecients must be one
(3.97), if one coecient is small in the regression then the other one has to be large.

To capture any positive autocorrelation in price movement econometricians use often

the Autoregressive Moving Average (ARMA) model of Box and Jenkins (1970)

p
X q
X
Rt = ak Rt−k + bk t−k + t + c
k=1 k=1

where the IID error term are t ∼ N (0, σ 2 ). The variance of the error terms is then often
modelled using a Generalised Autoregressive Conditional Heteroskedasticity (GARCH)
model by Bollerslev (1986). The literature documents patterns of persistence which vary
3.6. THE EFFICIENT MARKET HYPOTHESIS (EMH) 251

for the asset classes and the markets under consideration. While such patterns are found
on daily, weekly, monthly or even on an annual basis for stocks and bonds, the time
periods are much shorter for FX markets which we consider below.

3.6.3 Cross-Sectional vs Time Series Predictability

The prediction of nancial returns can be based on a cross-sectional and a time-series
approach. Which approach is better suited for return prediction? Consider a momentum
strategy where past winning stocks in a cross section enter in a long position and post
losers dene a short position. Net investment is zero, i.e. the long and short exposure
equalize at inception of the strategy. In a time-series momentum strategy, investors
are long stocks with past returns above zero and short the other ones. The exposure
do not add up to a zero investment but investors are net long in a bullish market.
Using NYSE quoted stocks between 1946 and 2013 and considering the past annual
performance for the long/short selection for the next month a cross-sectional annual
return of 5 percent compared to the time-series strategies return of 9.3 percent. This
observation was reported by Moskowitz et al. (2012). They conclude that time series
strategies fully explain and subsume cross-sectional strategies. We note that the ndings
hold for individual assets as well as for indices of stocks, currencies or commodities.
Evidently, the two strategies are not directly comparable - one being a net zero in-
vestment and the other one either net long or short. Furthermore the threshold levels,
i.e. which asset enters as a long or short position, is arbitrary and possibly, there is a
natural or optimal threshold such that the observed dierences qualitatively change.

Goyal and Jegadeesh (2018) make the two strategies comparable by correcting the
net long/short position. Since more stocks earned positive returns than negative returns
during the sample period, time series' long positions are bigger than the short positions.
The average long and short positions are $ 1.24 and $ 0.76, respectively. Therefore,
the time-series-constructed portfolio earned returns for simply being net long during a
bullish period. The authors therefore add to the cross-sectional strategy a time-varying
investment in the market equal to the dollar value of the dierence between the long
and short sides of the time series strategy each month. Doing this exercise, for NYSE
quoted stocks, the adjusted cross-sectional strategies show an annual return of 9.4 per-
cent; similar to the 9.3 percent found when using time-series strategies. Therefore, the
literature claims that time-series return predictability methods dominate cross-sectional
ones is erroneous.

3.6.4 EMH Extensions and Critique

The debate about the EMH raises questions which are not related how to proceed with
the joint hypothesis testing problem but whether the EMH per se is a meaningful con-
cept. Considering the denition of the EMH in its operationalized form, three objects
matter in the EMH [Lo (2004)]: Prices, probabilities and preferences. By the martingale
property, see equation (3.99), prices, probabilities and preferences enter decision making
252 CHAPTER 3. FUNDAMENTALS THEORY

of investors in a form such that no prots are possible by trading on the available infor-
mation since any prot is already captured in present prices.

Given the three parts prices, probabilities and preferences of the EMH and the strin-
gent martingale property which combines them, two critical points are immediate: Be-
haviour and technology. It is from a behavioural perspective not convincing that these
highly behavioural sensitive parts of the EMH are related with a single mathematical
property where behavioural facets do not matter. Whether the investors are greedy or
whether they fearing market crashes does not matter how they perceive the odds in price
formation (probability) nor how they value the outcome of their decision. The adaptive
EMH of Lo adapts the original EMH and makes it context-dependent and dynamic. The
adaptive EMH becomes a statement dependent on the environment of the economy and
the markets and the behavior of market participants.

Denition 51 (Lo (2004)). Prices reect as much information as dictated by the combi-
nation of environmental conditions and the number and nature of species [types of agents]
in the economy.
The behavior of the agents is considered to follow evolutionary principles. The dy-
namics of how market participants interact and therefore how the price dynamics of the
assets follows is driven by evolutionary principles which are better suited to describe the
market dynamics than the equilibrium concept in the EMH.

Lo (2004) states the following implications of the adaptive EMH. The risk and reward
relation is not stable over time since the population of agents and how they interact are
time varying. Similarly, the institutional and regulatory set-ups are also not constant
over time. A second implication is that temporary arbitrage opportunities are possible
and therefore, the EMH-critique of Grossman and Stiglitz (1980) does not apply for the
adaptive EMH. The possibility of temporary arbitrage possibilities also shapes the per-
formance of active investment strategies which in the EMH are useless. As an example
one considers the rolling monthly rst-order autocorrelation coecient of the S&P Com-
posite Index returns from January 1871 to 2003. By the EMH, the coecient should
by zero. The empirical plot shows that rst the coecient is typically positive where
there are periods of clustering values of the coecient. Finally, innovation is the key to
survival. The EMH states that certain levels of expected returns can be achieved simply
by bearing a sucient degree of risk. Since in the adaptive form the risk/reward relation
is not constant it turns out that adaption to the changing market conditions is the main
source for stabilizing the risk/return reward.

Technology is one environmental factor which is under permanent change. Consider

the information set Ft which is used in decision-making by the agents or investors. The
amount of data which is processed today and which will made available in the future
due to big data analytics is not comparable with information available in the past. The
Medaillon fund, see below, is an example where technology leads to a comparative ad-
vantage in specifying the information set. More precisely, the fund is able to separate
3.7. ASSET PRICING 253

valuable information from noise by processing permanently the extremely larger number
of signals from the markets, the news and other communication types.

The discussion so far considered non-specic individuals. The examples Warren Buf-
fet and Renaissance Medallion Fund show that particular skills and expertise allow in-
dividuals to generate excess returns as if the markets were inecient while for many
other investors the same markets are ecient. Buet and the Renaissance Medallion
Fund use their skills to predict future returns in very dierent forms. Buet's goal is to
understand specic rm in detail on an idiosyncratic level and then to embed view rm
specic investment view in a sector and macro context. The Medallion Fund is a quant
fund founded by the mathematician Jim Simons. The Fund which was set-up 1998 for
the employees of Renaissance generated in the period 1998 to 2016 an annual return of
80% with 1999 the only year with a loss of around 4 percent. The fund generated in this
period more than USD 55 billion in prot which is more protable by several billions of
USD than the next best funds. Even more notably the invested AuM have been smaller
than those of their competitors.

The fund has always been very secret about the methodology used. One knows that
Simons hired top scientists from the computer industry, notably from IBM, and Ph.D
in mathematics or physics from the top universities. The model which they constructed
is based one signal detection. This means that their powerful IT system is processing
all types of signals which are generated in the world. By a signal not only simple ones
such as realized price changes are considered but also signals from speeches and from
documents are detected. The success of their model is given by their power to separate
noise from valuable information and then to translate them into trades. The model itself
is not a single strategy encoded by a quantitative model but many dierent strategies
are integrated into one system.

3.7 Asset Pricing

The fundamental asset pricing equation of the last sections are variants of the more basic
equation which is a result in general equilibrium asset pricing :

Et (Mt,t+1 St+1 ) := E(Mt,t+1 St+1 |Ft ) = St , ∀t, (3.99)

with M the stochastic discount factor (SDF) and Mt,t = 1. Therefore, MS is a martin-
gale. There are dierent expressions for M. The exact nature of the SDF depends on
the nature of the asset pricing model. Specifying the asset pricing model species M and
its name - SDF in general equilibrium when intertemporal marginal rate of substitution
describe M or equivalent martingale measure in derivative pricing models.
254 CHAPTER 3. FUNDAMENTALS THEORY

3.7.1 Equivalent Formulation of the Fundamental Asset Pricing Equa-

tion
There are several extensions, equivalent formulations or straightforward conclusions of
last section's fundamental pricing equations which we consider next. First, the future
price St+1 of equity can be the sum of dividends and price changes of the asset S.
Therefore, the notion of a payo Xt+1 is used:

St = Et (Mt+1 Xt+1 ) (3.100)

Second, since St is known at time t, equation (3.99) reads in terms of gross return - payo
divided by price -
g
Et (Mt,t+1 Rt+1 ) = 1. (3.101)

With R = Rg − 1 the return we get

0 = Et (Mt+1 Rt+1 ) = hMt+1 , Rt+1 it (3.102)

i.e. the return and SDF are orthogonal. Note that also the excess return Re = R − Rf ,
with Rf the risk free return, is orthogonal to the SDF. Using

E(M R) = E(M )E(R) + cov(M, R)

we get for any asset i the traditional expression:

covt (−Mt+1 , Rt+1,1 )

Et (Rt+1,j ) = , (3.103)
Et (Mt+1,j )

i.e. the expected asset return is expressed with its covariance with the SDF. Using the
correlation notation,
e )
0 = Et (Mt+1 Rt+1 implies for asset j:

0 = E(M) E(Rje ) + ρM j σ(M )σ(Rje ).

Since the correlation is bounded between −1 and 1, the Hansen-Jagannathan volatility

bound follows:
σ(M ) |E(Rje )|
≥ . (3.104)
E(M ) σ(Rje )
This equation restricts the set of discount factors that can price a given set of returns,
as well as a restriction on the set of returns given a specic discount factor. If (!) M is
given in a pure nance economy without consumption, then we show that E(M ) = R1f .
By (3.104) we need very volatile SDF with a mean near one to understand stock returns.
For US stocks from Oct 1926 to Dec 2016, the Sharpe ratio is 0.41 and the real Rf is 1.
Hence, the volatility of the SDF should be higher than 0.41.

Rewriting equation (3.122),

St = Et (Mt+1 )Et (Xt+1 ) + covt (Mt+1 , Xt+1 ) . (3.105)

3.7. ASSET PRICING 255

Hence, asset prices are equal to a expected discounted cash ow plus a risk premium.
Idiosyncratic risk is by denition the part that is not correlated with the SDF and hence
does not generate any premium. What can be said about the sign of the covariance in
(3.105)? Since the SDF is an indicator of bad times but assets pay o well in good times,
the covariance between them is typically negative:

St < Et (Mt+1 )Et (Xt+1 ) . (3.106)

This generates a risk premium and allows risky assets to pay more than the interest rate.
Setting X equal to the stock price S and writing S̃t = St /Mt , (3.106) becomes

S̃t < Et (S̃t+1 ) . (3.107)

Investors expect positive gross asset returns. The asset price dynamics is not a martin-
gale under the empirical probability.If asset price dynamics would be a fair coin toss then
returns would not be predictable.Contrarily, to generate risk premia, asset prices have to
be predictable in the statistical sense.

Insurance investments show the opposite behavior to nancial assets in equation

(3.105): A nancial investment's return is positive in good times and negative in bad
times. Contrary, an insurance investment's return is negative in good times but pays o
well in bad times. The covariance in equation (3.105) is positive. Therefore the value of
the insurance, the left-hand side of (3.106), is larger than the right-hand side.

Example

The price of a forward satises:

0 = Et (MT XT ) = Et (MT (ST − ft,T )) .

This orthogonality equation states that the forward rate is given by the orthogonal pro-
jection:
ft,T = Et (ST ) + covt (MT , ST )Rf .
The forward price is therefore equal to the expected future spot price at time T plus a
risk premium.

As a rst specication of the SDF we assume that it is given by a linear regression

Mt = bRt + t (3.108)

with R any portfolio of N assets and b a vector of weights. Here geometry enters into
play. We introduce to the geometry in the next section. We estimate the optimal value of
b. Think about M and R to vectors. Then the optimal value of b is given by the shortest
distance of M toR. But this is the perpendicular value of M on R, i.e. the orthogonal
projection. The noise term is then perpendicular to bR. In other words, regressions are
nothing but projections in a suitable space. This geometric notion is made precise in the
next section.
256 CHAPTER 3. FUNDAMENTALS THEORY

3.7.2 Geometry of Asset Pricing

Starting point is the linear factor model for returns (3.108). See Back (2010) and LeRoy
and Werner (2000) for a detailed discussion.

We assume that the random variables M, R and epsilon are square-integrable. The
space of returns is an innite dimensional complete normed vector space H (a Hilbert
space). Although of innite dimension, the geometric intuitions of the Hilbert space
R3 can be applied. In a Hilbert space, the notions of a basis of vectors, orthogonality,
projection, least square distance common in R3 are well-dened. The norm is induced
by the scalar product for random variables y, x:

hx, yi := E(xy) . (3.109)

The N-dimensional return R span a subspace R ⊂ H . Then a regression Mt = bRt + t is

a projection of M on the space R: The dierence between the return and its projection
is an orthogonal error . The goal in empirical asset pricing is to nd projections such
that the error remains small, i.e. returns and their projection dier only slightly.

To gain intuition, we consider the regression

R = b1 F1 + b2 F2 +

where for simplicity we omitted the time index and we consider the regression of a return
R on two random variables F called factors. We switch to this example since it represents
the prototype problem in AM. Let R ∈ R3 and the factor space F be 2-dimensional. We
distinguish (i) the factor space is a vector space or (ii) the factor space plane does not
intersects the origin (hyperplane), see Figure 3.17. The second case is generic in our
context since the factor space is generated by random variables plus a constant; the risk
free return.
P is an orthogonal projection of a real vector space onto a subspace if it is
A map
linear,P 2 = P (projection of a projection does not alter the result), and P 0 = P . An
orthogonal projection in Hilbert space of a vector X on a vector Y reads

hX, Y i
PY (X) = Y =: E(X|Y ), (3.110)
hY, Y i

i.e. the conditional expectation operator is an orthogonal projection. Hence there is

a possibility to think in geometric terms about the usual asset pricing equations and
notions which are commonly expressed in probabilistic terms.

We project R on F = span{F1 + F2 }, where the two factors are assumed to be

orthonormal.
23
23 Assuming that the factors are orthonormal is not a restriction since the Gram-Schmidt procedure
allows us to construct for linearly independent vectors a set of orthonormal vectors recursively as follows.
3.7. ASSET PRICING 257

R R

PF(R)

F
PF(R)

Figure 3.17: Left panel - projection on the factor space which is a vector space. Right
panel - projection where the factor space is an ane space, i.e. translated away from the
zero vector.

Then

2
X hR, Fi i
PF (R) = Fi . (3.112)
hFi , Fi i
i=1

Let F̃i be linear independent vectors which are not orthonormal. Set F
F1 = F1 . To get the second vector, we use the full spanning property holds:
PF (R) + PF ⊥ (R) = R , PF + PF ⊥ = I . (3.111)
Then hF̃1 , F2 i
F̃2 = P(F̃1 )⊥ (F2 ) = F2 − PF̃1 (F2 ) = F2 − F̃1 .
hF̃1 , F̃1 i
This vector is orthonormal to the rst one and the construction is continued for the next vector by
projecting the third vector on the orthogonal complement of the rst two orthogonal vectors.
258 CHAPTER 3. FUNDAMENTALS THEORY

It is straightforward to check the two properties that make P an orthogonal projection.

24
If R is orthogonal to both factors the projection vector is zero, i.e. the orthogonal error
is maximal. The factors used for regression are not able to explain the return at all. The
opposite holds if one factor is collinear to the return vectors. If we project R on a ane
space F = {a + F̃1 + F̃2 }, where the two factors are assumed to be orthogonal, then

2
X hR − a, Fi i
PF (R) = a + Fi . (3.113)
hFi , Fi i
I=1

Therefore a simple shifts follows. As a consistency check, if a is equal to R then the

projection is the identity. If the factors are orthonormal and not only orthogonal, then
hFi , Fi i = 1. Using that the inner product in factor models is induced by the expectation,
the last formula reads:

2
X cov(R − a, Fi )
PF (R) = a + Fi (3.114)
σ 2 (Fi )
I=1
cov(x,y)
and dening
σ 2 (y)
=: βx,y , standard formulae such as the CAPM follow. Let F = RM
be the single market return factor, then for a=0 and omitting the time index:

PRM (R − Rf ) = E(R − Rf ) = E(R) − Rf (3.115)

cov(R − Rf , RM ) cov(R, RM )
= RM = RM (3.116)
σ 2 (RM ) σ 2 (RM )

As a second example, consider the matrix linear regression

y = Xβ +

with , y ∈ Rn , β ∈ RK+1 and X ∈ Rn×K the factor associated to β1 is 1, i.e. the rst
row in the matrix X has entries 1 in each cell. Using the above formalism we get:

Proposition 52. Given the above matrix linear regression, the least-square estimated
plan yb = PX (y) = X βb is given
PX (y) = X(X 0 X)−1 X 0 y , βb = (X 0 X)−1 X 0 y (3.117)

24 For example
2
X hR, Fi i
PF (PF (R)) = PF ( Fi )
i=1
hF i , Fi i

2 2 2
X hR, Fi i X hR, Fi i X hFj , Fi i
= PF (Fi ) = Fj
i=1
hFi , Fi i i=1
hFi , Fi i j=1 hFi , Fi i
2 2 2
X hR, Fi i X δij X hR, Fi i
= Fj = Fi .
i=1
hF i , Fi i
j=1
hF i , Fi i
i=1
hF i , Fi i
3.7. ASSET PRICING 259

The residual matrix QX = 1 − PX is given by

= y − yb = y − PX y = QX y .
b (3.118)

The residual sum of squares RSS is given by

RSS := hb i = hy, Qyi.
, b
There are some basic results from projection theory on Hilbert spaces which are used
over and over again in AM: The classic projection theorem states: the shortest dis-
tance between a point M ∈ H and a plane F is achieved by a line perpendicular to the
plane. Suppose that an asset x ∈ H is illiquid and let F be span of liquid asset payos.
The portfolio of liquid assets which is closest to x is the orthogonal projection. The
projection of H onto F can be dicult to calculate. Assume that F can be decomposed
into two orthogonal subspaces where it is simpler to calculate the projections onto these
smaller spaces. Then, the original projection is equal to the sum of the two projections
onto the smaller spaces. A dierent form of sequential projecting is the case where one
has to project H onto a subset G ⊂ F . Then o rst projecting on F and then projecting
the result on G is the same as the direct projection on G. This is the tower property in
the conditional expectation language.

Using this formalism we dene:

Denition 53 (Factors). Factors F = (F1 , . . . , FK ) are independent square-integrable

random variables with zero expectation. F is the linear space of dimension K generated
by the factors. A factor model is quantied by a SDF M = a + b0 F where a is a number
and b a vector of dimension K .
M being element of a Hilbert space and (ej ) a basis of this space, then

∞
X
M= aj ej
j=0

where the sum over |aj |2 is nite. This is the analogue to nite dimensional vectors
except that we need a notion of convergence due to the innite dimensionality of the
Hilbert space. The similar decomposition applies to the factors and the factors span the
SDF if M can be replicated exactly by the factors. If the factors span a subspace of the
total space, then SDF possess an error in the factor representation.

Denition 54 (Beta Pricing Model) . Let F = (F1 , . . . , FK ) be a vector of random

variables, R0 a constant and λ a K -dimensional constant vector. There exists a multi-
factor beta pricing model with factors F if for each return R:
E(R) = R0 + λ0 β (3.119)

with
β := CF−1 cov(F, R) (3.120)

the vector of multiple regression betas of the return R on the factors F and CF the
covariance matrix of the F 's.
260 CHAPTER 3. FUNDAMENTALS THEORY

The elements Fj are the risk factors and λ is the factor risk premium. If λ > 0, then
an investor is compensated for holding extra risk by a higher expected return when risk
is measured with the beta w.r.t. F. The coecient β is the coecient of an orthogonal
projection of the return R on the space generated by the factors F plus a constant.
The risky asset's risk premium is proportional to the covariance between its returns and
the SDF (its systematic risk). In the CAPM for example, the market return replaces
the SDF. Factors can be abstract random variables, portfolio returns, excess returns or
dollar-neutral returns. The model is exact, because there is no error term in (3.119). One
can always take factors to have zero means, unit variances and be mutually uncorrelated
by using
Fb := CD−1 (F − E(F )) (3.121)

with CD the Cholesky decomposition of CF . If F are returns then the factor

the factors
risk premium λ becomes an ordinary risk premium λ = E(F ) − R0 .25

When is a factor pricing model exact? By the Riesz theorem, see below, the price of
an asset q(St ) is given by a scalar product hMt+1 , St+1 i of the future asset price and the
SDF. If the SDF is element of the space spanned by the factors, then the pricing of the
asset can be done by using the factors with arbitrary precision: Beta pricing and factor
pricing models are then equivalent, see Proposition 63. In other words, the quality of the
factors used to replicate the SDF determine the quality of the representation of expected
return by their betas.

We consider the Riesz Representation Theorem. It states that any linear function on
a Hilbert space can be represented by a scalar product.

Theorem 55. (Riesz) Let H be a Hilbert space and p : X → R a linear map. There
exists a vector r∗ ∈ H , the Riesz kernel, such that

p(x) = hr∗ , xi

for all x ∈ H .
To apply it in asset pricing, let X be the future payo of an asset and p(X) the
present pricing of the payo. This pricing functional that maps future payos, elements
of H into current prices, the reals. It is natural to assume that the pricing function p is
linear, i.e. for no-arbitrage reasons we impose value-additivity. Furthermore p should be
continuous, small payo variations have small prices. That is p is a linear functional on
the space of future payos. The Riesz theorem states that there exists a random variable
M such that
p(X) = hM, Xi = E(M X).
Hence, by the Riesz Theorem to price any payo it suces to use M - the stochastic
discount factor (SDF) and to apply the expectation. The theorem does not tell how to

25 To prove this consider a one-factor model F = R in (3.119). If there is a risk free asset, then
R0 = Rf .
3.7. ASSET PRICING 261

nd M. In absolute (equilibrium) pricing it is given by the marginal consumption rate

substitution; in derivative pricing a deterministic discount factor or any numeraire can
be used.

3.7.3 Absolute Pricing (General Equilibrium)

Investors solve a fully-edged economic model: they choose optimal consumption and
investment portfolios over time to maximize their expected utility function. The op-
timal policies clear the markets of all goods and nancial assets which determines the
equilibrium asset price dynamics. Only a few models can be explicitly solved, for more
complicated ones numerical approximation methods are used. In equilibrium no rational
investor has an incentive to deviate from the equilibrium allocation: If an investor is
optimally short, there must be an investor who optimally buys the asset, else markets
do not clear.
26
This contrasts with relative or derivative pricing were no consumption, explicit utility
function matters but only the no arbitrage principle. But it turns out that any general
equilibrium allocation is free of arbitrage - it is a necessary condition for an equilibrium
to exist.
We consider a simple general equilibrium model to highlight some ideas. Two in-
vestors i = 1, 2 can consume at the beginning and end of a single period a single good c.
Both derive utility from a logarithmic utility function over consumption.

They face the same endowment (salary) and only dier in their impatience: The
time discount rates b1 and b2 are dierent and hence the time value of money is dierent.
The only asset to invest in the nancial market is a risk-free bond B, which they can
exchange, i.e. there is no money.

An optimal policy xes the optimal consumption levels at the two dates and the
investment amount in the bond at the rst date. These optimizations determine optimal
consumption ci (B) and investment φi (B) for each investor. The policies depend on the
yet exogenous given bond price B. Inserting these strategies in the market clearing
condition xes the endogenous price B = e−Rf of the bond, i.e. the risk-free interest
rate Rf follows from the interaction of the investors. Let φk (S) be the number of bonds
investor k buys and keeps at time 0. Market clearing means φ1 + φ2 = 0: what 1 sells
(buys) must 2 buy (sell). Inserting the individual optimal investment strategy functions
xes the equilibrium risk-free interest rate

2(1 − b1 b2 )
Rf = .
b1 + b2 + 2b1 b2
All quantities which enter symmetrically in the optimization such as endowment have to
cancel in the equilibrium expressions. The time value of money is driven by impatience. If

26 We follow Cochrane (2005), Back (2010), Campbell and Viceira (2002), Cochrane (2011), Culp and
Cochrane (2003), Merton (1971, 1973), Martellini and Milhau (2015), Schaefer (2015) and Shiller (2013).
262 CHAPTER 3. FUNDAMENTALS THEORY

impatience is zero, the risk-free rate is zero. Other limit or sensitivity cases follow at once.

We derive formally the solution by allowing for more heterogeneity. The log prefer-
ences are:

ui (ci0 , ci1 ) = log(c10 ) + bi log(ci1 ) , i = 1, 2 , 0 ≤ bi ≤ 1

with bi the time preference rate. The budget restrictions read for investor i (with e the
endowment):

1
ci0 − ei0 = − φi
1 + Rf
ci1 − ei1 = φi

with Rf the yet unspecied risk free rate. We introduce the Lagrangian L:

1
Li (ci , φi , λi ) = ui − λi0 (ci0 − ei0 + φi ) − λi1 (ci1 − ei1 − φi ) .
1+r
The FOC read:

∂Li 1 bi
0 = i
=⇒ ci0 = i , ci1 = i
∂cj λ0 λ1
∂Li
0 = =⇒ λi0 = λi1 (1 + r)
∂φi
∂Li
0 = .
∂λij

Solving these equations implies:

ei0 ei1 1
ci0 = + = i
PV(e )
(1 + bi ) (1 + Rf )(1 + bi ) 1 + bi
bi (ei0 (1 + Rf ) + ei1 )
ci1 =
1 + bi
−ei1 + ei0 bi (1 + Rf ) − ei1
φi =
1 + bi
(1 + Rf )(1 + bi )
λi0 =
ei0 + Rf ei0 + ei1
1 + bi
λi1 = .
ei0 + Rf ei0 + ei1

Using market clearing, we get the equilibrium interest rate:

e11 + e21 − e10 b1 + e21 b1 − e20 b2 + e11 b2 − e10 b1 b2 − e20 b1 b2

Rf =
e10 b1 + e20 b2 + e10 b1 b2 + e20 b1 b2
3.7. ASSET PRICING 263

Assuming that endowment is the same for both agents, endowment cancels in the last
expression and the above equilibrium rate follows. If risk enters in the model, the FOC
conditions become equations with expected values but still the same logic applies.
We derive the fundamental asset pricing equation in the context with risk. Assuming
separable preferences, rational investors derives expected utility from two-period con-
sumption at the present date t and a future date t + 1,

Et [u(ct , ct+1 )] = Et [u(c1t )] + bEt [u(ct+1 )] , , 0 ≤ b ≤ 1

with b the time preference rate. He chooses investment to maximize expected utility
where consumption is assumed to be al ready optimally chosen. There is only a single
risky asset S and two budget constraints at time t and t+1 (with e the endowment):

ct − et = −φt St
ct+1 − et+1 = φt Xt+1 .

Introducing the Lagrangian, the FOC imply the Fundamental Asset Pricing Equation
(3.122)- for asset S at time t:
St = Et (Mt+1 Xt+1 ) (3.122)

with M the stochastic discount factor ( SDF),

u0 (ct+1 )
Mt+1 = b (3.123)
u0 (ct )

and u0 (c) marginal utility of consumption. Hence, price is expected discounted

payo. (3.122) assumes that there is the underlying general equilibrium model, which
ensures that a single SDF exists which can be used to price all assets by discounting
payos. Since consumption at time t+1 is stochastic from vista time t, Mt+1 is stochastic
too. The SDF is high if time t + 1 turns out to be a bad time - consumption is low in
future states, see Figure 3.18. Then future payos are discounted weakly in the pricing
equation (3.122) and they attribute to assets in bad times a high price.

The SDF relationship between asset prices and consumption states that investments
proposed by asset managers should protect investors' optimal consumption in the short
and long run. This sound theoretical model has drawbacks. First, investments derived
from consumption data often underperform. Second, the assumption and knowledge of a
single utility function is unrealistic. Data science is a feasible and powerful alternative.

The ratio of marginal utilities in the SDF reects that investors value money more
when they need it in bad times than in good times. Marginal utility can therefore be
seen as an index of bad times and the SDF as a substitution measure between present
and future consumption is an index of growth in dierent times. The price changes of
S in the fundamental pricing equation (3.122) can have three causes: The probability p,
the discount factor M or the payo X. There is strong evidence that expected return
264 CHAPTER 3. FUNDAMENTALS THEORY

Figure 3.18: Marginal utility u0 is a decreasing function of consumption. Hence, in bad

times where consumption is lower at the future t+1 than at present t, the ratio of
marginal utilities in (3.123) is larger than one.

variation over time and across assets dominate and that asset valuation moves far more
on news aecting the discount factor than on news of expected cash ows, that is, the
payo X.

We consider some examples. If we consider a risk-less asset S0 , that is X(s) = 1 in

all states s, then S0 = E(M ). Therefore, the risk-less rate Rf satises

ST 1 1
1 + Rf = = = .
S0 S0 E(M )
Assuming a constant relative risk aversion utility function u(c) = c1−γ , 0 < γ < 1,
the SDF reads
−γ c
ct+1 −γ ln t+1
M =b = be ct
∼ b(1 − γ∆ct+1 )
ct

ct+1
up to the rst order where ∆ct+1 = ln ct . Expanding again up to rst order:

1 1
1 + Rf = ∼ (1 + γEt (∆ct+1 )) .
E(M ) b
Hence interest rates are higher if people are impatient (low b) or if expected consumption
growth is high. Since high consumption growth means people get richer in the future one
3.7. ASSET PRICING 265

has to oer high risk free rate such that they consume less now and save.

How much does Rf vary over time is the same to ask how much must one oer to
individuals to postpone consumption. This variation is given by the risk aversion factor
γ. Expanding the risk-free rate relation up to second order:

1 1
1 + Rf ∼ (1 + γEt (∆ct+1 ) − γ 2 σt2 (∆ct+1 ) .
b 2
Therefore, higher consumption growth volatility lowers interest rates which motivates
investors to save more in uncertain times.

Using

E(M R) = E(M )E(R) + cov(M, R) , βi = cov(M, Ri )/var(M )

and λ = −var(M )/E(M ), beta pricing follows (deleting time indices):

E(Rie ) = βi λ (3.124)

Inserting the explicit utility function up to rst order we get in ( ??)

e
cov(Rt+1 , ∆ct+1 )
e e
Et (Rt+1 ) = βλ ∼ γ cov(Rt+1 , ∆ct+1 ) = γσt2 (∆ct+1 ) . (3.125)
| {z } σt2 (∆ct+1 )
=λ | {z }
=β

If assets covary positively with consumption growth or equivalently negatively with the
SDF then they must pay a higher average return. High expected returns are equivalent to
low asset prices. From a risk perspective, the above equations state that average returns
are high if beta on the SDF or on consumption growth ∆c is large. This is the above
'bad times - low consumption growth - high SDF - high returns or high asset prices' story.

Using the fundamental equation (3.122) with a risk free rate and the approximation
for the SDF we get:

Et (Xt+1 )
St = Et (Mt+1 Xt+1 ) ∼ − γ cov(Xt+1 , ∆ct+1 ) . (3.126)
Rf

Again, price is higher if the asset payo is a good hedge against consumption growth
(negative correlation).

3.7.4 Projection Pricing and SDF Formulation

We showed that pricing of assets can be represented by state prices or risk-neutral prob-
abilities. We recall that projections are used to write any payo in a Hilbert space as
a sum of a payo in the asset span and an orthogonal part. The Riesz representation
states that any linear, continuous function on a Hilbert space can be represented by a
266 CHAPTER 3. FUNDAMENTALS THEORY

scalar product which in our set-up is induced by an expected value. Finally, the inner
product is induced by expectation, i.e. hx, y, i := E(xy). The main source for this section
is LeRoy and Werner (2000).
All random variables dened on the asset span hSi ⊂ RS also span a Hilbert space.
Therefore, the Riesz Representation Theorem applies on the asset span functionals too.
The expectations functional and the payo pricing functional turn are of particular in-
terest. Pricing functionals p are linear functionals hSi → R.. The extension of p the
whole asset space RS is the valuation functional. If markets are free of arbitrage, p
is strictly positive. If x ∈ hSi is an Arrow -Debreu state, p=ψ is a state price. Hence,
p is a linear combination of the basis state prices. If markets are complete, the unique
representation ψ(x) = hψ, xi holds for the valuation and pricing functional. Formally:

Denition 56. The expectations functional E maps every payo x ∈ hSi into its expec-
tation E(x). The payo pricing functional p maps every payo x ∈ hSi into its price
p(x).

By the Riesz Represetation for both functionals exist a unique vector k∗ , M ∗ such
that

E(x) = E(k ∗ x)
and

p(x) = E(M ∗ x).

M∗ is called the pricing kernel - the SDF - and k∗ the expectation kernel. The
∗
vector space generated by M and k ∗ is denoted E.

The construction of the dierent kernels is straightforward. For the pricing kernel,
consider the two-dimensional set hSi = ⊂ R2 , (1/4, 3/4) the probabilities of
span(1, 1)
the expectation in the inner product and p(s) := 2s1 for s = (s1 , s2 ) ∈ hSi. Since (1, 1)
is a basis of the span, the Riesz kernel has to be a multiple a(1, 1) of the basis vector
with a ∈ R:
a 3a
p(1, 1) = 2 × 1 = 2 = E(r∗ (1, 1)0 ) = ×1+ × 1 = a,
4 4
i.e. a = 2 and the kernel reads r∗ = (2, 2). To calculate the expectation kernel, let
x1 , . . . xm be m payos P with S components for the states with probabilities ps , s =
∗ ∗
P
1, . . . , S . Then, E(x) = Ps ps xs and E(k x) = s ps ks xs . Since the expectation kernel
is in the asset span, k =
∗
v as xv : The kernel can be spanned by the payos with the
coecients unknown. But then

X
p s xs = av xs xsv
v

k∗ =
P
denes a linear system for the a's. Solving this system and using v as xv provides
the expectation kernel. If there are three states with equal probability, two payos
3.7. ASSET PRICING 267

x1 = (1, 1, 0) and x2 = (0, 1, 1), then the expectation kernel reads k ∗ = (a1 , a1 + a2 , a2 ),
2 ∗
and from
3 = E(k xj ), j = 1, 2 the linear system

2
3
1 a1 1 a1 + a2
2 = + .
3 3 a1 + a2 3 a2

k ∗ = 23 , 34 , 23 follows.

Solving the system,

We summarize some basic facts.

Theorem 57. 1. If the risk-free payo is in the asset span, then the expectations
kernel is risk-free and equal to one in every state.

2. If the risk-free payo is not in the asset span, then the expectations kernel is the
orthogonal projection of the risk-free payo on the asset span.

3. The pricing kernel is unique regardless of whether markets are complete or incom-
plete.

4. Let ψ1 , . . . , ψS be the state prices of the S states and ps the corresponding probabil-
ities of the states. Then −M ∗ is the orthogonal projection of the vector ψ/q on the
asset span.

5. For any SDF M , E[(M − M ∗ )x] = 0, x ∈ hSi. M ∗ is the projection M on hSi.

6. k ∗ = e if markets are complete.

We apply the representation to mean-variance theory.

Denition 58. The mean-variance frontier is the set M which consists of all payos
x ∈ hSi such that there exists no other payo x0 in the asset span with the same expected
value and the same valuation.

The next theorem is the main result.

Proposition 59. In a discrete, nite state market M = E . k∗ and M ∗ are collinear i

all portfolios have the same expected return. If the risk-free asset is element of the asset
span, then expectations kernel and the pricing kernel are collinear i the expected payo
for each asset is equal to r. Then, k ∗ = e and M ∗ = 1r e.

Hence, a payo is a mean-variance frontier payo i it lies in the span of the expec-
tations kernel and the pricing kernel. Since return is dened as payo divided by price
and price is given by the valuation functional, we have for x = M ∗:

∗ M∗ M∗ k∗ k∗
RM := = , R := .
p(M ∗ ) E[(M ∗ )2 ] E[k ∗ ]

This two returns are frontier returns with unit price.

268 CHAPTER 3. FUNDAMENTALS THEORY

Proposition 60. Assume that the pricing and expectation kernel are not collinear.
a) The set of frontier returns is given by the line spanned by the two frontier returns Rk
∗

∗
and RM : For λ a number
∗ ∗ ∗
Rλ = Rk + λ(RM − Rk )

is a frontier return.

b) If the expectation kernel is risk free,

∗
var(Rλ ) = λ2 var(RM ). (3.127)

c) If the risk-free payo is in the asset span, then the risk-free return is the minimum-
variance frontier return. If the risk-free payo is not in the asset span, then
∗ ∗ ∗
cov(Rk , RM − Rk )
λ0 := −
var(RM ∗ − Rk∗ )
denes the minimum-variance frontier return Rλ0 .

d) Given any frontier return Rλ , which is dierent from the minimum-variance frontier
return, there exists a zero-covariance frontier return RλC , i.e. cov(Rλ , RλC ) = 0.
Using this proposition, we can recover Beta pricing models. Let Rj be the return of
an asset j. Then,
Rj = PE Rj + j
denes the projection on the space E and the epsilon term is orthogonal to this space.
Since this space is generated by the expectation and pricing kernel, epsilon is orthogonal
to these two kernels and hence has zero expectation and price. This implies that the
projected return PE Rj is a frontier return. We span this return in a new basis Rλ and
the zero-covariance return, i.e. for some parameter βj

Rj = RλC + βj (Rλ − RλC ) + j .

Taking expectations and the covariance w.r.t. Rλ , which is uncorrelated with the zero-
covariance return and the epsilon, implies that the beta coecient is the ordinary regres-
sion coecient of Rj on Rλ . If the risk-free payo is in the asset span, the beta pricing
equation
E(Rj ) = Rf + βj (E(Rλ ) − Rf )
follows. Since the market return in the CAPM turns out to be also a frontier return, Rλ
can be replaced by the market return. Hence, the SML of the CAPM is a special case of
beta pricing. The analysis not only holds for a single asset but for a portfolio.

The beta pricing with one factor Rλ is generalized in the above geometric set-up in a
straightforward way. The span E is replaced by a span F of K normalized factors fj , i.e.
3.7. ASSET PRICING 269

their expected value is zero, and the risk-free asset xf . Projecting an arbitrary payo xj
on the new span space, switching from prices to return the usual representation

K
X
Rj = E(Rj ) + βjk fk + j (3.128)
k=1

follows with the beta's the factor loadings. As in the proof of the beta pricing repre-
sentation, if the pricing kernel and the risk free asset are elements of F, then the exact
factor pricing equation X
E(Rj ) = Ff + βjk λk
k
∗
holds with λk = −E(RM fk )Rf .

So far we did not consider any equilibrium economy analysis in this representation
set-up. To do so, consider a two period economy where agents derive utility from con-
sumption of a single good, the utility function is a smooth function, individuals are
strictly risk averse, there are K factors fj and where the expected error epsilon in (3.128)
conditional on the factors is zero.

Theorem 61. Under the above assumption, if the risk-free asset, the factors, and agents'
endowments at date 0 lie in the asset span and if the aggregate date 0 endowment lies in
the factor set, then exact factor pricing holds in any equilibrium in which the consumption
allocation is interior.
We nally relate this representation to the case where the SDF M is linearly related
to factor returns, such as for the CAPM

∗
Mt+1 = a + bRM,t+1 , (3.129)

with a, b constants. Using this SDF, the CAPM formulation follows

E(Rj ) = Rf + βj,M (E(RM ) − Rf ) (3.130)

if the parameters a and b are appropriately chosen. For the mean-variance model,

∗
Mt+1 = a + bRmv,t+1 ,
where Rmv,t+1 is any mean-variance ecient return. Again given any Rmv,t+1 and a
risk-free rate, we nd a SDF that prices all assets and vice versa. This shows that the
CAPM and Markowitz model are approximations to the general equilibrium
pricing kernel or SDF - the ratio of marginal utilities of consumptions at dierent
dates is approximated by ane functions in the market and mean-variance return re-
spectively.

It is worth to express the relationship between factor models and beta representations
in general since the expression of a risk premium given in (3.124) is of limited practical
use because it involves the unobservable SDF. The idea is to start with investable
factors and then derive the beta representation which is equivalent to the SDF approach.
270 CHAPTER 3. FUNDAMENTALS THEORY

Denition 62. A K -factor model is quantied by M = a + b0 F where F is the K -

dimensional vector of factors, a is a number and b is a vector of numbers. A factor Fk
that has a non-zero loading bk is said a pricing factor
.

The equivalence between factor models and beta pricing models is given in the next
proposition.

Proposition 63. A scalar a and a vector b exist such that M = a + b0 F prices all assets
if and only if a scalar κ and a vector λ exist such the expected return of each asset j is
given by
E(Rj ) = κ + λ0 βj (3.131)

where
1 1
λ=− cov(M, F ), κ = .
E(M ) E(M ) − 1
The K × 1 vector βj is the vector of multivariate regression coecients of the return of
asset j on the risk factor vector F .

The vector λ is called the factor risk premia. The constant κ is the same for all
assets and it is equal to the risk-free rate if such a rate exists. We mentioned above that
factor models often are not given as pay-os nor as returns, but the fundamental pricing
equation is expressed using pay-os. It possible to replace a given set of pricing factors by
a set of pay-os that carries the same information. The following proposition summarizes:

Proposition 64. Starting with a SDF in the factor model format M = a + b0 F , we

can always construct a new SDF M ∗ = a∗ + b0 F ∗ where a∗ and F ∗ are the constant a
mimicking and the factor F mimicking payos. These mimicking expressions depend on
the original factors and the payo x as follow:

a∗ = E(x)0 E(xx0 )−1 x , fk∗ = E(Fk x)0 E(xx0 )−1 x, k = 1, ..., K.

'Mimicking' means that the new SDF is as close as possible chosen to match the pay-
o. Summarizing, there is no loss of generality from searching for pricing factors among
pay-os.

Cochrane (2013) distinguishes between pricing factors and priced factors. Con-
sider M = a + b0 F and the factor risk premia λ of Proposition 63. The coecient b in
the SDF is the multivariate regression factor of the SDF on the factors. Each component
of the factor risk premia is proportional to the univariate beta of the SDF with respect
the corresponding factor. If b is non-zero for a given factor means that the factor adds
value in pricing the assets given all other factors - a pricing factor. If the component
of the factor risk premia is non-zero, then the factor is rewarded - a priced factor. The
two concepts are not equivalent except in the case where all factors are independent.
3.7. ASSET PRICING 271

Summarizing, the three representations - discount factors, mean-variance frontiers,

and beta representation - are all equivalent. They all carry the same information. Given
one representation, the others can be found. Economist prefer to use discount factors,
nance academics prefer the mean-variance language, and practitioners the beta or factor
model expressions.

But there is bad news. Factors are related to consumption data entering the SDF.
While multi-factor models try to identify variables that are good indicators of bad vs
good times - such as market return, price/earnings ratios, the level of interest rates, or
the value of housing - the performance of these models often varies over time. The overall
diculty is that the construction of the SDF by empirical risk factors is more an art than
a science. There is no constructive method that explains which risk factors approximate
the SDF in all possible future events reasonably well.

So far we did not consider how to choose risk factors for investment. We discuss some
theoretical recommendations for the choice of risk factors. First, factors should explain
common time variation in returns. Second,assuming that there exist a risk-free rate rf
and M = a + b0 F , then the denition of the SDF implies for any asset k return rk :
E(rk )
b0 cov(rk , F ) = 1 − .
1 + rf
For all assets earning a dierent expected return than the risk-free rate, the vector of
covariances between the risk factor and the asset's return must be non-zero. Regressing
the returns on the candidate pricing factors, all assets should have a statistically signi-
cant loading on at least one factor. This choice recommendation is model independent.

The next recommendation is based on the APT model. APT not only requires that
factors explain common variation in returns but the theory suggests that these factors
should also explain the time variation in individual returns. This ensures that the pay-
o and hence the price of an asset can be approximated as the pay-o of a portfolio of
factors. Therefore, the idiosyncratic terms should be as small as possible. Performing a
PCA, the largest eigenvalues follows and hence the main factors.

3.7.5 Arbitrage Pricing Theory (APT)

Ross's (1976b) arbitrage pricing theory (APT) starts from the SML and no arbitrage.
Like the CAPM, APT assumes that asset prices are based on systematic risk and not
total risk. But dierent to the CAPM (i) it not assumes that all investors behave alike,
i.e. not all investors need to keep the same portfolio, (ii) nor that the tangency portfolio
or capital-weighted market portfolio is the only risky asset that investors hold and (iii)
that more factors than the single market factor act as risk sources: APT is based on the
idea that portfolios of stocks can be good approximated as linear combinations of returns
of a few basic major macroeconomic factors. Specically, the assumptions underlying the
APT are:
272 CHAPTER 3. FUNDAMENTALS THEORY

• security returns can be described by a linear factor model;

• there are suciently many securities available to diversify away any idiosyncratic
risk: In a large and diversied portfolio the idiosyncratic risk contributions should
be negligible due to the law of large numbers - investors holding such a portfolio
require compensation only for the systematic part.

• arbitrage opportunities do not exist.

APT does not assume an economic equilibrium nor the existence of risk factors driving
the opportunity set for investments. While CAPM and ICAPM represent the SDF in
terms of an ane combination of factors, APT decomposes returns into factors. CAPM
explains the risk premia; APT leaves the risk premia unspecied.

Assume that there are k factors Fk with a non-singular covariance matrix CF and
N >> F returns RN . Projecting the returns orthogonally on the set generated by the
factors plus a constant:

Ri = E(R)i + cov(F, Ri )CF−1 F + i (3.132)

with Fk = Fk − E(Fk ) the centred factors and idiosyncratic risks i satisfying E(j ) =
cov(Fk , j ) = 0 and E(j k ) = 0 for all j 6= k . The restriction that the residuals should
be uncorrelated across assets implies:

C = β 0 CF β + C (3.133)

where C is a diagonal matrix with non-zero elements the variances of the idiosyncratic
risks, CF is the factor covariance matrix and β is a m×N matrix of betas.

Denition 65. The returns in equation (3.132) have a factor structure with the factors
F1 , . . . , Fk if all residuals are uncorrelated.
To understand APT, assume rst that idiosyncratic risks are zero in (3.132). We can
derive an exact beta pricing model starting from the fundamental asset pricing equation
E(M Ri ) = 1. Writing the expectation product as single expectations plus the covariance
term, inserting (3.132) for the return and rearranging implies the beta pricing equation
(3.131) in Proposition 63:
E(Rj ) = κ + λ0 βj (3.134)

If the residuals are not zero, we get

E(M j )
E(Rj ) = κ + λ0 βj − (3.135)
E(M )
with the additional the pricing error. The idea is E(M j ) → 0 if we increase the number
of uncorrelated assets, see Proposition 4. The analysis requires a precise mathematical
modelling under the assumption that no arbitrage holds. The APT theorem states that
if there are enough assets then the beta pricing equation is approximatively true for most
assets.
3.7. ASSET PRICING 273

Example

Consider two assets with two dierent factor loadings but the same factor F. What
relationship holds between their expected returns if there is no arbitrage? Let φ be the
weight of the rst asset in a portfolio and 1−φ the weight of the second one. The
portfolio return reads (we set for simplicity idiosyncratic risk to zero)

RP = (µR,1 + β1 F)φ + (µR,2 + β2 F)(1 − φ) .

We choose φ such that the portfolio return becomes

(µR,1 − µR,2 )β1

RP = + µR,2
β2 − β1
This is a risk-free portfolio. Therefore, the return must be equal to the risk-free rate µ0 .
Rearranging,
µR,1 − µ0 µR,2 − µ0
= =λ.
β1 β2
Since the two expressions are the same for any asset, the ratios must be equal to a
constant value λ; the factor risk premium since it represents the expected excess return
above the risk-free rate per unit of risk (as quantied by F ). The two assets have the same
factor risk premium. Otherwise, arbitrage is possible. This equality can be rewritten as

µR = µ0 + βλ , (3.136)

the exact beta factor relation holds.

3.7.6 Pricing Real-Estate Risk

We apply the asset pricing theory to real estate risk. This is an important risk source.
In the US, the value of real estate owned by households and non-prot organizations in
2017 was USD 23.8 tr. We state some characteristics of real estate risk.

First, the market for real estate is often larger in valuation than the entire stock
markets. In Switzerland, the value of real estate in (2014) was about 4 to 5 times larger
than the value of all companies listed on the SIX exchange. Second, pure real estate
risk is illiquid. The annual turnover of private owned real estate is in the low one-digit
domain. Table 3.22 illustrates the illiquidity using data from the the state of Zurich in
2011.
The holding period median value of the private persons' homes is 25 years. Hence,
the construction of a repeated sales index, which would be a transaction based price
index, is not meaningful. Third, one cannot short property. Fourth, historically real
estate risk is the most prominent a frequent driver for a nancial crisis. Fifth, friction
costs for direct real estate transaction are high. Sixth, since each property is unique, the
274 CHAPTER 3. FUNDAMENTALS THEORY

Number of houses in the state 690'000

Of which property 210'000 (30%)
New constructions in 2011 11'000 (1.6%)
Of which property 4'300 (40%)
Arm's-length transactions 7'110
Resales 3'700 (1.7%)

Table 3.22: Liquidity for the state of Zurich.

construction of a standardized asset which can be aggregated to form an index is not

a trivial task. Summarizing, real estate markets are incomplete and inecient. Hence
pricing real estate risk is more an art than a science.

What do we mean by real estate risk? Figure 3.19 provides an overview of investments
and consumption in the real estate asset class.

Figure 3.19: Dierent use of the real estate asset class (Extension of Zürcher Kantonal-
bank (2015)).

3.7.6.1 US market: Repeated Sales Index versus Constant Quality Index

Case and Shiller (1994) tested the eciency of the US market for single-family homes.
Since resales of houses occur over time periods of decades, the usual tests that work for
equity could not be applied. The quarterly published 'Constant Quality Index' produced
3.7. ASSET PRICING 275

by the US Census Bureau is compared with the Case and Shiller 'Repeat Sales' index
in Figure 3.20. price index (Case and Shiller [1987, 1989, 1990]). A constant quality

Figure 3.20: Two indices of US home prices divided by the Consumer Price Index (CPI-
U), both scaled to 1987=100. Monthly observations in the period 1987-2013 are consid-
ered (Shiller [2014]).

index corrects for dierent quality characteristics zk , k = 1, . . . , K such as size of the

property, view, shopping facilities, number of bedrooms, location, etc. 'Hedoni' refers to
the concept that the value of a home is given by the value of the constituent components
of a home. Therefore, hedonic house prices are not inherently skewed by for example
people shifting to larger properties: Without accounting for change in the parameter
size, property prices increase due to this change in demand or if say the characteristic
'nearness to the city center' changes due to say new trac possibilities, than only this
characteristic's price changes. With enough data points a regression model can be used
to determine the relationship between each of these parameters and the value of a home.
In the time dummy linear model, a single hedonic regression equation is estimated from
data across characteristics starting for quarterly periods 0, 1, . . . , T . The dummy variable
t
takes the value δ if the house is sold in period t 6= 0, and zero otherwise. For property
j with prices S in two adjacent periods t, t + 1 the regression reads
27

K
Sjt,t+1 = β0 + δ t+1 Djt+1 + t,t+1
+ t,t+1
X
βk zk,j j (3.137)
k=1

27 See Fisher, Geltner, and Webb (1994), Hansen, (2009), Silver (2018) and Shimizu et al. (2010).
276 CHAPTER 3. FUNDAMENTALS THEORY

with β the estimated weights of the characteristic. More general models account for
time varying beta over longer time periods. Hedonic models contain between 20 and 30
dierent characteristics for private property.

Figure 3.20 shows that both indices are smooth over time. For real estate price mo-
mentum dominates price volatility. The boom in house prices after 2000 is visible in the
Case Shiller index but not in the Census Constant Quality Index. The reason is that new
homes are built where it is possible and protable to build them. This is often not the
case in the expensive area of a city. Therefore, the constant quality index level through
time is more accurately determined by simple construction costs if as in the US there is
a hugh reservoir of cheap land.

3.7.6.2 Constant Quality Index: Greater London and Zurich Area

Figure 3.21 shows the house price indices in the Greater London and Zurich areas. Both
indices, the Halifax and the ZWEX, are transaction based hedonic models which include
condominiums and single-family houses.

Figure 3.21, left panel, shows that in the mid-1990s house prices in Zurich and Lon-
don started to grow at dierent rates. This is in line with London becoming the world's
major nancial center. In the GFC, the greater vulnerability of the Halifax index is
visible while during the whole GFC house prices in Zurich never fell.

The right panel shows forwards on the Halifax index at dierent time periods in the
GFC period. In May 2007 the forecast was still on an increasing value of the house
prices: Market participants failed to identify the GFC. During the GFC, forward levels
of the index sharply corrected in each month. The culmination point was October 2008
where the forward levels were predicted too low but the turning point of the index was
identied almost perfectly.

The EMH requires that markets are free of frictions. But in housing markets there
are many sources of friction. Figure 3.22 shows friction sources for dierent types of
real-estate investments in Switzerland. 'Direct' means that investors buy houses, 'indi-
rect' means to invest in stocks that are related to housing or investment funds and and
'derivative' refers to property derivatives dened on property indices.

3.7.6.3 Investment, Derivatives

Property derivatives on property indices are niches products compared to REITs in the
US or property investment funds. Property derivative markets never really established.
The rst reliable indices which overcame the standardization problem was launched 1994
3.7. ASSET PRICING 277

Figure 3.21: Left Panel: The Halifax Greater London price index and the Zurich price
index (ZWEX) (ZKB and Lloyds Banking Group). Right Panel: Halifax Greater London
price index and forwards on the index (Syz and Vanini (2008)).

in London. The markets started in 2005 in the US where again OTC products domi-
nated. The CME started to launch 2006 futures with very limited success for residential
investment based on the S&P/ Case-Shiller Index. The most common transactions are
swaps. Derivative instruments allow investors to gain exposure to the real estate asset
class, without having to buy or sell properties by replacing the real property with the
performance of a real estate return index. Most popular instruments are swaps, total
return swaps while options are much less established.

Consider the case of derivatives on the residential, hedonic real transactions property
index ZWEX of Zurich Area. In 2006 warrants, calls and puts on the ZWEX, were issued
to allow investors to protect home owner's capital against falling future house prices (in-
dex mortgages, i.e. ordinary mortgage plus a put on the real estate index) and to oer
leveraged investments at the same time. Fix a home owner which seeks protection from
falling house prices at the end of his 5y xed mortgage contract. He buys a put option
on the ZWEX, see Salvi et al. (2008). The put option should nance possible forced
amortizations at maturity of the mortgage if ZWEX falls. To show the impact on capital
278 CHAPTER 3. FUNDAMENTALS THEORY

Figure 3.22: Frictions for investment in real-estate markets in Switzerland. Lex Koller
is a federal law which restricts the purchase of property by foreigners (Syz and Vanini
[2008]).

protection, consider a present house price of CHF 1 mn and a maximum loan-to-value

(LTV) of 80%, i.e. the mortgage notional is CHF 8000 000. Suppose that house prices are
0 0
down 20% after ve years. Then, 80% LTV of CHF 800 000 means CHF 640 000. With-
0
out a protective put, he is forced to amortize CHF 160 000. The costs of the put option
are 50 bps p.a. Figure 3.23 shows the eectiveness of the hedge for three dierent real
estate house price evolutions. How can the bank as the issuer of put hedge its risk? To
achieve this, a cross hedge between homeowners seeking protection and investor betting
on house prices by buying options from the trading department applies. To be eective,
such a cross hedge assume that the demand on the mortgage and trading side are similar.
But while there was a strong belief that Zurich house price will drop in the GFC period
from investors buying puts, homeowners did not shared this belief. This disequilibrium
led to a failure of the product innovation. One reason was nearness. The buyer of a put
during or after the GFC has a short time horizon while the homeowner has long time
horizon in mind, i.e. when the mortgage contracts are due.

A second example of property derivatives are property swaps, i.e. OTC contracts, see
Geltner and Miller. Assume that a small rm BUY wants to invest in real estate without
facing high costs and illiquity of a direct investment. The rm SELL is over-invested in
real estate and wants to sell real estate market risk exposure. No party intends to buy
or sell objects they are actually invested to circumvent large transaction costs and to
3.7. ASSET PRICING 279

Figure 3.23: Eectiveness of the put option hedge for a 5 year mortgage under three
dierent real estate price scenarios (Syz and Vanini (2008)).

keep regular income stream from the physical objects. A NCREIF Appreciation Swap
('Swap') allows BUY to swap a xed return for NPI appreciation return, i.e. the return
of the property index, and SELL takes the short position of BUY, pays the oating,
quarterly NPI appreciation return and receives from BUY quarterly the xed return.
Netting of the payments occurs quarterly and notional amounts are not exchanged.

We price the Swap using a replication portfolio and no arbitrage. We assume that it
is possible to replicate a NPI return with a portfolio of assets, that there are no frictions
and short-selling is possible. Although these assumptions are violated in practice, the
pricing denes a benchmark which can be compared to the second equilibrium pricing.
The assumptions allow us to construct a risk-less hedge using the replicating portfolio and
the swap. We consider two periods, t, t + 1, t + 2, It the value level of NPI, E[y] expected
income of NPI with y the same random income in each period and S the unknown xed
leg / spread of swap.
The t+1 and t+2 part of the hedge are risk-less. Setting the NPV of the hedge
equal to zero, this means excluding arbitrage, implies for the xed leg

S = Rf − Et [y]. (3.138)

The xed spread S is independent of the NPI level value and only the borrowing costs
of the investor BUY as well the expected income stream matter. If we consider a total
return swap, i.e. all proceeds from the index are also exchanged, then expected income
280 CHAPTER 3. FUNDAMENTALS THEORY

t t+1 t+2
Short Index It −gt+1 It − Et [y]It −gt+2 It − E[y]It − It
Risk-less ZCB −Rf It 0 It
Long Swap 0 gt+1 It − SIt gt+2 It − SIt
Hedge (1 − Rf )It −(S + E[y])It −(S + E[y])It

Table 3.23: Risk-less hedge Swap, long position of BUY. ZCB means Zero Coupon Bond,
D discounting with the risk free rate and g is the growth rate of NPI.

Et [y] is also part of the index value and S = Rf follows using the same replication ap-
proach.

For the equilibrium valuation, we introduce the risk premium

FI = E[RI ] − Rf
and decompose E[y] = E[RI ] − E[g] with E[g] the real estate appreciation rate. Then
the xed no-arbitrage spread
S = E[g] − FI ,
is equal to the expected index appreciation rate minus the risk premium. A no arbitrage
argument is not allowed since it is not possible to short It . We assuming linear pricing
rules in equilibrium. BUY expects the net return which consists of the NPI appreciation
return E BU Y (g), minus S plus receives Rf from the covering bond position to be not
smaller than the swap risk premia:

E BU y [g] − S + Rf ≥ Rf + FI .
SELL also considers his overall net return. It consists of S plus the expected return on
his real estate portfolio E SELL (RS ) which should be as close as possible to the return of
the NPI minus NPI appreciation return E SELL (g). Since by assumption E[y] is constant
and the NPI swap obligation is covered by the bond portfolio, net risk exposure is zero.
Summarizing, SELL's requirement is:

S − E SELL (g) + E SELL (RS ) ≥ Rf .

Connecting the two requirements implies the price range:

Rf − E SELL (RS ) + E SELL (g) ≤ S ≤ E BU Y (R) − FI .

This is not an equilibrium condition since beliefs can dier. If beliefs are the same for
BUY and SELL, then RS = F and the two inequalities become the equality F = E[g]−F ,
i.e. the single price in the complete market using no-arbitrage follows. Assuming that
the expectation of BUY of g is b bps higher than the market expectation and SELL
expectsg to be lower by −s, s ≥ 0, bps below the market expectation, then S can vary
between S ± b + s bps. Clearly, infeasible beliefs are also possible. If say SELL assumes
that market expectations will be lower than his expectations for example, then no S will
exist.
3.7. ASSET PRICING 281

3.7.7 Multi-Period Asset Pricing and Multi-Risk-Factors Models

If we consider equity with D=X the dividends, we get in many periods the fundamental
value equation of the dividend discount model of corporate nance generalizing (3.122):

∞
X 1
St = Et Dt+j , (3.139)
(1 + R)j
j=1

with R the internal rate of return on expected dividends: For two stocks with the same
expected dividends but dierent prices, the stock with the lower price has to have a
higher expected return. Merton's (1973) multi-factor inter-temporal CAPM (ICAPM)
generalizes to the case of several factors assuming:

• Investors choose an optimal consumption path and an optimal investment portfolio

to maximize their lifetime expected utility.

• Investors care about the risk factors market return RM and so-called innovations
Y.

• Innovation factors describe changes in investment opportunity which by denition

is equal to the set of all attainable portfolios. Examples are changing volatilities,
changing interest rates, or labour income. Innovations are orthogonal to the asset
space generated by the market return.

In the Markowitz model, the investment opportunity set consists of all ecient and
inecient portfolios. If the investment opportunity set changes over time, then variables
Y other than the market returns drive returns. Working without these factors trivializes
human behavior and needs. Using market return only for example, all investors are
jobless since no labor income exists. The possible change of the investment opportunity
set for investors is more important for longer-term investment horizons than for shorter
ones since the deviations from a static opportunity set can become larger for longer time
horizons. The solution of the ICAPM model generalizes (3.124) to

St (Re ) = bM λM + bI λI = Θcov(Re , RM ) − Ωcov(Re , RI ) (3.140)

where Θ is the average relative risk aversion of all investors and Ω is the average aversion
to innovation risk. The mean excess returns are driven by covariances with the market
portfolio and with each innovation risk factors. The geometric intuition of this beta
pricing model is the same as in the case with xed opportunity sets. The rst term
in (3.140) is mean-variance ecient but the total portfolio is no longer mean-variance
ecient. Economically, the average investor is willing to give up some mean-variance
eciency for a portfolio that better hedges innovation risk. The mutual fund theorem of
the Markowitz model generalizes to a K +2 fund theorem if there are K innovation risk
sources. Investors will split their wealth between the tangency portfolio and K portfolios
for innovation risk.
282 CHAPTER 3. FUNDAMENTALS THEORY

3.8 Applications
3.8.1 Low Volatility Strategies
Low-beta stocks outperform in many empirical studies high beta stocks and volatility neg-
atively predicts equity returns (negative leverage eect), see Haugen and Heins (1975),
Ang et al. (2006), Baker et al. (2011), Frazzini and Pedersen (2014), Schneider et al.
(2016). This means, high beta (risk) is not rewarded as it should be according to the
asset pricing equations. These denes the beta and volatility low risk anomalies.

There are dierent ways how to rationalize these anomalies by enlarging models which
lead to the anomalies. Schneider et al. (2016) show that taking equity return skewness
into consideration rationalize these anomalies. Thy generalize the CAPM which serves as
an approximation and allows for higher moments of the return distribution. This leads to
skew-adjusted betas. They use credit worthiness of the rms as the source for skewness
in returns:The higher a rm's credit risk, the more the CAPM overestimates the rm's
market risk, because it ignores the impact of skewness on asset prices (Schneider et al.
(2016)). Benchmarked returns against the CAPM then appear to be too low since the
CAPM fails to capture the skewness eect. Formally, starting with (3.124), dening the
regression coecient βi = cov(M, Ri )/var(M ) and the variable λ = −var(M )/E(M ), we
get the equivalent equation to (3.122)

cov(M, Ri ) σ(M )
Et (Rie ) = . (3.141)
σ(M ) E(M )
Schneider (2015), Kraus and Litzenberger (1976) and Harvey and Siddique (2000) dene
the risk premium as the dierence between the expected value of a derivative X based
on the historical probability P and on the risk neutral probability Q:
Risk Premium = EtP (XT ) − EtQ (XT ) . (3.142)

The two probabilities P, Q can be related to each other by the state price density L:28 f

dQ
L= , E P (L) = 1 . (3.143)
dP
To illustrate the technique, consider two states with probabilities P = ( 12 , 12 ) and Q=
1/3
(1/3, 2/3). Then in state 1, L1 = 1/2 . Therefore,

1
E P (X) = p1 X1 + p2 X2 = (X1 + X2 ) = E Q [LX] = q1 L1 X1 + q2 L2 X2 .
2
Using M =L in (3.141) and the risk premia for the market risk return we get:

cov(L, Ri )
Et (Rie ) = e
Et (RM ). (3.144)
cov(L, RM )

28 The Radon-Nykodim L derivative (math), the state price density (economics), likelihood ratio
(econometrics).
3.8. APPLICATIONS 283

The expected return on asset i is proportional to the expected excess return on the mar-
ket, scaled by the assets covariation ratio with the pricing kernel - the true beta. Since
L is not observable, the authors approximate L(R) := E P (L|R) in a power series in R.29
Using a linear and a quadratic approximation of L in (3.144) changes the true beta into
a CAPM beta (linear case) or a skew-adjusted beta in the quadratic case.

... a rm's market risk also explicitly depends on how its stock reacts to extreme mar-
ket situations .. and whether its reaction is disproportionally strong or weak compared
to the market itself. A rm that performs comparably well ... in such extreme market
situations, has a skew-adjusted beta that is lower relative to its CAPM beta. ... investors
require comparably lower expected equity returns for rms that are less co-skewed with the
market. Schneider et al. (2016)

To incorporate time-varying skewness in the stock returns the authors consider cor-
porate credit risk by using the Merton (1974) model. In this models, equity value at
maturity date is an European call option on the rm value with strike equal to debt
(which is a zero-coupon bond). For rms with high credit risk, the increased probability
to default is reected in strong negative skew of the return distribution. The forward
value of equity is then given by the expected value of the call option discounted with the
SDF M =L under P . This forward value denes with the call option value the rm's
i excess equity return Rie . The expected gross return is given by (3.144) with the linear
and quadratic approximation replacing the SDF. For the linear CAPM the betas increase
with credit risk, i.e. the asset volatility or the leverage, and the rm correlation to the
market. Comparing this beta with the skew-adjusted one, the latter one is in general
larger. The dierence increases the higher credit risk: The rm becomes more and more
an 'idiosyncratic risk factor' and hence less connected to the market the stronger the
skew is. In this sense the CAPM approximation overestimates expected equity returns,
i.e. the return anomaly.

Schneider et al. (2106) apply their model implications to low risk anomalies, the
so-called Betting-Against-Beta (BAB) strategy, see Frazzini and Pedersen (2014).
BAB is based on the empirical observation that stocks with low CAPM betas outperform
high beta stocks. Hence, investors believing in BAB goes long a portfolio of low-beta
stocks and short a portfolio of high-beta stocks. To reach an overall zero beta, the
strategy takes a larger long than short position. The strategy is nanced with riskless
borrowing. Frazzini and Pedersen (2014) document that the BAB strategy produces
signicant prots across a variety of asset markets. Using empirical evidence from 20
international stock markets, Treasury bond markets, credit markets, and futures markets
Frazzini and Pederson (2014) ask:

• How can an unconstrained arbitrageur exploit this eect, i.e., how do you bet against
29 To achieve this L is written as an innite series. The coecients in the series depend on P, Q, i.e.
the price dynamics of the assets, and the risk aversion of the investor. Geometrically, the representation
of L is equivalent to orthogonal projections of L on the space generated by the powers of R.
284 CHAPTER 3. FUNDAMENTALS THEORY

beta?

• What is the magnitude of this characteristic relative to the size, value, and momen-
tum eects?

• How is BAB rewarded in dierent countries and asset classes?

They nd that for all asset classes alphas and Sharpe ratios almost monotonically
decline in beta. Alphas are decreasing from low beta to high beta portfolios for US
equities, international equities, treasuries, credit indices by maturity, commodities and
foreign exchange rates. Constructing the BAB factors within 20 stock markets they nd
for the US a Sharpe ratio of 0.78 between 1926 and March 2012 which is twice as much
as the value eect and still 40% larger than momentum. The results for international
assets are similar. They also report that BAB returns are consistent across countries,
time, within deciles sorted by size, and within deciles sorted by idiosyncratic risk and
are robust to a number of specications. Hence, coincidence or data mining are unlikely
explanations.

The BAP strategy is rationalized in the model of Schneider et al. (2016) as follow.
The CAPM betas increase for xed credit risk (xed volatilities and leverage) with the
rm's correlation to the market: buy stocks with low and sell stocks with high correlation
to the market. The alpha of this strategy, the excess expected return relative to market
covariance risk, is given by the rm's expected return for the skewness. These typically
positive alphas increase with increasing credit risk. Summarizing, the BAB returns can
be directly related to the return skewness induced by credit risk.

3.8.2 What Happens if an Investment Strategy is Known to Everyone?

We follow Asness (2015) who considers the value risk factor - that is to say, bets that
cheap stock investments will beat expensive investments. What happens to a strategy
if it becomes more and more widely known? Intuitively, at the beginning of a strategy
one faces true alpha which is expected to move towards a beta strategy the wider the
strategy is used. But even a public known strategy can continue working for dierent
reasons. First, the investor is receiving a rational risk premium, i.e the strategy exists
in equilibrium. If the long (cheaper) stocks are more risky than the short and more
expensive one on a portfolio level which cannot be diversied away, then it is rational
that there is a persistent risk premium. A second reason is behavioural: Investors make
from a rational point of view errors. The long stocks have a higher expected return not
because they are riskier, but because of these errors - the stocks are too cheap and one
earns a return if they return to their rational value.

The relative impact of both explanations can vary over time. During the tech bubble
of 1999-2000 for example, cheap value stocks - which typically are cheaper because they
are riskier - were cheaper because investors were making errors.
3.8. APPLICATIONS 285

The two explanations behave dierently when a strategy becomes known. In the ra-
tional model the value strategy still works but at a level consistend with the equilibrium
demand and supply side.The equilibrium property conserves both the expected return
and the risk of the strategy.

In the behavioural explanation, the risk source is not systematically linked to the
return in equilibrium. There is no systematic demand and supply as in the equilibrium
model to guarantee that the risk premium will not go away. It is therefore dicult to be
convinced that risk remains stable over time.

Asness (2015) compares these two dierent views using historical data and the Sharpe
ratio. If a strategy has an impact on the risk premia if it becomes more common, the
Sharpe ratio is expected to fall. Either because excess return diminishes or because risk
increases. With regard to the returns, one could argue that if the value strategy becomes
more popular, then the 'value spread' between the long and short sides of the strategy
gets smaller. This spread measures how cheap the long portfolio is versus the short port-
folio. If more and more investors are investing in this strategy, then both sides face a
price movement - long is bid up and short is bid down. This reduces the value spread.

The author uses the FF approach for value factor construction. He calculates the
ratio of the book-to-price ratio of the cheapest one-third over the BE/ME of the most
expensive one-third of large stocks. Clearly, cheaper stocks always have a higher BE/ME
than the expensive stocks. But the point is to compare how the ratio of large-cheap over
large-expensive changes over time as an approximation of the attractiveness of the value
strategy. Considering 60 years of data, the ratio is very stable, with a 60-years median
value of 4. There is no downward or upward trend. The only two periods during which
the ratio grew signicantly - reaching a value of 10 - correspond to the dot-com bubble
and the oil crisis of 1973. This measurement shows little evidence that the simple value
strategy was arbitraged away in the last 60 years.

To analyze the risk dimension, the annualized, rolling, 60-month realized volatility
of the value strategy for the last 56 years is considered. Again, the dot-com bubble is
the strongest outlier followed by the GFC and the '73 oil crisis. There is again little
evidence that the volatility of the strategy is steadily rising or falling. The attractiveness
of a strategy is best measured by the in- and outows of investment in the strategy.
Increasing inows should, on a longer time scale, increase the return of a strategy and
the opposite holds if large outows occur. This was not observed in the above return
analysis.

3.8.3 Short-Term versus Long-Term Investment Horizons

This section is based on Campbell and Viceira (2002). The theoretical set-up allows us
to discuss the relevant practical questions or observations:
286 CHAPTER 3. FUNDAMENTALS THEORY

• Financial planners often recommend investors with a longer investment horizon to

take more risks than in the case of a short time horizon.

• Conservative investors are advised to hold more bonds relative to stocks than ag-
gressive investors. This contrasts the constant bond - stock ratio in the tangency
portfolio of the CAPM. This is the asset allocation puzzle.

• Judgement of risk may be dierent for long-term and short-term investors. Cash
which is considered risk free for the short term becomes risky in the longer-term
since it must be reinvested at an uncertain level of real interest rates.

3.8.4 Time-Varying Investment Opportunities

When investment opportunities vary, optimal long-term portfolio choice is dierent from
myopic portfolio choice. Investment opportunities can vary because market factors do so
(interest rates, volatility, risk premia) or because non-market factors vary (labor income).

We consider the case of time-varying short-term interest rates. An investor with

constant relative risk aversion maximizes his consumption paths by investing in a single
risky equity asset and a risky short-term rate asset. The optimal investment in the risky
asset in (3.13) reads

µt − rt −1 cov(It+1 , −Et (It+1 ))

φ(t) = 2 RRA + (1 − RRA−1 ) (3.145)
σt σt2

with It+1 the short-term interest rate at time t + 1.

If the interest rate return is IID, then the optimal strategy is the myopic one, i.e. the
second is zero. Assume returns are not IID. If the investor becomes more risk averse,
−1
RRA tends to zero. A conservative investor will not invest in the risky asset to cap-
ture its short-term risk premium but rather fully hedge the future risk of the risky asset.
Hence, short-term market funds are not a risk-less asset for a long-term investor. Camp-
bell and Viceira (2002) show that the risk-less asset is in this case an ination-indexed
perpetuity or consol. Note that for all results an individual investor's viewpoint is con-
sidered buy not an equilibrium. Hence, possible equilibrium feedback eects on the asset
prices and returns are missing.

Predictable asset returns lead to a hedging demand. If equity is predictable, there will
be an inter-temporal hedging demand for stocks. Campbell and Viceira (2002) consider
a model where long-term investors face a time varying opportunity due to changing
interest rates or changing equity risk premia. A striking result is that a conservative
investor will hold stocks even if the expected excess return of the stock is negative. How
is he compensated for doing so? We rst assume that the covariance between risky asset
returns at two consecutive future dates is negative. This captures that equity returns
are mean-reverting: an unexpectedly high return today reduces expected returns in the
3.8. APPLICATIONS 287

future. This describes how the investment opportunities related to equity vary over time.
If the average expected return is positive, the investor will be typically long on stocks.
Given a negative correlation, for stocks with a high return today future return will be low
and hence the investment opportunity set deteriorates. The conservative investor wants
to hedge this deterioration. Stocks are just one asset that delivers increasing wealth when
investment opportunities are poor. Figure 3.24 illustrates, for a conservative investor,
three alternative portfolio rules.

Figure 3.24: Portfolio allocation to stocks for a long-term investor, a myopic investor,
and for a CIO choosing the TAA (Campbell and Viceira [2002]).

The horizontal line represents the optimal investment rule if the expected excess
stock return is constant and equal to the unconditional average expected excess stock
return. The TAA is the optimal strategy for an investor who observes, in each period,
the conditional expected stock return. The myopic strategy and TAA cross at the point
at which the conditional and unconditional returns are the same. The TAA-investor
is a myopic investor with a one-period horizon. The SAA line represents the optimal
investment of a long-term investor. There is a positive demand for stocks even if the
expected return is negative. This reveals that the whole discussion in this section can
be seen as describing the structure of strategic asset allocation (SAA). In fact, Formula
(3.13) can be transformed as follows:

φ(t) = Short-Term Weight + Opportunistic Weight

= Short-Term Weight - Long Run Myopic Weight

+ Long Run Myopic Weight + Opportunistic Weight (3.146)

288 CHAPTER 3. FUNDAMENTALS THEORY

The long-term investor should hold long-term, ination-indexed bonds and increase
the average allocation to equities in response to the mean-reverting stock returns (time-
varying investment opportunities). Empirical tests suggest that the response to changing
investment opportunities occurs with a higher frequency for stocks than for the interest
rate risk factor. Therefore, this long-term weight or SAA should be periodically reviewed
and the weights should be reset.

3.8.5 Model Portfolios

Whether or not investors use long-term investments as described in the last sections
depends on constraints taken from WEF (2011):

1. Liability prole - the degree to which the investor must service short-term obliga-
tions, such as upcoming payments to beneciaries.

2. Investment beliefs - whether the institution believes long-term investing can produce
superior returns.

3. Risk appetite - the ability and willingness of the institution to accept potentially
sizable losses.

4. Decision-making structure - the ability of the investment team and trustees to exe-
cute a long-term investment strategy.

Comparing this with optimal investment formula (3.14), point 3. is captured by risk
aversion, 2. denes the asset universe selection of the model and 1. is part of the utility
function.

The WEF (2011) report considers the question who is the long-term investors. They
build the following ve categories. Family oces with USD 1.2 trillion AuM, endowments
or foundations with USD 1.3 trillion AuM, SWFs with USD 3.1 trillion AuM, DB pension
funds with USD 11 trillion AuM and ve insurers with USD 11 trillion AuM. Matching
these dierent types of investors to the above listed four constraints leads to the following
long-term investment table (Source for the table is WEF (2011) and the many sources
cited therein):
The following model portfolio construction of Ang et al. (2018) provides an a prac-
titioner's approach to long and short term investment in asset classes.

Their model portfolios are parametrized by investor's preferences such as risk toler-
ance and the selection of the asset universe. Their construction combines three portfolios:
A performance benchmark reecting investor's risk appetite, a construction of the strate-
gic model relative to the benchmark which reects long-term view on market and nally
a tactical model portfolio is considered mimicking short-term views.
3.8. APPLICATIONS 289

Investor Liability constraint Risk appetite Decision Estimated

Family oces In perpetuity High Low 35%
Endowments In perpetuity High Low 20%
SWFs In perpetuity Moderate Moderate 10%
DB pension funds D 2-15 yrs Low High 9%
Insurers D 5-15 yrs Low High 4%

Table 3.24: Decision represents the decision making structure, D the average duration
and Estimated the estimated allocation to illiquid investments (WEF [2011]).

The benchmark portfolio φB is a xed equity-bond portfolio, say 80/20. The chosen
fraction mimics risk tolerance of the investor. Such benchmarks can be implemented
at low costs and the performance of more complicated portfolio can measured without
diculty. The strategic portfolio φS is the solution of a mean-variance optimization
problem relative to φB where both the risk aversion and covariance matrix are long-term
parameters. Several constraints are used such as equalizing the equity components of
the strategic to the benchmark portfolio, long-only, full-investment and many more. The
short-term or tactical portfolio φS also solves a mean-variance problem where the short-
term expected returns and covariance matrix parameters enter. The two main constraints
are hφS , ei = 0, i.e. it is a zero-dollar long-short portfolio which shapes the strategic allo-
cation, and hφS , CS φs i = 1, i.e. the short-term risk aversion follows from this constraint.
This short term portfolio is weighted by market signals implying the so-called long-short
combined portfolio φC = hw, φS i where the weights wi add up to one. Adding φC + φS
denes the target portfolio φ∗ . Finally, the model portfolio φM is the portfolio which
minimizes the
∗ ∗
variance h(φ − φ , CS (φ − φ )i subject to the full-investment constraint
and linear constraints for the asset classes.

The authors use liquid ETFs to implement the approach. Besides the broad equity
and bonds which enter the benchmark portfolio they use ETFs for the styles momentum,
minimum volatility, value, quality and size. These more complicated indices are added
in the strategic portfolio and the tactical portfolio. Figure 3.25 illustrates their model
portfolio construction.

The gure shows that the start is to choose performance benchmark reecting the
preferences, i.e. blending with MSCI USA Minimum Volatility Index the MSCI USA
Index. Active risk of 250 bps relative to the performance benchmark is added in the
strategic model portfolio. For the model portfolio capturing short-term tactical views
on the chosen style, the factors are re-weighted relative to the strategic portfolio. This
means to add additional an average of 110 bps risk. The model is tested full amount of
active risk relative to the performance benchmark is approximately 300 bps. The model
is tested using data from Jan 2000 to Jun 2017. Note that not all styles existed for the
whole period, i.e. the index values are then theoretically calculated. The nal model
portfolio generated an annual return of 8.9%, outperforming the performance benchmark
by 3.4% per year. The outperformance is attributed to two sources. The strategic port-
290 CHAPTER 3. FUNDAMENTALS THEORY

Figure 3.25: Tactical U.S. equity model portfolio construction process. White MSCI
USA Index, Red MSCI USA Minimum Volatility Index, Blue MSCI USA Momentum
Index, Green MSCI Risk Weighted Index, Light Blue MSCI Value Index. (Ang et al.
(2018)).

folio tilts the factors which possess inherent and persistent risk premia. Second, the
short-term indicators have some ability to predict factor returns. These time-varying
active positions versus the strategic benchmark generate excess return.

Comparing this approach with the general optimal decision making formula, the two
components of long term and short term are present. The way how they enter in the
nal model portfolio is a multi-stage process which consists of several plausible particular
optimizations. Why the whole approach should be optimal at all is not considered at
all. Furthermore, since one period decisions are made in each model type, risks are not
distributed over time in an optimal way but in a kind of a static long term part and a
varying short-term allocation.

3.8.6 Fallacies in Long Term Investment

When asset returns are IID, the variance of a cumulative risky return is proportional to
the time horizon implying that the standard deviation is proportional to the square root
of the time horizon (the square-root rule). Since the Sharpe ratio uses standard devia-
tion, the ratio grows with the square-root of the time horizon. It is therefore tempting
to increase the investment time horizon to increase the Sharpe ratio. This is a pseudo
risk-return improvement since Sharpe ratios must always be measured over the same time
3.8. APPLICATIONS 291

interval.

Are equities less risky than bonds in the long run? Siegel states (Siegel [1994]):

It is widely known that stock returns, on average, exceed bonds in the long run. But it
is little known that in the long run, the risks in stocks are less than those found in bonds
or even bills! [...] But as the horizon increases, the range of stock returns narrows far
more quickly than for xed-income assets [...] Stocks, in contrast to bonds or bills, have
never oered investors a negative real holding period return yield over 20 years or more.
Although it might appear riskier to hold stocks than bonds, precisely the opposite is true:
the safest long-term investment has clearly been stocks, not bonds.

Using the standard deviation, Siegel advices that long-term investors should buy and
hold equities due to the reduced risks of stock returns at long maturities. But such a risk
reduction only holds if stock returns are mean reverting: returns are not IID. But we
showed that a long-term buy-and-hold strategy is not optimal. The optimal strategy is
a strategic market timing strategy with a mixture of myopic and hedging demand parts.
If one follows Siegel's advice, the buy-and-hold investment strategy is not optimal. The
other logical direction is also true: an optimal long-term investment strategy does not
produce the suggested portfolio weights of Siegel.

The herding of pension funds. Pension funds consider, by their very denition,
an innite time horizon in their investments since each year there are new entrants to
the pension scheme. As long-term investors, one would expect pension funds to focus
on their long-term investment strategies. They should therefore behave dierently than
typical short-term asset-only managers. But there is a dierent investment motivation,
which may counteract long-term investment behavior: the fear of underperforming rela-
tive to their peer group, which denes such funds incentive to herd.

Such herding may be stronger for institutional investors than for private investors.
First, there is more trade transparency between institutional investors than between
private investors. Second, the trading signals that reach institutional investors are more
correlated and hence increase the likelihood of eliciting similar reactions. Finally, because
of the size of the investments, institutional herding is more likely to result in stronger
price impacts than is the herding of private investors. Therefore, to adopt a position, as
an institutional investor, outside the herd will have a stronger return impact than would
such a position if adopted by private clients.

Blake et al. (2015) study the investment behavior of pension funds in the UK, an-
alyzing - on an asset-class level - to what extent herding occurs. Their data set covers
UK private sector and public sector dened-benet (DB) pension funds' monthly asset
allocations over the past 25 years. They present information on the funds' total portfolios
and asset class holdings, and are also able to decompose changes in portfolio weights into
valuation eects and ow eects.
292 CHAPTER 3. FUNDAMENTALS THEORY

These authors nd robust evidence of reputational herding in subgroups of pension

funds. Similar pension funds follow each other. Public-sector funds for example follow
other public-sector funds of a similar size. This follows from a positive relationship be-
tween the cross-sectional variation in pension funds' net asset demands in a given month
and their net demands in the preceding month. A second result is that pension funds seem
to use strong short-term portfolio rebalancing. Funds rebalance their long-term portfo-
lios such that they match their liabilities. Since the maturity of pension fund liabilities
increased, pension funds have systematically switched from UK equities to conventional
and index-linked bonds.

The authors also nd that pension funds mechanically rebalance their short-term
portfolios if restrictions in their mandates are breached. They therefore, on average, buy
in falling markets on a monthly basis and sell in rising markets. This is suboptimal given
the optimal investment rule (3.140). Therefore, pension funds' investments fail to move
asset prices toward their fundamental values, and hence do not stabilize nancial mar-
kets. The market exposure of the average pension fund and the peer-group benchmark
returns match very closely the returns on the relevant external asset-class market index.
This is evidence that pension fund managers herd around the average fund manager:
they could simply invest in the index without paying any investment fees.

As a nal result, the pension funds studied captured a positive liquidity premium
contrary to the expectation that these long-term investors should be able to provide
liquidity to the markets and earn a risk premium in return.
Chapter 4

Portfolio Construction
4.1 Steps in Portfolio Construction
So far, we did not consider the logic of portfolio construction but used dierent portfolios
in examples on an ad hoc basis. Several steps dene portfolio constructions:

• Grouping of assets: How do we select the assets to form a portfolio?

• Allocation of assets: How much wealth (weight) do we invest at each date in the
specic assets?

• Implementation of the allocation: How do we transform the asset allocation into

trades?

The grouping of the assets or asset selection can be done on dierent levels:

• Asset classes (AC)

• Single assets

• Risk factors

The allocation of the assets can follow dierent rules:

• Optimal investment (Markowitz, CAPM, Black-Litterman, dynamic Merton-type
models, mean-surplus maximization)

• Heuristic rules (EW, ERC, risk budgeting, factor investing)

• Big data based methods

The implementation of the asset allocation can be done using dierent liquid assets:
• Cash products such as stocks and bonds

• Derivatives such as futures, forwards and swaps

293
294 CHAPTER 4. PORTFOLIO CONSTRUCTION

• Options

• Mutual funds, certicates, ETFs, money market funds

Further implementation issues are liquidity, tax and compliance (eligibility, suitability
and appropriateness).

4.2 Allocation - Foundations of Investment Decisions

The risk, return and diversication properties of assets of last sections were the result of
ad hoc or experience based decision rules. We focus on optimal investment, i.e. rational
decision-making in a probabilistic set-up (statistical models). We distinguish between
optimal investment where people consume and invest (ALM) or where they only invest
(asset-only).

We assume that investors use the expected utility criterion as a rule of choice: The
higher the expected value is for an investment, the more is such an investment preferred.
Like any mathematical model, expected utility theory is an abstraction and simplica-
tion of reality. There exists a large academic literature which reports about systematic
violations of empirical behavior of investors compared to the expected utility theory pre-
dictions. A prominent theory is prospect theory by Kahneman and Tversky (1979) which
is also an optimization problem but typical behaviors of models such as in Markowitz
is enriched. But most investment theories used in practice are still based on expected
utility theory.

The theory assumes that investors form correctly beliefs and that they choose opti-
mal actions or decisions. The beliefs dene the probabilistic set-up about the dynamics
of future returns. One optimal action is the choice of the portfolio weights over time. The
optimal decision is based on the investor's preferences which are represented by her util-
ity function. Optimization requires to maximize expected utility subject to constraints
such as the budget constraint. Decision problems in term of mathematical optimization
are since decades an active eld of research.

If investors face situations where risks (probabilities) are not known, uncertainty
dominates. Then it makes no sense to rely on optimal investment theory but to use
heuristic reasoning, see Section 4.2.4.

4.2.1 Statistical Models, Quadratic Optimization

Most classic investment models such as Markowitz, the CAPM, arbitrage pricing theory
(APT) and Black-Litterman are asset-only models. With new technologies it is possible
today to consider also the liabilities or goals for private investor or pension funds.
Preferences are described by a utility function u wealth W . Utility increases u0 (W ) >
0 with increasing consumption (positive marginal utility) but marginal utility decreases,
4.2. ALLOCATION - FOUNDATIONS OF INVESTMENT DECISIONS 295

u00 (W ) < 0.1 These mathematical conditions imply that investors:

• Prefer more money to less;

• Are risk averse.

We further assume that investors are impatient: They prefer 1 CHF today than 1 CHF
tomorrow.

The mean-variance model was the rst model in portfolio optimization based on the
return-risk trade-o. Markowitz stated in 1952: The investor should consider expected
return a desirable thing and variance of return as an undesirable thing. Three methods
are common to operationalize this principle:

1. Either the investor chooses a portfolio φ to maximize the expected return where
volatility cannot exceed a predened level σ, or

2. Volatility is minimized such that the expected return cannot be lower than a pre-
dened level r or

3. A mean-variance utility function is optimized. The solution φ is parametrized by

the risk aversion, θ.

All solutions are equivalent. We formalize the ideas. Consider N risky assets with
a return vector R in a single period. The expected returns are µ = E(R) and the
covariance matrix C of the returns is given by

C = E((R − µ)0 (R − µ)) .

The objective is to maximize the quadratic utility function which reects the trade-o
between reward and risk:

θ
u(R) = φ0 R − φ0 (R − µ)0 (R − µ)φ .
2

θ is the risk aversion of the investor.

2 Taking the expectations

θ
EP (u(R)) = φ0 µ − φ0 Cφ .
2
Optimization means to nd a portfolio φ which maximizes the above expected utility,
i.e.

θ 0 0
max EP (u(R)) = max φ µ − φ Cφ (4.1)
φ φ 2
1 We always assume that the utility functions are continuously dierentiable.
2 The factor 1 is used to cancel a factor 2 in calculating the optimal portfolios.
2
296 CHAPTER 4. PORTFOLIO CONSTRUCTION

with the solution

1
φ∗ = C −1 µ . (4.2)
θ
The matrix C −1 is the information matrix. The elegance of this formula, which is
the simplest one in the Markowitz model, cannot be overestimated. Just plug in the
information matrix and the expected return and you will get an optimal portfolio. But
both inputs are not observable and hence must be estimated. What is the best way to do
this? This led to a half century of academic research and to frustration by practitioners
using this model. We consider the reasons in detail below. Suppose that there is only
one risky asset and risk free asset with return µf . Then the above optimal rule reads:

1 µ − µf
φ∗ = . (4.3)
θ σ2

The fraction
µ−µf
σ2
is the market price of risk. It is proportional to the Sharpe ratio.

An investor with zero risk aversion puts all the money in the asset with the largest
expected return. If risk aversion is not zero and since risk is always positive, the higher
risk, the lower the optimal level of expected utility. Formula (4.3) states that the optimal
amount invested in each asset is given by a mix of the expected returns of all assets with
the information matrix doing the mix. What is the intuition of how the information
matrix acts? Does it favours diversication? Again, we consider this below.
The success of mean-variance optimization is mathematically due to the success of
quadratic programming (QP) - easy to solve and available, powerful mathematical soft-
ware, solving Since mean-variance optimization does not imply diversication, the mean-
ingfulness of an allocation depends on the chosen constraints. Fortunately, many practical
problems with constraints can be rewritten as QP problems. A quadratic programming
(QP) is an optimization w.r.t. to a quadratic objective function and linear inequality
constraints:

θ
φ∗ = arg max φ φ
0
µ − φ0 Cφ , V φ ≤ Z (4.4)
2
where C is a N ×N matrix and V, Z are two matrices. The constraints Vφ ≤ Z allows
for equality constraints (budget - or full investment constraint), inequality constraints or
band constraints a ≤ φ ≤ b (asset class bounds in a TAA). QP problems are solved using
active set, gradient projection and interior point methods.

Several practical variation of the Markowitz problem are in fact QP, see Perring and
Roncalli et al. (2019) for details. The rst one is to consider a general benchmark b. The
expected excess return or expected tracking error reads µ() between an active managed
portfolio φ and the benchmark b

µ(φ, b) = (φ − b)0 µ , (4.5)

4.2. ALLOCATION - FOUNDATIONS OF INVESTMENT DECISIONS 297

where the dierence

ψ := φ − b (4.6)

denes the active bets of the investor. The tracking error volatility TE is by denition
the volatility of the tracking error dierence:
p
TE = σ(φ, b) = σ(e) = (φ − b)0 C(φ − b) . (4.7)

Minimizing the tracking error volatility and maximizing the expected excess return (or
the alpha) can be written as QP

θ
φ∗ = arg max φ φ̃
0
µ − φ0 Cφ , (4.8)
2
where φ̃0 = φ + θCb is the regularized vector of expected returns, see below for regular-
ization. Second, consider index sampling, i.e. to replicate an index portfolio b with a
smaller number of assets than the index. The goal is to minimize tracking volatility under
some constraints such as full investment and long-only constraints which are linear. The
extra constraint which assures that the number of assets is smaller than the index asset
size can be written in the linear form
X
χ{ } ≤ Size of desired assets

where χ is the indicator function. Despite this non-linear function, the constraint is
linear and hence the whole problem is of the QP type. Other relevant models consider
a turnover constraint, i.e. the amount of sold and bought asset in an optimization is
limited, or transaction constraints, that is the net expected portfolio by subtracting the
bid and ask trading costs, is also a QP problem.
An insightful investor doubts that the probability law is known. He could therefore
consider the investment situation where dierent probabilities matter in the portfolio
choice problem. Then, uncertainty besides risk matter. Formally, let P be a set of
admissible probabilities. The optimization becomes

θ
min max EP (u(R)) = min max φ0 µ − φ0 Cφ . (4.9)
P ∈P φ P ∈P φ 2
The investor assumes that out of all possible probabilities (who denes this set?) the
worst one is chosen by a second player called 'nature'. This denes a robust optimization
problem. The solution will be more conservative than the original one. If one asset is
risk-free, this asset will attract a large part of the invested money. Although theoretically
sound, robust investments in this sense are hardly considered since the wealth allocation
is often to conservative and it is dicult to single out the set of admissible probabilities.
We do not consider this approach any further.

Summarizing, maximizing expected utility has the following general structure.

• There is an utility function u, decision variables χ, such as consumption and port-

folio weights, and state variables ξ, such as wealth.
298 CHAPTER 4. PORTFOLIO CONSTRUCTION

• The constraints dene the admissible set A(ξ). Examples are the full investment
constraint, the budget constraint, the max and min amounts for each asset class,
a turnover constraint or a downside risk bound.

4.2.1.1 Examples
Consider a single period investment problem where the investor derives utility u(W1 )
from nal wealth W1 . The investor chooses a portfolio φ ∈ P Rn for
n assets to maximize
E(u(W1 )) under the two budget P constraints at time 0 and 1: j j Sj,0 = W0 with Sj (0)
φ
the price of asset j and W1 = j φj Sj1 . The rst order condition (FOC) for optimality
reads:
E(u0 (W1 )(Ri − Rj )) = 0 , (4.10)

for all asset pairs i, j . This equation has several implications. First, Ri − Rj means
a long-short combination is optimal. Second, the FOC holds also if one asset is risk
free asset. Third, geometrically the condition states that the excess return vector and
marginal utility are orthogonal to each other, that is
3

hu0 (W1 ), Ri − Rj i = 0 . (4.11)

Fourth, assume that the investor is risk averse u00 < 0. Then it is never optimal to
fully invest in the risk free asset. By contradiction, assume that the investor puts all
his initial wealth in the risk free asset. But then nal wealth W1 will be non-random
and also u0 (W1 ) is deterministic and it which can be taken outside the expected value in
(4.10). But then, unless all risky returns are the same, the FOC cannot be satised.

The utility function denes risk preferences. Consider an investor who is given the
choice a lottery that pays-o either 50 or 100 with the same probability or a lottery with a
guaranteed payo of 75: The bet has the same expected value as the guaranteed payo. A
risk-neutral investor is indierent between the two lotteries, a risk-averse investor prefers
the guaranteed payo.
4
Figure 4.1 shows the payo and utilities for the risk-averse and the risk-neutral in-
vestor. For the risk-averse investor, the expected value of the bet lies also on a straight
line but its utility value (yellow dot) is strictly lower than the utility of the guaranteed
payo (red dot). A risk-averse investor needs an extra compensation 'red minus yellow
dot' such that he becomes indierent.

Investment restrictions are widely used.

5 Practitioners often impose constraints
if the output of an investment optimization is not in line with what they consider a

3 The inner product of the square integrable random variables is used.

4 Jensen's inequality E[u(W )] ≤ u(E[W ]) for a concave utility implies this. Risk aversion is the same
as concave utility functions.
5 Some restrictions are:
• Preference restrictions - limiting the fraction of capital invested in equities.
• Legal restrictions - prohibiting access to some markets.
• Taxation - dierent taxation for the same investment idea wrapped by dierent securities such as
4.2. ALLOCATION - FOUNDATIONS OF INVESTMENT DECISIONS 299

Figure 4.1: Risk-neutral and risk-averse investors.

reasonable strategy. But very constraint has an economic price; the shadow price. The
larger this price the lower is the constrained optimum compared to an unconstrained one.
Furthermore, adding many ad hoc constraints makes it dicult to explain whether a port-
folio is optimal due to the investor's preferences or due to the many constraints. Often in
wealth management several dozen constraints are imposed - constraints expressing client's
preferences ('not investing in hedge funds'), compliance constraints ('Chinese bonds are
excluded') or CIO-related constraints ('weight of Swiss equity is between 20 − 40% for a
specic investor').

We show the loss of utility in restricted optimization. The optimal value of an un-
restricted optimization problem is never lower than the value of a restricted problem.
Consider the minimization of the parabola u(x, y) = x2 + y 2 . The minimum is achieved
for the vector (0, 0) and the optimal value is u(0, 0) = 0. We
insert the restriction that
x + y = r > 0. This means that x and y are positioned on a line. The optimal values
r r r r2
are x = y =
2 and f ( 2 , 2 ) = 2 which is larger than the optimal unrestricted value. The
Lagrange multiplier λ associated to the constraint x + y = r has the value λ = r (shadow
price). Since the unrestricted optimum is a the origin, the larger we choose r , that is

mutual funds or structured products.

• Budget restrictions.
• Liquidity restrictions - large investors do not want to move asset prices when they trade.
• Transaction fee restrictions.
300 CHAPTER 4. PORTFOLIO CONSTRUCTION

the more distant the line is from the origin, the more value is lost. This is exactly the
statement of the shadow price.

4.2.2 Rational Dynamic Decision Making

Investors often face a long-term investment horizon. Pension funds have to satisfy a
liability stream over time, private clients would like to nance dierent goals in the fu-
ture. This denes a dynamic expected utility problem, see Sections 4.3.2 and 4.3.3. The
investor searches a portfolio φt at dierent dates such that the expected present value of
the investment is maximized. To solve such an investment problem optimally one has
to proceed backwards : Solve, in discrete time, the last period investment decision. This
is an optimal single period decision. Then solve the second to last decision given the
last one will be optimal and so on. This backward induction is based on the optimality
principle of Bellman (1954): An optimal policy has the property, that whatever the ini-
tial state is and initial decisions are, the remaining decisions must constitute an optimal
policy with regard to the state resulting from the rst decision.

Why should one consider dynamic investment at all?

Fact 66. Optimal dynamic investment allows to distribute investment risk not only in
the cross-section (single-period models) but also over time.

Despite the meaningfulness of multi-period models, most investment models used are
static ones. There are three main reasons. First, technology was not able in the past
to solve dynamic problem in time, i.e. machines were not fast enough. Second, most
asset managers are well-educated in static models but knowledge about dynamic models
is sparse. Third, yet static models are awed by parameter uncertainty (estimation risk).
The intertemporal set-up adds additional uncertainty.

Optimal dynamic investment is able to take into account changing future investment
opportunities in an optimal way. Static models do not have any foresight power to react
today what could happen in future periods. Changing investment opportunities are key
for long-term investors such as for pension funds.

Example Backward versus forward induction

Consider the case where you have to drive from New York to Boston. Using a repeated
static model (forward induction) you decide at each crossroad given the trac sit-
uation which direction to follow next. Using this strategy you will never arrive in Boston.

Dynamic optimally means that you start with the end in mind: You work backwards
starting in Boston. At each crossroad in the backward approach, you calculate whether
it is best to turn left or right knowing that all decisions which follow are optimal. Given
4.2. ALLOCATION - FOUNDATIONS OF INVESTMENT DECISIONS 301

the circumstances it may be globally optimal to take a non-optimal small road in a

single step if for example this road leads to the next highway. This singles out between
the myriad of paths between New York and Boston the truly optimal one.

Repeating say 10 times an optimal one-period model decision (forward solution) is

not the same than making optimal investment decisions backwards, except for some
particular situations.

4.2.3 Growth Optimal Portfolios

Growth optimal portfolio (GOP) by denition have a maximal expected growth rate over
any time horizon. Therefore, a GOP dominates any other portfolio strategy when the
time horizon increases. If such a strategy exists, why do then people care about any
otehr investment strategies?

The origins of GOP is attributed to Kelly (1956) which also leads to the Kelly cri-
terion in investment. Kelly was not interested in investment but wrote his work with
gambling and information theory in mind. The Kelly strategy, i.e. a GOP, is an optimal
strategy such that with probability one the strategy accumulates more wealth than
any other strategy. The expression 'with probability one' is key. Deleting this expression
leads to wrong statements and decisions concerning GOP.

To motivate GOP, consider a binary gamble, see Rotando and Thorp (1993). Let
W0 be initial wealth, Bk the bet k, p the probability of winning and q the probability of
loosing the bet. Then,
n
X
E(Wn ) = W0 + (p − q)E(Bk ) .
k=1
If the game's expectation is positive, p > q, then to maximize E(Wn ) is the same to
maximize E(Bk ) at each trial. Therefore, it is optimal to bet on all resources in each
trial - W0 = B1 is the starting bet. The ruin probability of such a strategy is 1 − pn ,
i.e you get bankrupt fast almost certain. Contrary, if one minimizes the ruin probability
then one also minimizes expected return. The GOP is an intermediate strategy between
these two over-aggressive or over-timid strategies.

Consider the strategy to invest a xed fraction c of present wealth in the next bet,
i.e. Bk = cWk−1 . If s is the number of successful bets and f the number of failures in n
bets, then
Wn = W0 (1 + c)s (1 − c)f .
If 0 < c < 1, ruin is not possible. Using the compounding identity
1/n
n log Wn Wn
e W0
=
W0
302 CHAPTER 4. PORTFOLIO CONSTRUCTION

the exponential growth rate per trial is given by

1/n
Wn s f
log = log(1 + c) + log(1 − c).
W0 n n

Setting G(c) equal to the expected value of this growth rate, we get

G(c) = p log(1 + c) + q log(1 − c).

Since G(c) = n1 E(log Wn )− n1 log W0 , maximizing G(c) is equivalent to the maximization

0
of log utility E(log(Wn )). Taking the derivative, G = 0 if the optimal xed fraction
c∗ = p − q is chosen. Furthermore G00 < 0 and c = c∗ is the unique maximum.

Proposition 67. 1. If c ∈ (0, c∗ ) is chosen, then wealth will grow unlimited if n → ∞

except for a nite number of wealth terms.6 Contrary, if c ∈ (c∗ , 1) is chosen, ruin
follows if n → ∞.

2. Consider the optimal xed fraction strategy c∗ and any

other strategy φ. Then the
ratio of wealth W (c∗ )/W (φ) tends to innity if the number of trials goes to innity
except for a nite number of wealth terms.

3. The fastest time to reach a target wealth level starting from any level W is given
asymptotically by a strategy which maximizes expected log-wealth utility.

Rotando and Thorp apply the GOP to S&P investing using the data 1926-1984.
First, they calculate the probability of a return below a T-bill return. This probabil-
ity decreases from 38% for n = 2 years to 21% after ten years to 8% after 30 years.
The optimal xed fraction to invest is 117%, i.e. it is optimal to borrow 17% of exist-
ing wealth in each year. This suggests that GOP needs long-term investment horizons
and the optimal strategy is leveraged. Summarizing, a GOP has the theoretical advan-
tage of maximum rate of growth of wealth but it turns out to be too risky in practice.

These results triggered many discussions about the usefulness of GOP. A main cri-
tique was formulated from Samuelson in the 60's of last century. He states that if one
is not willing to accept a single bet then one rationally will never accept a sequence of
such bets: If the ruin probability is not acceptable for the rst year investment given c∗
I will never accept 30 bets of this type. This non-transitivity of preferences is refused by
Samuelson. Thorp answered that the limit GOP respects transitivity.

This discussion about transitivity is central to GOP: Does a property is valued in

the limit n → ∞ or for a nite but large n, and how large is such an n in practice?
Christensen considers geometric Brownian motions.
7 The GOP strategy is to replace
a−r
volatility of the price process by the market price of risk θ= σ and the drift a by the

6 Formally, Wn is approaching innity almost surely.

7 The asset price dynamics dSt = St (adt + σdWt ) with S0 = s has the solution St = se(a− 12 σ2 )t+σWt .
4.2. ALLOCATION - FOUNDATIONS OF INVESTMENT DECISIONS 303

risk free rate r. Then even with a Sharpe ratio of 0.5 it would take almost 30 years to
beat the risk-free bond with a 90% probability.

Summarizing, GOP are too risky: No money manager can survive with a limit-oering
if he is hit say twice in the rst 5 years of his mandate. Impatience of investors rules
out any long term investment strategies which are focussing on maximum return growth
without controlling the possible nite shortfall risks. But controlling for short fall risks
in a mathematical way brings us back to a return-risk framework. A dierent approach
is to mix the mathematics of GOP with business experience by selecting those stocks
for GOP which are not expected to have shortfall risks. W. Buet seems to apply an
investment approach of this form.

4.2.4 Heuristic Models

The heuristic approach is radically dierent from the statistical one. Heuristics are
methods used to solve problems using rules of thumb or experience. Heuristics need
not be optimal in a statistical modelling sense. Heuristics could be seen as a poor
man's concept compared to statistical models. But there are situations where heuristic
approaches are meaningful.

One reason for the use of heuristics arises if one distinguishes between risk and uncer-
tainty. According to Knight (1921), risk refers to situations of perfect knowledge about
the probabilities of all outcomes for all alternatives. This makes it possible to calculate
optimal choices. Uncertainty refers to situations in which the probability distributions
are unknown or unknowable - that is to say, risk cannot be calculated at all. Situations
of known risk are relatively rare. Savage (1954) argues that applying standard statistical
theory to decisions in large, uncertain worlds would be utterly ridiculous because there is
no way of knowing all the alternatives, consequences, and probabilities. Using optimal so-
lutions in a world with uncertainty just adds non-controllable model risk. To understand
when people use statistical models in decision-making and when they prefer heuristics
requires the study of how the human brain functions, see Camerer et al. [2005] and
Plicher and Fehr [2013].

Example - Uncertainty examples

Ellsberg (1961) invented the following experiment to reveal the distinction between
risk and uncertainty.
8 An individual considers the draw of a ball from one of two urns:

• Urn A has 50 red and 50 black balls.

• Urn B has 100 balls, with an unknown mix of red and black.

First, subjects are oered a choice between two bets:

• USD 1 if the ball drawn from urn A is red and nothing if it is black.
304 CHAPTER 4. PORTFOLIO CONSTRUCTION

• USD 1 if the ball drawn from urn B is red and nothing if it is black.

Second, the same subjects are oered a choice between the following two bets:

• USD 1 if the ball drawn from urn A is black and nothing if it is red.

• USD 1 if the ball drawn from urn B is black and nothing if it is red.

In both cases, the rst bet is generally preferred in experiments. That is, individuals belief
in the rst case that the number of red balls in urn B is less than 50% and in the second
case the same individuals assume that the number of black balls in urn B is also smaller
than 50%. This probability assessments are inconsistent. Ellsberg's interpretation was
that individuals are averse to the ambiguity regarding the odds for the ambiguous urn B.
They therefore prefer to bet on events with known odds. Consequently they rank bets
on the unambiguous urn, A , higher than the risk-equivalent bets on B.

Example - Uncertainty in macroeconomics

Caballero (2010) and Caballero and Krishnamurth (2008) consider the behavior of
investors in the following ight-to-quality episodes:

• 1970 - Default by Penn Central Railroad's prime-rated commercial paper caught the
market by surprise.

• 1987 - Speed of the stock market's decline led investors to question their models.

• 1998 - Co-movement of Russian, Brazilian, and US bond spreads surprised almost

all market participants.

• 2008 - Default on commercial paper by Lehman Brothers created tremendous un-

certainty. The Lehman bankruptcy also caused profound disruption in the markets
for credit default swaps and interbank loans.

They nd that investors were re-evaluating their models, used conservative behav-
ior or even disengaged from risky activities. These reactions cannot be addressed by
increasing risk aversion about macroeconomic phenomena. The reaction of investors in
an uncertain environment is fundamentally dierent from a risky situation with a known
situation and environment.

Example - Greece and the EU

In spring 2015 uncertainty about the future of Greece in the EU increased. Four
dierent scenarios were considered:
4.2. ALLOCATION - FOUNDATIONS OF INVESTMENT DECISIONS 305

• Status quo. Greece and the EU institutions agree on a new reform agenda such
that Greece receives the remaining nancial support of EUR 7.2 billion from the
second bailout package.

• Temporary introduction of a currency parallel to the euro. If the negotiations under

A are taking longer than Greek liquidity can last, Greece will introduce a parallel
currency to fulll domestic payment liabilities.

• Default with subsequent agreement between the EU and Greece. There is no agree-
ment under A. Greece fails to repay loans and there will be a bank run in Greece.
The ECB takes measures to protect the European banking sector.

• Grexit - that is, Greece leaves the eurozone. Greece stops all payments and the
ECB abandons its emergency liquidity assistance. Similar conclusions hold for the
Greek banking sector as under C. Greece needs to create a new currency since the
country cannot print euros.

The evaluation of the four alternatives is related to uncertainty and not to risk: the prob-
ability of each scenario is not known, there are no historical data with which to estimate
the probabilities, and the scenarios have dependencies but they are of a fundamental
cause-eect type, which cannot be captured by the statistical correlation measure. This
shows that valuable management is related to situations which are based on uncertainty.

The use of 'uncertainty' and 'risk' does not follows clear standards and conventions
in practice. A volatility index such as VIX is sometimes called a measure of uncertainty:
If volatility increases one often states that uncertainty increases. Strictly speaking this
makes no sense since the VIX is a calculated index of risk. Hence, risk increases or
decreases but this has a priori no relation to uncertainty. A similar logic is that investor
state if uncertainty increases often markets become more volatile and equity markets fall
(negative leverage eect) or if uncertainty increases, then credit spreads of corporates or
governments should widen. Again risk and uncertainty are used interchangeably. 2016
provides an example that one should not mix risk and uncertainty. In 2016 many events
happened where it was impossible to calculate risk - Brexit, election of Trump, increasing
geopolitical tensions in the Middle East, political instability in major countries such as
Brazil and Turkey for example. There were for example no data to assess the risk of the
Trump election. But if large uncertainty means large risks, then heavy market reactions
should follow. But most assets classes ended the year with positive returns. There was
almost no market reaction to the events. Furthermore, plotting an uncertainty index such
as policyuncertainty.com versus credit spreads measured in USD show that uncertainty
increased in 2016 while the spreads fell.
306 CHAPTER 4. PORTFOLIO CONSTRUCTION

4.3 Portfolio Construction Examples

4.3.1 Heuristic Allocation: Static 60/40 Portfolio
A classic portfolio construction is the so called '60/40 portfolio'. After each time period,
the portfolio values are rebalanced such the value of equity is 60 percent of the actual
wealth level and the xed income government bond investment has weight 40 percent.
The two components equity and government bonds are equally weighted portfolios of
stocks and bonds ('dollar weighted'). The 60/40 portfolio in the US has generated a 4
percent average annual return back to 1900.

The 60/40 portfolio turns out to be not diversied enough when markets are dis-
tressed or booming. The dot-com bubble and the nancial crisis of 2008 revealed that
dierent asset classes moved in the same direction and behaved as if they were all of
the same type, although capital diversication was maintained: Risk weights are not the
same as dollar weights.

Deutsche Bank (2012) reports the following risk contributions using volatility risk
measurement for 60/40 portfolios with S&P 500 and US 10y government bonds. The
long-term risk contribution, 1956 to 2012, by asset class was 79/21 percentage dierent
from a 60/40 capital diversication. The risk contribution in extreme market periods of
US government bonds varied between 53% in 19981 and 7% in 1973.

The left panel in Figure 4.2 illustrates the strong positive correlation between equity
and bonds: In the left panel, world wide equity portfolios are compared to a balanced
equity and bond portfolio. The linear relationship between the two returns with low vari-
ability indicate that a single global equity portfolio is as good as a balanced equity bond
portfolio. The performance and risk of traditional balanced portfolios is mostly driven by
the equities quota. The R2 is 95%, i.e. 95% of the risk is explained by equity risk. Hence,
asset classes consist of a bundle of 'risk factors' where the same risk factors can belong
to several asset classes. This extends to all asset in the case of systemic liquidity events:
The monthly dollar returns between the classic asset classes and alternative classes show
rather low correlation between 2000 and 2007 but increase sharply during the GFC and
remain elevated as the sovereign debt crisis follows in 2011. This failure of alternatives
to diversify during the GFC led to critique about the diversication concept based on
asset classes per se, see Figure 4.2. In the middle panel commodities and hedge funds are
added to the balanced portfolio. While the variability increases one still sees that equity
risk factors are driving the returns, the allocation of risk is only slightly improved. Still
90% of risk is explained by the equity risk factor. Finally, if one replaces equity by bonds
in the right panel, a cloud-type scatter plot follows. This indicates that equity and not
bond risk factors are the return drivers.
The time varying correlation in Figure 2.18 shows that the correlation between stocks
and bonds varies over time. Historically, periods of rising ination and heightened
sovereign risk have driven stock and bond correlations sharply positive. In contrast,
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 307

Figure 4.2: Left Panel: Monthly return equities world vs monthly return balanced port-
folio (Equities world: 50%, bonds world: 50%), Bloomberg: 12/1998-3/2013. Middle
Panel: Monthly return equities world vs monthly return balanced portfolio (Equities
world: 40%, bonds world: 40%, commodities: 10%, hedge funds global: 10%) Com-
modities database: DJUBSTR, Hedge Funds database: HFRXG. Right Panel: Monthly
return bonds world vs monthly return balanced portfolio (Equities world: 50%, bonds
world: 50%) Bloomberg: 12/1998-3/2013, local data.

correlations often turned negative when ination and sovereign risk were at low levels.

If stocks and bonds can be described by their exposure to macroeconomic factors,

their correlations could be determined entirely through their relative exposures to the
same set of factors. Therefore, why not measure the exposures of stocks and bonds to
common factors and act according to the volatility and correlation forecast instead using
the static 60/40 rule? This is not eective since the true factor structure is unobservable,
economic factors are not investable or investor's sentiment impact the correlation struc-
ture, which makes the prediction of changing correlation dicult. Kaya et al. (2011)
nd that the economic factors growth and ination have accounted for only 2 percent of
the total volatility of the 60/40 portfolio in the US since 1957, while 98 percent of the
volatility of the portfolio has been the result of missing factors, mis-specied factors, or
risks that are specic to each asset class.

Summarizing, the 60/40 asset allocation based on asset classes correlations between
asset classes is time-varying, not risk-stable and dicult to forecast. Risk weights are
308 CHAPTER 4. PORTFOLIO CONSTRUCTION

not the same as dollar weights. Asset classes seem not to be the right level for risk
aggregation.

4.3.2 Optimal Allocation: Dynamic Merton Model

We consider the Merton model (1973) of dynamic optimal consumption and investment.
This seminal contribution is the benchmark model for dynamic optimal decision making.

(c, φ) with consumption rate c, φ the fraction of wealth

The choice variable is a vector
invested in the risky asset and 1−φ in the risk-less asset. The state variable Wt represents
wealth and utility reads
ca
u(t, c) = e−rt , 0<a<1.
a
The individual optimizes expected utility:
Z ∞ a

−rt ct
V (W0 ) = max E e dt , 0 < a < 1 .
c,φ 0 a

The maximization is done subject to the dynamic budget constraint for the wealth dy-
namics Wt . Wealth growth is driven by the price evolution of a single risky asset S, a
risk free asset B and the consumption rate at each date. The risky asset S dynamics fol-
lows a geometric Brownian motion with constant drift µ and volatility σ and the growth
rate of the risk free r. Inserting this information provides us with the dynamic budget
constraint
dW = (φµW + (1 − φ)rW − c)dt + σφW dB
with B the standard Brownian motion. The optimality principle of Bellman starting in
t0 for a period t0 + dt reads:

Z t0 +dt
V (t0 , W0 ) = max E u(t, c, W )dt + V (t0 + dt, W0 + dW ) . (4.12)
c,φ t0

Hence, the value at t0 is equal to the sum of optimal utility over short time dt plus the
value reached at t0 + dt, i.e. all decisions are optimal after t0 + dt. Expanding the future
value in a Taylor series, using the dynamics of the assets transforms the above equation
into a non-linear partial dierential equation for the value function J. The solution of
this equation implies the following optimal strategies:
9

1 µ−r 1
V (W ) = α∗ W a , c∗ = W (aα∗ ) a−1 , φ∗ = (4.13)
σ2 1 − a
where α∗ is the explicit solution of an algebraic equation involving the preference and
growth rate parameters. The optimal investment in the risky asset φ∗ is equal to the
µ−r 1
market price of risk (MPR) times the relative risk aversion
σ2 1−a . The MPR is itself
proportional to the Sharpe ratio (which is also the solution of the Markowitz problem).

9 Only very few problems can be explicitly solved.

4.3. PORTFOLIO CONSTRUCTION EXAMPLES 309

This validates the claim that the Markowitz problem also holds in a dynamic context
unless the investment opportunity sets are changing over time, see Section ??. Optimal
consumption is proportional to the wealth level which is reasonable. There are many
extensions of the basic Merton model - such as many assets, adding income, allowing for
a bequest motif, adding linear investment constraints. As a fact, analytical tractability
is lost in most extensions.

4.3.3 Optimal Allocation: Goal Based Investment

The ideas of Merton can be applied to maximizing the probability of nancing several
goals (liabilities) at dierent future dates. since private clients often think in terms of
goals G. They are interested to choose an optimal strategy such that the probability of
nancing their future goals is maximized. The trivial case is where initial W0 is larger
the risk-free PV of all goals.

Assume that risk is needed to nance the goals. Goal based investment (GBI) means
to nd a strategy φ(t) which maximizes the probability

max P (WT ≥ GT ) . (4.14)

To this objective function one adds the asset dynamics, the initial wealth level and
additional constraints. Assume that there are N risky assets which all are coupled by
a time-varying but deterministic covariance matrix C and where each asset has a time-
varying expected return µ(t). There is a risk less asset with a time-varying deterministic
short-term rate r(t). The asset dynamics denes the wealth dynamics dWt starting at
W0 . The optimal policy, using the Bellman Principle, is derived by Browne (1999):

C −1 (t)Θ(t) φ N −1 (z(t))

S
φ (t) = qR Wt (4.15)
T 0 Θ(s) z(t)
t Θ(s)
RT
with the discount factor D(t, T ) = e− t r(s)ds
, φ the density function of a standard nor-
mal distribution, N the associated cumulative distribution function, Θ = C −1 (µ − re)
W0 −1
the market price of risk (MPR), e a N -dimensional unit vector, z(t) = G T
D (t, T ) the
percentage of the discounted goal reached at time t

The optimal investment formula 4.15 states:

• At each time t≤T optimal investment is a linear function of wealth.

• The linear function is weighted by a time-dependent part proportional to the MPR

φ(N −1 (z(t)))
and a part which measures how much of the goal has be achieved at
z(t)
time t.

• The investor or asset manager at each date t observes the optimal wealth Wt and
then chooses the investment for the next (innitesimal) period according to the
310 CHAPTER 4. PORTFOLIO CONSTRUCTION

optimal formula. The problem can be discretized in order to obtain real investment
periods.

• At each date the deterministic expected means and covariances enter. These func-
tions can be determined by the CIO oce or the advisory function using a SAA
and TAA approach. Besides the actual values also the values for the remaining
life-time matter. Therefore, by changing these forecast values at time t implies a
reshaping of the optimal investment policy at this date. Given the simplicity of
the optimal formula the investment universe can be set-up by a large number of
dierent assets ensuring diversication of wealth growth.

• Suppose that all assets lose in value from the beginning for some time. Then if
wealth has dropped enough in value there is not enough time left such that the
wealth level can beat the goal. Then, the investor has to borrow or to inject
additional money or to reduce the size of the goal. Browne shows in an example
that for T = 10y the wealth has to drop more than 62% in the rst year in order
to need to borrow. If there is only one month left, then the investor must borrow
unless wealth is already of 88% distance to the investment goal.

The approach can be generalized to include income and consumption streams, beating a
benchmark portfolio and controlling for downside risk, see Browne (1999).

4.3.4 Optimal Allocation: Markowitz

4.3.4.1 The Two-Asset Case
Consider two assets and two portfolios A, B shown in the portfolio expected return and
standard deviation space in Figure 4.3.
Solving the mean-variance optimization problem in this two risky asset case shows
that the portfolio opportunity set is a hyperbola in the (σ, µ)-portfolio coordinates (line
3). It is maximally bowed for perfect negative correlation. The lower correlation is, the
higher are the gains from diversication. For perfect positive or negative correlation the
hyperbola degenerates to straight lines. Line 1 represents all possible portfolio choices
if there is perfect positive correlation, +1. Similarly, for perfect negative correlation the
straight line 2a or 2b follow. In the presence of perfect negative correlation we can fully
eliminate portfolio risk (point C ).
The following denitions are common.

Denition 68. 1. If a portfolio oers a larger expected return than another portfolio
for the same risk, then the latter portfolio is strictly dominated by the rst one.
2. Portfolios that are not strictly dominated are called mean-variance ecient.
The set of these portfolios form the ecient frontier.
3. The portfolio φm at the point D is the global minimum variance (GMV) port-
.folio
The lines 1, 2b and the line between D and B are ecient frontiers.
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 311

Figure 4.3: Portfolio frontiers in the two-asset case. The portfolio opportunity set is a
hyperbola in the portfolio coordinates expected return and standard deviation.

4.3.4.2 Many Risky Assets

Considering many assets does not add new economic insights. We therefore keep this
section short. The assumptions of the Markowitz model are:

1. There are N risky assets and no risk free asset. Prices of all assets are exogenous
given.

2. There is a single time period. Hence risks cannot distributed over time but only in
the cross-section.

3. There are no transaction costs. This assumption can be relaxed nowadays by

solving the model numerically.

4. Markets are liquid for all assets.

5. Assets are innitely divisible. Without this assumption, we have to rely on integer
programming which makes sense and which today is feasible.

6. If borrowing and lending is excluded, full investment holds, he, φi = 1 with e =

(1, . . . , 1) ∈ Rn .
7. Portfolios are selected according to the mean-variance criterion.

8. The vectors e, µ are linearly independent. If they are dependent then the optimiza-
tion problem does not have a unique solution.
312 CHAPTER 4. PORTFOLIO CONSTRUCTION

9. All rst and second moments of the random variables exist, i.e. the mean and
covariance are not dened.

We dene the auxiliary variables: a = hµ, C −1 µi, b = he, C −1 ei, c = he, C −1 µi, ∆ =
ac − b2 and
a c
A= .
c b
Proposition 69. Consider N risky assets and the above assumptions. Then the Markowitz
problem
1
minn hφ, Cφi (M) (4.16)
φ∈R 2
s.t. he, φi = 1 , hµ, φi = r .

has a unique solution

φM V = rφ∗1 + φ∗2 (4.17)

with
φ∗1 C −1 µ

−1
=A . (4.18)
φ∗2 C −1 e

The portfolio weights are linear in the expected portfolio return r. Inserting φM V
2
into the variance implies the optimal minimum portfolio variance σp -hyperbola:

1 2
σp2 (r) = hφM V , CφM V i =

r b − 2rc + a . (4.19)
∆
Diversication in the mean-variance model means that adding more assets causes the
ecient frontier to widen: for the same risk, a higher expected return follows (see Figure
4.4).
The Markowitz model fails to be stable in the following sense. Consider a GMV
portfolio with two assets, hence the optimal portfolio only depends on covariance but
not on returns. Suppose that both assets have a volatility of 20 percent and full positive
correlation of 1. Then, the optimal weights are 50 percent in each asset. Suppose next
that asset 1 has only 19.9% volatility, all other numbers unchanged. Then, 100 percent
is invested in this asset and zero in the second one.

Example

Consider three assets with expected returns (20%, 30%, 40%) and covariance

 
0.1
C =  0.08 0.15 
0.09 0.07 0.25
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 313

Figure 4.4: Dierent ecient frontiers for dierent numbers of assets. It follows that
adding new assets allows for higher expected return for a given risk level (measured by
the portfolio standard deviation). The portfolio with the lowest standard deviation is
the global minimum variance (GMV) portfolio (Ang [2012]).

We assume that the investor expects a minimum return of r = 30%. He could then fully
invest in asset 2 to achieve this return goal. But the optimization φM V shows that he
can reach this target with lower risk, where

φM V = (0.28, 0.43, 0.28)0

The investor is fully invested and long in all assets. The risk of the optimal portfolio
is σp = 10.7 percent which is less than the 15 percent if the investor only invests in
the second asset. We compare the Markowitz portfolio with the equally weighted (EW)
portfolio and the risk-parity portfolio of inverse volatility (IV) - that is to say, investment
in each asset is inversely proportional to its volatility. We get

• φM V = (0.28, 0.43, 0.28)0 ,

• φEW = (0.33, 0.33, 0.33)0 ,

• φIV = (0.48, 0.32, 0.19))0 .

The MV strategy considers variances and covariances, EW does not consider them at
all, and the risk-parity strategy only considers variances. The statistics for the three
strategies are:
314 CHAPTER 4. PORTFOLIO CONSTRUCTION

Strategy Expected return Portfolio σP

MV 35.7% 10.7%
EW 29.7% 10.6%
IV 26.8% 9.6%

Example

Consider two assets with expected returns of µ1 = 1 and µ2 = 0.9 and

0.1
C= .
−0.1 0.15

Asset 1 seems more attractive than asset 2. It has a higher expected return and lower
risk. Naively one would invest fully in the rst asset. But negative correlation makes
an investment in asset 2 necessary to obtain an optimal allocation. The expected return
constraint is set equal to r = 0.96. We consider four strategies:

• φ1 = (1, 0), full investment in asset 1.

• φ2 = ( 12 , 21 ), an equal distribution.

• φ3 = (5/9, 4/9), optimal Markowitz strategy without the expected return con-
straint.

• φ∗M V = (0.6, 0.4), optimal Markowitz solution with the expected return constraint.

The following expected portfolio returns and risk for the dierent strategies hold:

Strategy µ σP
φ1 1 0.1
φ2 0.95 0.0125
φ3 0.955 0.011
φ∗M V 0.96 0.012

φ1 satises the expected return condition but risk is much larger than in all other
strategies - lack of diversication. The risk of φ3 is minimal but the return is smaller than
required. To generate the return and keep risk minimal, 40 percent has to be optimally
invested in the not very attractive asset. This is the Markowitz phenomenon: to reduce
the variance as much as possible, a combination of negatively correlated assets should be
chosen.
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 315

Figure 4.5 shows that an under-diversied portfolio follows for portfolios on the e-
cient frontier which becomes more pronounced for higher risks. Furthermore, the steep

Figure 4.5: Ecient allocations for 21 dierent portfolios. The rst portfolio is the GMV
portfolio and moving to the right optimal portfolios on the ecient frontier follow. Data
1991-2016, monthly data, long-only portfolio constraint.

vertical changes in the asset allocation indicate that the allocations are not robust:
Small changes in covariance data lead to large changes in the asset allocations. Does
a Markowitz portfolio provides a reasonable diversication for portfolios over time?
The answer, see Figure 4.10, is again no: One observes an under-diversication and
non-stability of the asset allocation.

4.3.4.3 Mutual Fund Theorem

The Mutual Fund Theorem (MFT)
10 is one of the most important results for investment
theory and investment practice. Under suitable assumptions, which are satised in the
Markowitz model, the optimal investment strategy is to invest in the risk-free asset and
a second risky asset (ETF, mutual fund) which is a linear combination of the risky assets
available on the nancial market. The Proposition for the case with risky assets only
reads:

Proposition 70. Any minimum variance portfolio can be written as a convex combina-
tion of two distinct minimum variance portfolios.
10 Sometimes called the 'two fund theorem' or 'separation theorem'.
316 CHAPTER 4. PORTFOLIO CONSTRUCTION

Formally, if φ∗M V (r) is any optimal minimum variance portfolio, then there exists a
function ν(r) for any two other optimal minimum variance portfolio, φ∗1 (r), φ∗2 (r), such
that
φ∗M V (r) = νφ∗1 (r) + (1 − ν)φ∗2 (r). (4.20)

The entire mean-variance frontier curve can be generated from just two distinct portfolios.
This holds since the ecient frontier is a one-dimensional ane subspace in Rn . The
Mutual Fund Theorem allows investors to generate an optimal portfolio by searching for
cheaper or more liquid portfolios and invest in these portfolios in the prescribed way.
This theorem led to the growth of mutual fund and ETF industry. The Mutual Fund
Theorem also holds for some dynamic models such as the Merton model of last sections.
But if there are risk sources for assets which cannot be hedged then more than two funds
are needed to construct an optimal investment strategy. In general, structure of the
investor's preferences and the structures of the asset markets both determine whether a
mutual fund theorem is valid.

4.3.4.4 Markowitz Model with a Risk-Free Asset

If we assume that one asset is risk-less and the other ones are risky, the whole optimization
program of Markowitz can be repeated. Most properties of the risky-only asset case carry
over to the case with a risk-free asset.

The ecient frontier is a straight line which has at least one point in common with
the ecient frontier - the case, where it is be fully invested in risky assets. The portfolio
where the two frontiers intersect is the tangency portfolio T (see Figure 4.6; left panel).
Natural candidates for the mutual fund theorem are the tangency portfolio and the
risk-less-asset investment. In the right panel of Figure 4.6, dierent portfolios on the
ecient frontier are shown. The investors can add cash to become more conservative
or borrow cash for an aggressive investment. The portfolios on the Capital-Market-Line
(CML) depend on the investor's preferences θ in (4.1). The higher risk aversion, the
closer is the point in the CML to the risk-free investment. Ang (2012) estimates an
aggregate risk aversion parameter value as follow. He calculates the optimal minimum
variance portfolio using USA, JPN, GBR, DEU, and FRA risky assets only. Then he
adds a risk-free asset and searches for the point on the CML that delivers the highest
utility. This point implies a risk aversion of θ = 3. The optimal portfolio with a risk-free
asset can be seen in Figure 4.6 in the region where the aggressive investor is shown. The
investor is long on all risky assets and short on the risk-free asset. But in reality, only
half of investors invest their money on the stock market and the remainders keep their
money risk free. In some European countries stock market participation is lower than 10
percent. This is the non-participation puzzle of mean-variance investing.

Geometry implies for the CML

µT − Rf
µp = Rf + σp
σT
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 317

Figure 4.6: Mean-variance model with a risk-free asset. Left panel - straight line ecient
frontier (CML), which is tangential to the ecient frontier when there are risky assets
only. The tangency point T is the tangency portfolio where investment in the risk-free
asset is zero. Right panel - investors' preferences on the ecient frontier. Moving from
the tangency portfolio to the right, the investor starts borrowing money to invest in the
risky assets. The investor is short cash in this region to nance the borrowing amount.

with µT , σT the expected mean and standard deviation of the tangency portfolio, respec-
µT −Rf
tively. The slope of the CML is the Sharpe ratio SR = σT . This is the price of one
unit risk of for an ecient portfolio.

4.3.4.5 Mean-Value-at-Risk Portfolios

One critique of the mean-variance criterion for optimal portfolio selection often concerns
the variance as a symmetric risk measurement: Why penalize the upside in portfolio
selection? Also, the variance is not seen as a true measurement of risk since it fails to
detect the states that reect stress situations. Risk-sensitive asset managers prefer to
use a mean-downside risk approach such as mean-Value-at-risk (VaR) instead.

To gain some ideas about stress periods, table 4.1 reports data about periods when
Swiss stock market faced a stress. Besides the maximum drawdown, the time period
where prices were falling and when they rebound are shown. The last two periods rep-
resent the global nancial crisis and the dot-com bubble, respectively. On average it
takes longer for the markets to recover than to drop and a second observation are the
heavy maximum drawdowns. This illustrates that also in an optimal portfolio choice the
318 CHAPTER 4. PORTFOLIO CONSTRUCTION

evaporation of diversication - that is correlations become close to 1 - in time of market

stress happens.

Period 1928-1941 1961-1968 1972-1979 1989-1992 00-05 08-13 Av.

Low 1935 1966 1974 1990 2002 2008
MDD % 41.3 37.5 47.2 20.2 42.3 34.1 36
yfp 7 5 2 1 2 2 2.86
yrp 6 2 5 2 3 5 3 .57

Table 4.1: Periods involving large drawdowns in Swiss equity markets. The drawdown is
the measurement of the decline from a historical peak. The maximum drawdown (MDD)
up to time T is the maximum of the drawdown over the overall time period considered,
yfp means years with falling prices, yrp years with rising prices and Av. average (Kunz
[2014]).

We consider mean-VaR portfolio optimization. VaR(a) is the minimum dollar amount

an investor can lose with a condence of 1−a for a given holding period where the portfolio
is not changed.
11 If the portfolio returns are normal N (µ, σ), the dollar amount V aR(a)
is for a unit time period

VaR(a) = σk(a) + µ, (4.21)

where µ is the portfolio return, σ the volatility of the portfolio return, and k(a) is a tabu-
lated function of the condence level 1 − a. Hence, under normality, VaR is proportional
to volatility. This translates into the optimization problem: Mean-variance is equivalent
to mean-VaR by rescaling the volatility.

Figure 4.7 shows an ecient frontier and several VaR constraints, i.e. the problem is
to maximize expected return under the constraint

P (R ≤ x) ≤ a . (4.22)

This VaR(a) constraints dene straight lines assuming normality. The impact on the
optimal portfolio choice is as follow. Starting with the benchmark of say −3% = x loss

11 To gain intuition for VaR, consider:

• A stock with an initial price S0 of USD 100 and the random price S1 in one year.
• An investor faces a loss if S1 < 100.
What is the probability that the loss exceeds USD 10 - that is to say, P (100 − S1 < 10) =?. Hence,
the loss amount is given; the probability of the loss is unknown. VaR answers a related question: the
investors search a USD amount - the VaR - such that the probability of a loss is not larger than the
predened quantile level on average. That is to say,
P (100 − S1 < ?) ≤ a%,

where ? = is the Dollar VaR amount. Hence, the probability of the loss is given; the loss amount is
unknown.
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 319

capacity, the straight blue lines. The intersection between this line and the mean variance
frontier select the optimal mean-VaR portfolio. If loss capacity increases, the line moves
parallel to the right implying higher possible optimal risks and returns. The same eect
follows if for a xed loss capacity the condence level is lowered - more risk and return
becomes optimal.

Increasing Confidence
Expected
Return 99%

95%

Equities

Bonds

Portfolio Risk
Increasing Loss Capacity
-3%

-7% -9%

Figure 4.7: Mean-drawdown optimal portfolio. The straight lines represent the VaR
constraints. If loss capacity increases, the VaR lines move to the right indicating higher
risk and returns in the optimal portfolio. The same result follows if the condence level
is lowered.

We conclude with two VaR calculations. Consider a position with value USD 1
million. Assuming normality of returns, the goal is to calculate the one-day VaR on the
95 percent level. The estimated daily mean is 0.3 percent and the volatility is 3 percent.
With k(5%) = 1.6449 follows

VaR(a) = (1.6449 × 0.03 + 0.003) × USD1mn = USD 52, 347.

Therefore, on average in 1 out of 20 days the loss is larger than the calculated VaR of
USD 52, 347.

To be more realistic for AM, we calculate VaR for an Euro investor with the portfolio:
There are three equity risk sources (DAX, DJ, Novartis), two FX risks USDEUR (spot
1.05) and CHFEUR (spot 0.8) and US interest rate risk for the bond, i.e. 6 risk factors.
The goal is to calculate the weekly Euro VaR on a 95% level.
320 CHAPTER 4. PORTFOLIO CONSTRUCTION

Position Type Market Price Currency

1 10 Equity Funds Shares DAX 1'000 Euro
2 5 Equity Funds Shares DJ 5'000 USD
3 200 Novartis Stocks 50 CHF
4 10 US Treasury, 10y, Zero Coupon Bonds 800 USD

Table 4.2: Initial value of the investor's portfolio.

We rst need the variance and covariance information, then the calculation of the
exposure in Euro and the allocation of the EUR exposure to the risk factors using market
data in the following table. We rst calculate the EUR exposure and the allocation of the

σ DAX DJ Novartis USDEUR CHFEUR US 10y

DAX 30% 1 0.5 0.6 0.4 0.55 -0.2
DJ 20% 0.50 1 0.55 0.77 0.66 -0.4
Novartis 25% 0.60 0.55 1 0.23 0.72 -0.22
USDEUR 15% 0.40 0.77 0.23 1 0.73 -0.49
CHFEUR 5% 0.55 0.66 0.72 0.73 1 -0.21
US 10y Treasury 10% -0.20 -0.4 -0.22 -0.49 -0.21 1

Table 4.3: Market Data.

EUR exposure to the risk factors: The portfolio variance σp2 is given by σp2 = hX, CXi

Posit. Price EUR Exp DAX DJ Nov. USDEUR CHFEUR USD10y

10 DAX 1'000 10'000 10'000
5 DJ 5'000 26'250 26'250 26'250
200 Novartis 50 8'000 8'000 8'000
10 US Treas. 800 8'400 8'400 8'400
Sum (X ) 10'000 26'250 8'000 34'650 8'000 8'400

Table 4.4: EUR exposure and allocation of the exposure to the risk factors.

where X is the EUR exposure vector allocated to the risk factors and Cij = σi σj ρij .
2
Calculating these matrix products gives σp = 1600 8040 032. This is the value on an
annual basis. To obtain the result on a weekly basis, we obtain for the variance

1600 8040 032/52 = 10 758 .

p
σw =

The critical value on the 95% level is k95% = 1.644853. This implies the 1w EUR VaR
of using
√ √ √
−VaRα = σkα T = X 0 CXkα T
where the drift is zero:

VaR = 10 758 × 1.644853 = 20 892 EUR .

4.3. PORTFOLIO CONSTRUCTION EXAMPLES 321

We then get for the VaR contribution:

∂ VaR √ CX CX
= kα T √ = kα2 T 2 .
∂X 0
X CX VaR

This implies the contribution rule:

X ∂ VaR √ X (CX)j X
VaR = Xj = kα T Xj √ = VaRj . (4.23)
j
∂Xj X 0 CX
j j

Applying this to the portfolio, the US Treasury bond is negative: Due to its negative
correlations to the other factors the VaR is reduced by 6 percent. The largest VaR
contribution is from the DAX risk factor with 31 percent, although the exposure is only
10.5 percent. The contribution of USDEUR is 19 percent to VaR whereas the factor
exposure is the largest one of 36 percent.

4.3.4.6 SAA and TAA

The Markowitz optimization approach can be used to dene the SAA and TAA rigorously.
We follow Leippold (2011), Lee (2000) and Roncalli (2014). Consider the optimization
problem (4.1):

θ
max hφ, µi − hφ, Cφi
φ 2
where we assume the full investment condition. Then, the solution can be written as a
sum of the GMV portfolio and a second portfolio φX :12

φ = φGM V + φX .

To introduce the SAA, we use the unconditional long-term (equilibrium) mean of the
returns. Adding and subtracting the long-term mean µ̃ in the second component, the
solution can be written after some algebra in the form:
13

φ = φGM V + φS + φT . (4.24)

The second and the third component are the SAA and the TAA component, respectively.
The sum of the three components is an ecient portfolio.

Each SAA component φj,S is proportional to µ̃j − µ̃k for k 6= j . If the long-term fore-
casts of all assets are the same, the SAA component is zero. If the long-term forecasts
dier, the holdings are shifted to the asset with the higher equilibrium return. The size
of pairwise bets depend on the relative risk aversion θ and the covariance C which enter

12 We have
C −1 he, C −1 µi
φX = (µ − e)
θ he, C −1 ei
.
13 φS = 1 C
−1
(µ̃e0 −eµ̃0 )C −1
e and φT = 1 C
−1
((µ−µ̃)e0 −e(µ−µ̃)0 )C −1
e .
θ he,C −1 ei θ he,C −1 ei
322 CHAPTER 4. PORTFOLIO CONSTRUCTION

φS . The sum of the GMV and the strategic portfolio is called the benchmark portfo-
lio in the asset management industry and the strategic mix portfolio in investment theory.

For each TAA component, φj,T is proportional to

µj − µ̃j − (µk − µ̃k )

for k 6= j . Hence, there are again bets between the assets case where there are no
bets against the same asset and the bets are of an excess return type with the SAA as
benchmark. For N assets, there are N (N − 1)/2 bets. As in the SAA case, the bets are
weighted by the covariance matrix and the relative risk aversion.

4.3.4.7 Active Investment and Benchmarking

We consider the case of mean-variance optimization considering a general benchmark b,
see (4.5) for the notation. The investor chooses the bets such that the quadratic utility
is maximized:
θ
max hψ, µi − hψ, Cψi (4.25)
ψ 2
Assuming full investment, the solution of this active risk and return program can be
written as a sum of two parts. One part is given by the benchmark and the second
one by the bets. Note that in general this bet vector is dierent from the tactical asset
allocation vector in last section.

Proposition 71. Consider the active risk and return optimization in (4.25) with the full
investment constraint. The ecient frontier are straight lines in the (σ(ψ, b), µ(ψ, b))-
space. Inserting further linear constraints, the ecient frontier are non-degenerate hy-
perbolas.

4.3.4.8 Comparing Mean-Variance Portfolios with Other Approaches

We follow Ang (2012). Consider four asset classes - Barcap US Treasury (US govt
bonds), Barcap US Credit (US corporate bonds), S&P 500 (US stocks), and MSCI EAFE
(international stocks) for the period 1978 to 2011. Dierent strategies are chosen monthly
and for the estimated parameters the past ve years of data are used:

• Mean-variance (MV), equal weights (EW), Global Minimum Variance (GMV) and
equal risk contribution (ERC) are four strategiess

• Diversity weights, which are transformations of market weights using entropy as a

measure of diversity, dene another strategy.

• Risk parity (RP). The optimal portfolio weights are chosen proportional to inverse
volatility. This approach mimics negative leverage in the markets - if asset prices
fall, volatility rises. This strategy ignores the correlation structure.

• The Kelly rule.

4.3. PORTFOLIO CONSTRUCTION EXAMPLES 323

Strategy Return Volatility Sharpe ratio USD 100 after 33 years

Mean-variance 6.06 11.59 0.07 697
Market weights 10.25 12.08 0.41 2,503
Diversity weights 10.14 10.48 0.46 2,422
EW 10 8.66 0.54 2,323
RP 8.76 5.86 0.59 1,598
GMV 7.96 5.12 0.52 1,252
ERC 7.68 7.45 0.32 1,149
Kelly rule 7.97 4.98 0.54 1,256

Table 4.5: Risk and return gures for the dierent investment strategies. (Ang [2012]
and own calculations).

The mean-variance portfolio is the strategy with the worst performance: choosing
market weights, diversity weights, or EW leads to higher returns and lower risk. A reason
for the outperformance of the global GMV is that there is a tendency for low-volatility
assets to have higher returns than high-volatility assets.

4.3.5 Review Markowitz Model

We considered so far properties of the Markowitz model without asking basic questions
about the pros and cons of the model. This is the objective in this section.

First the Markowitz model is the most used model in portfolio allocation. There
are two main reasons for this fact. First its simple and convincing economic assumption
about the risk and return trade-o. Second, it denes a quadratic optimization prob-
lem (QP). This means hµ, φh− 2θ hφ, Cφi is minimized under a set of linear constraints
Aφ ≤ b with A a matrix and b a vector. In its simplest form QP problem are even
analytically solvable. Adding more constraints, the problem is numerically approached
where decades of research in this direction provide ecient algorithms. Summarizing,
portfolio optimization with a benchmark, a tracking-error problem, also the problem of
Black-Litterman with views, index sampling, turnover constraints and the case with lin-
ear and quadratic transaction costs are all QP! Its specic mathematical form is therefore
a success factor for mean-variance portfolio allocation.

Explained why the mean-variance analysis is successful, we consider its general prop-
erties:

• Portfolio theory in general and the mean-variance approach in particular are as-
sumed to be related to diversication. But what does this really mean?

• Is optimal investment in the Markowitz model to risk factor, arbitrage factors,

hedging factors?

• How smooth and robust is mean-variance optimization?

324 CHAPTER 4. PORTFOLIO CONSTRUCTION

We start with the diversication issue and recall that optimal investment is proportional
in the basic Markowitz model to C −1 µ: the information matrix mixes expected returns
for the optimal allocation and not the covariance matrix. But what can be said about
the information matrix? Stevens (1998) derives an expression for the information matrix
in the Markowitz model. Using general matrix inversion, the OLS regression of Rt,i on
the return of all other assets Rt,−i plus a noise term which is normally distributed with
mean zero and variance σi2 reads:

Rt,i = β0 + βi0 Rt,−i + i,t .

He proves that
1 βij
Cii−1 = −1
, Cij =− .
σii (1 − Ri2 ) σii (1 − Ri2 )
Using this model, the information matrix elements follow as ratio between the estimated
betas and the unhedgeable risk of the regression.

Proposition 72 (Stevens (1998)) . Consider the standard Markowitz model 4.1.

P
1 µ̂i − k6=i βik iµk
φ∗i = .
λ σi2 (1 − Ri2 )
σ̂i2
where Ri2 = 1 − σi2
.
optimal MV allocation rewards large hedging errors in
In other words, the
the above sense
P weighted by the tracking error of the hedging portfolio. The
dierence µ̂i − k6=i βik iµk is a long-short combination of the asset's expected return and
the hedge portfolio. MV optimality therefore does not mean that it is optimal to diversify
across the risk sources but to concentrate the exposure on the long-short combination of
risk minus the hedge. This is a complete dierent story than 'do not put all eggs in the
same basket'. The better the hedge, i.e. the larger the R2 of the regression, the smaller is
the denominator in the optimal policy and therefore the more weight the asset receives.
But a high R2 means that the asset i is strongly correlated to the other assets. Hence,
yet small variations of the dependence create strong variation in the optimal policy. This
shows why strongly correlated assets are a source of instability of mean-variance optimal
portfolios. An investor is long in asset i if the expected return of this asset is larger than
the return of all other assets and similarly for a short position (and similar for a short
position).

Bourgeron et al. (2018) provide a characterization of the above dierence between

the expected return and the optimal hedge:

Proposition 73 (Bourgeron et al. (2018)) . Consider the standard Markowitz model 4.1.
φ∗i = φ∗i,0 + ω(φ∗i,0 − φ∗i,h )
where φ∗i,0 is the optimal portfolio by assuming zero correlation, φ∗i,h is the optimal port-
folio of the hedging strategies and ω is the leverage dened as the ratio between the id-
iosyncratic variance and the tracking error variance.
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 325

If tracking error is small, a larger leverage follows. This characterization shows that
MV diversication means to leverage a hedge portfolio: The MV optimal portfolio is an
aggressive portfolio by selecting a few bets!

To control for this incentives one uses constraints. The simplest one is full invest-
P
ment i φi = 1. Solving (4.1) with such a constraint amounts to consider a Lagrangian
function and then calculating the First-Order-Condition (FOC). Further constraints can
be added. Real optimizer in asset and wealth management can consider up to hundreds
of constraints. This destroys analytical tractability and in some sense leads optimization
ad absurdum: If you know what you want by imposing many constraints why don't you
simply state the investment policy? Furthermore, each constraint has an economic price,
the shadow price, which reduces unconstrained utility. This should be made transpar-
ent to the investor what the economic price of his own constraints are, such as loving a
particular stock, and what the price of constraints are induced by the AM rms such as
the band with of the SAA and TAA. From an ecient frontier perspective, adding con-
straints transforms the hyperbola into piecewise straight lines and piecewise hyperbolas,
Globally the constraint frontier lies below the ecient frontier and shifted towards more
risk.

We now consider the second challenge, how to complement the Markowitz optimiza-
tions such that for example solutions are smooth, i.e. varying inputs slightly should lead
to smooth changes of the allocation, controlling for a smooth rebalancing and controlling
1 −1
for turnover costs. Reconsidering the optimal portfolio φ = θC µ the two inputs C
and µ need to be estimated. If there is estimation error in the covariance matrix this
will be amplied for the information matrix. Since covariance matrices are large, say
1000 × 1000 matrices, and µ is also an estimate the solution of the optimality condition
is only an approximation to the unique solution if the inputs were known. We discuss
this issue below. But also the returns need to be estimated. This is not easier than for
the covariance matrix - the myriads of factors and factor all try to provide to explain or
predict returns.

The main strategy is to add an additional term T to the quadratic utility function
(Thikonov Regularization)

θ
hµ, φi − hφ, Cφi − c||Γφ − φ0 ||22
2
with c > 0, Γ a matrix, φ0 an initial portfolio and || . . . ||2 the Euclidian norm. c con-
trols the importance of the regularization term. These terms are commonly added to
promote sparsity or to reduce sensitivity to outliers. There are many dierent ways how
regularizations can be implemented.
The de-noising techniques of the covariance matrix are not sucient for obtaining
the stability of the solution. Figure 4.9 provides the intuition. Each covariance matrix
can be diagonalized where the eigenvalues are all positive, real numbers in the diagonal
matrix. Ordering the eigenvalues according to their size, we show below that the largest
326 CHAPTER 4. PORTFOLIO CONSTRUCTION

Figure 4.8: Ridge solution for a portfolio with and without a target. The gures shows
how the ridge solution provides smooth portfolio components (denoted by xj ) as a func-
tion of the control parameter c, (Source: Roncalli (2018))

eigenvalues of C account for portfolio risk. The smallest eigenvalues however matter for
the information matrix C −1 which is the proportion constant for the optimal investment
rule: The noisy eigenvalues drive optimal investment. Regularization techniques handle
this small eigenvector problems. But this is not sucient to obtain a meaningful optimal
asset allocation since as stated above, the Markowitz model makes bets on the long-short
portfolio of expected return minus the beta hedge. But these factors are distributed on
the whole range of eigenvalues. Therefore, considering the largest ones and treating the
smallest one using regularization leaves out all intermediate eigenvalues which impact
also the stability and smoothness of the optimal allocation. Hence, more than de-noising
of the covariance matrix is needed.

Some practitioners prefer to introduce restrictions into the optimization problem to

stabilize the optimization problem. This approach has drawbacks:

• Each restriction has an economic price.

• Compare two constraint models. Is one allocation better than the other because
of a better model or because of the chosen constraints? Constraints are ad hoc,
discretionary decisions that impact a model's performance in a complicated way.

To introduce regularization, we consider the problem of optimizing or tting for

φ1 , φ2 in the objective ||1(φ1 + φ2 )||22 . There are innitely many solutions φ1 = 1 − φ2 .
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 327

Size of eigenvalus

Dominate Region Portfolio Risk ~ Dominate Region Optimal Allocation ~

Eigenvalue Size Inverse Eigenvalue Size
Regularization

Intermediate Eigenvalue Size

Eigenvalues of a covariance
matrix C

Figure 4.9: Distribution of eigenvalues of a covariance matrix.

φ1 = 10100 and φ2 = 1 − 10100 are two solutions which we do not like. Adding a penalty
||φ||1 := |φ1 | + |φ2 | an optimal solution is the sparse solution φ1 = 1, φ2 = 0. For dierent
choices of c, Γ, dierent well-known regularization approaches follow. If Γ is set equal
to the identity matrix, the ridge regularization follows. We next denote Ĉ the unbiased
empirical covariance matrix, F an estimator which is biased but converges more quickly
∗
than the empirical covariance, Ĉ(ν) := ν Ĉ + (1 − ν)F and if ν is the minimizer of
2 1−ν ∗
E(||Ĉ(ν) − C|| ). If we set c = ν ∗ and Γ equal to the Cholesky decomposition of F ,
then the Ledoit-Wolf covariance shrinkage method follows, see Section 4.4.2 for a discus-
sion. We discuss regularization in dierent sections below.

4.3.6 Views and Portfolio Construction - The Black-Litterman Model

In all model considered so far, the views of the investors did not enter in a systematic
form. But most investors have some views about specic assets and they wish to apply
these views to their asset management. For example high past returns may not be the
same in the future and the asset manager would like to correct this by implementing
a prior view in the model. By doing this, the acceptance of the model increases. The
views should be integrated consistently in the model such that stable estimations for the
expected return and covariances follow.
There are many dierent approaches how views can be used in portfolio construction.
We consider the Black-Litterman (BL) model (BL [1990]) to be the rst, and still the
most popular, model used by practitioners.
328 CHAPTER 4. PORTFOLIO CONSTRUCTION

For further reading, in addition to BL, we cite Walters (2014), Satchell and Scowcroft
(2000), Brand (2010), Meucci (2010), Idzorek (2006), Herold (2003), and He and Litter-
man (1999).

4.3.6.1 Black-Litterman Model

The two contributions of BL model to the asset allocation problem are:

• The equilibrium market portfolio serves as a starting prior for the estimation of
asset returns.

• It provides a clear way of specifying an investor's linear views on returns and of

blending these views with prior information. The investor is not forced to have a
view for all assets and the views can span arbitrary combinations of assets.

For non-linear view, consider entropy pooling. In the construction of BL, the rst step is
to dene the reference model. Assume that for the returns RN (µ, C), where both mean
and covariance are unknown. Since the goal of BL is to model expected returns we start
with a model for the mean: µ ∼ N (π, Cπ ). Hence, µ = π + with ∼ N (0, Cπ ). The
covariance of the returns CR about the estimate π is - given µ and are not correlated -
is given by

CR = C + Cπ . (4.26)

Therefore the reference BL model is given by R ∼ N (π, CR ). The mean π represents the
best guess for µ, and the covariance Cπ measures the uncertainty of the guess. How do
we x π , the prior estimate of returns, that is to say the returns before we consider views?

BL uses a general equilibrium approach since if a portfolio is at equilibrium of supply

and demand in the markets, then each sub-portfolio must be at equilibrium too. There-
fore, an equilibrium approach for the return estimate is independent of the size of the
portfolio under consideration. BL and many other use the CAPM in the following reverse
engineering way. But there is no model restriction.

Using the CAPM means that all investors have a mean-variance utility function.
Without any investment constraints, the optimal strategy φ maximizes the expected
utility given in (4.1)
θ
E(u) = φ0 π − φ0 Cφ ,
2
where we have replaced the expected returns by the unknown expected return estimate π.
The solution gives us the optimal strategy φ as a function of the return and covariance:
φ = 1θ C −1 π .
Given the equilibrium strategy φ in the CAPM we immediately get the excess return
estimate

π = θCφ . (4.27)
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 329

How do we x the risk aversion parameter? Multiplying (4.27) with the market portfolio
φ0 implies that
2
RM − Rf = θσM (4.28)

with RM the total return of the market portfolio. In other words, the risk aversion pa-
rameter is equal to the market price of risk. Using (4.28) in (4.27), the CAPM species
in equilibrium the prior estimate of returns π.

We consider next the insertion of views, where we follow Walters (2014). A view is
a statement on the market. Views can exist in an absolute or relative form. A portfolio
manager can, for example, believe that the fth asset class will outperform the fourth
one. BL assumes that views

• apply linearly to the market mean µ,

• face uncertainty,

• are fully invested (the sum of weights is zero for relative views or one for absolute
views), and

• do not need to exist for some assets.

More precisely, an investor with k views on N assets uses the following matrices:

• The k×n matrix P of the asset weights within each view.

• The k×1 vector Q of the returns for each view. That is, Pπ = Q expresses the
views.

• The k×k diagonal matrix Ω of the covariance of the views, with ωnn the matrix
entries. The matrix is diagonal as the views are required to be independent and un-
correlated. The inverse matrix with the entries 1/ωnn are known as the condence
in the investor's views.

The conditional distribution of the mean and variance can be represented in the view
space as
P (View|Prior) ∼ N (Q, Ω) .
Since the matrix P is in general not invertible, this expression cannot be be written in a
useful way in the asset space. But using Bayes' theorem, a posterior distribution of the
returns that blends the above prior and conditional distribution follows. Since the asset
returns and views are normally distributed, the posterior is also normally distributed. It
is given by the Black-Litterman master formula for the mean returns πBL and the
covariance CBL

I −1 µ
πBL = Cπ−1Ω (4.29)
P Q
CBL = C + Cπ

I −1 I
Cπ = Ω .
P P
330 CHAPTER 4. PORTFOLIO CONSTRUCTION

The parameters Ω and C are not observable and must be xed additionally. C is typically
replaced by the estimated covariance matrix C
b. There are several ways of specifying Ω.
One can assume that the variance of the views will be proportional to the variance of
the asset returns, one uses a condence interval or one uses the variance of residuals if
a factor model is used. We refer to Walters (2014) for details. How do we estimate the
variance of the mean π - that is, how do we x Cπ ? BL assume the proportionality

CR = τ C (4.30)

with τ the constant of proportionality factor. The uncertainty level τ can be chosen
proportional to the inverse investment period 1/T . The longer the investment horizon
is, the less uncertainty exists about the market mean; the higher the value of τ, the less
weight is attached to the CAPM. Summarizing, the prior return distribution is a normally
distributed random variable with the mean given in (4.27 and variance (1 + τ )C . With
this choices, the Black-Litterman master formula for the mean returns π and the
covariance C read

πBL = π + τ CP 0 (P τ CP 0 + Ω)−1 (Q − P π) (4.31)

−1 0 −1 −1
CBL = ((τ C) +P Ω P) .
Several consistency checks can be applied to (4.31): First, if Ω vanishes, which means
absolute certainty about the views, then the posterior mean becomes independent or
insensitive to the parameter τ. Next, if the investor has a view on every asset, the ma-
trix P becomes invertible. Since the covariances are by denition invertible the posterior
mean equation simplies to π = P −1 Q. Finally, if the investor is fully uncertain about
the validity of his or her views - that is to say, the matrix entries of Ω tend to innity,
there is no value added by adding any views to the model since the prior and posterior
return distribution agree: π = π.

Example

Consider four assets and two views. The investor believes that asset1 will outperform
asset 3 by 2 percent with condence ω11 and that asset 2 will return 3 percent with
condence ω22 . The investor has no other views. Mapping these views into the above-
dened matrices implies

1 0 −1 0 2 ω11 0
P = , Q= , Ω= . (4.32)
0 1 0 0 3 0 ω22

The technique developed by BL provides a framework in which more satisfactory

results are obtained from a larger set of inputs than are obtained using the mean-variance
framework. The model is usually applied to asset classes rather than single assets. Besides
generating higher returns, the BL model leads to more stable portfolio allocations over
time.
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 331

4.3.6.2 CIO Investment Process and Black-Litterman

A Black-Litterman-oriented investment process would have at least the following steps
(Walters [2014]):

• Determine which assets constitute the market.

• Compute the historical covariance matrix for the assets.

• Determine the market capitalization for each asset class.

• Use reverse optimization to compute the CAPM equilibrium returns for the assets.

• Specify views on the market.

• Blend the CAPM equilibrium returns with the views using the Black-Litterman
model.

• Feed the estimates (estimated returns, covariances) generated by the Black-Litterman

model into a portfolio optimizer.

• Select the ecient portfolio that matches investors' risk preferences.

These steps only dene one part of the investment process of a CIO. In general, the
CIO receives information from dierent sources the investment process: A macroeco-
nomic view from research analysts, market information, chartist information and valua-
tion information.Assume that one output of this information is to ' overweight Swiss
stocks - underweight European stocks'.

This denes a pair-wise bet. All bets of this type form the tactical asset allocation
(TAA). Several questions follow:

A How strong is the bet - that is to say, how much should the two stock positions deviate
from the actual level 'overweight Swiss stocks - underweight European stocks' ?.

B Should any possible currency risk in the bet be hedged?

C How long should this bet last?

D How condent is the CIO and his or her team about the bet?

E Is the bet implementable and what is the precision of such an implementation mea-
sured by the tracking-error?

F Will there be a stop-loss or prot-taking mechanism once the bet has been imple-
mented?

G How does the CIO measure the performance of the bet?

332 CHAPTER 4. PORTFOLIO CONSTRUCTION

The approach to question A is often based on the output of a formal model. That
is to say, a risk budgeting model, a BL model, or a mean-variance optimization model
proposes to increase Swiss stocks by 5 percent and to reduce the European stock ex-
posure by 5 percent. It is then common practice that this proposal is corrected by the
CIO, either because it creates too much turnover for the portfolio managers or because
he considers such a change to be too strong.

Question B is - among other things - a consistency question since, on the one hand,
the +/ − 5 percent increase in equities also changes the FX exposure of the whole TAA
and, on the other hand, there could be a CHF-EUR bet following from the many in-
formation sources. Typically - question C - bets are made for one month. This is the
standard time after which the CIO and his or her team review the TAA.

Question D is the information risk issue. Information risk is dierent from statistical
risk. The most well-known statistical risk measurement in the industry is the tracking
error, which measures the volatility of alpha over a period of time. The risk sources
are market, counterparty, and liquidity risk of the assets. Bernstein (1999) denes in-
formation risk as the quality of the information advantage of a decision-maker under
uncertainty.

Reconsider the above Swiss stock-European stock bet. This view must be driven by
our information set, as well as by the proprietary process of analyzing the information
and data. To evaluate information risks, we ask (Lee and Lam [2001]):

• What is the completeness and timeliness of our information set?

• Have we missed something?

• Have we misinterpreted something?

• How condent are we about our models and strategies?

These questions suggest that some information risk may be quantied with a good deal of
precision while in most cases precise measurement of information risks seems impossible,
and well-informed judgement may be necessary. This may result in a nal statement on
the decision-maker's condence of adding alpha. If, say, the condence is 50 percent, we
are not condent at all about the bet. A standard approach to measuring the perfor-
mance of bets is the hit rate (HR).

A hit rate of 60 percent means that we add alpha in 60 percent of the months in
which we make an active bet. The condence in adding alpha can be interpreted as the
expected value of the hit rate. Information risk is then quantied by the expected hit
rates of our investment views.
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 333

Example

We follow Lee and Lam (2001). They assume that alpha is normally distributed
around its mean value. Then, there is a unique one-to-one mapping between the hit rate
HR and the information ratio IR. To derive this relation, we have for the α of an asset
which follows a normal distribution:

HR = P (α > 0) , α ∼ N (α, TE)

with α the arithmetic average alpha and T E the tracking error. Changing variables:
Z ∞
1 1 2
HR = √ e− 2 y dy
2π − TE
α

i −α
with x = αTE and dening the information ratio

α
IR =
T

, we get: Z ∞
1
HR = √ f (y)dy = 1 − Φ(−IR), (4.33)
2π −IR
with f the standard normal density function, Φ the standard normal distribution function
and IR the information ratio. Once the expected alpha and expected tracking error, and
therefore the expected information ratio, are stated, the complete ex ante distribution of
alpha is specied. The hit rate is the area to the right of 0% alpha. Using the square-root
law the following information risks, condence levels, and information ratios follow:

Information risks Condence (monthly HR) Monthly IR Annualized IR

Low 60% 0.25 0.88
Medium 56% 0.15 0.52
High 52% 0.05 0.17
Innity 50% 0 0

Table 4.6: Information risks, condence levels, and information ratios (Lee and Lam
[2001]).

4.3.7 Heuristic Allocation: Risk Budgeting Portfolio Construction

Risk-based portfolio construction - called risk budgeting - has two basic properties:

1. It is not based on the optimization of an investor's utility function, unlike the

Markowitz model.
334 CHAPTER 4. PORTFOLIO CONSTRUCTION

2. It uses only explicitly the risk dimension of investment.

The rst property derives from the mentioned problems using quadratic optimization.
The second one reects the diculty of forecasting expected returns. Although only risk
is explicit, returns are implicit and the approach therefore a priori does not lead to very
conservative portfolios.

Constructing risk-based portfolios has three steps:

• Dene how risk is measured.

• Consider the risk allocation.

• Dene and solve the risk-budgeting problem. This implies the investment strategy.

4.3.7.1 Risk Measurements

The foundations of coherent risk measurement are given in the work of Artzner et al.
(1999). They dene a set of properties that each risk measure should satisfy, prove the
existence of such measures, construct such measures and show that some widely used
measures violate some of these properties. The properties that a coherent risk measure
should satisfy (Artzner et al. [1999]) are:

1. The risk of two portfolios is smaller than the sum of the risks.

2. The risk of a leveraged portfolio is equal to the leveraged risk of the original port-
folio.

3. Adding a cash amount to a portfolio reduces the risk of the portfolio by the cash
amount.

There are several variations of this axiomatic approach to risk theory.

Value at risk (VaR) is only a coherent risk measure for elliptical return distribution.
General VaR fails to satisfy axiom 1. Expected shortfall, i.e. what is the expected
loss given the loss exceeds a VaR-value, is a coherent and convex risk measurement.
Volatility risk measurements do not satisfy an additional property which is often assumed:
If a portfolio's return dominates another portfolio's return in all scenarios, the risk of
the former portfolio dominates the risk of the latter. Under the normal distribution
distributed, VaR, expected shortfall and volatility risk measurement are equivalent by
scaling the risk gures.

4.3.7.2 Risk Allocation

The main tool for risk allocation is the Euler allocation principle, see equations (2.12)
and (2.13).
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 335

4.3.7.3 Risk Budgeting

We restrict ourselves to the case of two risk budgets; the generalization is obvious. The
main idea is that the portfolio is chosen such that the individual risk contributions, using
a specic risk metrics, equal a predened risk budget.

Let B1 and B2 be two risk budgets in USD. For a strategy φ = (φ1 , φ2 ), the risk bud-
geting problem is dened by the two constraints, which equate the two risk contributions
RC1 and RC2 to the risk budgets - that is to say, the strategy is chosen such that the
following equations hold:

RC1 (φ) = B1 , RC2 (φ) = B2 . (4.34)

Summing the left-hand sides of (4.34) is, by the Euler principle, equal to total portfolio
risk. The sum on the right-hand side is the total risk budget. Problem (4.34) is often
recast in a relative form. If bk = cBk is the percentage of the sum of total risk budgets,
(4.34) reads

RC1 (φ) = b1 R(φ), RC2 (φ) = b2 R(φ) . (4.35)

The goal is to nd the strategies φ which solve (4.34) or (4.35) . This is in general a
complex numerical mathematical problem. But introducing the beta βk of asset k,

cov(Rk , R(φ)) (Cφ)k

βk = = ,
σ 2 (φ) σ 2 (φ)

the weights are given by

bk β −1 (φ)
φk = P k −1 . (4.36)
j bj βj (φ)

The weight allocated to component k is thus inversely proportional to the beta. This
equation is only implicit since the beta depends on the portfolio φ. The next proposition
summarizes some explicit solvable cases.

Theorem 74. Consider the risk budgeting program (4.35) for N assets with volatility
risk measure.

1. If correlation ρ = 0 among all assets,

√
bk σ −1
φk = P p k −1 . (4.37)
j bj σj

2. If correlation ρ = 1 among all assets,

bk σ −1
φk = P k −1 . (4.38)
j bj σj
336 CHAPTER 4. PORTFOLIO CONSTRUCTION

3. If correlation is minimal, i.e. ρ = − N 1−1 among all assets, the ERC portfolio
follows:
σ −1
φk = P k −1 . (4.39)
j σj

4. In all other correlation cases, the implicit formula (4.36) holds.

5. If all volatilities are the same:
 −1
X
φk ∼  φj ρik  (4.40)
j

1. implies for example that the higher volatility of a component, the lower is its
weight in the RB portfolio. For equal risk contributions (ERC) model where all weights
for the risk budget bk are set equal to 1/N , Maillard et al. (2008) show that the volatility
of the ERC model is furthermore located between the volatility of the minimum variance
(MVP) portfolio and the volatility of an equally capital weighted (EW) portfolio:
14

σM V P ≤ σERC ≤ σEW . (4.42)

The ERC portfolio is equal to the MV portfolio if (i) the correlation is constant and
(ii) the correlation value attains its lowest possible value. The ERC is equal to the EW
portfolio if all volatilities are identical.

Denition 75. The (ERC) approach is called the risk parity (RP) approach.
Although closed-form analytical solution for risk budgeting problems are possible only
in some particular cases, there is a simplied heuristic allocation mechanism - inspired
by the allocation (4.36):
−m
Riskk
φk = L × P −m (4.43)
k Riskk
with Risk any risk measure, L the portfolio leverage which is needed if one denes ex-
ante a risk level for the portfolio (risk-targeting approach) and m a positive number.
If m = 0, the portfolio is equally weighted. For increasing m, the portfolio allocation
becomes more and more concentrated on the assets with the lowest individual risk. For
example, the GMV portfolio follows if all correlations are set equal to zero and m=2
and ERC by assuming that all correlations are constant and m = 1.

Teiletche (2014) illustrates some properties for the above four portfolios using Ken-
neth French's US industry indices, 1973-2014; see Figure 4.10.

14 The three portfolios are dened as follow:

∂σ(φ) ∂σ(φ) ∂σ(φ) ∂σ(φ)
φk = φj (EW ) , = (M V P ), φj = φk (ERC) . (4.41)
∂φj ∂φk ∂φj ∂φk
4.3. PORTFOLIO CONSTRUCTION EXAMPLES 337

Figure 4.10: Risk-weighting solutions for EW, GMV, MD, and RP (ERC) portfolios
using sector indices from Kenneth French. The variance-covariance matrix is based on
ve years of rolling data (Teiletche [2014]).

Figure 4.10 indicates that GMV has a preference for lower volatility sectors (e.g.,
utilities or consumer non-durables), MD prefers low correlation (e.g., utilities or energy),
EW is not sensitive at all to risk measures, and RP (ERC) is mixed. The RP and EW
show similar regular asset allocation patterns and GMV and MD asset allocation patterns
are much less regular. The latter react much more to changing economic circumstances
and are therefore more defensive.

Maillard et al. (2009) compare the ERC portfolio with 1/N and MVP portfolio for
a representative set of the major asset classes from Jan 1995 to Dec 2008.
15 The ERC
portfolio has the best Sharpe ratio and average returns. The Sharpe ratio of the 1/N
portfolio (0.27) is largely dominated by MVP (0.49) and ERC (0.67). MVP and ERC
dier in their balance between risk and concentration. The ERC portfolios are much
less concentrated than their MVP counterparts and also their turnover is much lower.
Lack of diversication in the MVP portfolios can be seen by comparing the maximum
drawdown values: The value for MVP is −45% compared to −22% of the ERC portfolio.

When we restrict the risk measurement to volatilities, the heuristic approach (4.43)

15 The asset class representatives are: S&P 500, Russell 2000, DJ Euro Stoxx 50, FTSE 100, Topix,
MSCI Latin America, MSCI Emerging Markets Europe, MSCI AC Asia ex Japan, JP Morgan Global
Govt Bond Euro, JP Morgan Govt Bond US, ML US High Yield Master II, JP Morgan EMBI Diversied),
S&P GSCI.
338 CHAPTER 4. PORTFOLIO CONSTRUCTION

takes the following generic component-wise form (Jurczenko and Teiletche [2015]):

φ = kσ −1 , (4.44)

where k is a positive constant, σ is a vector of volatilities of N assets, and φ is the vector

of risk-based portfolio weights. Equation (4.44) corresponds to the risk parity and max-
imum diversication portfolio solutions when the correlation among assets is constant,
the minimum variance portfolio when correlation is zero, and the 1/N portfolio when all
volatilities are equal. Many practitioners use (4.44) to scale their individual exposures
and the MSCI Risk Weighted Indices attribute the weights proportionally to the inverse
of the stock variances.

The constant k can be calibrated in dierent ways. Using a capital-budgeting con-

straint, that is to say, the sum of the components φi is equal to 1, implies

1
k=P −1 .
k σk
So, (4.44) becomes the heuristic model (4.43) with m=1 and zero leverage. If we use a
volatility-target constraint σT for the risk-based portfolio, we get
σT σT
k= = (4.45)
N Concentration N C(ρ)
with ρ the average pair-wise correlation coecient of the assets and C(ρ) the concentra-
tion measure
16
p
C(ρ) = N −1 (1 + (N − 1)ρ) . (4.46)

The concentration measure varies from 0, when the average pair-wise correlation reaches
its lowest value, to +1, when the average correlation is +1. Hence, k increases when the
diversication benets are important - that is, when the correlation measure decreases.
In this case, each constituent's weight needs to be increased to reach the desired volatility
target: the risk-based portfolio even becomes leveraged. Risk-based investing often faces
the criticism that it cannot allow for views. This is not true, see Jurczenko and Teiletche
(2015) and Roncalli (2014).

16 To prove this formula, we write Λσ for the diagonal matrix with the vector of volatilities σ on its
diagonal, ρ the correlation matrix of returns and I the identity matrix. The covariance matrix can be
written in the form C = Λσ ρΛσ which implies
hσ −1 , Λσ ρΛσ σ −1 i = he, ρei .

The volatility of the risk-based portfolio is then given by (using (4.44)):

p p s XX
σRB = φCφ = k he, ρei = k 1 + ρij .
i j6=i

Introducing the average pairwise correlation coecient

1 XX
ρ= ρij
N (N − 1) i
j6=i

implies (4.45).
4.4. ESTIMATION: THE COVARIANCE MATRIX 339

4.4 Estimation: The Covariance Matrix

The quality of the estimation of the input variables expected returns and covariance ma-
trix is key. We follow in several parts de Nard et al. (2018) and Ledoit and Wolf (2019).
Estimating the expected return has been for a long time in the focus of researchers start-
ing with the CAPM, the Fama-French model and the explosion of factor models in the
last years. One reason for the focus on returns is the existence of the The fundamental
asset pricing equation. It states that changes in asset price returns are driven by chang-
ing expectations of the cash ows, changing correlations between the assets or changes in
the discount factors. We refer the reader to Ilmanen (2012) for a discussion of estimating
the expected return.

There is no comparable economic equation as the fundamental pricing equation for

the covariance matrix which identies the drivers. The main theoretical property of a
covariance matrix is the possibility to diagonalize the matrix. This representation, the
Principal Component Analysis (PCA), reduces the complexity of a N ×N covariance
matrix to the study of N eigenvalues (entries of the diagonal matrix). Many approaches
in estimating the covariance matrix start with an analysis of the eigenvalues.

The estimation problem of the covariance matrix faces estimation risk: The true pa-
rameters in the models are not known and one has to estimate these parameters given
only a nite data set. Whichever statistical approach we choose, there is risk that the
estimated parameters are dierent from the unknown, true parameter values.

There are dierent methods to estimate the covariance. We can classify the methods
in three dimensions:

1. Static versus dynamics estimate of the covariance matrix.

2. Number of degrees to estimate.
3. Structure-Free Models versus Exact Factor Models versus Approximate
Factor Models.
The number of degrees varies from order N2 with N the number of assets to the order N
to low orders of 1, 2, 3 and the order zero. The order N 2, which means to use a sample
estimate follows by the dimension of the covariance matrix N (N − 1)/2. Using a factor
model, the order grows linearly since K + N (K + 2), if there are K factors, is the size
growth of the covariance matrix. We rst rule out the possibility to estimate the covari-
ance matrix using an unbiased sample estimates C S .17 Given N assets and a time series
of length T ∼ N, then one cannot estimate an number of parameters which is of order
N2 by the same order of magnitude of available data. This leads to a large estimation
error. Considering N = 1000 assets, a ve years time series means T ∼ 10000 which we

17 The expected covariance matrix is equal to the true covariance matrix.

340 CHAPTER 4. PORTFOLIO CONSTRUCTION

claim is by far not long enough to control estimation error.

To quantify this, let R(j) be the rate of return in the past month j. The average
return of n observations assuming IID returns has itself a mean R and a standard devi-
√
ation σ/ n. These are the true values. For an assumed annual return of 12%, the true
monthly return is R1m = 1%. For an annual standard deviation of σ = 5% the monthly
√
estimate σ1m = 5/ 12 = 1.44% follows. This estimate is larger than the mean itself, i.e.
not meaningful. Using n = 60 (ve years of data), the standard deviation estimate be-
comes 0.00645, which is not signicantly smaller than the mean. If we would like to have
√
a standard deviation estimate of, say, 1/10 of the mean, the equation 0.05/ n = 0.001
implies n = 2, 500. This corresponds to a time series of more than 208 years (2,500/12).

To illustrate estimation risk, we estimate the sample mean bS

µ and sample covariance
b0 S
C from the data and plug the values into the optimal portfolio rule (4.3):

1 b −1,S S
φM V = C µ
b . (4.47)
θ
Assuming that the plugged-in parameters are the true ones leads to zero estimation
risk. But this is not an optimal approach.
18 One has to dene a procedure outside of
the investment optimization program which xes the values of the parameters.

Bouchaud and Potters (2009) illustrate this. They consider the Markowitz model
without the full investment constraint. The optimal policy, if we assume the true ρ
known:
ρ−1 µ
φM V = r (4.48)
hµ, ρ−1 µi
with r the expected mean return. The true minimal risk is then

2 2 1
σM V = hφM V , ρφM V i = r . (4.49)
hµ, ρ−1 µi
We compare this optimal case with the in-sample and out-of-sample risks. The in-sample
estimate uses the known empirical correlation matrix ρbS of the corresponding period.
The out-of-sample matrix uses the empirical correlation ρ̃
S which is observed in the next
period. The portfolio risks read:

2 2 1 2 2 hµ, ρ̃−1 ρρ̃−1 µi

σM V,in = r , σM V,out = r . (4.50)
hµ, ρb−1,S µi (hµ, ρ̃−1 µi)2
If the posterior estimate is equal to the true one, then the risk of the out-of-sample
estimate is equal to the optimal one. Assuming that the in-sample estimate is not biased,

18 Tu and Zhou (2003), Kan and Zhou (2011), Zellnter and Chetty (1965), Pastor and Stambaugh
(2000).
4.4. ESTIMATION: THE COVARIANCE MATRIX 341

convexity properties for positive denite matrices imply:

2 2 2
σM V,in ≤ σM V ≤ σM V,out . (4.51)

How far away are the in- and out-sample risk from true risk? Pafka and Kondor (2004)
show that for IID returns and large portfolios:

2 2
p 2 N
σM V,in = σM V 1 − q = σM V,out (1 − q) , q = . (4.52)
T
√
The in-sample risk is 1−q smaller than the true risk and while the out-of-sample risk
√
is larger than true risk by the value 1/ 1 − q . This denes data snooping.

But also estimation risk of the mean matters. Ang (2014) estimates the original
mean-variance frontier using data from January 1970 to December 2011. The mean
of US equity returns is 10.3 percent. Ang changes the mean to 13.0 percent. Such a
change is within two standard error bounds. The minimum variance portfolios for a
desired portfolio return of 12 percent are given in Table 4.17. This change caused the
US position to change from -9 percent to 41 percent, and the UK position to move from
48 percent to approximately 5 percent.

Asset US mean = 10.3% US mean = 13.0%

USA -0.0946 0.4101
JPN 0.2122 0.3941
GBR 0.4768 0.0505
DEU 0.1800 0.1956
FRA 0.2257 -0.0502

Table 4.7: MV portfolios for two dierent expected equity returns (Ang [2014]).

Returning to the estimation of the covariance matrix, the main question is how to
reduce the number of degrees of freedom for the estimation purpose? The fully agnostic
view of assuming equal weights to be optimal, φ = 1/N , is the other extreme to using
sample estimates. Assuming EW means to avoid the need to estimate any input param-
eter - neither variances, correlations nor returns. DeMiguel et al. (2009) compare 14
optimized portfolio approaches across 7 datasets with the 1/N EW investment. Surpris-
ingly, 1/N is dicult to beat by the 14 optimal portfolios. They empirically compare
the Sharpe ratios, analytically derive the critical estimation window length for mean-
variance strategy to outperform 1/N and use simulations to extend the models to classes
of models which are designed to control estimation risk. The ndings are:

• Empirically none of the 14 portfolio models consistently dominates 1/N across all
data sets in terms of Sharpe ratio and turnover.

• Using US stock data, for 25 assets in the portfolio the critical estimation window
0
is around 3 000 months of data. This gure doubles for twice as much assets in the
portfolio.
342 CHAPTER 4. PORTFOLIO CONSTRUCTION

• Models which control for estimation risk also need very long data series to outper-
form 1/N .19

These results contradict the common view that heuristics is less successful than statis-
tical optimization models. Ignoring part of ambiguous information - insucient historical
data for estimation of model input parameters - is what makes heuristics 1/N robust for
the unknown future. But as we discuss below, there are now convincing alternatives to
beat this agnostic case.

The 1/N model can also be used to get insights about which estimation risk is more
severe - return or covariance risk? We follow Rohner (2014). Assume that µ and C are
known for 10 assets such that 1/10 is invested optimally in each asset. To see the impact
if either of the two parameters is not known, simulate multivariate normal returns, use
rolling window estimation with an integration period of 100 and calculate 300 sample
means and estimated covariances. Use these estimates to calculate for each date the
optimal portfolio weights, see Figure 4.11.

Figure 4.11: Signicance of return and covariance estimation risk. Source: Rohner [2014].

Even if the distribution of returns is known, estimated portfolio weights can deviate
largely from their theoretical optimal values. It follows, that estimation error in expected
returns has a larger impact than dependency estimation errors on the optimal portfolio
weights. Therefore, GMV portfolio which do not depend on estimated returns are more

19 They consider Bayesian portfolios, portfolios with moment restrictions, portfolios with short-sale
constraints and combinations of optimal portfolios.
4.4. ESTIMATION: THE COVARIANCE MATRIX 343

robust than other ecient portfolios.

Having argued against the agnostic and the N2 methods to estimate the covariance
matrix, the next step is to consider low dimensional methods or methods which grow
linearly with N . 20 The linear shrinkage approach or the factor model approach are
examples of low dimensional models and we compare these approaches with the order N
approach of Ledoit and Wolf (2018), a non-linear shrinkage approach.

We proceed as follows: We rst consider the topic of diagonalization of the covariance

matrix (PCA, random matrices) and then we consider the linear and non-linear shrinkage
approach.

4.4.1 Dimension Reduction: Eigenvalues and Eigenvectors

The estimation of the covariance is high dimensional. It is natural to reduce its dimen-
sion by projecting to a lower dimension space or by nding a better data representation.
Principal Component Analysis (PCA) is a simple and successful method which dates
back to a 1901 paper by Pearson. Let (Rk ) ∈ RN be a return time series of length
T. The goal is to project the data to a lower dimensional subset K: First, nd the K
dimensional ane space such that the projection of PK (Rk ) are the best approximation
to Rk . Second, nd K dimensional projection PK ofRk such that as much variance of
the data as possible is preserved.

Let
K
X
Rk ∼ µ + βkj vj
j=1
an approximation with W = (v1 , ..., vK ) where (vj ) is an orthonormal basis for subspace
K. Then, W 0W = I do to he orthonormality of the vectors (vj ). Finding the best linear
t means to solve the least square optimization:

N
X
min0 ||Rk − (µ + W βk )||2 .
µ,W,βk ,W W =I
k=1
Optimizing for µ implies
µ∗ = µN ,
i.e. the sample mean follows. Optimizing βk implies (orthonormality of the v 's):
βk = W 0 (Rk − µN ).
Inserting this in the objective function implies

N
X
min ||W (Rk − µN )0 W W 0 (Rk − µN )||2 = (N − 1)tr(W 0 C S W )
W,W 0 W =I
k=1
20 Ledoit and Wolf (2003, 2004a,b), Kan and Zhou (2007), Brandt et al. (2009), DeMiguel et al. (2009,
2013), Frahm and Memmel (2010), and Tu and Zhou (2011).
344 CHAPTER 4. PORTFOLIO CONSTRUCTION

with CS the sample covariance and tr the trace of a matrix. But the matrix under the
trace can be diagonalized according to the spectral theorem of linear algebra applies:

Proposition 76. Let C be a symmetric, positive denite and real matrix of dimension
N × N . There exists a diagonal matrix Λ and matrix W such that

W 0 CW = Λ. (4.53)

The diagonal elements of Λ are real-valued and positive (the eigenvalues λ1 , ..., λN ). The
eigenvalues solve the polynomial equation det(C − λI) = 0 with I the identity matrix.
Given any eigenvalue λk , the solution of the linear equation Cvk = λk vk is called an
eigenvector vk . They form an orthonormal basis and W = (v1 , ..., vN ).
Hence, the PAC is given by nding the largest eigenvalues and the corresponding
eigenvectors. The restriction to the largest eigenvalues means 'de-noising' the covariance
matrix. A covariance matrixC of dimension N × N does not tell us how much the
unobservable risk rivers of the N assets add to the total portfolio variance. Transforming
the matrix using PCA allows us to derive how important the risk factors are in explaining
portfolio risk? Consider Figure 4.12 where in the left panel the closing values of the Dow
and S&P 500 index are shown. The two series are heavily dependent: A data point of
the Dow corresponds to a S&P closing price such that the pair is close to the diagonal
(think about the bifurcation for low closing prices). The dependence can be o-set, if we
rotate the coordinate system. In the new coordinate system, data points have almost no
variance in the y2 direction but only one in the y1 direction. Therefore, the y1 -direction
factor explains most of the portfolio variance. De-noising then means to neglect the
y2 -risk contribution. PCA does this.
The eigenvectors explain the variance of the factors in (2.2):

σp2 = hφ, Cφi = hφ, W 0 ΛW φi

X
= hW φ, ΛW φi = hψ, Λψi = λi ψi2 . (4.54)
i

Factors with low eigenvalues add only little to the portfolio risk and are therefore avoided
- the de-noising of the covariance matrix. But the eigenvalues that are important from
a risk perspective are the least important ones from a portfolio optimization perspective
where C −1 matters, see (4.3) φ = 1θ C −1 µ. But the eigenvalues of the information matrix
are the reciprocal values 1/λk of the eigenvalues λk . This trade-o between risk and
investment is one reason why portfolio managers often do not use portfolio optimization
methods. Furthermore the small values of the inverse eigenvectors needed for optimal
portfolios are not robust - a small change of the values heavily changes the portfolio.
Therefore, regularization techniques are used.

Consider the matrix

2.25 0.4330
M= .
0.4330 2.75
4.4. ESTIMATION: THE COVARIANCE MATRIX 345

Figure 4.12: Closing values for the S&P 500 and Dow Jones Index in 2006. The red
coordinate systems denote the rotation applied in PCA.

This matrix is symmetric. Solving the eigenvalue equation

det(M − λI) = (2.25 − λ)(2.75 − λ) − 0.43302 = 0

we get the two eigenvalues, λ = 3 and λ = 2. Therefore, matrix M is also positive denite
and satises all the mathematical properties of a covariance matrix. The information
matrix M
−1 has the inverse eigenvalues 1/3 and
1
2 on its diagonal, which shows that the
ranking order reversion. Solving the two linear systems for the eigenvectors implies

v1 = (−1.73205, 1)0 , v2 = (1, 1.73205)0 .

The two vectors are orthogonal.

As an application, consider the linear factor model (2.2) with N assets and K risk
factors F. How does one estimates the model? Let R = (R1 , . . . , Rt ) be the N ×T matrix
and assume that K < N. Then (Kempthorne, Factor Models, MIT Lecture Notes, Fall
2013)

• Step 1: PCA analysis

1
• x̄ = T XI, means of the rows with I the identity matrix.

• De-meaning of returns X ∗ = X − x̄I

346 CHAPTER 4. PORTFOLIO CONSTRUCTION

[% ] PCA of C
Factor 1 Factor 2 Factor 3
Asset 1 65 -72 -22
Asset 2 70 69 -20
Asset 3 30 -2 95
EV 8 0.8 0.3
Cumulated σp -contribution 88 97 100

Table 4.8: PCA analysis of covariance matrix. Note that the eigenvalues of C −1 are
12, 119, 380 for the factors 1, 2, 3, i.e. the inverse ordering relation compared to the
covariance matrix. The rst factor in the covariance matrix is a market factor since all
components in the eigenvector are positive. It has the largest eigenvalue and contributes
88 percent to the portfolio's volatility.

• C
b= 1 ∗ ∗0
Sample covariance
TX X .

• PCA: C
b=W
cΛ c0.
bW

• Step 2: Fixing initial estimates

• α
b0 = x̄

• βb0 = W b m ) 12
cm (Λ where the subindex indicates the submatrix of the rst m columns

• D b − diag(βb0 βb0 )
b 0 = diag(C)
0

b0 = βb0 βb0 + D0
• C 0

• Step 3 Adjustment

• Adjust sample covariance to b∗ = C

C b−D
b0

• Adjust PCA, i.e. compute eigenvalues and eigenvectors for b∗ = W

C cΛ c0;
bW update
the eigenvalue and eigenvector matrices

• Repeat the initial xing steps leading to βb1 , D

b1 and b1 = βb1 βb0 + D1 .
C 1

• Step 4: Generate sequence of estimates

• Repeat the adjustments of the last step

• Leads to sequence of triple βbk , D

bk, C
bk for k = 1, 2, . . . until Ds becomes suciently
small

• Finally, use the estimates from the last step

4.4. ESTIMATION: THE COVARIANCE MATRIX 347

Another approach to nd the parameters is to use a maximum likelihood estimation.

Geometrically, the matrix β is found by an orthogonal projection of the returns on the
set generated by the factors. This projection are the betas in beta pricing models or the
factor risk premia in the APT model. Analytically, β is given by the eigenvectors of the
PCA.

Example Roncalli (2104)

Consider the S&P 500, SMI, Eurostoxx 50, and Nikkei 225 indices from Apr 1995 to
Apr 2015. Calculating the correlation matrix on a weekly basis using the closing prices:

 
1
 0.8 1 
ρ=
 0.82 0.88
s
1 
0.67 0.56 0.58 1

The data indicate that the correlation between the European and American markets is
stronger than between the Japanese market and the European or American one. We
therefore set up a two-linear-factor model.
The matrix β follows from the likelihood estimation

−0.015 0.21 0.29 0.35
β=
.91 0.93 0.96 0.76

The portfolio is long only in one factor, the market factor by denition, and long/short
in the second factor. Here it is short in the S&P 500 and long in the other three indices.

Given a PCA analysis of a covariance matrix - how noisy are the estimated eigenval-
ues? Random Matrix Theory (RMT) considers the study of the eigenvalues, eigenvectors
of large-dimensional matrices whose entries are sampled according to known probability
densities.
21 Basically, if the eigenvalue distribution of a covariance matrix is close to those
of a matrix of completely random entries, then randomness dominates in the covariance
matrix. A main feature of RMT is universality: The asymptotic behavior of random
matrices is often independent of the distribution of the entries. A second one is that
the limiting distribution takes non-zero values only on a bounded interval, displaying
sharp edges. Sharp edges indicate that eigenvalues outside of the asymptotic range are
non-random.

We write the empirical covariance ( ??) in the form

1
CS = RR0 (4.55)
T
21 References for RMT are Wigner (1951), Wishart (1928), Pastur and Marchenko (1967), Levina and
Vershynin (2012), Vershynin (2012), Gatheral (2008).
348 CHAPTER 4. PORTFOLIO CONSTRUCTION

where R is a N ×T matrix whose rows are the time series of the returns, one row for
each stock. We assume that returns are normalized by their standard deviation such
that their variance is 1. Suppose that the entries of R are random IID variables with
2
mean zero and variance σ , i.e. R ∼ N (0, C). R is a random matrix. Using PCA,
the hope is to nd a low dimensional structure in the distribution which corresponds
to large eigenvalues of C. How close are the spectral properties of CS and C? If N is
xed and T → ∞, the law of large numbers guarantees E[C S ] = C . But N is often of
the order of T or even larger. In this case it is not clear whether C S converges towards C .

We start with C = I, i.e. there is no low dimensional structure. For T = 500, N =

1000 the histogram in Figure 4.13 shows that for nite N there is a positive probability
of nding eigenvalues that may be above or below the theoretical bounds. The red line
is the eigenvalue distribution predicted by the Marchenko-Pastur distribution.

Eigenvalues Distribution C= Identity Matrix

Marchenco-Pastur Law

10%

6%
Frequency

0%
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3
Eigenvalues

Figure 4.13: Simulation for N = 1000 stocks and T = 500, i.e. daily data for two years.
The red line is the theoretical eigenvalue distribution of Marcenko and Pastur. Source:
Gatheral [2008].

The random matrix corresponds to the null hypothesis that the set of stocks consid-
ered are strictly independent, and that the correlation matrix is the identity matrix. Any
deviation from this structure in the empirical correlation matrix suggest the presence of
true information. All eigenvalues which belong to the theoretical spectrum of eigenvalues
are noisy and should not be considered in portfolio optimization.

How are the PCA eigenvalues related to the eigenvalues of the random matrix covari-
4.4. ESTIMATION: THE COVARIANCE MATRIX 349

ance? The Theorem of Marcenko and Pastur provide the answer:

Proposition 77 (Marcenko-Pastur). Let R be the random matrix dened above. In the

limit N, T → ∞ where the ratio Q := T /N ≥ 1 is kept constant, the density of eigenvalues
of λ is given by
p
Q (λ+ − λ)(λ− − λ)
ρ(λ) = 2
(4.56)
2πσ λ
where
r 2
2 1
λ± = σ 1 ± .
Q

The random matrix CS is a random Wishart matrix, λpm are the theoretical minimum
and maximum eigenvalues of the random correlation matrix and ρ is the Marcenko-Pastur
density.
22 The proof of the theorem is based on the following moment expansion and
combinatorics:
Z λ+
1
E[R0 R)k ] = λk dρ(λ) .
N λ−

What can be said about the distribution of the largest eigenvalue? Can we nd
the cut-o the eigenvalues separating noisy from eigenvalues with have true information?
What can be said about the eigenvalue distribution if N >T and can the IID assumption
in RMT be relaxed? The answer to the rst question is given by the Tracy-Widom law:
The probability distribution of the largest eigenvalue can analytically expressed in case
of normally distributed random variables. We refer to the literature for details.

Figure 4.14 compares the case of a random identity matrix with the eigenvalue dis-
tribution of a risk model (blue histogram). The higher frequency for large eigenvalues
indicates that the largest eigenvalues in the risk model which determine the risk is not
driven by noise compared to the identity matrix assumption. In other words, the risk
model is able to capture true risk information. A similar conclusion follows for the small
eigenvalues which dominate in optimal portfolio construction. For the intermediate eigen-
values there is virtually no dierence to the pure noise case. These the factors which
matter in the Markowitz model in the long-short portfolio of the expected return minus
the beta hedge which lead to unstable optimal allocations.

4.4.2 Linear Shrinkage of the Covariance Matrix

The sample covariance matrix in (4.55) is unbiased and the maximum likelihood estimator
under normality. Stein (1956) and James and Stein (1961) showed that in dimensions
N larger than 3, a better estimator than the sample mean exists by shrinking, i.e. by
using a linear combination of the sample mean and the target vector: The mean squared
error (MSE) of the shrinked estimator is smaller than for the sample mean. Ledoit and

22 ρ(λ) is dened as the limit ρN (λ) =: 1/N PN δ(λ − λj ) with δ the Dirac delta function.
j=1
350 CHAPTER 4. PORTFOLIO CONSTRUCTION

Eigenvalues Distribution C= Identity Matrix

versus Risk Model

10%

6%
Frequency

0%
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3.0
Eigenvalues

Figure 4.14: Simulation for N = 1000 stocks and T = 500, i.e. daily data for two years.
The blue histogram corresponds to a the eigenvalue frequencies of a risk model.

Wolf (2004) extended Stein's approach to the covariance matrix. To this end, we use the
Frobenius norm for a N ×N matrix A:
p
||A||F ; = tr(A0 A)/N .

Linear shrinkage is the approach where a highly structured estimator I, say the iden-
tity matrix representing the 1/N approach, is combined with the unstructured sample
covariance matrix CS with N 2 -growth in the form
Ĉ = a1 F + a2 C S , ν ∈ [0, 1].
The shrinkage value aj is constant. To nd the optimal weights, one solves

min E||Ĉ − C||2F ]

where Ĉ is given by the above linear form. A straightforward calculation proves:

Proposition 78. The above optimization problem has the unque solution:
Ĉ ∗ = a∗ µI + (1 − a∗ )C S (4.57)

where
E[||C S − C||2F
a∗ = , µ = tr(C 0 I)/N .
p
S
E[||C − µ||F 2

The matrix Ĉ ∗ is invertible.

4.4. ESTIMATION: THE COVARIANCE MATRIX 351

The optimal solution can be interpreted as shrinking the sample covariance matrix
towards the shrinkage target µI with (shrinkage) intensity a∗ . Ledoit and Wolf (2003)
state: The beauty of the principle is that by properly combining two 'extreme' estimators
one can obtain a 'compromise' estimator that performs better than either extreme.

The above optimal convex combination depends on unkonwn population parameters

of the true covariance matrix, i.e. the estimator is not feasible. But is is not dicult to
derive a feasible estimator that is asymptotically as good. Asymptotically here means
that both N, T tend to innity buy the ratio N/T converges to a positive, xed ratio,
see Ledoit and Wolf (2019) for the analysis.

To illustrate the shrinkage idea, consider a N ×N covariance matrix where σij is the
non-observable true covariance for i 6= j and covij is the sample covariance. The squared
deviation of the weighted average from the true value reads (1 − a)covij − σij )2 which
is a loss measure. Since the sample covariances are random, we take the expected loss
function and minimization is a routine quadratic optimization with the following optimal
shrinkage intensity
P
var(covij )
j>i
a= P .
|var(covij + σij |
j>i

Hence, to x the optimal shrinkage intensity a∗ , the variances have to be estimated.

Since the nominator and dominator are both positive and the latter one dominates the
former one the shrinkage intensity takes values in the unit interval.

4.4.3 Non-Linear Shrinkage of the Covariance Matrix

The class of non-linear shrinkage estimators was introduced by Stein (1975, 1986). Intu-
itively, small eigenvalues of the sample covariance matrix are pushed up and the large ones
pulled down by an amount that is determined individually for each eigenvalue. Given N
eigenvalues, there are N degrees of freedom. This therefore denes an intermediate ap-
proach to covariance estimation between the low dimensional optimized approaches and
the unstructured sample estimate. Based on the work of Stein and Theorem of Marcenko
and Pastur, see also the Random Matrix Section. Contrary to the linear shrinkage, were
all shrinkage of the target µI is uniform or global, in the non-linear case entries with
relatively more (less) sampling error should be moved more (less) to the corresponding
entries of the target. Ledoit and Wolf (2019) show how to control for two problems with
a local or non-linear approach. First, the number of distinct entries of Ĉ is of the order
N 2. Second, using dierent shrinkage intensities one has to assure that the resulting
shrinkage estimator will be positive semi-denite. To control these two issues one start
from the spectral decomposition of the sample covariance matrix, i.e. one works with the
eigenvalues and eigenvectors. The authors show that the spectral representation of the
optimal sample covariance matrix can be written with the same eigenvectors as for the
352 CHAPTER 4. PORTFOLIO CONSTRUCTION

sample covariance matrix but with the eigenvalues replaced by the convex combination

λ̂i = a∗ µ + (1 − a∗ )λi

where λi is an eigenvalue of the sample covariance matrix. Then one uses dierent
shrinkage intensities for dierent sample eigenvalues. Since there are N eigenvalues, the
method is of order N. As in the case of linear shrinkage, one denes a similar opti-
mization problem which leads to an infeasible estimator. One therefore also relies on
asymptotic analysis. But since there are N parameters in the limit N, T to innity, the
number of parameters explodes. To perform the analytics one has to use the machinery
of random matrix theory. We refer to the literature.

We review two extensions of the set-up: The extension to dynamic models and the
extension to factor models. The dynamic models allow us to get rid of the IID assumption
for the T observations. In order of not running in the curse of dimensionality problem,
instead of multivariate GARCH models, Ledoit and Wolf suggest to use a version of
the dynamic conditional correlation (DCC) model of Engle (2002) based on correlation
targeting. They dene a GARCH(1,1)-type of model for the conditional covariance of
devolatized returns, see the literature for details.

The second extension is to use factor models to estimate the covariance matrix of a
large universe of asset returns. Setting up a factor model, the approach is to use shrinkage
estimation for the residual covariance matrix of a general factor model. The factor model
can be static, i.e. the intercepts and the factor loadings are time-invariant, the conditional
covariance matrix of the vector of factorsis time-invariant and the conditional covariance
matrix of the vector of errors is time-invariant. Dynamic factor models are then given
by assuming all static components to be dynamic except the intercept since the authors
found that in this context of portfolio selection the conditional factor models do not work
better.

4.4.4 Comparing Dierent Approaches - Asymptotics

Ledoit and Wolf (2018) provide an asymptotic analysis on a monthly basis using Cen-
ter of Security Prices, Jan 1 1972 until Dec 31 2011. The out-of-sample period ranges
from Jan 19 1973 to Dec 31 2011, i.e. T = 480 months. Each month the covariance
matrix is estimated using the most recent T = 250 daily returns. Portfolio sizes N are
30, 50, 100, 250, 500 covering the majority of important stock indices. They rst x the
500 largest stocks with a complete return history over 1 year and expected over the next
month and then select at random the N stocks.

The test goal is the estimate of the GMV portfolio without any short-sales restrictions.
They consider 11 portfolio but we restrict to the following cases:

• 1/N portfolio.
4.4. ESTIMATION: THE COVARIANCE MATRIX 353

• Sam: Sample covariance matrix estimate portfolio.

• Lin: Linear Shrinkage portfolio.

• Non-Lin: Non-linear Shrinkage portfolio.

• Inv-Non-Lin: Non-linear Shrinkage portfolio where the information matrix is esti-

mated.

• Sharpe: The portfolio where the estimate is given by a single factos.

• FF: Estimated covariance is given by the Fama-French three-factor model.

Table 4.9 presents the results. Since the standard deviation of the true GMV portfolio

Portfolio 1/N Sam Lin Non-Lin Non-Lin-Inv Sharpe FF

N=30
AV 11.14 8.64 8.52 8.71 8.72 8.22 9.39
SD 20.05 14.21 14.16 14.08* 14.08 14.08 14.59
SR 0.56 0.61 0.6 0.62 0.62 0.56 0.66
N=50
AV 9.54 4.65 5.1 5.21 5.22 5.22 5.44
SD 19.78 13.15 12.75 12.68*** 12.68 13.04 12.51
SR 0.48 0.35 0.4 0.41 0.41 0.4 0.43
N=100
AV 10.53 4.74 4.99 5.1 5.12 4.81 5.8
SD 19.34 13.11 11.79 11.52*** 11.55 11.96 11.3
SR 0.54 0.36 0.42 0.44 0.44 0.4 0.51
N=250
AV 9.57 275.02 5.81 6.26 6.43 5.95 6.6
SD 18.95 3,542.90 10.91 10.34*** 10.49 11.3 10.47
SR 0.5 0.08 0.53 0.61 0.61 0.52 0.63

Table 4.9: Performance measures for various estimators of the GMV portfolio. AV,
average; SD, standard deviation; SR, Sharpe ratio; FF, Fama-French three-factor model.
All measures are based on 10,080 daily out-of-sample returns in excess of the risk-free
rate. In the rows SD, the lowest number appears in bold. In the columns Lin and Non-
Lin, signicant out-performance of one of the two portfolios over the other in terms of
SD is denoted by asterisks: *, **, and ** indicate signicance at the 10%, 5%, and 1%
level, respectively (Ledoit and Wolf [2018]).

decreases in N , this should be reected by the dierent constructions. Increasing from

N = 30 to N = 500, the standard deviation of 1/N decreases by only 1.1 percentage
points compared to Lin and Non-Lin with 3.9 and 4.4 percentage points. In general, 1/N
is consistently outperformed in terms of the standard deviation by all other portfolios
with exception the sample portfolio. Non-Lin has the uniformly best performance among
354 CHAPTER 4. PORTFOLIO CONSTRUCTION

the rotation-equivariant portfolios. ForN = 250 and 500, Sharpe ratio gains are 0.08
and 0.06 or in relative terms 15% and 12%, respectively. If one forms the factor portfolio
Non-Lin-Sharpe, then it outperforms FF which outperforms SF (numbers not displayed.)
Summing up, Non-Lin dominates all other rotation-equivariant portfolios portfolios in
terms of the standard deviation and additionally Lin in terms of the Sharpe ratio. Con-
sidering the summary statistics of portfolio weights over time, the most dispersed weights
among the rotation-equivariant portfolios are found for Sam. The three shrinkage meth-
ods have generally the least dispersed weights. The authors provide robustness tests,
tests with transaction costs and tests where individual stocks are replaced by the Ken
French portfolios.

Table 4.10 presents the results for the case where dynamic and factor models are
used.

N=100 N=1000
AV SD IR AV SD IR
EW 16.55 21.33 0.78 17.55 20.3 0.87
NL 14.76 14.16 1.04 15 8.75 1.71
DCC-NL 14.95 14.13 1.06 14.82 7.95 1.86
EFM1 15.37 16.5 0.93 16.33 12.78 1.28
EFM5 15.22 15.49 0.98 15.94 11.39 1.4
AFM1-NL 14.79 14.16 1.04 15 8.75 1.72
AFM5-NL 14.78 14.17 1.04 14.9 8.75 1.7
AFM1-DCC-NL 14.69 14.02 1.05 15.76 7.84 2.01
AFM5-DCC-NL 14.58 14.09 1.04 15.28 7.91 1.93

Table 4.10: Annualized performance measures (in percent) for various estimators of the
Markowitz portfolio with momentum signal. AV = average; SD = standard deviation;
and IR = information ratio. AV is the average of the 10,080 out-of-sample returns and
then scaled to one year. SD is the standard deviation of the 10,080 out-of-sample returns
and then scaled to one year. IR is the ratio AV/SD. EFM means Exact Factor Models,
AFM Approximate Factor Models. The number after EFM and AFM stands for the
number of considered Fama-French factors. DCC means the dynamic model and NL
non-linear shrinkage. *** denotes signicance at the 0.01 level.(Ledoit and Wolf [2018]).

The return signal is given by momentum, i.e. the geometric average of the previ-
ous 252 returns on the stock but excluding the most recent 21 returns. The vector of
these averages dene the expected return µ signal. The rst result is that all models
consistently outperform the 1/N model. Second, approximate factor models consistently
outperform the exact factor models. Third, DCC-NL outperforms the other structure-
free models and the exact factor models and AFM-DCC-NL consistently outperforms
DCC-NL for large portfolio sizes. For the one-factor AFM-DCC-NL with N = 1000 the
outperformance is statistically signicant. In a nutshell, dynamic models dominate static
ones, 1/N becomes a dominated strategy in this ne-tuned non linear shrinkage approach
4.5. FACTOR MODELS 355

and dynamics plus one factor does better than using more factors. This is an indication
that instead of searching a large number of factors a sound dynamics of the estimation
of the covariance matrix leads to more performing results.

4.5 Factor Models

One searches for liquid objects - risk factors - which are (i) random variables that should
explain return of assets, (ii) not divisible into smaller parts and (iii) dierent risk factors
should do not contain the same risk sources (risk unbundling). Do risk factors exist?
How are they selected? Several dierent approaches are used to select factors.

The rst one uses theory. The classic is the CAPM where the market portfolio return
is only factor which determines expected returns. Merton (1973) extended the theory to
the inter-temporal context. In this model any state variable that predicts future invest-
ment opportunities such as term premium, volatility premium, default premium, ination
dene additional factors.

Statistical factor selection is a second approach with the arbitrage pricing theory
(APT) of Ross as the classic model. Finally, identifying factors based on rm charac-
teristics with the famous the three-factor model of Fama and French (1993) denes the
empirical approach to facto selection.

4.5.0.1 Style Investment: Quality and Momentum

Empirical observations of some liquid trading strategies, all dierent from investing in
the broad market, can show empirical time-averaged persistent return patterns
in the data. The patterns are based on grouping the assets in so-called styles or factors.
The Fama-French factors value and size are examples. The factors capture rm charac-
teristics such as valuation ratios derived from the balance sheets and income statements
or market parameters such as volatility. A characteristic is then mapped into a tradeable
liquid strategy in a long-only or a long-short (market neutral) combination. Figure 4.15
illustrates the decomposition of risk premia into traditional and alternative ones. We
stress that the dierent premia have common risk sources, i.e. they are not orthogonal
to each other.

4.5.0.2 Quality Premium

We consider the quality risk factor (EQ Quality), see Figure 4.16, for all stocks in the
MSCI Europe. One calculates on a monthly basis rm specic gures such as protability,
net prot or degree of indebtedness. This allows to calculate the quality gure (Q-
gure) for all rms. To consider the sector structure, the Q-gure is normalized by
using the average sector Q-gure and the sector volatility. This denes the Q-score - the
characteristic. Ranking these scores one observes that on average those rms with a high
356 CHAPTER 4. PORTFOLIO CONSTRUCTION

Risk Premia

Traditional Premia Alternative Risk Premia

Grouping
Interest
of Assets Equities Credit Currencies Commodities
Rates
Carry EQ Dividends IR Carry CR Carry HY FX Global Carry CO Carry
EQ Merger Diversified vs. IG (Curve)
Arb IR Muni/Libor

Value EQ Value FX Value CO Value

Equi- Interest Real Volatility EQ Glob Vol IR Vol FX Vol Basket CO Vol
ties Rates Estate Divers.
EQ Mean FX Vol Single
Reversion CO Vol Single

Momen- EQ Moment. IR Moment. CR Moment. FX Moment. CO Trend

tum CO
Momentum

Idiosyn- EQ Low Beta

cratic EQ Quality

Covered by Global Market Only weakly covered by Global Market Portfolio and Factors driving the Premia Orthogonal
Portfolio (Assumptions)

Figure 4.15: Overview.

score led to a larger return than those with a lower score. This empirical feature called
EQ Quality. If one believes that this historical return pattern will continue to hold on
average in the future, one can invest in such a strategy. A long-short EQ implementation
removes directional risks. There are institutional investors which do not want or do not
can invest in long-short vehicles. But investing long only in a risk premia is not market
neutral. Market neutrality is lost and correlation between risk premia and between
traditional asset classes moves signicantly away from a weak correlation structure. But
a long-short strategy is not free of risk, see the momentum crash below. Factor investing
has emerged as the new paradigm among sophisticated institutional investors. A large
body of literature suggests that shorting is dicult to implement. Therefore, institutional
investors often prefer long-only approaches since they are also less exposed to liquidity
risk, have greater capacity, and do not require the use of leverage or derivatives. The
producer oer the risk premia products as fully transparent indices. Dierent wrappers
are used for risk premia investment - UCITS funds, ETFs or structured notes.

4.5.0.3 Momentum Factor

The idea is to extrapolate past performance into the future by 'buying the past win-
ners (long) and selling the past losers (short)', see Figure 4.17. Daniel and Moskowitz
(2012) consider a time series from 1932 to 2011 using international equities; there are
27 commodities, 9 currencies, and 10 government bonds in their data set. They nd
that in the period past WWII through 2008, the long/short equity momentum strategy
had an average return of 16.5 percent per year, a negative correlation (beta) with the
4.5. FACTOR MODELS 357

Monthly Quality Figure Final selction

MSCI Europe Normalization of Q-Figure
calculated such as liquidity,
(Q-Figure) borrowing costs
company
𝑄𝐾 − 𝑄𝐾𝑆𝑒𝑐𝑡𝑜𝑟
figures 𝑄𝑠𝑐𝑜𝑟𝑒 =
𝜎𝑆𝑒𝑐𝑡𝑜𝑟

Quality Ran
Stock Stock Q-Score 20 % highest Q-
Figure k
Score
y Long Position
StockA 2.5 Stock
1 3.0
C
StockB
1.6
2 Stock F 2.95
StockC
Historical
3.8 Stock
3 2.93 return
y A
y StockD 0.1
Stock
4 2.91 20 % lowest
StockE Z
2.0 Q-Score
Stock Short Position
5 2.86
… … S
… …
… … Historical
n Stock -3.0 return ARP Strategy
B

Figure 4.16: Construction of the risk factor quality.

market of −0.125, and an annualized Sharpe ratio of 0.82. They document that mo-
mentum is pervasive for equities, currencies, commodities, and futures. The maximum
monthly momentum return was 26.1% and that the worst ve monthly returns were
−79%, −60%, −46%, −44%, and −42%. Intuitively, the premium is positive if the win-
ner's return is larger than the loser's one. In a momentum crash, past winners will be
future losers and vice versa - you are wrong both the long and short leg of the investment.
This happened in fast market rebounds:

• In June 1932 the market bottomed. In the period July-August 1932, the market
rose by 82 percent. Over these two months, losers outperformed winners by 206
percent.

• In March 2009 the US equity market bottomed. In the following two months, the
market was up by 29 percent, while losers outperformed winners by 149 percent.
Firms in the loser portfolio had fallen by 90 percent or more (such as Citigroup,
Bank of America, Ford, GM). In contrast, the winner portfolio was composed of
defensive or countercyclical rms like AutoZone.

The rationale is simple. Suppose markets are crashing. Then losers already lost in value
before the crash and during the crash they are becoming extremely cheap if one beliefs
that they will not default. Since investors are convinced that markets will recover, the
demand for the losers exceeds the gainers one which leads to the winner-loser reversal.
Byun und Jeon (2018) suggested to adapt the momentum strategy in order to re-
duce the impact of momentum crashes. They considered to observe past returns for 12
358 CHAPTER 4. PORTFOLIO CONSTRUCTION

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov J=3 K=3
Screen Wait Buy / Sell
Formation
Period
Skip 1
Month
Holding
Period

Figure 4.17: We assume that stocks are screened based on their past return over the
last J = 3 months, where also J = 6, 12 months are used. This screening identies
the past winners and losers and denes the formation period. After this identication,
no action is taken for one month. The reason is to lter out any possible erratic price
uctuations in the past winners and losers selection portfolio. Finally, in the holding
period the selected stocks are hold for K = 3 months where again longer holding periods
are possible. Afterwards the positions are closed. This procedure is repeated monthly
leading to an overlapping roll-over portfolio allocation.

months but only invest 1 month and the decision criterion for going long and short is
past cumulated 52 weeks return. Therefore, the authors expect that 52-week high sub-
sumes the predictive power of past 12-month return and investing only for one month
adapts to often seen fast momentum reversals. This mimics that as the market rebound,
investors demand increases on stocks that are far from their 52-week highs. This bias
induces that the 52-week high negatively related with future returns. The authors show
that during the crash periods, stocks far from their 52-week highs outperform stocks near
their 52-week highs.

Figure 4.19 shows the return of investing $1 in 1956 until 2015 in a market factor,
and the styles size, value or momentum.
23
In this long-term view three observations are immediate: Simple grouping of assets
can lead to signicant outperformance of the market return for long periods but there

23 Value eect. Low price-to-book (P/B) stocks (value stocks) typically outperform high P/B stocks
(growth stocks). Size eect. Smaller stocks typically outperform larger stocks.
4.5. FACTOR MODELS 359

Figure 4.18: Long-only momentum strategies. Left panel - momentum strategies 1947-
2007 Right panel - momentum strategies during the GFC (Daniel and Moskowitz [2012]).

Figure 4.19: Investment return of $1 in 1956-2014 in the market, market plus value,
market plus size and market plus momentum factor (Ken French's website).
360 CHAPTER 4. PORTFOLIO CONSTRUCTION

are also short time periods where factor investment can crash. This is most dramatic
seen for the momentum crash during the GFC. Finally, the factors do not seem to be
independent - momentum and markets crash and boom in parallel during the GFC and
the following decade.

Is there a theoretical foundation of styles/risk factors? Can theory dierentiate be-

tween style which are supported in in equilibrium and styles which are fantasies? How
are risk factors identied and turned into tradeable strategies? How can it be that in the
long run simple grouping of assets produce much higher returns persistently, i.e. why
aren't they arbitraged away? Who is on the other side of the trades? We consider these
question in the sequel.

4.5.1 Industry Perspective

A key step for the industry about factor investing were the requirements published in
the Professor's report (2009) written for the Norwegian Government Fund. The authors
required that factors, dierent from the market risk premia, which can explain the cross
section of asset excess returns should:

• have an intellectual foundation (rational or behavioural) [Explainable].

• exhibit signicant premiums which are expected to persist in the future [Persis-
tence].

• be not correlated among themselves and to asset classes in good times and nega-
tively correlated in bad times [Independence].

• be implementable in liquid, tradeable instruments [Liquidity].

The notion of 'good' and 'bad' times is made precise in economic theory by the stochastic
discount factor (SDF).

The nancial industry denes factor investing similar to the Professor's report. Deutsche
Bank [2015] states additional to the above requirements:

• Accessible - risk factors must be accessible at a level of cost that is suciently low
to avoid the dilution of the return.

• Fully transparent - strategies are fully systematic and work within well-dened rules.

• Low cost - a well-dened systematic approach makes possible ecient transactions

costs.

• Flexible access - strategies can be accessed in a variety of formats either funded

or unfunded as a portfolio overlay and in a variety of wrappers (OTC, structured
notes, UCITS funds, etc.).
4.5. FACTOR MODELS 361

Factor investing means alternative strategies dened on liquid assets and not the creation
of new, illiquid asset classes. Transparency radically changed in the last decade. In the
past, an investment bank's oering of a momentum strategy basically was a black box
for the investor. Today, each factor is constructed as an index with a comprehensive
documentation about the index mechanics, risks and governance. Hedge funds often use
factor investing strategies too but often they are not transparent.

4.5.1.1 Industry Oering

We consider the practice of factor oering by large asset managers.
24 The process of
building a risk factor portfolio is as follows (Deutsche Bank [2015]):

• Identify rst the key objectives of the portfolio and the preferences of the investor.

• Start with a long list of potential risk factors and select a core portfolio made up
of the most attractive risk factors. Figure 4.20, shows the cross asset risk factor
list of DB.

• Add any uncorrelated factors if they oer a benet to the portfolio.

• Finalize the list of selected risk factors and construct a portfolio using a risk-parity
methodology.

• Review and test the portfolio against general measures of diversication.

Figure 4.20, upper panel, shows the cross asset risk factor list of DB and some key gures.
Risk and return properties of the dierent risk factors dier. Therefore, if one invests
into a portfolio with a target volatility to control downside risk, leverage is needed because
else combining a low vol 2% interest rate risk premia with a 12% vol equity premia
makes no sense. Figure 4.12 shows monthly correlations. The lower triangular matrix
correlations are calculated for turbulent markets; those for normal markets are shown in
the upper triangular matrix.
25
The correlation for the equally weighted portfolio of risk factors (ARP) implies an
annualized correlation of 4% in normal markets and 5% in stressed ones. The correlation
with traditional asset classes is also low. Correlations between the dierent asset classes
are however much larger. In this sense the risk factors are closely mutually uncorrelated

24 The data are from Deutsche Bank (DB) or JP Morgan (JPM).

25 The following periods dene turbulent markets:
• May 97 to Feb 98 Asian nancial crisis
• Jul 98 to Sep 98 Russian default and collapse of LTCM
• Mar 00 to Mar 01 Dot-com bubble bursts
• Sep 01 to Feb '03 9/11 and market downturn of 2002
• Sep 08 to Mar. 09 US subprime crisis and collapse of Lehman Bros.
• May 10 to Sep 10 European sovereign debt crisis
362 CHAPTER 4. PORTFOLIO CONSTRUCTION

EQ IR CR FX CO
Category
Equities Interest Rates Credit Currencies Commodities
Carry EQ Dividends IR Carry Diversified CR Carry HY vs. IG FX Global Carry CO Carry (Curve)
EQ Merger Arb IR Muni/Libor

Value EQ Value FX Value CO Value

Volatility EQ Glob Vol Carry IR Vol FX Vol Basket CO Vol Divers.

EQ Mean Reversion FX Vol Single CO Vol Single

Momentum EQ Moment. IR Moment. CR Moment. FX Moment. CO Trend
CO Momentum
Idiosyncratic EQ Low Beta
EQ Quality

Annual Returns (dark), Volatilities (light) and Sharpe Ratios (diamonds)

For DB-Factors since Start Date
16.0% 1.8

14.0% 1.6

1.4
12.0%
Return & Volatilität

1.2

Sharpe Ratio
10.0%
1
8.0%
0.8
6.0%
0.6
4.0%
0.4
2.0% 0.2

0.0% 0

IR Vol
EQ Mome

COM Vol
FX Mome

IR Carry (Divers.)

EQ Vol

EQ Carry (Div)
FX Value

FX Carry (Balanced)
FX Carry (G10)

IR Carry (Muni/Libor)

EQ Carry (Merg Arb)

COM Value (Backw.)

CR Mome

CR Carry

IR Mome

COM Carry (Curve)

FX Carry (Global)

COM Mome

EQ Idios. (Low Beta)

COM Carry (Box)

EQ Idios.(Quality)

EQ Value

Figure 4.20: Upper Panel: Risk factor list of DB London. COM Mome (Trend)
Risk factors are grouped
according to their asset class base and the ve styles used by practitioners. Lower Panel:
Average annualized volatilities, returns and Sharpe ratios for the risk factors (DB [2015]).

compared to asset classes. This lower correlation is due to the use oong and short po-
sitions. Short positions give factors the appearance of lower correlations. We discuss in
the next section that it is impossible to produce more ecient portfolios, in sample, by
expressing exposures as factors instead of assets, as long as the investable units are the
same.

Low beta portfolios, that is to say, a portfolio of risk factors should have low correla-
tion to equities and bonds in normal market periods and negative correlation to equity
in turbulent markets, are of particular importance since they promise to resist a joint
market downturn. Suitable risk factors are value and momentum risk factors for all asset
classes, low beta risk factors, quality, and US muni curves vs Libor. The correlation of
this portfolio to equity is −1.6% and to bonds 7.6%. In turbulent markets, correlation to
equity is −37.5% and to bonds 8.8%. The Sharpe ratio is very high and the maximum
drawdown is low, at −5.6%, see Table 4.12.
A deeper analysis of the correlation structure reveals that the risk factors can be
clustered into three broad groups.
4.5. FACTOR MODELS 363

- ARP EQ Bonds Commodities HF Real Estate PE

ARP 5%/4% 10% 4% 7% 16% 8% 9%

EQ 4% - 6% 39% 47% 64% 27%
Bonds 6% 4% - 18% 5% 7% -20%
Commodities 6% 43% 19% - 30% 24% 28%
HF 13% 41% 11% 32% - 29% 40%
Real Estate 4% 66% 9% 34% 28% - 52%
PE 4% 78% -21% 36% 35% 52% -

Table 4.11: The correlations in the top-left cell is the average equally-weighted portfolio
of factos (AP) correlation of a portfolio of all DB risk premia. PE means Private Equity.
In the lower triangular matrix the correlations are calculated for turbulent markets; those
for normal markets appear in the upper triangular matrix (DB [2015]).

Statistics Low beta portfolio

% positive 12m returns 99.5%

IRR 10.7%
Volatility 5.0%
Sharpe ratio=IRR/volatility 2.16
Maximum drawdown -5.6%
IRR/MDD 1.93
Days to recover from MDD 120
Correlation to equity -1.6%
Correlation to bonds 7.6%
Stress correlation to equitie -37.5%
Stress correlation to bonds 8.8%

Table 4.12: Summary statistics for the low beta portfolio (DB [2015]).
364 CHAPTER 4. PORTFOLIO CONSTRUCTION

DB (2015):

• High beta, higher information ratio factors. These factors exhibit high information
ratios but also contain some equity market risk.

• Low beta, stable correlation factors. Factors with moderate correlation levels which
are typically stable.

• Negative beta, lower information ratio factors. Factors that exhibit negative corre-
lations to equity markets.

This observation leads to timed factor portfolio investments, see the literature for details.
We conclude this section by comparing a low volatility portfolio of risk premia of JP
Morgan - the 7.5% target volatility index - with the MSCI world, see Figure 4.21.

Great Financial Crisis European Debt Crisis Stress Q1 2016

JPM 15.86% JPM 13.70% JPM 3.16%
MSCI -42.86% MSCI -11.90% MSCI 0.02%

XRJPBE5E - 7.5% Volatility Target

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Year
2006 -0.48% -2.41% -0.27% 1.30% 1.63% 0.57% 2.67% -0.06% 2.79% 5.78%
2007 2.04% -1.62% 2.13% 3.14% 2.63% -0.34% -0.27% 1.15% 1.59% 2.09% 0.87% 3.40% 18.01%
2008 -1.46% 8.20% 1.35% -2.49% 1.79% 2.19% 1.22% 0.21% -1.10% 3.85% 3.93% 6.26% 25.15%
2008 MSCI 5.31% 1.16% -8.34% -2.72% -2.35% -12.68% -19.91% -6.80% 3.47%
2009 0.15% 2.61% -0.67% -0.34% 2.27% 1.18% 1.59% 0.00% 2.65% 3.10% 1.61% 1.77% 16.20%
2009 MSCI -8.63% -10.02%
2010 1.54% 2.28% 1.41% 1.27% -2.93% 2.45% 0.99% 2.10% 1.47% 3.16% -1.19% 1.87% 15.23%
2011 -0.14% 0.74% -0.90% 4.60% 1.92% -0.53% 1.83% 1.05% 3.91% -0.90% 1.82% 2.51% 16.93%
2011 MSCI 3.86% -2.52% -1.75% -1.73% -7.53% -9.65% 10.62% -3.21%
2012 2.93% 1.65% 0.09% 1.79% 3.49% 2.38% 3.98% -0.72% 2.23% -0.07% 1.99% -0.84% 20.45%
2013 2.44% 4.78% 2.86% 1.44% -2.33% -2.78% 1.17% -2.00% 1.38% 1.43% 2.99% 0.03% 11.70%
2014 -1.85% 1.55% 0.87% 0.37% 0.99% 1.82% 0.05% 1.36% 0.90% -1.79% 3.32% -0.55% 7.12%
2015 0.61% 0.72% 1.24% -0.53% 2.04% -2.01% 3.04% -4.43% 5.19% -0.69% 1.82% -1.57% 5.20%
2016 3.17% 1.36% -1.37% 3.14%
2016 MSCI -6.09% -0.90% 7.16%

Figure 4.21: Top panel. The return of JP Morgan risk premia index and MSCI world
2006-2016. The middle statistics shows the cumulative returns of the two indices for
three stress events. The bottom panel shows the monthly returns of the JPM index and
for the three stress events - GFC, EU debt crisis, Q1 2016 - the returns of the MSCI are
also shown. (JPM [2016]).

The top panel shows that investing world wide diversied did not provided any pos-
itive return in the ten year investment period if the concept of asset diversication is
used. The JPM index, contrary showed an impressive performance. More detailed, the
risk premia performance slope is not the same in the ten years period: After the GFC
2008 until the end of 2012, the returns were largest with very low risk. Then for about
one and a half years there was a stand still period which was followed by a positive return
4.5. FACTOR MODELS 365

period with larger risks - the return chart is more zigzagged than in previous years. If
we compare the performance of the JP Morgan Index with MSCI in three stress periods
- GFC, EU debt crisis and Q1 2016 - we observe that the risk premia index did well
compared to the MSCI in the GFC and the EU debt crisis: The construction mechanics
to be uncorrelated to traditional asset classes in general and negatively correlated in
market stress situations worked. In the Q1 2016 event, things are more complicated.
While the same can be said for Jan and Feb 2016, the March data show that the risk
premia index largely underperformed MSCI. To understand the reason, from an asset
class perspective, there was a sharp and fast rebound of stock markets after the ECB's
president Draghi's speech. This rebound was to fast for the risk premia index' rebal-
ancing quarterly basis rebalancing frequency. Furthermore, the speech of Draghi also
aected credit risk premia in a way which is the exception rather than the rule: The
credit spread tightening was more pronounced for the Itraxx Europe Main index than
for the Crossover index of the same family. This means that risk factors collecting the
credit risk premia generated negative returns since they were wrong both the long and
the short risk premia portfolios. A similar remark applies to interest rate risk premia.

How many risk factors are they? Harvey et al. (2015) use 313 published works and
selected working papers and catalogue 316 risk factors and Hou, Xue, and Zhang (2017)
report in their study 447 factors. It is clear that not hundred of factors will be rewarded,
i.e. they are anomalies. We show in the section backtesting that an appropriate use
of statistical methods rules out most of them. Hou et al.(2017) for example nd that
two-third of the 447 factors are insignicant at the 5 percent level using the usual critical
t-value of two and 85 percent becomes insignicant if a critical value of three is used.

4.5.2 Non-Performance of Alternative Risk Premia

Are alternative risk premia performing? The HFR Bank Systematic Risk Premia Indices
reect the performance of the universe of investible risk premia strategies. The indices
which represent many styles across six asset classes comprise over one thousand individ-
ual component strategies.

Table 4.13 shows for premia indices and for individual premia show that most premia
fail to deliver a promised performance. Essentially, the indices show over the last three
year zero performance. What are the reasons for this underperformance compared to the
promising values of the premia providers in the section before? First, backtesting is not
used correctly, see Section Backtesting. Second, global stock markets closed out their
worst year since the nancial crisis. The equity market has been stoked by concerns of a
slowing global economy, tightening monetary policy and mounting geopolitical tensions
(trade war between the US and China, Brexit). The year on the stock market was
marked by zigzag movements that followed one another quickly. This was an expression
of uncertainty, which was exacerbated by the constant tweets of the US President: One
day he threatened in the trade war and the next he spoke again of great achievements.
The long-short factor portfolios could not follow this rapid change; investors were often
366 CHAPTER 4. PORTFOLIO CONSTRUCTION

Name Premia(index) YTD LAST 36M

Risk Premia Commodity Index -11.72% -2.95%
Risk Premia Credit Index -13.00% 5.70%
Risk Premia Currency Index -3.98% -1.34%
Risk Premia Equity Index -16.29% -2.18%
Risk Premia Multi-Asset Index -20.81% 0.02%
Risk Premia Rates Index -2.93% -3.46%
Risk Premia Commodity Carry Index -6.99% 1.19%
Risk Premia Commodity Volatility Index -16.52% -9.89%
Risk Premia Credit Momentum Index -26.87% -3.12%
Risk Premia Credit Multi-Style Index -4.30% 20.29%
Risk Premia Currency Momentum Index -6.08% -6.65%
Risk Premia Currency Value Index -5.43% 6.50%
Risk Premia Equity Size Index -25.10% -11.85%
Risk Premia Equity Smart Beta Index -13.66% 5.73%
Risk Premia Equity Value Index -11.81% 0.03%
Risk Premia Equity Volatility Index -28.59% -5.38%
Risk Premia Multi-Asset Momentum Index -17.83% 3.19%
Risk Premia Multi-Asset Volatility Index -35.20% -31.94%
Risk Premia Rates Carry Index -5.80% -8.52%
Risk Premia Rates Volatility Index -5.72% 4.28%
Risk Premia Alternative Income Index -0.47% 4.99%
Risk Premia Risk Mitigation Index -6.91% -4.55%

Table 4.13: Performance of risk premia YTD (12. 12. 2018) and for the last three years.
In the upper part risk premia indices' performances are shown. Below, I selected the
best and worst performing individual risk premia for the six asset classes on the three
year basis. (Source: HFR Database,(2018)).
4.5. FACTOR MODELS 367

wrongly positioned on both sides.

4.5.3 A Critical Review of the Industry Perspective

The allocation of portfolios to factors rather than to assets was fashionable in the last
years. A main motivation, as we showed in last section, is that factors are less correlated
with each other than assets: Factors allow for better diversication. Since ultimately
any portfolio must be invested in assets it is impossible to produce a better in-sample
portfolio by describing the portfolio as a set of factors than assets. But factor over asset
stratication can be meaningful if several hypothesis hold true. If factor predictability is
larger than an asset one a factor approach makes sense. But such an EMH-type hypoth-
esis do not exist. Other statistical facts could make factors more favourable. Investors
rely on long samples of historical data to forecast returns , standard deviations, and cor-
relations over shorter future periods for investment. These forecasts are subject to three
sources of estimation error. We follow Cocoma et al. (2017). Third, independent-sample
error arises when known parameters from one sample are projected onto a separate,
independent sample.

• Interval error, i.e. say monthly estimates on the large past sample deviate from
annual forecast in the future period.

• Small-sample error, i.e. estimates on the large past sample can dier from estimates
on the smaller future sample.

• Stationarity between large past and smaller future samples can be dierent.

• Reducing the dimensionality of the set of assets to a smaller set of factors reduces
noise more eectively than reducing dimensionality to a smaller set of assets.

In all four cases the prediction errors for the future investment could be more reliable
for factors than for assets. If this is the case, then factor allocation dominates asset
allocation.
Before we consider these issues, we comment on the widely documented fact that the
pairwise correlations among risk factors are often lower than those among asset classes.
Does this imply that risk factors are superior to asset classes? Sources are Idzorek and
Kowara (2013) and Martellini and Milhau (2015). Idzorek and Kowara (2013) rst pro-
vide an answer in an idealized world where the number of risk factors is equal to the
number of asset classes where unconstrained mean variance optimization is considered.
The same dimensionality of asset classes and risk factors implies a one-to-one relationship
and then with no surprise, returns are the same.

The authors then consider a real world example. They focus on liquid US asset
classes and risk factors. The number of risk factors (eight) is not equal to the number of
asset classes (seven). The data set are monthly data starting Jan 79 until Dec 11. They
rst conrm that the average pairwise correlation for risk factors 0.06 is smaller than for
asset classes 0.38. Besides long-short for factors another main reason is that the market
368 CHAPTER 4. PORTFOLIO CONSTRUCTION

portfolio is part of the asset classes but not of the risk factors. The authors then consider
two dierent time horizons to derive the optimal allocations: The full time series and in
the second case Jan 02 to Dec 11.

The risk factor weights dene a lower dimensional space that the asset classes weights
since there are more constraints for the long-short risk factors. This lower dimensionality
seems favouring the asset classes. But it is in fact not possible to state which opportunity
set is larger since the exposure of risk factors can be −100% compared to asset classes
which are long only. Summarizing, the opportunity sets are complex large dimensional
spaces and it is not possible to nd out in general which set is larger. Since an ecient
frontier dominates another one if and only if assets dene the opportunity set, both sets
of frontiers are subject to the same constraints, and the results are shown in the same
return units as the inputs, it is not clear which optimal asset allocation - assets or factors
- dominates.

Figure 4.22: Optimal asset classes versus optimal risk factors. Left panel: Long time
series. Right panel: Short time series. The US asset classes are large value stocks, large
growth stocks, small value stocks, small growth stocks, Treasuries, mortgage backed
assets, credit and cash. The risk factors are market, size, value, mortgage spread, term
spread, credit spread and cash (Idzorek and Kowara [2013]).

The results indicate that by cherry picking a particular historical time period, almost
any desired result can be found. This illustrates that there is nothing obvious about the
superiority of asset allocation based on risk factors. This result does not depend on the
fact that historical data are used. Idzorek and Kowara (2013).
4.5. FACTOR MODELS 369

The interval error arises if the assumption of analysts that the square-root rule apples
between one month past estimates and longer periodicities is not true. That is lagged
auto-correlations are zero which evidence reveals to be wrong. The standard deviation
−1
NP
of the cumulative continuous returns of x over N periods, Rt,N = Rt+N , reads
N =0
v
u
u N
X −1
σ(Rt,N ) = σ(Rt,1 ) N + 2
t (N − m)ρt,t+m . (4.58)
m=1

If auto-correlations is zero, the square-root rule follows. A similar formula holds for
the correlation between the cumulative returns over N periods and the one-period case.
Again, the longer-interval correlations will dier from shorter-interval correlations due
to the auto-correlations. Errors due to non-zero lagged correlations, the interval error
IE , are dened as the absolute dierence between the parameter estimate R1 using the
full-sample, one-month returns, to the full-sample, three-year rolling returns RR scaled
by the full-sample standard deviation of three-year returns:

|R1 − RR |
IE = .
σR
. An IE of 0.3 means that the parameter value estimated from monthly returns is 0.3
standardized units away from the parameter value estimated from three-year returns,
expressed in monthly units. We refer to Cocoma et al. (2017) for the denitions of the
small-sample error (SSE) and the independent-sample error (ISE).

The authors distinguish between two types of assets: asset classes and industry group-
ings and three types of factors: fundamental factors, security attributes, and statistical
factors derived from principal components analysis. We refer to their paper for the con-
struction of factors, the asset selection and the various tests.

They summarize that no evidence was found that factors produce more stable results
for IE, SSE and IS than assets across varying frequencies. On the contrary, they found
evidence of the opposite, on average. The same conclusion follows when comparing the
complexity reduction of the asset set to factor versus the reduction to a smaller asset set.
No evidence was found that factors are meaningfully more eective than assets at noise
reduction.

4.5.4 The CAPM as a Beta Pricing Model

The capital asset pricing model (CAPM) is an exact one-factor beta pricing model.
We start with linear times series regression for the asset returns. Consider for a stock
i with return Rt,i , Rt,f the risk-free rate, and Rt,M the return of a broad market index
the linear regression

Rt,i − Rt,f = αi + βi,M (Rt,M − Rt,f ) + t (4.59)

370 CHAPTER 4. PORTFOLIO CONSTRUCTION

where α is the intercept, βi,M the slope or regression coecient, and t the standard nor-
mal error term satisfying (2.2). Beta measures the unit changes in stock excess return
for every unit change in market excess return. The intercept indicates the performance
of the stock that is not related to the market and that a portfolio manager attributes to
her skills.

Example

For both regression coecients α and β condence intervals can be determined

using the estimated parameter value, the standard error of the estimate (SEE), the
signicance level for the t-distribution and the degrees of freedom. For the β condence
interval, β ± tc × SEE, where β is the estimated value and tc the critical t-value at the
chosen signicance level.

Consider the linear regression between an European equity fund's returns (dependent
variable) and the EUROSTOXX 50 index (independent variable). Statistical analysis
implies for 20 observation dates the estimates β = 1.18, SEE 18 = 20 − 2
= 0.147 and
degrees of freedom. The Student's t-distribution at the 0.05 signicance level with 18
degrees of freedom is 2.101. This implies the condence interval 1.18±(0.147)∗(2.101) =
0.87, 1.49. There is only a 5 percent chance that β is either less 0.87 or greater than 1.49.
There is a 95% condence that this fund is at least 87% as volatile as the S&P 500, but
no more than 149% as volatile, based on our ve-year sample.

We relate this empirical approach to the unconditional equilibrium asset pricing

model CAPM. The CAPM states that within the model the following exact cross-
section relation has to hold (deleting time indices):
E(Ri ) − Rf = βi,M (E(RM ) − Rf ) =: βi,M F . (4.60)

The risk premium of the asset i is E(Ri ) − Rf and the market portfolio risk factor is
F = E(RM ) − Rf . The CAPM states that some assets have higher average returns than
other ones but it is not about predicting returns. An asset has a higher expected return
because of a large beta and not the other way around. Furthermore, projection theory
implies that the beta is the projection coecient:

cov(Ri , RM )
βi,M = . (4.61)
σ 2 (RM )

Summarizing, the time series regression (4.59) xed the β which enters the CAPM
model (4.60) which predicts that alpha should be zero.

The linear relation (4.60) in the CAPM between the excess return of an asset and
the market excess return follows from the following assumptions:
4.5. FACTOR MODELS 371

• Investors act competitively, optimal, have a one-period investment horizon and

there are many investors with small individual endowments. Hence, they cannot
inuence prices and are so-called price takers.

• All investors have mean-variance preferences.

• All investors have the same beliefs about the future security values.

• Investors can borrow and lend at the risk-free rate, short any asset, and hold any
fraction of an asset.

• There is a risk-free asset in zero net supply. Since markets clear in equilibrium,
total supply has to equal total demand. Given the net supply of the risk-free asset,
we combine the investor's portfolios to get a market portfolio. This will imply that
the optimal risky portfolio for each investor is the same.

• All information is accessible to all investors at the same time to all investors - there
is no insider information.

• Markets are perfect: There are no frictions such as transaction costs or lending or
borrowing costs, no taxes, etc.

The proposition summarizes:

Proposition 79. (CAPM) Under the above assumptions:

• Each investor is investing in the risk-less asset and the tangency portfolio.

• The tangency portfolio is the market portfolio.

• All investors hold the same portfolio of risky securities.

• For each title i, the linear relationship between risk and return (the security market
line [SML]) (4.60) holds:

E(Ri ) − Rf = βi,M (E(RM ) − Rf ) (4.62)

with the beta given in (4.61) measuring the risk between asset i and the market
portfolio M .

The SML implies that beta measures how systematic risk is rewarded in the CAPM,
there is no idiosyncratic risk entering the SML.
26 There is no reward, via a high expected
rate of return, for taking on risk that can be diversied away. A higher beta value does
not imply a higher variance, but a higher expected return.
27

26 If an asset i uncorrelated with the market, its beta is zero although the volatility of the asset may
be arbitrarily large.
27 β = 1 implies E(Ri ) = E(RM ), β = 0 implies E(Ri ) = Rf and β < 0 implies E(Ri ) < Rf .
372 CHAPTER 4. PORTFOLIO CONSTRUCTION

The behavioural assumptions that all investors consider the same mean-standard de-
viation chart implies that all possess a mean-variance ecient portfolio. By the mutual
fund theorem, each minimum variance portfolio is a combination of a risk less asset and
a xed risky asset portfolio. Therefore, all investors invest in all risky assets in the same
proportions. Since demand equals supply in the asset market equilibrium, all investors
must hold the market portfolio which in turn is mean-variance ecient. Therefore, no
investor needs to perform a mean-variance analysis but just invest in the market portfolio.

The linearity of (4.62) implies that the portfolio beta is the sum of asset betas mul-
tiplied by the portfolio weights. In the CAPM, all optimal portfolios are a combination
of the risk-free portfolio and the market portfolio. Tobin's separation states how indi-
vidually tailored portfolios can be constructed. First, the portfolio manager constructs
the risk free and market portfolio. Then, investment advisor determines his risk prole
which xes the optimal allocation between risk-free and risk investments.
Inserting cov(Ri , RM ) = ρ(i, M )σk σM in (4.62) implies for the Sharpe ratio

µk − Rf µM − Rf
SRk := = ρ(k, M ) . (4.63)
σk σM

The Sharpe ratio of asset k is equal to the slope of the CML times the correlation
coecient. Comparing SML and CML, see Figure 4.23, all portfolios lie on the SML but
only ecient portfolios lie on the CML.
28 Finally, SML plots rewards vs systematic risk
while CML plots rewards vs total risk.
Consider three risky assets A, B , and C and 3 investors with capital of 250, 300, and
500, respectively, who have the following portfolios:

Investor Risk-less asset A B C

1 50 50 50 100
2 -150 150 200 100
3 100 75 75 250
Market Cap. 1,050 0 275 325 450

Table 4.14: CAPM.

10 050, the tangency portfolio follows from the Markowitz

Market capitalization is then
model φT = (0.2619, 0.3095, 0.4286) and the market portfolio is φM = (275/1050, 325/1050, 450/1050),
i.e. the two portfolio are equal.

Consider three risky assets, the market portfolio, and a risk-free asset given by the
data in Table 4.17 (taken form Kwok (2010)):
The CML implies, at the standard deviation levels 10 percent and 20 percent, respec-
tively, expected returns of 13 percent and 16 percent. Therefore portfolio 1 is ecient,
28 A portfolio lies on both the SML and CML if the correlation between the portfolio return and the
market portfolio is 1.
4.5. FACTOR MODELS 373

Figure 4.23: Left panel - capital market line in the Markowitz model. Right panel -
security market line in the CAPM model. Assume that the borrowing and lending rate
are dierent. Draw the CML for these two rates.

but the other two portfolios are not. Portfolio 1 is perfectly correlated with the market
portfolio but the other two portfolio have non-zero idiosyncratic risk. Since portfolio 2
has a correlation closer to one it lies closer to the CML. The expected rates of return
of the portfolios for the given values of beta, calculated with the SML, agree with the
expected returns in the tabl:

µ = µf + (µM − µf )β = 13%,

i.e. there is no mis-pricing.

Portfolio σ ρ with RM β µ
1 10% 1 0.5 13%
2 20% 0.9 0.9 15.4%
3 20% 0.5 0.5 13%
Market portfolio 20% 1 1 16%
Risk-free asset 0% 0 0 10%

Table 4.15: Asset pricing in the CAPM.

374 CHAPTER 4. PORTFOLIO CONSTRUCTION

4.5.4.1 Performance Measurement

We considered several Risk-Reward-Ratios (RR) in Section ??. The ratios mostly dif-
fered in the risk measurement. The popular Sharpe and Treynor ratios are not monotonic
RR in the sense that they guarantee that more return is better than less. Despite this
weakness, the Sharpe ratio encourages diversication: It can rank portfolios, portfolio
managers, fund, identify poorly diversied portfolios and too high charged funds. Which
measure should one choose if the portfolio is less diversied? Jensen's alpha, the ap-
praisal ratio, and the Treynor ratio are used. They are all based on the SML while the
Sharpe ratio is based on the CML.

Jensen's alpha
αk := µk − Rf − βk (µM − Rf ) (4.64)

is a performance measurement between the realized and theoretical returns of the CAPM.
Since alpha is a return it should be used for the compensation of portfolio managers.
While the Sharpe ratio can be illustrated in the return-volatility space, Jensen's alpha
is shown in the return-beta space. Jensen's alpha measures how far above the SML the
asset's performance is. It does not considers the systematic risk that an investment took
on earning alpha. The Treynor Ratio measurement (TR) adjusts for this systematic risk
taken:
µk − Rf
TRk := .
βk
The TR equals the slope of the SML for the actively managed portfolio. If the CAPM
holds, then the Treynor ratio is the same for all securities. Both, the Jensen and Treynor
measurements do not adjust for idiosyncratic risk in the portfolio.

The appraisal ratio (AR) or information ratio (IR) divides the excess return over
the benchmark by the tracking error (TE).

Values of the IR around0.5 are considered to be good values while a value greater
than 1 is extraordinary. The IR generalizes the Sharpe ratio since it substitutes the
passive benchmarks for the risk-free rate.

Example - Performance Measurement

We calculate the dierent ratios for the data in 4.17.

The beta of A is equal to its market portfolio correlation times its volatility divided
0.9×15%/20% = 0.675. The Sharpe ratio for A is SR =
by the market volatility - that is,
(12% − 4%)/15% = 0.53. Jensen's alpha for portfolio A reads 12% − 4% − 0.675(15% −
4%) = 0.575% and the Treynor ratio for A is given by (12% − 4%)/0.675 = 0.119. The
IR and the TE follow in the same way. We nally get:
4.5. FACTOR MODELS 375

Portfolios Return Volatility Correlation with market

A 12% 15% 0.9
B 16% 24% 0.94
C 18% 17% 0.98
Market 15% 20% -
Risk-free rate 4% - -

Table 4.16: Data set for the performance ratios.

Portfolio Beta TE SR Jensen TR IR

A 0.675 9.22% 0.53 0.58% 0.119 0.062
B 1.128 8.58% 0.5 -0.41% 0.106 -0.048
C 0.833 4.75% 0.84 4.84% 0.168 1.017
Market 1 0% 0.55 0% 0.11 -

Table 4.17: CAPM.

It follows that portfolio C is the best portfolio. We summarize the relevance of the
dierent performance measurements:

• Beta is relevant if the individual risk contribution of a security to the portfolio risk
is considered.

• TE is relevant for risk budgeting issues and risk control of the portfolio manager
relative to a benchmark.

• The Sharpe ratio is relevant if return compensation relative to total portfolio risk
is considered.

• Jensen's alpha is the maximum amount one should pay an active manager.

• Treynor measurement should be used when one adds an actively managed portfolio,
besides the many yet existing actively managed one, to a passive portfolio.

• The information ratio measures the risk-adjusted return in active management.

It is frequently used by investors to set portfolio constraints or objectives for their
managers, such as tracking risk limits or attaining a minimum information ratio.
Grinold and Kahn (2000).

Warnings: If return distributions are not normal since they show fatter tails, higher
peaks, or skewness, then the use of these ratios can be problematic, since higher moments
than the second one contribute to risk. Furthermore, the IR depends on the chosen time
period and benchmark index. Finally, the chosen benchmark index aects all benchmark-
based ratios: Managers benchmarked against the S&P 500 Index have lower IR than
376 CHAPTER 4. PORTFOLIO CONSTRUCTION

managers benchmarked against the Russell 1000 Index [Goodwin (2009)].

4.5.4.2 Testing the CAPM

The CAPM triggered an enormous econometric literature that addresses the verication
of (4.60). Although Black, already in 1972, veried that the risk premia are not pro-
portional to their beta, it took many more years and much more academic writing for
a majority of researchers to accept the non-empirical evidence of (4.60). The many as-
sumptions of the CAPM are the cause of the empirical failure of the CAPM. The CAPM
can, for example, not explain the size or value eect. The CAPM on average explains
only 80 percent of portfolio returns. One needs more factors than just the covariance
between the asset return and the return on the market portfolio. The classic papers
are Black et al. (1972), Fama and MacBeth (1973) and for a review about cross-section
regression Goyal (2012).

Standard assumptions for testing CAPM are rational expections, i.e. in particular
realized returns are a proxy for expected theoretical returns and that the holding period
of assets is known, typically one month.The CAPM equation which should be tested

E(Ri ) − Rf = βi,M (E(RM ) − Rf ) (4.65)

raises several questions: Are beta's stable measures of systematic risk? Are the expected
returns linearly related to the betas (Q1)? Is beta the only systematic risk measure (Q2)?
Does the expected return of the market portfolio exceeds the expected return of assets
uncorrelated to the markets (Q3)? Finally, do assets uncorrelated to the market portfolio
have the risk-free rate return (Q4)? There two linear test of the CAPM equation. Once
the returns of dierent assets are regressed over the betas (cross-section, Q2, Q3) and
once the CAPM equation for each individual asset over time is regressed (time-series, Q4).

The cross-sectional regression is used to test the CAPM equation. over a period
T years. Since expected returns are not measurable, the CAPM equation is tested for
average annual realized returns. The temporal individual asset test using time-series
regression tests the CAPM on a number of xed sub-periods up to time T: Excess asset
return is regressed over the excess market return in each sub-period.
Using the time series regression equation,

Rt,k − Rt,f = αk + βk,M (RM,t − Rt,f ) + t .

to estimate alpha, beta and epsilon, it follows that. The estimates of beta βb are volatile
both for stocks and for sectors; see Figure 4.24.
Since the CAPM is only interesting for portfolios with beta the signicant risk mea-
sure, an application to single securities does not make sense.

We consider tests of the CAPM where we restrict to three key papers. The beta
instability led to CAPM test for portfolios only. The rst one is the paper of Black,
4.5. FACTOR MODELS 377

Figure 4.24: Beta estimates for AT&T (left panel) and the oil industry (right panel)
(Papanikolaou [2005]).

Jensen and Scholes (1972).

29 It starts with the observation that zero-beta assets earned
more than the risk-free rate and that the beta premium was lower than the market excess
return. This violates Q3. The authors support these ndings and postulated a modied
version of the CAPM, the zero-beta CAPM: That is it accommodates zero-beta returns
above the risk-free rate or in other words, all investors can lend and borrow any amount
of money at the risk-free rate.They formed ten portfolios ordered from highest to lowest
beta securities. Their time series regression nds that high-beta (low-beta) portfolios
consistently show negative (positive) alphas which violates Q4. In the cross-section, the
regression line has a atter slope than the SML and an intercept which is signicantly
greater than zero. But linearity of between return and beta is conrmed.

Consider the cross-section where the factors F are non-traded portfolios with an ex-
isting risk-free rate. Using the time series regression, estimates of factor risk premia and
pricing errors can be obtained. But in the cross-section the estimation is simplied us-
ing the two-pass regressions. First, betas are estimated from the time-series regressions,

29 Consider a period of say 50 years, i.e. 600 months. Use say 60 months to estimate the beta for each
stock (pass one). Rank the securities by the estimated betas and form ten portfolios. Recalculate the
betas for the next ve years, and so on which denes a rolling regression. We then have monthly returns
for the time period minus ve years for each portfolio. Calculate mean portfolio returns and estimate
the beta coecient for each of the 10 portfolios. This provides beta estimates for the portfolios. Do pass
2 for the portfolios, i.e. regress the portfolio means against portfolio betas, that is estimate the ex-post
SML.
378 CHAPTER 4. PORTFOLIO CONSTRUCTION

and then a cross-sectional regression of average returns on betas follows. That is the
estimated betas are in the second step the explanatory variables.
30 The pricing errors
are given by the cross sectional residuals α̃. The estimates of the cross-section can be
obtained by OLS or by using GLS more ecient estimates follow since the cross-section
residuals are correlated. The betas in the second-pass CSR are time series estimates
which leads to the problem of errors-in-variables. Shanken (1992) showed how to correct
the standard errors of the risk premium and pricing error estimates. The predictions
of CAPM are that alpha is zero, lambda is equal to the market premia, that any other
variables are zero. Typically, alpha is estimated to be positive, lambda is positive but
smaller than the market premium and other factors are not rejected.

So far all results are under the assumption of a small number of assets and the
estimators are time horizon T consistent. If the number of assets increases for a xed T,
the error-in-variables problem also leads to biased and inconsistent coecient estimates.
Shanken (1992b) derives an estimator that is N -consistent. Finally, Gagliardini et al.
(2011) explore the properties of these estimators under both T, N → ∞.
Suppose that the R2 is large in the cross-sectional CAPM equation (4.60). The
CAPM then explains the cross-section of average returns successfully and the alpha in
cross-section is small. This can be the case even if the R2 of the time series regression
(4.59) is low. The main goal of the CAPM is to see whether high average returns in the
cross-section are associated with high values of the factors.

Summarizing, ndings are that excess returns on high-beta stocks are low, that excess
returns are high for small stocks and that value stocks have high returns despite low betas
while momentum stocks have high returns and low betas. The CAPM does not explain
why in the past rms with high B/M ratios outperformed rms with low B/M ratios
(value premium), or why stocks with high returns during the previous year continue to
outperform those with low past returns (momentum premium). Despite these ndings,
the CAPM is used for guring out the appropriate compensation for risk, is used as a
benchmark model for other models, and is elegantly simple and intuitive.

4.5.4.3 Conditional CAPM

Some researchers assumed that the poor empirical performance of the CAPM could be
due to its assumption of constant conditional moments but that a conditional CAPM
will possess a better empirical performance. They therefore model explicitly the time
varying conditional distribution of returns as a function of lagged state variables.

The conditional CAPM works as follows. Consider two stocks. Suppose that the

30 Formally for an arbitrary factor F :

Time Series : Re,t,i = αi + βi Ft + t ⇒ βbi ⇒ Cross-Section : Re,i = βbi λ + α̃i . (4.66)
4.5. FACTOR MODELS 379

times of recessions and expansions are not of equal length in an economy, that the mar-
ket risk premia are dierent and that the two stocks have dierent betas in the dierent
periods. The CAPM then observes only the average beta for each stock for both periods.
Assume that this beta is 1 for both stocks. Therefore, the CAPM will predict the same
excess return for the two stocks. But in reality the two stocks will show due to their
heterogeneity dierent returns for the two dierent economic periods. One stock can for
example earn higher return than explained by the CAPM since its risk exposure increases
in recessions, when bearing risk is painful, and decreases in expansions. Therefore such a
stock is riskier than the CAPM suggests and the CAPM would detect an abnormal high
return suggesting this is a good investment. The conditional CAPM corrects this since
return comes from bearing the extra risk of undesirable beta changes.

Lewellen and Nagel (2006) did not questioned the fact that betas vary considerably
over time. But they provide evidence that betas d o not vary enough over time to
explain large unconditional pricing errors. As a result, the performance of the conditional
CAPM is similarly poor as the unconditional model: It is unlikely that the conditional
CAPM can explain asset-pricing characteristics like book-to-market and momentum.
These statistical criticisms are not unique to the CAPM. Most asset pricing models are
rejected in tests with power.

4.5.5 Factor Investing: 3-Factor Model of Fama and French

The non-zero alpha in the CAPM and the non-vanishing of factors dierent than the
market premia led Fama and French to add additional factors value (HML) and size
(SMB) in the 90s. They sorted stocks in into ve market cap and ve book-to-market
equity (B/M) groups at a specic date, which leads to 25 portfolios. The sorted portfolios
scatter around the CAPM line. The interpolated line between the sorted portfolios is
too at compared to the CAPM line: stocks with low B/M should provide high average
returns and high betas but the betas are not small for high expected return. They even
have the wrong sign - betas are lower for higher return securities. This observation led
FF to introduce the two new factors and state the exact beta pricing relation:

E(Ri ) − Rf = βi,M (E(RM ) − Rf ) + βi,SM B E(RSM B ) + βi,HM L E(RHM L ) . (4.67)

While the CAPM has a theoretical foundation, the FF model is an ad hoc model in-
troduced to better t empirical data. The three factor model is routinely included in
empirical research.

We follow Kenneth French's web site for the FF factor construction. The factors are
constructed using the six value-weighted portfolios formed on size and book-to-market.

• SMB (small minus big) is the average return on the three small portfolios minus
380 CHAPTER 4. PORTFOLIO CONSTRUCTION

the average return on the three big portfolios

1
SMB = (Small Value + Small Neutral + Small Growth) (4.68)
3
1
− (Big Value + Big Neutral + Big Growth) .
3

• HML (high minus low) is the average return on the two value portfolios minus the
average return on the two growth portfolios
1 1
HML = (Small Value + Big Value) − (Small Growth + Big Growth) .
2 2

• Whether a stock belongs to, say, Small Value depends on its ranking. Small Value
contains all stocks where the market value of the stock is smaller than the median
market value, say, of the NYSE and where the book-to-market ratio is smaller than
the 30 percent percentile book-to-market ratio of NYSE stocks.
• SMB for July of year t to June of t + 1 includes all NYSE, AMEX, and NASDAQ
stocks for which there exist market equity data for December of t − 1 and June of
t, and (positive) book equity data for t − 1.
Why should one include factors which cannot explain average returns? The CAPM
worked until stocks were grouped by their book-to-market ratio (value) but it still works
when stocks are grouped according to their size. If FF were only to consider factors
which explain the average returns then they could left them out. But size is important
for return variance reduction.

To see this work, assume that the CAPM is perfect. Then,

E(Rk ) = βk E(RM )

where we set the risk free rate to zero. Include an additional industry portfolio in the
regression, i.e.
Rt,k = αk + βk,M Rt,M + βk,I Rt,I + t,k .
The regression generically leads to a coecient βk,I > 0 and taking expectations:

E(Rt,k ) = αk + βk,M E(Rt,M ) + βk,I E(Rt,I ) .

This additional industry portfolio return contradicts that the CAPM is perfect. To
resolve the puzzle, one uses a nested projection approach: First project the industry
portfolio on the market return:

Rt,I = αI + βI,M Rt,M + t,I .

If the CAPM is right, the industry alpha is zero and

E(Rt,I ) = βI,M E(Rt,M ) .

4.5. FACTOR MODELS 381

Then orthogonalize the industry return, i.e.:

∗
Rt,I := Rt,I − E(Rt,I ) = Rt,I − βI,M Rt,M .

This is equivalent to beta-hedge the portfolio. The expected value of the new return is
zero if the CAPM is right. Run a regression on this orthogonality-adjusted CAPM. This
improves the R2 , the t-statistics and the volatility of the residual while the mean of the
CAPM is unchanged.

Considering dierent portfolios, the CAPM-R

2 statistics for the increased for dierent

portfolios from 78 percent to 93 percent in the FF portfolios. Roncalli (2013) states that
the
2
improvement in the R is not uniform:

• The dierence in R2 between the FF and the CAPM is between 18 percent and 23
percent in the period 1995-1999.

• This dierence is around 30 percent during 2000 and 2004.

• The dierence then decreases and is around 11 percent during the GFC.

• In the period starting after the GFC and running until 2013 the dierence is 7
percent.

• SMB and HML explain the variation of returns across stocks; the market factor
explains why stock returns are on average higher than the risk-free rate.

Are the FF factors global or country specic? Grin (2002) concludes that the FF
model exhibits its best performance on a country-specic basis. This view is largely
accepted. While FF performed originally regressions on portfolios of stocks, Huij and
Verbeek (2009) and Cazalet and Roncalli (2014) provide evidence that mutual fund re-
turns are more reliable than stock returns transaction costs, trade impact, and trading
restrictions are of less impact.

Figure 4.25 illustrates the dierent FF factors' performance since 1991 and momen-
tum factor. The size factor only generates low returns. This is the reason why most
risk premia providers do not oer size risk premia.
31 Cyclicality is common to most risk
factors. Some factors show persistent excess risk-adjusted returns over long time peri-
ods but over shorter horizons they show cyclical behavior with underperformance. Ang
(2013) argues that the premia exist to reward long-horizon investors for bearing that risk.
FF (1993) tested their model in the period 1963-1991. They rejected the assertion that
all intercepts from the regression of excess stock returns on excess market return, SMB

31 The gure shows periods with momentum crashes. Heavy monthly losses occurred during the Great
Depression. The risk factor faced losses of up to 50 percent in one month. The risk factor performed
much better in the post WWII period until the burst of the dot-com bubble. In this period, investing
USD 100, say, in 1945 led to a payback of USD 3,500 around 50 years later. The average monthly return
over the whole period is 0.67 percent.
382 CHAPTER 4. PORTFOLIO CONSTRUCTION

250
Monthly returns of momentum risk factor,
18.00%
1927 ‒ 2014

200
16.00%

14.00%

150

12.00%

100 10.00%

8.00%

6.00%

4.00%
0

2.00%

-50
1991

2007
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006

2008
2009
2010
2011
2012
2013
2014
0.00%

Mkt-RF SMB HML WML RF

Figure 4.25: Left panel - FF annual factor performance in the period 1991-2014 starting
each year in January and ending in December. Mkt is the market return, RF the risk-
free return, and WML the momentum factor. Right panel - monthly returns of the
momentum risk factor (Kenneth French's web site).

and HML are zero. The FF performed better than any single factor model and it failed
only slightly due to the the low-B/M portfolios. Their return was too low and the return
on the big-size portfolio was too high; i.e. the size eect was missing in the lowest-B/M
quintile.

4.5.6 Factor Investing: 5-Factor Model of Fama and French

Fama and French (2015) proposed a ve-factor model extension of their three-factor
model. The motivation of the model follows from rm valuation equation:

∞
P 1
Et (1+R)j
(Yt+j − ∆Bt+j )
Mt j=1
= (4.69)
Bt Bt

with M the current market cap, Y total equity earning , ∆B the change in total book
value in the period, and R the internal rate of return of the expected dividends. Equation
(4.69) follows from the fundamental pricing equation; see Equation (3.139). Equation
(4.69) implies that B/M value is an imperfect proxy for expected returns: The market
cap M also responds to forecasts of earnings and investment (expected growth in book
value) which dene the two new factors. The regression (4.67) reads (neglecting time
4.5. FACTOR MODELS 383

indices)

X
Ri − Rf = βi,M (RM − Rf ) + βi,k Rk + αi + i , (4.70)
k∈{SMB, HML, RMW, CMA}

with RRM W the earnings risk factor (dierence between robust and weak protability)
and RCM A the innovation risk factor (dierence between low- and high-investment rms).
This is again an exact factor model by denition. The explicit construction of the risk
factors is a long/short combination similar to (37); see Fama and French (2015).

Fama and French (2015) rst analyze the factor pattern in average returns following
the construction of the three-factor model:

• One-month excess return on the one-month US treasury bill rate follow.

• The returns follow for 25 value-weighted portfolios of US stocks from independent

sorts of stock (ve size and ve B/M groups ranking into quintiles). The authors
label the quintiles from Small to Big (Size) and Low to High (B/M).

• Data are from 1963 to 2013.

Figure 4.26: Return estimates for the 5x5 size and B/M sorts. Size is shown on the
vertical and B/M on the horizontal. OP are the earnings factor portfolios and Inv the
investment factor portfolios. Returns are calculated on a monthly basis in excess to the
one-month US treasury bill rate returns. Data start in July 1963 and end in December
2013, thus covering 606 months (Fama and French, 2015).
384 CHAPTER 4. PORTFOLIO CONSTRUCTION

Panel A in Figure 4.26 shows that average returns typically fall from small to big
stocks - the size eect. There is only one outlier - the low portfolio. In every row, the
average return increases with B/M - the value eect. It also follows that the value ef-
fect is stronger among small stocks. In Panel B, the sort B/M is replaced by operating
protability due to the denition found in Fama and French's 2015 paper. Patterns are
similar to the size-B/M sort in panel A. For every size quintile, extremely high rather
than extremely low operating protability (OP) is associated with a higher average re-
turn. In panel C the average return on the portfolio in the lowest investment quintile is
dominates the return in the highest quintile. Furthermore, the size eect exists in the
lowest four quintiles of the investment factor.

The authors perform an analysis to isolate the eect of the factors on average return.
The main results are:

• Persistent average return patterns exist for the factors HML, CMA, RMW, SMB.

• As expected, statistical tests reject a ve-factor model constructued to capture

these patterns.

• The model explains between 71 percent and 94 percent of the cross-section variance
of expected returns for HML, CMA, RMW, SMB.

• HML (value) becomes a redundant factor. Its high average return can be completely
generated by the other four factors, in particular to RMW and CMA.

• Small stock portfolios with negative exposure to RMW and CMA are problematic:
Negative CMA exposures are in line with evidence that small rms invest a lot.
Negative exposures to RMW, contrary, is not in line with a low protability.

Why Fama and French did not introduce momentum? Asness et al. (2015) state
that momentum and value are best viewed together, as a system, and not stand-alone.
Therefore, it is not a surprise that value becomes redundant in the ve-factor model
where momentum is not considered. The authors redo then the estimation of 5 factor
model where they also nd that HML can be reconstructed and is better explained by
a combination of RMW and CMA. But the other direction is not true. CMA cannot be
explained for example by HML and RMW. The authors then add momentum which is
negatively correlated to value: Then, value becomes statistical signicant in explaining
returns.

4.5.7 Risk Factor Allocation

Several aspects determine how risk factors should be allocated. First, by construction
risk factors should show weak correlations in normal and stressed markets. This strongly
suggests that any short-term discretionary interventions should be excluded. Further-
more, any rebalancing of the portfolio weights should be considered within a time period
where short-term uctuations are no longer inuential. Typically, rebalancing take place
4.5. FACTOR MODELS 385

quarterly or even semi-annually. Second, some factors are pro-cyclical with the business
cycle while others are historical defensive or not related to the business cycle. Pro-cyclical
is value, growth, momentum, size and liquidity. Defensive or of low volatility are factors
exploiting the volatility, yield and quality. This suggests that there should be a discre-
tionary control about which factors should be included in the investment portfolio. Given
the periodicity of the cyclical behavior such a control should take place on an annual or
even bi-annual basis.

Following Leippold et al. (2019), we restrict us to a long-only strategy and we ask

whether we can improve the performance by timing the factors; this is often called Smart
Beta in the industry but there is no need to use this expression. The author focus on
few factors using equity-level data in a long-only context: the 3 and 5 FF and classi-
cal momentum. The portfolio weights are improved by including the information of the
covariance matrix of equity returns in the construction, i.e. by relying on the GMV port-
folio. Furthermore, realistic transaction costs are considered and particular attention is
paid of reducing the common pitfalls of out-of-sample backtesting. They jointly test US,
developed, and emerging equity markets. For the US market for example a signicant
out-of-sample alphas relative to the multi-factor benchmark that are robust to multiple-
hypothesis testing, ranging between 0.36 − 0.42% per month follows. The results are
solely due to the timing ability. They exploit momentum not in the factor returns but
in the optimal factor weights.

The portfolios are based on factor scores. They rank each individual factor f at each
date t and normalize it from zero (worst) to one (best) to obtain a factor score sf,i,t for
each security i. Assuming N securities and F factors, they write St for the F × N factor
score matrix. To build the aggregate score at , the factor scores are weighted with time-
dependent weights φt , i.e. at = φ0t St . The essential part of the model is the choice of the
weights φ. As a benchmark they use the naive strategy that equally weights the factor
scores over time. Setting the weights equal to the sign function of past factor returns
provides the rst timing which does not take into account the interaction eects of the
other factors. To include interaction among dierent factors a one-month momentum
strategy that invests in the optimal weight combination of the most recent month is
created, see the paper for the details how this is done. The nal timing strategy is to e
information of the covariance matrix, i.e. a minimum risk optimization. To estimate the
large dimension covariance matrix the shrinkage estimator of Ledoit and Wolf (2003) is
used. Figure 4.27 shows the results for dierent strategies.

4.5.8 Factors and Advisor Portfolios

This section is from Lawler et al. (2019). They analyze from Sept 30, 2017 to Sept 30,
2018 approximately 10'000 advisor portfolios in the Blackrock database. Advisor portfo-
lios are en vogue since they can be constructed with scaled, replicable, and transparent
investment processes. By denition, a nancial advisor model portfolio consists of a set
of identiers and weights associated with a hypothetical portfolio representing a specic
386 CHAPTER 4. PORTFOLIO CONSTRUCTION

Figure 4.27: The cumulative logarithmic excess return in US dollars above the one-month
Treasury bill rate of the value weighted market portfolio in the US (MKT US) dashed
line, and the optimized one-month momentum strategy 3 FF, 5 FF and 5 FF including
momentum (FFC), together with the excess return of the Opt strategy over the MKT
strategy for transaction costs of 0, 5, 10, and 15 Bps (bottom). The analyzed period is
from July 1963 to June 2018. Source: Leippold et al. [2019]

implementable investment strategy. A client portfolio can dier from the model portfolio
due to frictions such as tax constraints for example. On average, advisor portfolios con-
tain 17 direct holdings, with mutual funds and ETFs the main instrument types. Advisor
portfolios are grouped in ve classes from conservative portfolios (<30% equities) to ag-
gressive portfolios (>80% ) in equities. This classication is comparable to the balanced,
growth, etc. classication which is common in Europe. Most advisor portfolios have an
equity weight of 50% to 65%.

A hierarchical factor model is used to compare the many dierent types of portfolios.
The rst factor level are macro factors economic growth, which is mostly accessed through
equities, real rates, ination, credit, EM and commodity. Each macro factor is proxied by
a representative portfolio. Economic growth is for example modelled as a weighted basked
of various equity indices from around the world. Advisor portfolios are dominated by
exposure to economic growth. For 88% of advisor portfolios economic growth risk account
for 74.7% of portfolio volatility. On average, rates and credit exposures explain 66.9%
and 21.8%, respectively, of xed income variation. But the U.S. Bloomberg Barclays
Aggregate Bond Index is 104.3% rates and -4.9% - investors are relative short rates and
long credit. Furthermore, advisor models are consistently short on duration to safeguard
4.6. BACKTESTS 387

against the potential of unexpected rising interest rates. The second level are style factors
which allow comparison within specic asset classes. For equities, they investigate the
exposure to value, momentum, small size and low volatility strategies. Advisors do not
have meaningful style factor exposures in equitye except for small size stocks. Table ??
summarizes statistics.

Type Nr. Portfolios Vol Benchmark Vol Average Fee

Conservative 12% 3.3% 3.3% 58
Moderate Conserv. 15% 6% 5.3% 57
Moderate 34% 7.9% 7.5% 54
Moderate Aggres. 21% 9.5% 9.4% 54
Aggressive 18% 12.1% 11.2% 49

Table 4.18: Statistics for advisor portfolios. The total number of BlackRock Portfolios
collected between October 2017 and September 2018., as of September 30, 2018, is 9'940.
Ex ante average annual volatilities as of 9/30/2018. The benchmark for the Conservative
cohort is 11% S&P500, 4% MSCI All Country World ex US and 85% Bloomberg Barclays
U.S. Universal Index. For other cohorts the weights of the three indices vary. Fees are
in bps. (Lawler et al. (2018)).

The average number of individual equity holdings is 3.5 and the median is 2.5. Figure
4.28 shows the breakdown of macro factors across the dierent cohort with an increasing
exposure to equity for more aggressive advisors and an overall not signicant exposure
to style factors.

4.6 Backtests
Backtests are historical simulations of quantitative investment strategies. The tests com-
pute the P&L of the strategy if it had been run over that time period. The performance
is expressed using performance measures such as the Sharpe ratio,. Backtests often look
very promising for investment. But many practitioners fear that once they invested into
a backtested strategy, the backtesting-performance evaporates. This fear is justied if
statistics is not used appropriately, as it is too often the case.

4.6.1 Data Snooping

Data snooping and data mining are often the reasons for a disagreement between back-
testing results and future investment performance. They suggest ndings which are
supported by the data but in fact are spurious. Consider an investment strategy which
has been ne-tuned on the in-sample data set. Applying it out-of-sample the claimed
in-sample performance can disappear, if the strategy improved on specic in-sample char-
acteristics which are missing out-of-sample. The data snooping example from Andrew Lo
(1994) shows how a pure non-sense can lead to a spurious outperformance. The invest-
388 CHAPTER 4. PORTFOLIO CONSTRUCTION

Figure 4.28: Decomposing macro factor exposure. Other Macro means mostly exposure
to equity as of 9/30/2018. Source: Lawler et al. (2019)

ment strategist beliefs that the following mathematical proposition of Fermat regarding
prime numbers provides meaningful investment signals:

Proposition 80. For any prime number p, the division of 2p−1 by p always leads to a
remainder of 1.
Dividing 213−1 for example by 13 implies 315 plus the remainder of 1. This holds for
all prime numbers. But the converse is not true. If a division of 2
p−1 by p leads to a
remainder of 1, it does not imply that p is a prime number. But the converse is 'almost
true': There are very few numbers that satisfy the division property and are not prime.
In the rst 10, 000 numbers there are only seven such numbers, one of them is 1105.

He relates this prime number theorem to stock market performance as follow: Select
those stocks where one of these seven numbers are embedded in the CUSIP identiers.
32 Given the aforementioned seven numbers, there is only one CUSIP code that contains
such a number: CUSIP 03 110510. This CUSIP represents the stock Ametek. This stock
had exhibited, by the time of Lo's writing, extraordinary performance: a Sharpe ratio of
0.86, a Jensen alpha of 5.15, a monthly return of 0.017, and so on.

There is no reason why the link 'Prime Number Theorem - CUSIP Stock Selection '
should work in general. The relationship driven return is simple luck. Here the highly

32 A CUSIP is a nine-character alphanumeric code that identies a North American nancial security
for the purposes of facilitating clearing and settlement.
4.6. BACKTESTS 389

non-linear prime number property leads to a spurious return patterns.

Consider an order statistics example. Assume that there are N = 100 IID securities
with standard normal distributed annual returns with mean of 10 percent and standard
deviation of 20 percent. The probability that the return of security k exceeds 50 percent
is then 2.3 percent.
33 It is unlikely that security k will show this strong return. But if
we ask for the winner return
34 - that is to say, the probability that the maximum return
will exceed 50 percent, the probability is 90 percent.
But this winning question does not tell us anything about the nature of the winning
stock since they are IID distributed. Nothing can be inferred about the future return if
one knows at a given date which stock is the winner. Choosing today the past winner
and predicting that it will also be the future winner is data snooping. The prediction is
only related to luck.

4.6.2 Overtting
Researcher in investments algorithms often publish their results using in-sample results
where the number of trials is not stated. Not reporting the number of all trials increases
the probability of overtting: The published investment algorithm fails to t additional
data or predict future observations reliably. There is risk that a high Sharpe ratio in-
sample but with zero Sharpe ratio out-of-sample is reported. Consider an investment
algorithm for stock investment where 1000 paths are simulated. If one selects and pub-
lish the best performing path, then all investors using this algorithm will be disappointed.

The example is from Bailey et al. (2014). Consider an IID sequence of normal
returns with mean µ and volatility σ. The annualized Sharpe ratio can be computed as
(Lo (2002))
µ√
SR = T
σ
where T is the number of returns per year. The true values of the drift and volatility are
not known. Hence they are estimated, leading to an estimated annualized Sharpe ratio
SR. Lo proves that this estimate converges asymptotically for large y, the number of
years used to estimate the Sharpe ratio, to:

2
1 + SR
2T
SR → N (SR, ).
y

If µ=0 and y = 1, then

SR → N (0, 1) .
The following proposition is key:

33 P (Rk ≥ 50%) = 2.3%.

34 P (maxk (Rk ) ≥ 50%) = 99%.
390 CHAPTER 4. PORTFOLIO CONSTRUCTION

Proposition 81. Let Xn , n = 1, . . . N, be IID standard normally distributed and y = 1.

The expected maximum of the sample is for large N approximated by:
1 1
E[max Xn ] ∼ (1 − γ)Φ−1 (1 − ) + γΦ−1 (1 − ) , N >> 1, (4.71)
n∈N N eN
with γ = 0.57721...
√ the Euler-Mascheroni constant. An upper bound for the expected
maximum is 2 ln N .
Figure 4.29 illustrates the proposition. For N = 10 alternative congurations of
an investment strategy, one expects to nd a strategy with a Sharpe ratio in-sample of
1.57 although all strategies are expected to deliver a Sharpe ratio of zero out-of-sample.
Increasing the number of tested strategies, an increasing non-null probability of selecting
in-sample a strategy with null expected performance out-of-sample follows. Hence, unless
the maximum estimated Sharpe ratio is not much larger than the expected maximum
Sharpe ratio, the discovered strategy is likely to be a false positive.

Overfitting Backtest's Results as a Function of Trials N

3.5

2.5
Expected Maximum

1.5

0.5

0
5 55 105 155 205 255 305 355 405 455 505 555 605 655 705 755 805 855
Number of Trials N

Expected Maximum Upper Bound

Figure 4.29: Overtting of backtests for µ = 0 and y = 1 and minimum expected backtest
length.

The results carry over to y 6= q by scaling the above result. Again, the more indepen-
dent congurations a researcher tries, the more likely is overtting. Hence, increasing N
means to use a higher acceptance threshold for the backtested result to be trusted. In-
creasing the sample size y, the above overt problem can be at least partially mitigated.
This means that a minimum backtest length can be calculated such that one does not
selects an in-sample strategy with Sharpe ratio the expected maximum one but which
has an expected out-of-sample Sharpe ratio of zero, see Figure 4.29.
4.6. BACKTESTS 391

This trade-o implies for say 6 years data at hand, that no more than 100 independent
model congurations should be tried. Else, almost surely strategies are produced with
positive Sharpe ratios in-sample but zero ones out-of-sample. The authors state: A
researcher that does not report the number of trials N used to identify the selected backtest
conguration makes it impossible to assess the risk of overtting.

Example Backtest Q&A

The Financial Math Organization present some questions and answer to overtting
in a blog
35 which relates to the paper of Bailey et al. (2014). We focus on some questions
and answers.

• Why do so many quantitative investments fail? ... some of the most successful in-
vestment funds in history apply rigorous mathematical models (..., Winton, Citadel,
...). Many of them are closed to outside investors, and the public rarely hears about
them. This void is often lled by pseudo-mathematical investments, which apply
mathematical tools improperly as a marketing strategy. One of the most widely
misunderstood experimental techniques is historical simulation, or backtesting.

• Is it true that every backtest is intrinsically awed? Not at all. ... The purpose of
our research is to highlight how easily backtest results can be manipulated, ...

• Can the 'hold-out', i.e., reserving a testing set to validate the model discovered in
the training set, method prevent overtting? Unfortunately, this method cannot
prevent overtting. ... Perhaps the most important reason for hold-out's failure is
that this method does not control for the number of trials attempted. If we apply
the hold-out method enough times (say 20 times for a 95% condence level), it is
expected that we will obtain a false negative (i.e., the test fails to discard an overt
strategy). ...

• Are you saying that Technical Analysis is a form of charlatanism? No. Technical
analysis tools rely on a variety of lters that make them prone to overtting. We
are simply stating that technical analysts and their investors should be particularly
aware of the risks of overtting. When the probability of backtest overtting is
correctly monitored, technical analyses may provide valuable insights to investors.

4.6.3 Backtesting and Multiple Testing

We consider adjusting the p-value if multiple testing is considered. The discussion is taken
from Harvey and Liu (2015). Consider a single initial zero-dollar investment strategy φ
and the return Rt . A single test is used to evaluate the hypothesis that the expected
return E(Rt ) of the strategy is dierent from zero. To test the hypothesis a sample
statistics is constructed by considering the time series of historical returns and estimating
392 CHAPTER 4. PORTFOLIO CONSTRUCTION

the sample mean µ

b and volatility σ
b. The t-statistics which tests the null hypothesis of
µ
zero expected return is t= bT which implies that
σ
b

√
SR = T ×t . (4.72)

Therefore for a xed time horizon an increasing SR implies an increasing t-ratio which
implies a higher signicance level, and vice versa for the other direction. This is equivalent
to a lower p-value for a single strategy test:

√
pS = P (|R| > t) = P (|R| > SR T ) . (4.73)

Assuming a distribution for the returns, a distribution for the t-statistics and hence the
Sharpe ratio follows. Summarizing, if SR is the right measure to value performance,
(4.72) states that this is one-to-one related to the t-statistics. Back to a specic trading
strategy protable test, assuming normality and that the strategy is not protable (hy-
pothesis is null), then the chance to make an error of the rst kind is 5 percent: decide
to reject, meaning to implement strategy which would lose money. Since the hypothesis
is null, the rejection was false - a false discovery happened. What is the appropriate
p-level if multiple tests are used? A practitioners is to apply ad hoc rules in their back-
testing rules: Discount the Sharpe ratios of single test in the backtests by 30% or even
50%. While easy to implement, this approach fail to have any justication.

We generalize the above discussion to multiple N ≥ 1 independent tests. Assume

that we want again test whether a single investment strategy is protable. The hypothe-
sis is that if the strategy is not protable, then with 5% we make a wrong decision (error
type I) by implementing a strategy which will lose money - false discovery. We test
independently 100 strategies. While 5% is acceptable for one test, for many tests this
percentage can result in a large number of false positives. This is the multiple testing
problem. In the 100 tests, the chance to err is around 99%.36 If we want to keep the
5% level, a solution is to use 5%/100 = 0.0005 for each test. This makes sure that the
chance of making the wrong decision that one of those 100 strategies is working is less
than 5%. This is called controlling of the Family-wise error rate (FWER). It is a very
restrictive control in the multiple testing. The p-value of 1.66 for 5% becomes 3.4. Only
extremely performing strategies allow us to keep the 5% decision level. Many strategies
which are performing good will be missed.

36 To prove this let pM be the p-value for the multiple test dened as:
pM = P ( max |Ri | > t) = 1 − (1 − pS )N . (4.74)
i=1,...,N

Using the standard t = 2 value for a single test or equivalently pS = 5%, implies for N = 100 pM = 99%.
The search for a strategy which is at least as protable as the observed strategy largely reduces the
statistical signicance of the single test. In this sense pM is seen as the adjusted p-value which takes
data mining into account. Equating the two p-values, the adjusted or haircut Sharpe ratio follows which
is smaller than SR (since pM > pS ).
4.6. BACKTESTS 393

Consider a hedge fund manager using Commodity Trading Advisors (CTAs) strate-
gies. That is, he relates detected changes of trend in the securities to changes in the
exposure. Dierent parameters dene the change detection such as length of the time
series to calculate the moving averages, tresholds to enter and to exit and from a risk
management perspective, stop-loss trigger points. Given that many assets are tested, the
number of combinations are millions of even billion ones. Suppose that each strategy is
individually tested, say by calculating the Sharpe ratio for each trial and test its signif-
icance on 95% level. Given the large number of individual tests, multiple testing raises
the concern that an increasing number of them will be positive purely due to chance.
That is, a large fraction of the individual tests that ex post are positive will be false
discoveries, i.e. are due to chance. If the false discovery rate is 100%, the signicance
of all individual tests is completely uninformative.

The conservative FWER rule was improved by Holm (1979) and Benjamini and
Hochberg (1995). They proposed to allow non-performing strategies as long as the there
are enough performing ones. In doing this, we gain power to detect the skill-full man-
agers. But how many non-performing strategies are we willing to accept? We x the rate
of false discoveries (FDR) to 20% - we are willing to accept that out of ve strategies
one is a non-protable one. Assume that 2 out of the 100 strategies add value while the
other ones destroy wealth. Benjamini and Hochberg found an upper bound, i.e. even if
all 100 strategies are null, we will get our 20 percent by adjusting the threshold. In this
case it also follows that the strategies are normally distributed. If some strategies are
protable, then we get a better rate than 20 percent.

How do we nd the threshold which gives the chosen FDR? The theory is rather in-
volved, but an algorithm is used to derive the correct threshold. We expect100×0.05 = 5
signicant variables. Starting with the p-value of 2 6 variables: We get only
we get say
one skill-full manager while 5 have no skills. The ratio 5/7 = 71% is much higher than
the 20% accepted FDR. The algorithm then increases the p-value from 2 such that the
ratio of expected to observed variables becomes 20%. The resulting number of observed
variables s such that the division of the expected variables by s equals the FDR rate
says that if we know that there are 2 performing strategies among the 100 ones, then
by controlling the FDR to 20%, the test has the power to discover the strategies s. In
variables selection terms, we are willing to add estimation noise to our model (variable
which is not important) as long as we add relevant information as well (include more
relevant variables).

If the tests are dependent, then pM depends on the joint distribution of all N sin-
gle test statistics. To limit the occurrence of incorrectly discovered protable strategies
- false rejections of the null hypothesis occurs more likely than in a single test - two
methods are used: The method which controls the family-wise error rate (FWER) and
the control of the false discovery rate (FDR). Both methods dene type I errors in mul-
tiple testing thus generalizing type I error probabilities for single tests. Summarizing,
394 CHAPTER 4. PORTFOLIO CONSTRUCTION

FDR conceptualizes the rate of type I errors in null hypothesis testing when conducting
multiple comparisons. FDR-controlling procedures are designed to control the expected
proportion of discoveries, i.e. rejected null hypotheses that are false (incorrect rejections).

Formally, we denote by R the number of rejections, N the tested hypotheses and N0|r
the fraction of false discoveries.

Denition 82. FWER dened by

FWER = P (N0|r ≥ 1) (4.75)

is the probability of making at least one false discovery.

FDR considers the proportion of false rejections and it is based on the false discovery
proportion (FDP), the proportion of type I errors dened by

N0,r , Fraction of false discoveris if R > 0;
FDP = (4.76)
0, if R = 0.

FDR measures the expected proportion of false discoveries among all discoveries, i.e.
F DR = E[F DP ]. Given the type I error denitions, p-value adjustments control for
data mining. Based on the adjusted p-values, the corresponding t-ratios are transformed
into Sharpe ratios. There are dierent methods to transform p-values. Two methods for
FWER are:

Bonferroni's Method:
pBonf
(i) = min(N p(i) , 1)
Holm's Method:
pHolm
(i) = min(max(N − j + 1)p(i) , 1) .
j<i

For FDR the method of Benjamini, Hochberg, and Yekutieli (BHY) reads

pBHY
(i) = p(N ) if i=M

and if i≤M −1
N c(N )
pBHY
(i) = min(pBHY
(i+1) , p(i) )
i
PN 1
with the normalization constant c(N ) = k=1 k and where the p-values are ordered
descending in the algorithm.

To illustrate the methods, consider 8 investment funds given in 4.19.

The constant c(N ) = 2.72 and ordering the p-values, pBHY
(8) = 0.5485 is the largest
adjusted p-value. Using the BHY algorithm,

8 × 2.72
pBHY
(7) = min(0.5485, 0.16758) = 0.5209
7
4.6. BACKTESTS 395

√
Fund Ret Vol SR T t-stat t-value p-value
Energy -19,58 16,16 -1,21 1,41 -1,71 0,95637 0,08726
Diversied Dividend 6,70 3,87 1,73 1,41 2,45 0,99266 0,01468
Multi-Asset Income 1,58 3,70 0,43 1,41 0,60 0,72575 0,54850
Global RE Income 5,14 2,14 2,40 1,41 3,40 0,99966 0,00068
Low Vol Equity Yield 8,03 5,38 1,49 1,41 2,11 0,98257 0,03486
Low Volatility Yield 7,77 5,37 1,45 1,41 2,05 0,97982 0,04036
Real Estate 9,20 9,37 0,98 1,41 1,39 0,91621 0,16758
Dividend Income 9,25 4,37 2,12 1,41 2,99 0,99861 0,00278

Table 4.19: 8 investment funds from Ivesco. Data from January 2015 to December 2016.
(Engesser (2018)).

and the other adjusted p-values follow in the same way. Doing the calculation, we ob-
serve that all p-values increased except the highest one and that only two of them,
pBHY
(2) = 0.0302, pBHY
(1) = 0.0148 are statistically signicant compared to the ve signi-
cant strategies in 4.19 before correcting the p-values.

The next example considers the FWER for adaption to the momentum strategy
following the construction of Kenneth French. He considers all stocks on NYSE and
NASDAQ, where six portfolios are formed according to the market cap (small, big) and
historical returns (high, medium and low). We consider data from July 1963 to December
2012, i.e. 594 monthly returns. The null hypothesis is that returns are not dierent from
zero. Calculating rst the performance of the strategy without any adjustments using
the Sharpe ratio we get:

µ√ 0, 7 √
SRp.a. = 12 = 12 = 0.57.
σ 4.29
Calculating the p-value using

p = 2(1 − Φ(t-value) = 0.00006

follows. We reject the null hypothesis. Assuming that there are N = 50 strategy im-
provements, pBonf = 0.003 follows which is signicant. If N = 10 000, then the Bonforroni
adjusted
0
p-value becomes 0.06, that is the null hypothesis cannot be rejected for 1 000
strategies.

4.6.4 Application to Factor Investing

Harvey et al. (2015) used 313 published works and selected working papers and a cata-
logue 316 risk factors . The 316 risk factors are the result of various sorting mechanism.
Which of these factors are truly independent or which of them are subsumed by other
variables? The standard criterion of using a t-ratio greater than 2.0 as a hurdle is no
longer adequate. There are three main reasons for this.
396 CHAPTER 4. PORTFOLIO CONSTRUCTION

First, the multiple testing problem using the FDR control replace the single testing
problem p-values. Second, there must be a huge number of putative papers that did not
nd any signicant explanation for the cross section of expected returns. These papers
were never published and hence their information content did not enter the traditional
statistical setup. There are two reasons for these non-publications. You don't make an
academic career in nance by publishing non-results and it is also dicult to publish
a replication of a successful argument. There is a bias toward publishing papers that
establish new factors. Third, Lewellen et al. (2010) show that the explanatory powers of
many documented factors are spurious using cross-sectional R-squared and pricing errors
to judge the success of new factors. The Fama-French 25 size-B/M portfolios in their
three factor model explain more than 90%(75%) of the time-series variation in portfolios'
returns (cross-sectional variation in their average returns). Any new factor added to this
model which is correlated with size and value but not with the residuals will produce a
large cross-sectional R-squared.

Harvey et al. (2015) apply the false discovery proportion (FDP) and the false discov-
ery rate (FDR). The authors derive the following results. Between 1980 and 1991, only
one factor is discovered per year growing to around ve factors in the period 1991 - 2003.
In the last nine years, the annual FDR has increased sharply to around 18: 164 factors
were discovered in the last nine years, doubling the cumulated 84 discovered factors of
the past. They calculate t-ratios for each of the 316 factors discovered, including those
in working papers. The vast majority of t-ratios exceed the 1.96 benchmark and the
non-signicant factors typically belong to papers that propose a number of factors.

The authors apply their method rst to the case in which all tests of factor cross-
section returns are published. This false assumption denes a lower bound of the true
t-ratio benchmark. They obtain three benchmark t-ratios, two of which we describe:

• Factor-related sorting results in cross-sectional return patterns that are not ex-
plained by standard risk factors. The t-ratio for the intercept of the long/short
strategy returns regressed on common risk factors is usually reported.
• Factor loadings as explanatory variables. They are related to the cross section of
expected returns after controlling for standard risk factors. Individual stocks or
stylized portfolios (for example FF 25 portfolios) are used as dependent variables.
The t-ratio for the factor risk premium is taken as the t-ratio for the factor.
They transform the calculated t-ratios into p-values for all three methods. Then, these
p-value are transformed back into t-ratios, assuming that standard normal distribution
accurately approximates the t-distribution, see Figure 4.30
Figure 4.30 presents the benchmark t-ratios for the three dierent methods. Using
Bonferroni the benchmark t-ratio starts at 1.96 and increases to 3.78 by 2012 and will
reach 4.00 in 2032. A corresponding p-values for 3.78 is for example0.02 percent which
is much lower than the starting level of 5 percent. Since Bonferroni detects fewer discov-
eries than Holm, the t-ratios of the later one are lower. BHY t-ratio benchmarks are not
4.6. BACKTESTS 397

Figure 4.30: The green solid curve shows the historical cumulative number of factors
discovered, excluding those from working papers. Forecasts (dotted green line) are based
on a linear extrapolation. The dark crosses mark selected factors proposed by the lit-
erature. They are MRT (market beta; Fama and MacBeth [1973]), EP (earnings-price
ratio; Basu [1983]), SMB and HML (size and book-to-market; Fama and French [1992]),
MOM (momentum; Carhart [1997]), LIQ (liquidity; Pastor and Stambaugh [2003]), DEF
(default likelihood; Vassalou and Xing [2004]), IVOL (idiosyncratic volatility; Ang, Ho-
drick, Xing, and Zhang [2006]), DCG (durable consumption goods; Yogo [2006]); SRV
and LRV (short-run and long-run volatility; Adrian and Rosenberg [2008]), and CVOL
(consumption volatility; Boguth and Kuehn [2012]). T-ratios over 4.9 are truncated at
4.9 (Harvey et al. [2015]).

monotonic but uctuate before the year 2000 and stabilize at 3.39 after 2010.

Figure 4.30 shows the t-ratios of a few prominent factors - the main result in this
section:
398 CHAPTER 4. PORTFOLIO CONSTRUCTION

Result 83. Book-to-market, momentum, durable consumption goods, short-run volatil-

ity and market beta are signicant across all types of t-ratio adjustments, consumption
volatility, earnings-price ratio and liquidity are sometimes signicant and the rest are
never signicant.

The authors extend the analysis by testing, for example, for robustness and assuming
correlation between the factors. The above results did not change notably. The analysis
suggests that a newly discovered factor today should have a t-ratio that exceeds 3.0,
which corresponds to a p-value of 0.27 percent. The authors argue that the value of
3.0 should not be applied uniformly. For factors derived from rst principles, the value
should be less.
Harvey et al. (2015) - Many of the factors discovered in the eld of nance are likely
false discoveries: of the 296 published signicant factors, 158 would be considered false
discoveries under Bonferonni, 142 under Holm, 132 under BHY (1%) and 80 under BHY
(5%). In addition, the idea that there are so many factors is inconsistent with the princi-
pal component analysis, where, perhaps there are ve 'statistical' common factors driving
time-series variation in equity returns (Ahn, Horenstein and Wang (2012)).

4.6.5 p-Hacking
In general, p-hacking means to push down the p-value to create signicance. For ex-
ample, testing multiple hypotheses increases the likelihood of false results. That is, the
null hypothesis is rejected, although it is correct: The p-value is actually larger and not
signicant. Chordia, Goyal and Saretto (2017) show how the published performance of
investment strategies is doubtful since the manner in which they are evaluated does not
align with research quality standards. First, there is a publication bias since only those
strategies that are signicant are reported as only they have a viable path to publication.
Second, data snooping leads to a number of false rejections of the null. Finally, a number
of data choices, test procedures, and samples may be tried until a signicant result is
discovered and only the signicant result is reported. All this is referred to p-hacking.

They use all accounting variables on Compustat data base and basic market variables
on CRSP data base. They construct all possible trading signals based on the data item
of Compustat satisfying minimal requirements. The signals consist of all types levels and
growth rates, ratios of two levels or growth rates, i.e.

x1 − x2
x3
and all possible permutations. This leads to a total of approximatively 2.1 million sig-
nals in 1972-2015. It is clear, that most of these signals are economically meaningless
combinations of items. But this large sample accounts for existing and yet to be studied
trading strategies. Using this sample they ask whether they can put a bound on the
magnitude of p-hacking and furthermore, after accounting for p-hacking, how likely is a
4.6. BACKTESTS 399

researcher to nd a truly abnormal trading strategy? To strengthen the robustness of

the results, non-treadable and peculiar assets are removed: A 6 month period between
portfolio formation and the data base timestamp is required, all stocks worth less USD
3 and all stocks in the bottom quintile of NYSE market cap distribution are removed.

The authors use FDP to control the proportion of false discoveries, since the trading
strategies are not independent of each other (cross-correlation in stock returns) and FDP
deliver statistical cutos that rely on the cross-correlations present in the data. They
calculate measures of risk-adjusted performance for each strategy by rst constructing a
long-short portfolio based on the top and bottom decile of each signal's distribution, com-
puting portfolio alphas using the Fama and French (2015) ve factor model augmented
with the Carhart (1997) momentum factor and they calculate the Fama and MacBeth
(1973) (FM) coecient for each signal.

Imposing a tolerance of 5 percent FDP and the same signicance level, the critical
value for alpha t-statistic is 3.79 (for FM it is 3.12). This numbers are comparable to
those of Harvey et al. (2015). At these thresholds, 2.76 percent of strategies have signi-
cant alphas and 10.80 percent have signicant FM coecients.
37 Using single hypothesis
testing (SHT) with t-statistic higher than 1.96 rejects the null hypothesis in about 30
percent of the cases for both alpha and FM t-statistics. The majority of the discoveries
(rejections of the null of no predictability) based on SHT without accounting for the very
large number of strategies that are never made public are likely false.

The authors add economic reasoning to this so far purely statistical considerations to
gain more robust conclusions. They impose consistency between performance measures
obtained by portfolio sorts (alpha) and those derived from FM regressions. Eliminating
strategies that have statistically signicant t-value for alpha but insignicant for FM, or
vice-versa, reduces the number of successful strategies to 806 under MHT and to 33,881
under SHT.

The second restriction are economic hurdles based on the Sharpe ratio, i.e. they elim-
inate strategies that do not have a Sharpe ratio higher than that of the value-weighted
market portfolio. Imposing the two economic hurdles leaves us with 17 strategies that
are both statistically and economically signicant under MHT and 801 under SHT. The
the likelihood of a researcher nding a truly abnormal trading strategy tends to zero.

Surprisingly, the 17 surviving strategies fail to have any economic meaning - the
sorting makes no economic sense of these strategies. The authors conclude that the
standard of market eciency is as strong as ever. A dierent conclusion is that while
accounting and economic based sorting is meaningless, this could be dierent for nancial
market signal based sorting such as implied vs realized volatility, credit basis trades or
carry trades.

37 The larger critical values for FM than for the alphas are due to the longer tails of the former one.
400 CHAPTER 4. PORTFOLIO CONSTRUCTION

4.6.6 Active vs Passive Investments

The simple arithmetic drawn from Bill Sharpe, see (2.15)), showed that, before costs,
the return on the average actively managed dollar will equal the return on the average
passively managed dollar. The analysis did not tell us whether an active manager who
beats the average is skilled or just lucky. The skill/luck question is impacted by several
factors. Scale for example often impacts performance negatively: A more skillfully man-
aged large fund can under-perform a less skillfully managed small fund. Pastor et al.
(2014) empirically analyze the returns-scale relationship in active mutual fund manage-
ment. They nd strong evidence of decreasing returns at the industry level and that a
fund's performance deteriorates over its lifetime.

4.6.6.1 The Success of the Active Strategy

Leaving the size-skill dependence aside, how can we dene and measure skills in active
management? We take a skill degree of the asset managers for granted in this and the
next section.
Assume IID returns R ∼ N (0, σ 2 ). Protable trades have by denition a positive return
and then the expected return E(R) of one protable trade is .
38

r
2
E(R) = σ ∼ 0.8 × σ ≡ 80% percentile .
π
Since risk scales with the square root of the number of trades, risk equals for n trades
√
nσ . Consider two portfolio managers. One manager is always successful; the other
is successful in x% of all trades. Both trade n times. The information ratio (IR), the
measure of a manager's generated value, measures the excess return of the active strategy
over risk:
Excess Return Active Strategy over Benchmark
IR = , (4.77)
Tracking Error (Active Risk)

where the tracking error is the standard deviation of the active return. For the investor
with 100% success rate, we get

q
2
nσ
r
π 2n
IR = √ =
nσ π
The trader with a success rate of x percent faces a loss in 1 − x percent of the trades
leading to a net prot x − (1 − x) = 2x − 1. Hence, after n trades
r r
2 2n
Ex (R) = (2x − 1)nσ , IRx = (2x − 1) . (4.78)
π π
For a xed success rate x an increasing trading frequency n increases the information
ratio. But raising the trading frequency brings about diminishing returns due to the

38 E(R) = x2
q
.
R∞
√ 1
2πσ 2 0
e− σ dx = σ 2
π
4.6. BACKTESTS 401

square-root function. Numerically, an IR of 50 percent needs a success rate x of two-

thirds if the manager trades quarterly. Hence, a high success rate is necessary to obtain
a moderate IR. Assuming that active management is a zero-sum game centered at zero,
Table 4.20 relates the IR to the percentiles: It follows that a top-quartile manager has

Percentile IR
90 1
75 0.5
50 0
25 -0.5
10 -1

Table 4.20: Percentiles of an IR distribution.

an IR of one-half and an IR of +1 is exceptional.

The skill versus frequency of trading (breadth) trade-o reads qualitatively, see (4.78),

IR
x∼ √ (4.79)
n
is of dierent severity for dierent asset classes. Many investors in interest rate risk
trade one a monthly or quarterly basis since they are exposed to fundamental economic
variables. They cannot increase their trading frequency arbitrarily. To achieve a high IR
they need to be very successful. But if markets are ecient, this is not possible. One
expects to observe more skills within (global) asset managers which can exploit inecien-
cies between dierent markets. It is easier to increase the IR by increasing the trading
frequency but this increases trading costs. Beside the naive approach to trade more often
other methods are to enlarge the set of eligible assets for the asset managers or to expand
the risk dimension by allowing investment strategies which generate separate risk premia.

Following this rst example, add some structure to the discussion. Skill have dier-
ent meanings. In its basic form a measure of skill is a hit ratio. It accounts for playing
well a game. This is not a statistical measure. The information coecient IC is such a
statistical measure of skill.. The measure correlates forecast residual return with ex post
residual return. The information ratio relates skill, say IC, directly to capital market
theory such as the CAPM, i.e. by assuming specic IC properties and investor decision
process.

The IR has similar to the alpha an ex-post and an ex-ante interpretation. Ex-post
it measures an achievement; the a ratio of (annualized) residual return to (annualized)
residual risk. Such a realized IR is often negative and in a return regression it is related to
the t-statistic one obtains for the alpha. Roughly, the IR is equal to the alpha's t-statistic
divided by the square root of observation years. The ex-ante IR measures opportunities
given by the expected level of annual residual return per unit of annual residual risk.
402 CHAPTER 4. PORTFOLIO CONSTRUCTION

4.6.6.2 Fundamental Law of Active Management

Formula (4.77) is one of many formulas to be found in the literature related to skills in
active portfolio management. The most famous formula, the fundamental law of active
management, expressed by Grinold (1989), states:

Proposition 84. Consider mean-variance portfolio optimization where the optimal ac-
tive weights φA maximize the utility function µA − λσA 2 with the expected active return

and active return variance. If the residual stock returns are uncorrelated and if no budget
constraint is imposed, then:
√
IR ∼ IC BR = Skill × Frequency , (4.80)

where IC is the information coecient of the manager and BR - the strategy breadth
- is the number of independent forecasts of exceptional returns we make per year.
IC measures the correlation between actual realized and predicted returns and pro-
vides a measure of a manager's forecasting ability. Equation (4.80) states that the in-
vestors have to play often (high BR) and play well (high IC) to win a high IR. The
fundamental law (4.80) is additive in the squared information ratios. Formula (4.77)
√
shows the same intuition: 2x − 1 represents IC and n represents BR. The derivation
of (4.80) depends on several assumptions, see Buckle (2005) for a review of the assump-
tions. Roughly on a behavioral sid, the portolio manager knows the metric of skill and
h optimizes skill, according to a model, say the CAPM. Regarding securities, the same
skill level applies to all asset choices and the sources of information are independent -
forecasts are unbiased and residual returns have zero expected value. Next, the infor-
mation coecient is a small number and the impact of estimation error in investment
information on out-of-sample optimized investment performance is not considered. Some
consequences following Grinold (1999)are:

• Combine models, because breadth applies across models as well as assets.

• Don't market-time. Such strategies are unlikely to generate high information ratios.
While such strategies can generate very large returns in a particular year, they're
heavily dependent on luck. On a risk-adjusted basis, the value added will be small.
This will not surprise most institutional managers, who avoid market timing for
just this reason.
• Tactical asset allocation has a high skill hurdle. This strategy lies somewhere be-
tween market timing and stock picking - it provides some opportunity for breadth,
but not nearly the level available to stock pickers. Therefore, to generate an equiv-
alent information ratio, the tactical asset allocator must demonstrate a higher level
of skill.
We apply this to portfolio management. To continue, we restate the denition of the IR
of a portfolio given in (4.77) as

Portfolio Alpha αp
IR = = . (4.81)
Portfolio Residual Risk p
4.6. BACKTESTS 403

For a portfolio P relative to a benchmark B we have:

2p = σp2 − βp2 σB

2
, (4.82)

i.e. residual risk orthogonal to the systematic return. The objective of an active mean-
variance asset manager is to maximize:

θ
E(u) = αp − 2p . (4.83)
2
Replacing the alpha by the IR using (4.82) implies the optimal level of residual risk:

IR
∗p = . (4.84)
θ
Using the fundamental law,
√
IR IC BR
∗p = = . (4.85)
θ θ
The breadth allows for diversication among the active bets and skill increases the pos-
sibility of success so that the overall level of aggressiveness ∗ can increase.

Example Grinold and Kahn (2000)

A manager wants to forecast the direction of the market each quarter. The market
direction takes only two values - up and down, i.e. the random variable x(t) = ±1
with mean zero and standard deviation 1. The forecast of the manager y(t) takes the
same values and has the same mean and standard deviation as x(t). The information
coecient IC is given by the covariance of x and y . If the manager makes N bets and is
correct N1 times (x = y) and wrong N − N1 times (x = −y), then
1
IC = (N1 − (N − N1 )) . (4.86)
N

The fundamental law of active management has been generalized. One reason is that
the IR given in (4.80) seems to overestimate the IR which a portfolio manager can
reach. Assume a forecast signal with an average monthly IC of 0.03 and a stock universe
of 1, 000, Then, the expected annualized IR from (4.80) is 3.29. This is beyond what the
best portfolio managers can realize. Ding (2010) generalizes the law by considering time
series dynamics and cross-sectional properties. He shows that cross-sectional ICs are
dierent from time-series ICs and that IC volatility over time is much more important
for a portfolio IR than breadth: Playing a little better has a stronger impact on the IR
than playing a little more often. He proves

IC √
IR = p BR , (4.87)
1 − IC2
404 CHAPTER 4. PORTFOLIO CONSTRUCTION

o.e. for a small IC, (4.87) is approximatively the same as (4.80).

From an information processing point of view, active management is forecasting.

There are dierent types of forecast quality. The naive forecast is the consensus expected
return. This is the informationless forecast and if it can be implemented eciently, the
expected returns of the market or the benchmark follow. There are so-called raw and
rened forecasts (Grinold and Kahn [2000]). Raw forecasts are based corporate earnings
estimates or buy and sell recommendations. It is not directly a forecast of exceptional
return. Rened forecasts are conditional expected return forecasts based on the raw
forecast information. The following forecast formula for the excess return vector R and
the raw forecast vector g where the two vectors have a joint normal distribution holds:

cov(R, g)
E(R|g) = E(R) + (g − E(g)) . (4.88)
var(g)

The covariance term is the IC. This equation relates forecasts that dier from their
expected levels. The rened forecast is then dened as the dierence between E(R|g)
and the naive forecast E(R), the consensus expected return. It is the informationless
forecast. The naive forecast leads to the benchmark holdings.The forecast formula has
the same structure as the CAPM or any other single factor model. This is not a surprise
but follows from a linear regression analysis.

4.6.6.3 Skill and Luck in Mutual Fund Management I

The approach so far has not addressed the problem of how one can distinguish between
skill and luck. Peter Lynch, the manager of the Magellan fund, exhibited statistically sig-
nicant abnormal performance. Lynch beat the S&P 500 in 11 of the 13 years from 1977
to 1989. This is itself not evidence of value enhancement. Consider 500 coin-ippers.
Each ips 13 coins and we count the number of heads for each ipper. The winner, on
average, ips 11.63 heads. But the excess return of Lynch in this period relative to S&P
is remarkable 10.5% which is a strong evidence of skills.

We analyze how skill-full are fund managers. Scaillet et al. (2013) use the FDR
to control for false discoveries or mutual funds that exhibit signicant alphas by luck
alone. They estimate the proportions of unskilled, zero-alpha, and skilled funds in the
population. A fund is unskilled if the return from stock picking is smaller than the costs
(alpha is negative net of trading costs and expenses), a zero-alpha fund if the dierence
is zero, and a skilled fund otherwise (alpha is strictly positive).

We consider the distribution function for the three groups unskilled, zero-alpha, and
skilled funds. Grouping the three distribution functions as a function of the t-statistics,
we have three density functions with the zero-alpha group density function in the middle,
see Figure 4.31. The two density functions overlap - unskilled overlaps with zero-alpha
and zero-alpha with skilled. Pick the latter region of overlap. If a fund has a high enough
t-value, then if the fund belongs to the group of zero-alpha funds, the probability of this
4.6. BACKTESTS 405

fund having the high t-value is driven by luck. Therefore, in the cross-section distribution
of all funds, some funds with high t-values are genuinely skilled and others are merely
lucky.

Figure 4.31: Intuition about luck and skill for the three groups of mutual funds unskilled,
zero-alpha and skilled. (Scaillet et al. [2013]).

Of course, it is not possible to observe the true alphas for each fund. The inference
for the three skill groups is carried out as follows. First, for each fund, the alpha and its
standard deviation are estimated. The ratio of the two estimates denes the t-statistic.
Choosing a signicance level, the t-estimate lies within or outside the threshold implied
by the signicance level. Estimates outside are labelled signicant. The FDR measures
the proportion of lucky funds among the funds with signicant estimated alphas. The
data set are monthly returns of 2, 076 actively managed US open-end, domestic equity
mutual funds that existed at any time between 1975 and 2006 (inclusive).

Of the funds, 75.4 percent are zero-alpha, 24.0 percent are unskilled, and 0.6 percent
are skilled. Unskilled funds under-perform for long time periods. Aggressive growth
funds have the highest proportion of skilled managers, while none of the growth and
income funds exhibit skills. During the period 1990-2006, the proportion of skilled funds
decreases from 14.4 to 0.6 percent, while the proportion of unskilled funds increases from
406 CHAPTER 4. PORTFOLIO CONSTRUCTION

9.2 percent to 24.0 percent. Although the number of actively managed funds increases
over this period, skilled managers have become exceptionally rare. This is also reected
in a decreasing overall alpha in the period reaching -1% in 2016, see Figure 4.84. These
facts seem to be a good motivation for passive investments.

What could be reasons for these facts, although the education level of the average
asset manager increased during the two decades? After the peak in 1993 when the alpha
started to decline, the internet was launched. The cost of information started to decrease
over time. Therefore markets became more and more ecient. In other words, luck has
become more important than skill over time. But luck is not persistent. This leads to
an overall decreasing alpha of the industry. They authors test whether funds lose their
outperformance skills due to their increasing size. They treat each ve-year fund record
as a separate 'fund' and nd that the proportion of skilled funds equals 2.4 percent,
implying that a small number of managers have 'hot hands' over short time periods.

Figure 4.32: Proportion of unskilled and skilled funds (Panel A) and total number of
mutual funds in the US versus average alpha (Scaillet et al. [2013]).

A plausible further explanation is the movement of skilled and performing managers

to the hedge funds industry since hedge fund use performance-based fees contrary to
payments used in mutual funds, see the same analysis for hedge funds in Section 2.11.
4.6. BACKTESTS 407

Skilled funds are concentrated in the extreme right tail of the estimated alpha distri-
bution. This suggests a way to detect them. If in a year tests indicate higher proportions
of lucky, zero-alpha funds in the right tail, then the goal is to eliminate these false dis-
coveries by moving further to the extreme tail. Carrying out this control each year, they
nd a signicant annual alpha of 1.45 percent. They also nd that all outperforming
funds waste, through operational ineciencies, the entire created surplus.

The authors re-examine the relation between fund performance and turnover, expense
ratio, and size. For each characteristic, the proportion of zero-alpha funds is around 75%.
The proportion of unskilled funds is qualitatively larger for funds with high turnover -
many unskilled funds trade on noise to pretend that they are skilled. The size of the
fund has a bipolar eect: Both the proportion of unskilled and skilled funds are larger
than for smaller funds.
What about European funds? Scaillet (2015) considers 939 open-end funds between
2001 and 2006. The main ndings are rst, the proportion of zero-alpha funds is 72.2
percent, the proportion of skilled funds is 1.8 percent, and the proportion of unskilled
funds is 26 percent. Second, in skilled funds, we nd low betas with respect to MSCI
Europe. Some skilled funds are known to play bonds and depart from their pure equity
mandates.

4.6.6.4 Skill and Luck in Mutual Fund Management II

Leippold and Rueegg (2018) reconsider the skill and luck question by changing or ex-
tending the analysis of last section as follow.

First, they do not consider equity markets only but also take into account a multi-
risk factor analysis for the xed income mutual funds. Risk factors change in the level,
slope, and curvature of the local yield curve, together with a credit spread. Second,
they compare value-weighted returns of active against index mutual funds within the
same investment category. This allows them to avoid choosing multi-factor benchmarks
and they can compare two investable alternatives where in both alternatives the corre-
sponding friction costs and restrictions are included. They use 30 dierent investment
categories across asset classes. Finally, they distinguish between retail and institutional
funds and they change the statistical methods of last section.

We consider the last point in more details. The studies of Scaillet et al. (2010) and
Fama and French (2010) state or assume that autocorrelation is of minor importance.
Leippold and Ruegg test for autocorrelation in mutual fund returns using a distribution-
free test. They nd that already in the rst three lags serial dependence can be found for
20 percent of single mutual funds and 30 percent of mutual fund portfolios. This evidence
calls for temporal dependence control in the analysis of single and portfolios of mutual
funds alphas against dierent benchmark models. They suggest to block-bootstrap the
alpha of a strategy to its benchmark returns, see Ledoit and Wolf (2008, 2011). This im-
408 CHAPTER 4. PORTFOLIO CONSTRUCTION

proves inference accuracy for dependent time series data and the bootstrapped t-statistics
and p-values are then the inputs in the multiple hypothesis frameworks, see Romano and
Wolf (2005a). Since the authors test whether single active or index funds signicantly
outperform the theoretical multi-factor models, there are many hypotheses and thus they
use the FDR. For portfolios of mutual funds there are only a few hypotheses and they
use the FWER.

Figure 4.21 summarizes some ndings which are comparable to those of the former
section. The result shows the dierences between retail and institutional funds, for ex-

Retail US Glob. EU Jap Asia Aver USD CHF EUR GBP Aver
Active
Zero alpha 55.1 39.6 66.2 67.9 83 62.3 38.9 71 77.8 83.3 67.8
Skilled 0 0 3 5.7 0 1.7 23.3 3.3 22.2 16.7 16.4
Unskilled 44.9 60.4 30.8 26.4 17 35.9 37.8 25.7 0 0 15.9
Index
Zero alpha 61.9 30.1 76.5 73.7 100 68.4 41.6 93.3 71.6 100 76.6
Skilled 0 0 3.6 5.9 0 1.9 29.2 6.7 28.4 0 16.1
Unskilled 38.1 69.9 19.9 20.4 0 29.7 29.2 0 0 0 7.3
Instit.
Active
Zero alpha 69.3 53.5 78.5 88.2 97.4 77.4 38.5 77.4 60.1 82.7 64.7
Skilled 0 0 8.2 9.4 0 3.5 40.7 22.6 39.9 17.3 30.1
Unskilled 30.7 46.5 13.3 2.4 2.6 19.1 20.8 0 0 0 5.2
Index
Zero alpha 66.9 55.7 91.9 92.5 90.6 79.5 57.5 71.4 56.5 95 70.1
Skilled 0 0 6.8 0 0 1.4 21.3 26.2 43.5 0 22.7
Unskilled 33.1 44.3 1.4 7.5 9.4 19.1 21.3 2.4 0 5 7.2

Table 4.21: For equity the ve-factors benchmark model including the regional model of
the Fama and French homepage for MKT, SMB, HML and WML and AQR homepage for
BAB. For the xed income benchmark model the four factors are 'shift', 'twist', 'buttery'
and the spread of the BBB to the AAA credit spread from MSCI. The Morningstar
database from Dec 1991 to Dec 2016 includes 61,269 funds (Source: Leippold and Ruegg
[2018]).

ample the percentage of skilled active institutional funds with 3.5% compared to 1.4%
and 1.9% for skilled single mutual funds. For the active and index mutual funds only
managers in Europe and Japan have skills. For xed income funds the number of zero
alpha funds is lower. The highest skills are observed in the US and Euro market.

Figure 4.33 represents the hall of fame of successful investors which prove to out-
perform the S&P500 for at least more than 10 years The only persistent quantitatively
managed investments from Renaissance is based on top secrecy about the used methods
4.6. BACKTESTS 409

and the hiring of top scientists from the natural and IT sciences which apply algorithms.
Only one money manager of the alternative investment group is listed in the hall of fame.
Furthermore, it is notable that the macro investors dominate the fundamental investors
which cannot be grouped to the Buet/Graham school. Finally, the appearance of Lord
Keynes shows that it was possible to successfully outperform the US markets in days
where technology was in a state of infancy but instead relying on deep understanding of
the macro economy.

Figure 4.33: Hall of Fame of investors (gurufocs, Hens and FuW [2014]).
410 CHAPTER 4. PORTFOLIO CONSTRUCTION
Chapter 5

Asset Management Innovation

5.1 Big Data
5.1.1 Denitions
Big data means a business process, see Figure 5.1. The goal is to answer business ques-
tions by using a large amount of dierent types of data as input and algorithms extracting
information on the data to produce an output needed to answer the business questions.
1
From a volume perspective the data volume range is between peta- to zetabytes (1 mio.
petabytes)
2 Data consist of structured and, unstructured one. The growth rate of un-
structured data make them encompass structured data. Data are almost fully digital
contrary to the situation 30 years ago where data where analogue. Many algorithms
from machine learning or AI are often open-source. But clearly, Google will not make
public its latest algorithms which are in use. Since storage capacities are almost unlim-
ited and cheap and computational speed is still increasing, today the handling of massive
data for business purposes is possible and protable.

The process in Figure 5.27 can be split into two steps. Raw data are transformed into
model variables such as averages, aggregates, conditioning of the raw data. The raw data
is complex, huge, structured and unstructured. The second step is to generate outputs
using algorithms. Pre-processing the data such that they can used by the algorithm
requires much more time than using the alorithms afterwards. In the last years many tools
and software packages were designed to master the complexity of data pre-processing.
The data are not only available in dierent formats, they are also not complete, have
dierent integrity properties, are intermittently exible and are only partially digitized.
Thanks to the innovation in data pre-processing, analytics using algorithms are in the
focus.

1 The sources in this section are Lin (2015), Roncalli (2014), McKinsey Global Institute (2011, 2013),
Varian (2013), Hastie et al. (2009), Harvey et al. (2014), Novy-Marx (2014), Bruder et al. (2011), Freire
(2015), Fastrich et al. (2015), Zou (2006), DeMiguel et al. (2009), Belloni et al. (2012), Burges (1998),
Smola and Schölkopf (2004), Jaakokla (2006).
2 1 peta means 1015 or 1.000 trillions.

411
412 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Internal

Structured
Develop/
Prediction Retain/
Acquire
Semi-Structured Visualization
Unstructured Costumers

Clustering Optimize
S
F 1 Peta F Pricing
Analytics
External
Improve
Learning Products
Structured

Algorithms Marketing

Unstructured

Figure 5.1: Denition of big data adapted from Roncalli [2014]. The economic value of
the big data process starts at the end with a clear business perspective.

5.1.2 Demand for Big Data

In the context of customers, big data should help to

1. develop customers,

2. retain customers,

3. acquire customers.

Internal goals of data analytics are:

1. optimize distribution, i.e. the relationship managers' performance and potential,

2. optimize marketing and branding,

3. analyze competitors and pricing,

4. digitize all documentation work such as trade conrmations, legal contracts, work
ow documentation (Legal - and Reg Tech).

5. manage cyber and fraud security.

The market for big data rose from USD 7.3 bn in 2010 to 130 bn in 2016 and to
USD 189 bn in 2019 (wikibon.org, Forbes, IDC). The revenues for the providers of large
data are distributed in large data hardware, software and service revenues. Large IT
5.1. BIG DATA 413

companies like IBM, HP or Dell dominate in absolute revenues. But the share of total
sales in these companies is still at a low single-digit percentage. New companies with
large big data revenues are Palantir and Pivotal.
Articial intelligence (AI), machine learning and deep learning are dierent concepts,
see Figure 5.2.
3 Originally, the goal of articial intelligence (AI) was to model the

AI
Big Data
ML

Deep Learning
Supervised (Classifikation)
Unsupervised (Patterns)
Reinforcement

AI vs ML (Un)Supervised, Reinforcement, Deep

• ML subset of AI Learning
• ML = selflearning, algorithmic • Supervised: Classification, testsets
identification of pattern and objects in with known results
large data sets • Unsupervised: Estimation and pattern
• ML-Applications: Image/voice/ recognition (outliers, anomalies)
speech/text recognition, fraud • Reinforcement: Mix between above
detection, predictions two ML types
• AI different from ML: Use and • Deep Learning: Neuronale nets,
application of deductive and inductive hidden layers (i.e. multistage image
logic recognition)

Figure 5.2: Scheme of algorithms and big data.

brain as an articical neural network. Some research still heads in this direction. But
many apply AI as a tool today to solve problems in engineering, nance, economics,
marketing, etc. Machine learning (ML) is a narrower concept. ML, a statistical theory,
extends well-known methods such as linear regression to situations where the data set
is enormous or where the linearity assumption is not suitable. While econometrics is
based on causal inference, ML is not. ML is based on prediction and categorization
using optimization. A learner or algorithm detects characteristics on a training set such
as typical words in email spamming and applies the insight to new emails. Being a
probabilistic theory there can be errors. In the task to classify emails into spam and
non-spam ones, the word 'casino' is labelled as a spamming indicator. But the word can
also appear in a non-spamming email. While human learners can rely on common sense
to lter the meaning of such a word, a machine learner needs well-dened principles in
order of not reaching useless conclusions. Basic is the incorporation of prior knowledge
or a hypothesis that biases the learning mechanism; the inductive bias. Evidently, there
is a trade-o between too restrictive and too broad biases.

3 AI industry start-ups rose from $ 282 million in 2011, to $2.4 billion in 2015 (WEF (2017) and the
number of merger and acquisition deals in AI also raised to 20 to 40 deals per annum.
414 CHAPTER 5. ASSET MANAGEMENT INNOVATION

5.2 Machine Learning (ML)

Supervised machine learning means learning from examples (training set) which are
shown to the machine. The machine tries to t optimally the parameters in the hy-
pothesis such that it can then be successfully applied to new, unseen examples. ML is
similar to a child which learns what a car is by showing her examples of cars. After
learning, the child can decide whether a new object is a car or not. But learning does not
mean how a car functions or how a car is driven. The goal is not to generate any kind
of intelligent behaviour but to discover rules, tasks or mechanisms which can be learned
by a computer.

If human tells the machine's algorithm what is correct answer on a training set a
supervised learning problem is considered - the teacher's case. If the values of the
output are not known, unsupervised learning is used. It means to nd structures
or meaningful groups on the inputs. This arises in consumer behaviour where the algo-
rithm tries for example to pool customers with similar behaviour.This section is based
on Luxburg and Schölkopf (2008), Shalev-Schwartz (2016), Hazan (2016), Bruna (2018).

5.2.1 Set-Up
X is the set of examples or instances such as pictures of animals where the goal is to
classify them into cats and non-cats. Every x ∈ X has features such as four legs, two
ears. Y is the label space such as the binary set +1, −1 (cat, no cat). The data set S
consists of all pairs of instances and labels,

S = {(x1 , y1 ), . . . (xm , ym )} ⊂ X × Y.

The data are randomly split into labelled training data, test data with hidden
labels and the validation data used for parameter tuning.

X, Y, S are the inputs in the statistical learning model (ML). The output is a pre-
diction rule or hypotheses f : X → Y where f ∈ F. F is a space of functions such
as linear, polynomials or more general functions or a set of rectangles, circles. F the
hypothesis class. Given a training set and a set F, the goal is to nd the optimal
parameters θ for the function f (θ) ∈ F such that the classier is able to classify well all
new data of the test set.
The following assumption describes the mechanism which generates the data: There
exists a joint probability function P on X ×Y, each training example (xi , yi ) is sampled
IID from P and Y is given by some unknown function h:X →Y.

Note that P is not known. Else learning becomes trivial. By denition, P is not
changing over time. This stationarity of the unknown distribution is relaxed if nancial
time series are forecasted. We write |F | for the power of a set F; i.e. the number of
elements. If F is the set of classiers from X with m examples into a yes/no classier
5.2. MACHINE LEARNING (ML) 415

set, then |F | = 2m .

5.2.1.1 Classifying Apples

We want to classify apples into sweet and sour ones and assume that two features weight
(g) and diameter (mm) matter for the classication: X is a two-dimensional lattice with
spacing 1mm and 1g , respectively, and Y = ±1 according to a sweet or sour outcome.
For each date t = 1, 2, . . ., an apple xt is randomly presented. The learner predicts
ŷt ∈ Y . The environment afterwards reveals the true label yt ∈ Y . This learning set-
ting is called on-line learning since the learner receives one sample (apple) at a time
and makes a prediction for this sample. A dierent setting is batch learning where the
learner receives the full training sample for prediction. The goal of the learner is make
as few mistakes in the classication of the randomly chosen apples.

Suppose rst, that there are only a nite number of apples. An algorithm could mem-
orize all labels in the training set. But this is not what we would call learning. Assume
that the number of apples is not bounded. Without any a priory knowledge (hypothesis)
from a human the learner might always err. Without any a priori hypotheses, learning
cannot be dened.

We provide the learner with more knowledge and assume that the environment pro-
duces the labels of the apples by applying an unknown function h : X → Y, h ∈ F :
There is a functional relationship between apples' features and their taste sweet or sour.
But this is still too little information since F is too big. It can contain any polynomial
functions, any trigonometric functions, any geometric gure classiers or any stochastic
process for example. By assumption, F is a nite set of rectangles which are aligned to
the axis'. We restrict the set F of classiers to be rectangles. This is a simple type of a
priori knowledge.

We furthermore assume that the largest rectangle has size 200 g and 100 mm. This
turns the learning problem into a nite one - there are only a nite number of possible
rectangles as classiers. The prediction rule f is f (x) = 1 if x is element of the interior
of a rectangle and else the value is −1. The learner knows F but not h. Figure 5.3
illustrates the construction.
The size |F | of rectangles is bounded but still a large number. A rectangle of size m
m
n
times n contains
2 2 = 41 m(m − 1)n(n − 1) rectangles. Hence,

200 100
|F | = 2 ≤ 200 mn
2 2

where the multiplication by 2 represents that apples can be sweet or sour.

Given the set F of rectangles does an unknown rectangle h exist which perfectly
classies the sweet? Assuming its existence is called realizability. In the right panel in
416 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Maximum Rectangle Maximum Rectangle

Optimal Rectangle
Optimal Classifier
200g 200g

weight
weight

diameter 100mm diameter 100mm

Non Optimal Rectangles Non Optimal Rectangles

Figure 5.3: Left panel. Optimal rectangle classier. Blue denotes sour and red sweet
apples. Right panel. There exists no optimal rectangle classier. The optimal region,
yellow, is a complex domain which classies correctly the shown apples but it will hardly
correctly classify additional apples. This corresponds to overclassication (similar to
overtting) which means that the optimal algorithm shown will poorly generalize to
further not yet classied apples.

Figure 5.3 realizability does not holds: Realizability simplies the theory. This assump-
tion can is waived by using so-called agnostic learning.

Given assumptions, how could we dene learning? We consider:

1. Consistent learner. He starts with the set F1 = F of all rectangles at date 1. At

each future date t given an apple xt , he picks a rectangle f ∈ Ft and predicts
f (xt ) = ŷt . If ŷt 6= yt , Ft is updated to Ft+1 by removing f . Else the next apple is
classied. This algorithm rules out more and more of the initial set of rectangles.

2. Halving Learner. He behaves as the consistent learner, except that he predicts the
majority of f (xt ) where f ∈ Ft . Hence, a time t the learner errs if at least half of
the function in Ft will not be in Vt+1 .

Theorem 85. The consistent learner makes at most |F | − 1 errors; the halving learner
log2 |F |.

The proof is simple. Suppose that f ∈ Ft leads to an error, i.e. f ∈

/ Ft+1 . Hence,
|Ft+1 | ≤ |Ft | − 1. By induction,

|Ft+1 | ≤ |Ft | − 1 ≤ |Ft−1 | − 2 ≤ . . . ≤ |F1 | − t.

For the halving learner, the results follows by induction on |Ft+1 | ≤ |Ft |/2 since for any
error, half of the functions in Ft will not be in Ft+1 . Although the halving learner makes
dramatically less errors, the runtime of halving grows with |F |. Therefore, the algorithm
is ruled out from an computation eciency perspective. For 200 million rectancgles, the
5.2. MACHINE LEARNING (ML) 417

consistent learner can make at most 200 million -1 errors while the halving learner makes
at most 27 errors:
log2 (2000 0000 000) ∼ 27
How well does the algorithm f perform? The error function or true risk measures
the performance given a perfect classier h and the unknown P

R(f ) = P (f (x) 6= h(x)) . (5.1)

The identity P (A) = E(χA )) shows for a set A that risk is an expected value. Therefore,

R(f ) = P (f (x) 6= h(x)) = E(χf (x)6=h(x) ) = E(l(x, y)) (5.2)

with l the loss function. The error or risk can be equivalently expressed as the
expected loss. Since P is not known, this risk is purely theoretical. To compare risk with
the best possible learning rule, we dene the minimum risk value, Bayes risk,

R∗ = inf R(f ) .
f ∈F

For a binary classication, the classier leading to minimum risk can be explicitly calcu-
lated.

Theorem 86. Let F to be the set f : X → Y = {−1, 1} of all possible measurable

function Fall . Then
(
+1 , if P (Y = 1|X = x) ≥ 1
fBayes := 2
−1 , else

denes the Bayes classier.

Although Bayes risk is smaller than for any other chosen classier f, since we don't
know P, we also cannot compute the Bayes classier. But this classier serves as a
benchmark in theory.

As a rst application we show that it is impossible to nd f such that R is zero.

Consider X = {x1 , x2 } with P ({x1 }) = 1 − , P ({x2 }) = , 0 < < 1, and m IID
samples: one almost sure and one almost uncertain feature. The probability that x2 is
not seen among all samples is (1 − )
m ∼ e−m for m large.4 If is much smaller than

1/m, then the probability of not seeing x2 tends to one. Therefore, we are satised if

R(f ) ≤

with the accuracy chosen a priori by the human. There is a second problem arising
from the randomness of the input data. The probability that the learner observes the
same example over and over again is not zero: R(f ) ≤ cannot be guaranteed by any

4 (1 − )m = em ln(1−) ∼ em(ln(1)−ln0 (1)) .

418 CHAPTER 5. ASSET MANAGEMENT INNOVATION

algorithm. We allow the algorithm to fail with a chosen condence probability δ over the
random choice of examples. Summarizing, the learner asks for training data S containing
m(, δ) examples. This denes Probably (with probability at least 1 − δ) Approximately
(up to accuracy ) Correct learning - PAC learning..

Denition 87. m(, δ) is the sample complexity function.

This function does not depend on P and f.
Denition 88 (PAC Learning). A set F is statistically learnable if for all , δ > 0 exist a
sample complexity function m(, δ) = |S| and an algorithm that produces f with R(f ) <
with probability 1 − δ and

1 1
m(, δ) = Poly( , ln , ln |F |)
δ
with Poly representing polynomial growth. F is PAC-learnable if the runtime of the
algorithm is polynomial in S .
Learnability assumes that number of samples required for generalization depends log-
arithmically on the size of F and that it increases with increasing accuracy , δ . Hence,
the growth rate of the size F is very slow.

Let's apply the theorem to the apple classication. |F | has almost 200 million ele-
ments. It follows - thanks to the logarithm - that about 140 500 of training examples are
sucient to learn with a desired precision of one percent. The theorem shows that a
clear requirement for the precision of the learning algorithm must be set. Then the math
says how many data, here apples, are needed to learn satisfactorily: With 14'500 apples
the algorithm can determine with 1% error and 99% certainty whether an apple is sweet
or sour.

There is one piece missing in the discussion: How do we dene a risk or error measure
which can be observed? We ll this gap in the next section.

5.2.2 Linear Regression Analysis

5.2.2.1 Set-Up
Suppose that you want to predict house prices in Napoli, see Figure 5.4. We consider
only one feature the size of houses in square meters, and the prices of the houses in
thousands of euros as output. Suppose that you want to sell a house of say 200 square
meters. How much do you can get for the house? The learning algorithm could try to t
a straight line through the data, i.e. the linearly regressed prices

ŷ = θ0 + θ1 x

with x the feature size, ŷ the predicted house price, θ0 the unknown intercept and θ1
the slope of the straight line. The algorithm predicts that you could sell the house for
5.2. MACHINE LEARNING (ML) 419

linear
House Price Euro

300’000 quadratic
x
x x x
x x
250’000 x x
x x x
200’000 x
x
x x
150’000 x

100’000

Size House Square Meters

50 100 150 200

Figure 5.4: House price in Napoli as a function of the feature size.

300'000 euros. This is a too high price as the gure shows. The algorithm could maybe
do better by tting the parabola, i.e. a second-order polynomial:

ŷ = θ0 + θ1 x + θ2 x2 .

Then the house price prediction is around euros 250'000 which is closer to true prices.
Which function ts best observable house prices? Whichever function is chosen it is a
supervised learning algorithm. Our choice of the function is the hypothesis. This is a
regression problem since we want to predict a continuous valued output.

House pricing in reality uses more than 20 features - age of the house, centrality, view,
standard of construction, distance to the next public transport station etc. Assume that
in the house price prediction in Napoli additional features are the number of rooms,
the age of the house and the number of oors. Using many features, vector and matrix
notation simplies the presentation and the understanding what is going on.

5.2.2.2 Linear Regression

We write x = (x1 , . . . , xn )0 for the n-dimensional column vector n features with
of
x0 the transpose vector. The labels are the true house prices denoted y and ŷ is the
predicted house price. A hypothesis f : X → Y can be a linear, quadratic functions,
(k)
exponential or any other function. The number of training data equals m . xj denotes
feature j in the training data set k. Considering the six data sets and the rst two
420 CHAPTER 5. ASSET MANAGEMENT INNOVATION

features in the exercise below, the design matrix and its transpose read:

 
1 5 1
1 3 2  

1
 1 1 1 1 1 1
2 2
, X0 = 5 3 2 4 5 1 

X=
1 4 1

1
 1 2 2 1 3 1.
5 3
1 4 1.

m × (n + 1) design matrix X
(k)
The consists of all entries xj plus a rst column of
(k)
1 elements for feature parameter in front of the feature x0 . This matrix contains all
features' information across all test data.

Features are often of dierent numerical size as in the house pricing example: 300
meters squared size of the house, 2 bedrooms, 3 km distance to the next railway station,
etc. The features need to be comparable for two reasons. First, if an algorithm calcu-
lates the nearness of data, then 300-200 and 3-2 are the same on a relative scale but
in absolute term the rst dierence dominates the second one. An example are Netix
recommendations which compares the nearness or distance of your movie preferences to
other Netix users. The range of all features should be normalized such that each feature
contributes approximately proportionately to the prediction. A second reasons is much
faster convergence of the gradient descent algorithm to nd optimal parameter values if
data are normalized, see below.

The scaling of features can be done in many dierent forms. Either all features are
normalized such that the features take values in [−1, 1] by dividing trough the largest
number. Another method rst de-trends the features by subtracting the mean value µi
of the features and then divides by the standard deviation σi , i.e.

x i − µi
x̃i = . (5.3)
σi

This normalization is often used since powerful convergence theorems of probability the-
ory apply.

5.2.2.3 Learning
We search for an algorithm that chooses the parameter vector θ in linear regression such
that the predicted house prices deviate as few as possible from the true values. Learning
here means to nd the optimal parameters. Hence, ŷ − y should be small. Since we do
not want positive deviations to oset negative ones, J = (ŷ − y)2 is minimized. This is a
quadratic function and convex if we use linear
0
regression ŷ = θ x. The dierence should
be minimal over all training data sets.
5.2. MACHINE LEARNING (ML) 421

The following quadratic objective function J is used to measure the average degree
of deviations which is also called the empirical cost - , risk - or error function on
the training set:

 2
m m n
1 X 1 X X (k) (k)

J(θ) = (hθ (x(k) ) − y (k) )2 =  xj θj − yj  . (5.4)
2m 2m
k=1 k=1 j=1

Contrary to the discussion of last section this function is perfectly observable on the
training data. This is an empirical expression since we do not introduce any probabilistic
model in the denition and it can be calculated. In matrix notation:

1
J(θ) = (Xθ − y)0 (Xθ − y). (5.5)
2m
Note that

 (1) (1)     (1) (1) 

1 x1 . . . xn θ0 θ 0 + x 1 θ1 + . . . + x n θn
Xθ =  ... . .
. .. .
  ..   . . . .
. . .. .
 .  =  .
 
. . . . . .
(k) (k) θn (k) (k)
1 x1 . . . xn θ0 + x1 θ1 + . . . + xn θn

L(h, y) = (fθ (x)−y)2 is the loss function. If Xθ = y , on all training data the prediction
is perfect, then the cost or error function J is zero. The corresponding function on the
test set with N data points is

N
1 X
J(θ) = L(fθ (x(k) ), y (k) ) (5.6)
2N
k=m+1

Figure 5.5 illustrates the levels curves where the cost function has constant value with
two variables. The levels are ellipses in our cases. Starting from an initial point x0 in
the center is the minimum cost point x∗ . The closer the level lines are the steeper is the
corresponding cost function - think about the level curves as altitude curves on a map.
The derivative perpendicular to the level curves, called the gradient, is larger the closer
the level curves are. We show that the gradient descent algorithm converges faster to the
optimum value the steeper the level curves are.

The minimization of cost or error means to nd the right critical point. Critical
points are by denition points where the rst derivative vanishes. This is necessary for a
point to be say a minimum. But it is not sucient. Figure 5.6 shows the dierent types
of critical points. By denition at a critical point the rst derivative f 0 (x) equals zero
- at slope of the tangent. Besides maxima and minima, saddle points are the types of
critical points.
If the function has global minimum or maximum, we are in the best situation since
an algorithm to nd this point can be designed. But if a saddle point arises we are
422 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Feature 2

Xq=y

Steep Surface
Region

Feature 1

Figure 5.5: Level curves and gradients in two dimensions.

Types of Critical Points

Global minimum Global maximum Saddle point

Local minimum Global minimum Plateau

Figure 5.6: Three types of critical points and the diculty to nd the global minimum
if a functions has several critical points.
5.2. MACHINE LEARNING (ML) 423

facing problems in particular with algorithms since in some directions the algorithm is
slipping away from the saddle point and in other ones the algorithm bounces back after
an iteration step. Whether a critical point is say a global minimum depends not only
locally on the vanishing derivative but one the function as a whole. In optimization
theory for so-called convex functions, such as our empirical risk error, lead us in the best
possible solutions: A local minimum is a global minimum. But unfortunately in deep
learning models the optimization criterion is not convex and the optimization becomes
very intricate and many questions are left open for future research.

A second example of diculties arises for curves with several minima. starting on the
left in Figure 5.6 we are likely to end up in the rst local minimum with the algorithm -
it will not nd the global minimum. Starting on the right side of the graph we encounter
the problem that the function can be very at in a certain region. Then convergence
can become very slow. So it is not enough in optimization to assume that the algorithm
solve the rst order condition. We need to consider the second order derivatives of the
function too, the Hessian, in order to control for saddle points for example.

5.2.2.4 Gradient
The gradient ∇ is a dierential operator: It acts on smooth functions f with n variables
and taking values in the real leading to:

0
∂f (x) ∂f (x) ∂f (x)
∇f (x) = , ,..., .
∂x1 ∂x2 ∂xn

Note that by denition the gradient vector measures the changes of the function at a
point in the direction of the standard basis. The gradient denotes the direction of greatest
change of the function f - that's why it is so important in the gradient descent algorithm
to nd minima.

Why is the gradient pointing in the direction of greatest change of the function? By
denition, the gradient measures how fast your function is changing with respect to the
standard basis. Let us compare this with the change of the function in an arbitrary
direction v, where v is a unit vector. The projection of the gradient at a point x on this
vector v 0
is given by the scalar product (∇f (x)) v . Calculus implies

(∇f (x))0 v = |∇f (x)||v|| cos(α)|

with α the angle between the gradient and v. Since |v| = 1, the expression is maximal
when the cosine is one, i.e. when v is pointing in the same direction as the gradient.

Consider the function f (x, y) = 4x2 + y 2 . The level curves are ellipses, see Figure
5.7. The gradient vector reads ∇f = (8x, 2y). The gradient is normal to the level curve
through (x, y); it points in the direction of greatest rate of increase of f (x, y).
The gradient is a linear operator, it satises the product and chain rule.
424 CHAPTER 5. ASSET MANAGEMENT INNOVATION

∇f(x,y)

−𝛁f(x,y) ∇f(x,y)

∇f(x,y)

Figure 5.7: Level curves of f (x, y) = 4x2 + y 2 and the normal gradient vector shown at
dierent points.

Figure 5.8 is a snapshot of an app from the website mathinsight.org which interac-
tively illustrates how the gradient and a directional derivative evolve in a relief.

5.2.2.5 Gradient Descent

Since analytical calculations of minima are often not possible for more general cost func-
tions than quadratic ones or they become cumbersome if we face large scale system, we
need algorithms to nd minima of functions.

Given a function f and any point x, we would like to nd the minimum as fast as
possible with the algorithm. Hence we consider at the point x the direction where the
function is steepest. Since there are many possible directions where we could move let v
be a unit vector. The directional derivative tells us how strongly the function is changing
locally in the given direction and we then choose the direction with largest change - we
know that this is the negative gradient. The directional derivative is the derivative of
the function:
∂f (x + αv)
|α=0 = v 0 ∇f (x) = |v||∇f (x)| cos β.
∂α
As we know, to minimize f means to choose v in the direction of the gradient. This is
known as the gradient descent method. It reads in our cost minimization problem as
follow: For every parameter component θj in step n:

∂J(θn )
θj,n+1 = θj,n − α
∂θj,n
5.2. MACHINE LEARNING (ML) 425

Figure 5.8: The function is shown as surface plot and a two-dimensional level curve
plot. The red point can be moved where the gradient (red vector) and the directional
derivative (green vector) are to be calculated. is illustrated by the red vector emanating
from the red point as well as by its shadow below the surface plot. The angle between
the gradient and the directional derivative can be chosen interactively.

with the learning rate parameter α. If the derivative is large in absolute value, then the
parameter is updated by a large amount: We are far away from the minimum of the cost
function and in a steep region of function. If the derivative is zero, we are at the optimum
cost level. In optimization, a descent direction is a vector p that moves us closer towards
a local minimum. Formally, in the iterate n of a calculation of a minimum a descent
direction vector pk is dened by

p0k ∇f (xk ) < 0.

This guarantees that for small steps along the direction pk the function f is reduced. In
the gradient descent algorithm the descent vector is itself the gradient: pk = ∇f (xk ).
Given a descent direction, the line search algorithm computes a step size or the learning
rate α that determines how far the algorithm should move in the descent direction, see
below.

Analytically, the partial derivative of the cost function for the gradient descent reads:

m
1 X
θj,n+1 = θj,n − α (fθn (x(k) ) − y (k) )xxj , j = 1, . . . , n.
m
k=1

The gradient descent needs to choose a learning rate α and one needs possibly many
iterations to come close to the minimum value. In the analytic approach we don't need
426 CHAPTER 5. ASSET MANAGEMENT INNOVATION

α and also don't nee to iterate at all.

How do we make sure that the gradient descent is working correctly and how is the
learning rate α chosen? A powerful method is to plot the cost functions against the
number of iterations by choosing a value for α. This is not a mathematical approach but
a practitioners one which is widely used. We consider below a mathematical approach.
If the chart looks like in Figure 5.9, then you are on the right track.

Convergence Overshooting

Minimum cost function Cost function

# iterations # iterations

Figure 5.9: Plot of the cost function against the number of iterations. Left Panel:
Convergence. Right Panel: Overshooting when the learning parameter is for example
chosen too large.

After each iteration you always get a value of θ. Insert the value in the cost function
and calculate the costs. If gradient works properly, the cost function should decrease
after every iteration. When do we stop, i.e. when we say that the algorithm empirically
(not mathematically!) converged? Dene a small number say 1/3000. If the values
of the cost functions show changes by increasing the iteration number which are smaller
than this number, then you stop the algorithm.

What if the cost function is growing as a function of iterations, see right panel in
Figure 5.9? Then the chosen α was too large. The gure illustrates how overshooting
can then arise leading to a non-convergence. What if takes extremely long to reach con-
vergence? Then the chosen α is too small.

For dierent applications, the curvature of the cost function and the chosen learning
rate are dierent. This leads to applications with convergence after just a few dozens
of iterations up to cases where thousands and even millions of iterations are needed.
5.2. MACHINE LEARNING (ML) 427

Figure 5.10 illustrates level curves where gradient descent is slowly converging due to
the very at plateau. We consider more advanced algorithms to overcome such types of
shortcomings.

Figure 5.10: Level curves with a slowly converging gradient descent. Source: Sven Leyder,
Argonne National Laboratory, 2016.

We consider the mathematical approach of the analysis whether and how fast an
algorithm converges. So far we considered the rst order derivative. This lead to critical
points which can be maxima, minima or saddle points. To decide which type of critical
point we face we have to consider the second order approximation of the function, i.e. the
second order derivative. The second derivative tells us whether a gradient step will cause
as much of an improvement as we would expect based on the gradient alone. While the
rst derivatives measure the slope of the tangent, the second derivative measures curva-
ture.

Consider the quadratic function f (x) = 1 − x2 . Then f 0 (x) = 2x = 0 means that

x=0 00
is the critical point. Since f (x) = 2 > 0, we know from maths that the critical
point is a local minimum because moving away from the critical point, the change of the
gradient is positive.

Consider f (x, y) = x2 − y 2 , the saddle function, see Figure 5.11.

Then ∇f = (2x, −2y) = 0 if (x, y) = 0. The second derivatives form the Hessian
matrix
∂2f ∂2f
!
∂x∂x ∂x∂y 2 0
H = ∂2f ∂2f = .
0 −2
∂x∂x ∂y∂y

The second derivative has mixed signs. It fails to be positive or negative denite which
428 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Figure 5.11: Saddle surface for Newton's algorithm.

implies that the critical point is a saddle point but not a maximum or minimum. This
mixed signs means in this case that the critical point is a saddle point. Note that we
always assume that the order of partial derivatives does not matter. This means that
the Hessian matrix is assumed to be a symmetric matrix. This assumption is not critical
for most application in machine learning.

To see how the Hessian aects the gradient descent algorithm, we make a second-order
Taylor series approximation to the function f (x) around the current point x0 :
1
f (x) = f (x0 ) + (x − x0 )0 ∇f (x0 ) + (x − x0 )0 H(x0 )(x − x0 ) + error.
2
With the learning rate alpha, inserting the gradient descent rule x = x0 − α∇f (x0 ) into
the Taylor series implies:

1
f (x0 − α∇f (x0 )) = f (x0 ) − α(∇f (x0 ))0 ∇f (x0 ) + α2 (∇f (x0 ))0 H(x0 )∇f (x0 ) + error.
2
There are three terms: the original value of the function, the expected improvement
due to the slope of the function, and the correction for the curvature of the function.
When this last term is too large, the gradient descent step can actually move uphill and
the algorithm will not converge. When the curvature term is negative or zero, the Taylor
series approximation predicts that increasing the learning rate will forever will decrease
the function value to be minimized. When the curvature term is positive, the optimal
learning rate or step size that decreases the Taylor series approximation is given by (take
the derivative w.r.t the learning rate and set the expression equal to zero)

(∇f (x0 ))0 ∇f (x0 )

α∗ = .
(∇f (x0 ))0 H(x0 )∇f (x0 )
5.2. MACHINE LEARNING (ML) 429

If (!) the function is well approximated by the second order polynomial, then the Hessian
or curvatures determines the scale of the learning rate.

5.2.2.6 Analytic Solution

Let us consider the analytical solution of the cost minimization

m
1 X
J(θ) = (fθ (x(k) ) − y (k) )2 → min! (5.7)
2m
k=1

The rst order condition for a minimum is the following system of equations:

∇θ J(θ) = 0.

Doing the calculus,

m m
1 X 2 X
∇θ J(θ) = ∇θ (fθ (x(k) ) − y (k) )2 = ∇θ (fθ (x(k) ) − y (k) )
2m 2m
k=1 k=1

continuing
m
1 X (k)
∇θ J(θ) = (θx − y (k) )x(k) = 0.
m
k=1

Written in matrix notation,

Xθ = y
is the equation for optimal theta values. But the inverse of X does not exist - the number
of data points m and the number of features n + 1 are not the same and hence the matrix
is not a square one which is a necessary condition for its inversion. But multiply the
equation with the transpose matrix X0 from the left:

X0 Xθ = X0 y.

Linear algebra tells us that the (n + 1) × (n + 1) matrix X0 X is invertible if the columns

are linearly independent. Then,

θ∗ = (X0 X)−1 X0 y

is the analytical expression for the optimal parameters. Fine, so why bother about the
gradient descent algorithm if we just need to plug in the values in the above formula?
The matrix (X0 X)−1 is of dimension (n + 1) × (n + 1). To calculate such inverse matrices
is costly, i.e. the computation time costs grow with n3 . Hence, if you have 10 features
it is order 1'000 but with 1000 features the costs for computation are of the order 10 ,
9

i.e. one billion. For large scale problems the analytical solution becomes slow while the
gradient descent works well also if the number of features is large.
430 CHAPTER 5. ASSET MANAGEMENT INNOVATION

5.2.2.7 Feature Creation

Suppose that you consider a regression with two features x1 , x2 . Then insight from your
√
side can lead to the conclusion that a feature combination x = x1 x2 or x = x1 x2 is
more meaningful than to consider the two original features in the linear model.

Consider a regression of the health status of individuals using dierent features. One
features x1 is the weight in kg and the other one x2 is the body height in meters. Naively,
one could set
ŷ = fθ (x) = θ0 + θ1 x1 + θ2 x2
and estimate θ for the the health status prediction. Insight from medicine learns us that
the above choice is nonsense. It it the ratio x1 /x22 , called the body-mass-index, which is
a meaningful indicator or feature for the health status. Think about why a square of the
body height enters in the index denition.

A similar approach is to move from linear to higher order polynomial regressions if

the data plots suggest that a linear choice would be sub-optimal. Then one can add
quadratic, cubic and higher order powers of single features or products of several fea-
tures. House prices are increasing with increasing size of the house. But the increase
may be not linear. In particular, for large houses any additional square-meter will add
less value than it does for a small house. The tted curves is then increasing but the
increase diminishes for large houses. One could therefore assume that house prices follow
a quadratic function of size. But each quadratic function has a maximum. Then for very
large sizes the house price will decrease. We don't think housing prices should go down
√
at all when the size goes up. Therefore trying x would be a better guess.

Consider the tting of a cubic model

ŷ = fθ (x) = θ0 + θ1 x + θ2 x2 + θs x3 .

Setting x = x1 the size, x2 = x2 the squared size and x3 = x3 the cube of the size
denes a multivariate linear regression. This is trivial but you have to be careful about
feature scaling. Suppose a house size is 100 squared meters. Then, the square is 100 000
0 0
and the cube is 1 000 000. So the ranges of the features become very broad for higher
order polynomials. Features need to be scales to become comparable if you say apply
the gradient descent method.

5.2.3 Logistic Regression

We use logistic regression for classication problems. We consider the case of two classes
with the label taking values 1 or 0. Why is a discrete classication called a logistic re-
gression which in turn takes continues values? There are historical reasons. Don't worry
and take the word regression just literally.
5.2. MACHINE LEARNING (ML) 431

We start to develop a classication algorithm for the binary case of tumor classica-
tion with two states malignant or benign. We rst e rst discuss why linear regression is
not a good idea to consider. Assume that the feature size is the only one used to classify
tumors in benign (0) or malignant ones (1), see Figure 5.12.

Start New Data Point Large Size Tumor

Classification Classification

h(x)=q’x h(x)=q’x

Malign =1 x xx x x xx x x

Threshold
0.5

x x x x x x x x
Benign=0
Size Tumor Size Tumor

Figure 5.12: Attempt to try to classify tumors using linear regression.

Then a tted linear regression as shown in the gure looks reasonable in the Left
Panel when we choose a threshold value of 0.5. But now assume that we have more data
points of tumors with large size. Then the estimated coecients of the linear regression
changes such that the slope of the regression becomes worse in separating the points.
The newly added data are of little information value since large tumors are most likely
malign and adding even larger ones is not very informative for the classication problem.
The gure shows that some large tumor sizes can be classied wrongly due to the x
threshold value and the moving linear regression. Using a straight line misses the intu-
ition that a kind of a threshold size would separate better the data points: A separation
function which looks more like a step function makes more sense.

Choosing a x threshold value using a step function to separate the points can be
meaningful if you consider electrical circuits for example. But in social, nancial or
medical application such a zero-one function might be too severe and lead to errors. A
softer function which approximates the zero-one cuto one is used. Such functions are
called sigmoid functions.
The logistic regression is such a smoothed out step function approach. The range pre-
dictions of logistic regression are always between zero and one. Again, note that although
the word regression appears, the highly non-linear logistic regression is a classication
algorithm applied to settings where the label y is a discrete value.
432 CHAPTER 5. ASSET MANAGEMENT INNOVATION

To start with, instead of the not very useful multivariate linear regression

ŷ = fθ (x) = θ0 x
we consider a non-linear transformation which give us a smoothed-out step function.
This means, we consider a function g for logistic regression or the logistic function:

ŷ = fθ (x) = g(θ0 x)
For x a real number,
1
g(x) =
1 + e−x
takes value in (0, 1) and it is an S-shaped approximation to the step function, see Figure
5.13.

1,2 Sigmoid Functions

0,8
1
1 1 + 𝑒 −𝑥
0,6 1 + 𝑒 −3− 𝑥

0,4

0,2 1
1 + 𝑒 −3𝑥
0
-10
-9,7
-9,4
-9,1
-8,8
-8,5
-8,2
-7,9
-7,6
-7,3

1,1

8,3
-7
-6,7
-6,4
-6,1
-5,8
-5,5
-5,2
-4,9
-4,6
-4,3
-4
-3,7
-3,4
-3,1
-2,8
-2,5
-2,2
-1,9
-1,6
-1,3
-1
-0,7
-0,4
-0,1
0,2
0,5
0,8

1,4
1,7
2
2,3
2,6
2,9
3,2
3,5
3,8
4,1
4,4
4,7
5
5,3
5,6
5,9
6,2
6,5
6,8
7,1
7,4
7,7
8

8,6
8,9
9,2
9,5
9,8

Figure 5.13: Sigmoid function. Multiplying x by 3, i.e. using larger parameters θ makes
the sigmoid function look more like a step function. Adding −3 to the exponent, i.e. θ0
is a changed does not change the shape of the sigmoid function but shifts it parallel to
the x-axis.

The sigmoid function, our hypothesis, asymptotes approaches 1 for x to innity and 0
for x to minus innity. Since the outputs are in the unit interval, they can be interpreted
as probabilities. A prominent example are probability of default in rating systems of
counter parties asking for loans in banking. In this case the features are nancial variables
such as liquidity ratios, earnings per share, investment ratios or qualitative variables such
as the quality of the management, the competitive strength of the rm in its sector:

θ0 x = θ0 + θ1 Liquidity +θ2 Return on Equity + . . . .

| {z } | {z }
x1 x2
5.2. MACHINE LEARNING (ML) 433

The product θ0 x is called the score S of the rating system. This is a real number. This
delivers a ordinal ranking of the counter parties of a bank: The higher the score the lower
the credit worthiness of the client. But this is not yet a price which can be charged to
the clients, i.e. how much do they have to compensate the bank for taking their credit
risk on their balance sheet. This requires to turn the score into a Probability of Default
(PD) on a one-year time horizon using the sigmoid function. A PD of say of 3% is then
mapped via a so-called master scale into a price bucket and a rating: All PDs between
in the interval [2.5%, 3.5%) are mapped into a rating BB and their price is 3.25% per
annum. To arrive at the master scale interval boundaries and the price in each interval,
the parameters θ are tted such that the sigmoid function satises additional business
requirements. First, choosing the thetas, which means calibrating the model should rst
lead to PD, should lead to a credit risk price which covers over a whole portfolio of
counter parties the eective losses in one year. Second, the shape of the sigmoid function
should be chosen such that as many as possible of good risks accept the bank's pric-
ing and do not decide to take an oer from a competitor and vice versa, that there is
no incentive for bad risks to accept the banking oer since it is more favourable than
oers from competitors. This trdeos are handled optimally by considering the shape
and the level (i.e. the parallel shifts) of the sigmoid function appropriately. So clearly,
the classication of clients into the rating classes from AAA, AA+, ... , BB-, D is a
multi-classication problem and not a binary one.

We return to our binary classication. An output of say 0, 2 means a probability of

20 percent on a one-year time horizon. Although the output is a real number, the value
makes reference to the two states of the binary classication, one state being default and
the other one risk-free. This probability that a counter-party will default (state 1) given
the features x parametrized by θ reads

1
P (y = 1|x, θ)) = 0.2 ∼ .
1 + e−1.4

The logistic regression hypothesis gives an estimates of the probability that y is equal to 1.

1
Since g(0) = 2 , to predict y = 1, the argument has to be larger than zero which
than 50 percent. This means that θ x > 0 is the bound-
means the probability is large
0

ary condition, which leads to the claimed predicted value.

Using higher order polynomials, more complex decision boundaries such as circles,
ellipses can be generated.

5.2.3.1 Cost Function

We consider how to to t the parameters of theta for the logistic function. We dene
an optimization program, i.e. the optimization of the cost function will be used to t
the parameters. We have a training set of m training examples which each is n+1
434 CHAPTER 5. ASSET MANAGEMENT INNOVATION

dimensional and every label y is either 0 or 1. We already introduced the cost and loss
function above in (5.7):

m m 2 m
1 X 1 X 1 1 X
J(θ) = (fθ (x(k) )−y (k) )2 =: − y (k) =: Loss(fθ )x
(k) (k)
, y ).
1 + e−θ0 x
(k)
2m 2m 2m
k=1 k=1 k=1

This was the appropriate loss function for linear regression, but for the logistic regression
this function no longer has a single minimum, i.e. it can have many local optima using
the squared dierence function contrary to the linear case where a single global minimum
exists, see Figure 5.14. But this is not a way to follow since algorithms to search for the
minimum costs and hence deliver the estimate of theta will likely to be trapped into local
minima. We would like to have a convex loss function, i.e. with a single minimum.

Non-Concexity of Loss Function

0,45

0,4

0,35

0,3

0,25

0,2

0,15

0,1

0,05

0
-5 -4,8 -4,6 -4,4 -4,2 -4 -3,8 -3,6 -3,4 -3,2 -3 -2,8 -2,6 -2,4 -2,2 -2 -1,8 -1,6 -1,4 -1,2 -1 -0,8 -0,6 -0,4 -0,2 0 0,2 0,4 0,6 0,8 1 1,2 1,4 1,6 1,8 2 2,2 2,4 2,6 2,8 3 3,2 3,4 3,6 3,8 4 4,2 4,4 4,6 4,8 5

2
1
Figure 5.14: Plot of the loss function
1+e−θ0 x
−y . We assumed that for almost all
0
negative values of θ x, y = 0 holds, and similarly, for all positive values. Around the
value zero, we added some errors, i.e. y=1 for values of θ0 x < 0 and similar for y = 0.
These few errors create the erratic behaviour of the loss function around zero. Note that
also for zero error, the loss function is not convex!

Convex functions play an important role in many areas of mathematics and in par-
ticular in optimization problems. A strictly convex function on an open set has no more
than one maximum or minimum. A real-valued function dened on an n-dimensional
interval is called convex if the line segment between any two points on the graph of the
function lies above or on the graph, see Figure 5.15.

If the function is dierentiable, then it is convex if and only if its second derivative is
non-negative on its entire domain. Example of convex functions are x, x2 , ex , |x|. The
5.2. MACHINE LEARNING (ML) 435

Figure 5.15: Convex and non-convex function.

loss function for logistic regression is (omitting the superscripts)

(
− log h(x) if y=1
Loss(fθ )x, y) = .
− log(1 − h(x)) if y=0
Why is this a convex function, i.e. with a single global minimum and why does this
function makes sense as a loss function in the logistic regression case? To start with, for
y=1 the function − log h(x) is monotone decreasing in the relevant domain (0, 1). If h
approaches 1, the loss vanishes and contrary if h tends to zero the loss explodes which
makes sense since in this case the predicted value h(x) and the true value y = 1 disagree.
In the other case, the function is monotone increasing in the domain, i.e. zero losses if
h is close to zero and exploding losses if h approaches 1. Why is the function convex?
Taking twice the derivative a function is convex on its domain if the second derivative
is not negative - calculate this derivative and check that the convexity condition holds
true. Since y can take only two values, the loss function can be written as a single step
function
Loss(fθ )x, y) = −(y log h(x) + (1 − y) log(1 − h(x))
which also follows from maximum likelihood estimation in statistics. Summarizing, with
the above choice of the loss function we get a convex optimization program to t the
parameters.

5.2.3.2 Advanced Algorithms to Minimize the Cost Function

In order to minimize the cost function one could use gradient descent algorithm. We
introduce alternative algorithms in this section.
436 CHAPTER 5. ASSET MANAGEMENT INNOVATION

5.2.3.3 Line Search

We recall that we want to nd a vector x such that it minimizes f (x) without constraints.
Either we can calculate the minimum analytically or nd it by iteration.

One method is to solve in each iteration step n approximately the sub-problem

min f (xn + αpn )

of nding the step length in the direction pn which has the largest impact on the function
f. This reduces the function as strong as possible: xn is the current best guess in step
n, the vector pn is a search direction, and the number α is the step length. Such inexact
line searches provide an ecient way of computing an acceptable step length.

Clearly, pn has to be a direction where f decreases, i.e. a descent direction p0n ∇f (xk ) <
0. Dierent methods of algorithm lead to dierent choices of the descent direction. For
the gradient descent algorithm, pk = −∇f (xk ).

Given a descent direction, the step length αk is assumed to satisfy the Wolfe condi-
tions:
i) f (xk + αk pk ) ≤ f (xk ) + c1 αk p0 k ∇f (xk ),
ii) −p0 k ∇f (xk + αk pk ) ≤ −c2 p0 k ∇f (xk ),
with the constants 0 < c 1 < c 2 < 1. Set c1 ∼ 10−4 and c2 ∼ 0.9 for Newton or quasi-
Newton methods, see next section. Inequality i), the Armijo rule, ensures that the step
length decreases f suciently. The function evaluated in the descent direction is smaller
than the function before moving in this direction plus a term proportional to the descent
direction which is negative. ii) is a curvature condition. It ensures that the slope has
been reduced suciently.

5.2.3.4 Second Order Algorithm

Gradient descent is a rst order algorithm since it requires only the rst derivative. We
consider methods in this section which also use the second method and start with the
most prominent one of Newton-Raphson.
Starting point is the solution of a set of k nonlinear simultaneous equations

fj (x1 , . . . , xn ) = 0, j = 1, 2, . . . , k

which can in general only be found by an iterative process which approximates the solu-
tion. The the Newton-Raphson method uses a second order approximation to f compared
to the gradient descent, which uses only the rst order approximation.

The idea for a single non-linear function f of one variable is to compute the x-intercept
of the tangent line, i.e. the rst order Taylor approximation. We write xn for the current
5.2. MACHINE LEARNING (ML) 437

approximation and derive xn+1 . The equation of the tangent line to the curve y = f (x)
at xn is
f (x) = f 0 (xn )(x − xn ) + f (xn ).

f’( xn )
slope

f( xn )

Root xn+1 xn

Figure 5.16: Newton-Raphson method.

The x-intercept of this line is taken as the next approximation, i.e. solving

0 = f 0 (xn ) (xn+1 − xn ) + f (xn )

gives the iteration

f (xn )
xn+1 = xn − .
f 0 (xn )
Two points where the algorithm runs into problem are immediate: f0 does not exist or is
zero. Clearly, the closer the initial guess the faster is convergence. For zeros of functions
with multiplicity 15 , the number of correct digits roughly doubles in every step (quadratic
convergence). Despite its power and popularity, the algorithm has some weakness. In
higher dimensions, the main dierence is the Jacobian matrix replacing the derivative in
one dimension. In our initial case of k equations with k unknown the k×k Jacobian Df
is the matrix
 ∂f1 ∂f1 
∂x1 ... ∂xk
. .. .
Df (x) =  . . .
 
. . .
∂fk ∂fk
∂x1 ... ∂xk
The Newton-Raphson method in many dimensions then reads:

xn+1 = xn − Df (xn )−1 f (xn ).

5 The equation f (x) = (x − 1)3 has the zero 1 with multiplicity 3.
438 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Improvements replace the inverse Jacobian by an approximation since the exact calcu-
lation grows as O(N 3 ) with the dimension of the Jacobian matrix. The asymptotics is
the same as for matrix multiplication. The latter one follows from the worst case where
N3 multiplication of scalars are needed for a N ×N matrix and (N − 1)N 2 additions for
computing the product of two square matrices which leads to the claimed asymptotic.
We are interested in the method for optimization problems. Hence the root we
Therefore, f is replaced by f and f by f
search is the root of the rst derivative:
0 0 00

in the Newton-Raphson algorithm and similar for higher dimensions. In particular,

the

∂ 2 f (x)
Jacobian matrix Df of rst derivatives is replaced by the Hessian matrix H= ∂xi ∂xj
of all second order derivatives. This means

xn+1 = xn − H −1 (xn )∇f (xn ) (5.8)

is the updating rule for the algorithm. If the function f is positive quadratic, then after
one application of the of the updating rule we reach the minimum. If the function is more
complex than quadratic, then iteration of the updating rule often leads to a faster conver-
gence to the minimum than using gradient descent if the Hessian is positive denite. But
if we encounter saddle points, then the Newton methods can be non-converging. Hence,
in the NewtonRaphson algorithm the descent direction is given by pn = −H −1 ∇f (xn ).

To derive the iteration (5.8), we consider the optimization of a general quadratic

function in several variables which is the case using the descent direction approach ap-
proximation up to second order Hx=-b. in each step n:
1
min f (xn + d) ∼ f (xn ) + ∇g(xn )d + d0 H(xn )d
d 2
with H the Hessian matrix of second derivatives which we assume to be positive denite.
Then, the minimum of f (xn + d) is equivalent to the gradient of this function being zero
which is equivalent to
H(xn )d = −∇g(xn ),
i.e.
d = −H −1 ∇g.
implies the iteration rule:

xn+1 = xn + d = xn − H −1 ∇g.
As an application of the Newton method consider the saddle function f (x, y) = x2 − y 2 ,
see Figure 5.11. The Hessian and its inverse read:

2 0 −1 1/2 0
H= ,H =
0 −2 0 −1/2
and the gradient is ∇f = (2x, −2y)0 . This implies:

xn+1 xn 1/2 0 2xn 0
= − = .
yn+1 yn 0 −1/2 −2yn 0
5.2. MACHINE LEARNING (ML) 439

The Newton method led you to the saddle point at the origin in one step. The
gradient descent method will not lead to the saddle point. The gradient is zero at the
saddle point, but a tiny step out would pull the optimization away.
Clearly, the example is specic in the sense that after one step we found the saddle
point in the algorithm independent where the starting point is. This is particular to this
second order polynomial. Although the Hessian is not positive denite, we have a saddle
point and no minimum, this critical point can be found. This is an atypical situation for
Newton's method to work well. Since Newton works well and often outperforms gradient
descent if the Hessian is positive denite.
So far, we have to calculate at each stage the second derivatives of the Hession. In
Quasi-Newton methods, the Hessian matrix of second derivatives is not computed but
the Hessian matrix is approximated using specic updates given by gradient evaluations.
The BroydenFletcherGoldfarbShanno (BFGS) algorithm is such a method.
The search direction pn at stage n is given by the solution of the analogue of the
Newton equation:

H̃k pn = −∇f (xn ),

where H̃ is an approximation to the Hessian matrix. A line search in the direction
pn is used to nd the next point xn+1 by minimizing

f (xk + αpn )
over the step length α > 0. The quasi-Newton condition imposed on the update of H̃n
is:
H̃n+1 (xn+1 − xn ) = ∇f (xn+1 ) − ∇f (xn).
Instead of requiring the full Hessian matrix at the point f xn+1 to be computed as
H̃n+1 an approximation is used:

yn yn0 Bn sn s0n Bn0

H̃n+1 = H̃n + − ,
yn0 sn s n Bn s n
where
yn = ∇f (xn+1 ) − ∇f (xn ), sn = xn+1 − xn .
This algorithm and other advanced ones have incorporated the line search algorithm that
automatically tries out dierent values for the learning rate alpha and automatically pick
a good learning rate alpha so that it can even pick a dierent learning rate for every
iteration. Furthermore, they often converge much faster than gradient descent.

5.2.3.5 Pros and Cons of Optimization Algorithms

We compare the rst order algorithm, just using the rst order derivative, of gradient
descent with second order algorithms, using rst and second approximations, of Newton-
Raphson and some extensions. One often encounters in machine learning a preference
for rst order algorithms. What are the reasons?
440 CHAPTER 5. ASSET MANAGEMENT INNOVATION

• Neural networks typically fail to dene convex problems. This issue together with
the occurrence of many saddle points in the high dimensional feature space and the
computational burden limit the use Newton's method for training large neural net-
works. See deeplearningbook.org Section 8.6 Approximate Second-Order Methods
for a overview.

• Alternatives such as the quasi-Newtonian stated above have their own issues.

• In machine learning outside of deep learning, BFGS and variants of it are fairly
common optimization algorithm.

• Newton's method, a root nding algorithm, maximizes a function using knowledge

of its second derivative. That can be faster when the second derivative is known and
easy to compute such as in logistic regression where the Newton-Raphson algorithm
is used.

• Often, even in the literature, drawbacks of algorithms are stated which follow due
to the non correct use of the methods and are hence of limited information content.

5.2.4 Overtting and Undertting

5.2.4.1 Introduction
Our training data set St consists of m pairs (xi , yi ). Taking the input features xi we form
a prediction ŷi = f (θ, x). Any deviation of a predicted value from a true value y denes
a loss L(ŷi , yi ). Summing all these losses over the m data points and taking the average
denes observable empirical in-sample error or cost in (5.7). Minimizing the error leads
to the optimal parameters θ∗ in our hypothesis. The same logic applies to the test data
set Se of N data points (xi , yi ).

Assume that the true values are generated by the same hypothesis f but with the
unknown, true parameters θ̂. Then we can write for the vector y of true values

y = f (θ̂, x) +

with the noise = (1 , . . . , m ) a normal distributed variable with mean zero, variance
E(0 ) = 2
σ . Noise is independent of the data set X. Hence, true data are generated by
a signal f and noise.
What happens if we consider a dierent training data set also with m data points?
Then the empirical loss and error functions change: They depend on the features, pa-
rameters and training data sample. Suppose that we consider many dierent training
data sets, each of them drawn at random from a large data pool. We calculate an average
empirical error over all the samples. The expectation is that these average number ap-
proaches the unknown true error in some mathematical sense. This true error is assumed
to be also an average, the expected value of the true loss function with an unknown
5.2. MACHINE LEARNING (ML) 441

probability P, i.e. EP (L(x, y)) is the true error which we dened in the last section.

Consider the optimality condition for the linear regression

θ∗ = (X0 X)−1 X0 y (5.9)

Inserting y = Xθ̂ + with the assumed correct values θ̂ of the parameters implies
6

θ∗ = θ̂ + (X0 X)−1 X0 .
The parameter estimates can be decomposed into the sum of correct underlying param-
eters and estimates based on noise alone. If each set is drawn from a large data set
independent from all other ones and if we assume that we consider many independent
training data sets X, then the average parameter value conditional on the specic data
input X is given by
EP [θ∗ |X] = θ̂
since EP [|X] = 0 and the design matrix is not stochastic. Therefore, our parameter
estimates are unbiased w.r.t. to the unknown true parameters .on average when averaging
is over many training sets. In the same way, see the exercises, the conditional covariance
or variance can be calculated:

∗
cov(θ , θ∗ )|X] = σ 2 [θ∗ |X] = σ 2 (X0 X)−1 . (5.10)

Contrary to the expectation, the covariance is a function of the design matrix and it is
proportional to the unknown variance of the noise
Pn σ2. We introduce the Euclidian norm
||x||2 = 2
j=1 xj of a n dimensional vector. Note that (Xθ − y)0 (Xθ − y) = ||Xθ − y||.

Using the calculated bias and variance we obtain the mean squared error (MSE)
of the parameter estimates for xed true parameters (omitting the X dependency):

E[||θ∗ − θ̂||2 ] = E[||θ∗ − E[θ∗ ] + E[θ∗ ] − θ̂||2 |]

= E[||θ∗ − E(θ∗ )||2 ] + ||E(θ∗ ) − θ̂||2 + 2E[(θ∗ − E(θ∗ ))0 (E(θ∗ ) − θ̂]
= E[||θ∗ − E(θ∗ )||2 ] + ||E(θ∗ ) − θ̂||2
2
= Variance + Bias (5.11)

6
θ∗ = (X0 X)−1 X0 y
= (X0 X)−1 X0 (Xθ̂ + )
= (X0 X)−1 X0 Xθ̂ + (X0 X)−1 X0
=: (X0 X)−1 X0 Xθ̂ + A
= X−1 X0−1 X0 Xθ̂ + A
= X−1 IXθ̂ + A
= Iθ̂ + A
= θ̂ + A.
442 CHAPTER 5. ASSET MANAGEMENT INNOVATION

where we used:

E[(θ∗ − E(θ∗ ))0 (E(θ∗ ) − θ̂] = E 2 [θ∗ ] − θ̂E(θ∗ ) − E 2 [θ∗ ] + θ̂E(θ∗ ) = 0.

Note that in our example the Bias is

P zero. The variance expression can be written using
the trace tr(A) := i aii of a matrix A:

E[||θ∗ − E(θ∗ )||2 ] = trcov(θ∗ ) = σ 2 tr(X0 X)−1 .

We summarize:

σ 2 (n + 1)
E[||θ∗ − θ̂||2 |X] = σ 2 tr(X0 X)−1 ) ∼ (5.12)
m
with n the number of feature and m the number of training set examples. If the number
of training set increases, the mean squared error of the parameter estimates vanishes.
Contrary, for xed number of data sets an increasing number of features increases the
mean squared error of the parameters. Although we dont know the noise variance σ2 for
the correct model it only appears as a multiplicative constant. Hence, it will not aect
how we should choose the features: A natural approach would be to choose them to mini-
mize the above trace. But this makes sense only if the assumed linear relation holds true.

Summarizing, if the data is indeed generated by a linear function, then there is no

bias and the mean squared error on the training data can be made arbitrarily small by
increasing the number of training data sets for a xed number of features. How should
the dierent training data sets be drawn from the set of all data? The assumption is
that they are Independent Identically Distributed (IID) under the unknown probability
P. There is no strategy how we select the training data but at random. Furthermore,
the dierent training data do not depend on each other; there is no such thing as an
autocorrelation between the data sets. Clearly if we consider price time series data of
bond for example the independence assumption does not hold true.

5.2.4.2 Variance-Bias Tradeo

What happen if the true unknown function which generates the data is not a linear func-
tion as in our hypothesis but say a sine function? Then the hypothesis will not t the
data well. A quadratic hypothesis at least locally does a better job, a cubic even a better
one and so on. Hence, by increasing model complexity we are able to achieve better ts
in the training data set to the sine function.

Assuming y = θ̂ sin(x) + in one dimension for the true model and plugging this
in (5.9) it follows that the optimal parameter of the linear regression θ∗ is no longer
unbiased :
EP (θ∗ |X) = θ̂ sin(x).
Besides variance, we also get a bias-squared term in the mean squared error (MSE) of the
parameter in equation (5.11). The next formulae. Let y = f (x) + with f any unknown
5.2. MACHINE LEARNING (ML) 443

function and noise with mean zero and variance σ2. Let h(x) be any hypothesis used to
predict y and X the IID sampled training data set.

We have seen in the former linear case that the MSE is only given by variance but
now with the sine function generating the data and us using a polynomial function in
the regression, a bias term also follows. As in the linear case, the mean squared error
(MSE) between the estimated and true values is of interest. We could follow the strategy
such that MSE gets as small as possible - using many data sets to reduce variance and a
high order polynomial to approximate the sine function well on the test data in order to
also minimize bias. Plotting the MSE as a function of model complexity we indeed get
a decreasing error with increasing model complexity, see Figure 5.17.

Optimal Model Complexity

Error Test
Variance Error Test
Error
Error

Bias

Error Training
Variance Bias

Data Points Model Complexity F

Figure 5.17: Left Panel. Behavior of Training and Test error as a function of amounts
of data. Right Panel: Bias-Variance tradeo and model complexity. Decreasing and
increasing Test error as a function of model complexity and the optimal model complexity.

Hence, it looks like that by taking say a 10th order polynomial and many training
data samples that we achieve the best MSE. This is true for training data but it fails to
be true for the test data MSE. And it is on this set of unseen data where we want the
algorithm to perform well. Why is the test data MSE increasing after a certain level of
complexity, see Figure 5.17?

The answer is given by the bias-variance tradeo of machine learning or equivalently,

the undertting and overtting tradeo. By increasing complexity of the function used for
prediction, we increase the number of parameters which need to be optimally estimated:
A straight line has much less parameter to estimate than a 10th order polynomial which
444 CHAPTER 5. ASSET MANAGEMENT INNOVATION

gives the better t if the training data set is looking complicated since we have much
more parameters to t the complicated data structure. But by choosing many parameters
for tting the training data we are likely to t spurious data pattern which are common
to the training data but fail to be true patterns which also show up in the test data or
unseen data (overtting). The extremely precise training data t will miss to be closed to
the new test data. This failure of the algorithm to generalize well, the random artefacts
of the training data do not reproduce on other data, leads to an increasing error for the
test data if model complexity is becoming too large. The property of a u-shaped test
MSE as a function of model complexity is an intrinsic property of statistical machine
learning models; the bias-variance tradeo. Figure 5.18 shows the dierent cases when
using dierent complexity of the polynomials.

Figure 5.18: The authors perform three ts to the shown training data. The training
data was generated synthetically. the feature was chosen randomly and y was a quadratic
function of the feature with noise. The center plot assuming a quadratic function does
not suer from overtting or undertting. Overtting is due to a ninth order polynomial
which perfectly matches all training data. But the wild graph produces a lot of articial
patterns inconsistent with the synthetic quadratic function. Source: Deep Learning, Ian
Goodfellow, Yoshua Bengio and Aaron Courville, MIT Press, 2016.

Our ultimate goal in machine learning is to try and minimise the expected test MSE,
that is we must choose a statistical machine learning model that simultaneously has low
variance and low bias. In order to estimate the expected test MSE, we can use techniques
such as cross-validation.

Summarizing, bias and variance are two notions in overtting and undertting. Bias
is the dierence between the average prediction of our model and the correct value.
5.2. MACHINE LEARNING (ML) 445

Model with high bias pays very little attention to the training data and oversimplies
the model. It leads to high error on training and test data. Variance is the variability
of model prediction for a given data point or a value which tells us spread of our data.
Model with high variance pays a lot of attention to training data and does not generalize
on the data which it has not seen before. As a result, such models perform very well on
training data but has high error rates on test data.

We conclude with the formal decomposition of the error in bias, variance and ir-
reducible risk for any hypothesis h and true model f. The error Err(x) or cost is by
denition the expected dierence between the predicted and the true value, i.e.

h i 2
:= E (y − h(x))2 |X = BiasX h(x) + VarX h(x) + σ 2

Err(x, X)

where (neglecting the data set):

2
= [E(h(x)) − f (x))]2

Bias h(x)
Var h(x) = E (h(x) − E(h(x)))2 .

Hence, error is the sum of squared Bias, variance and the irreducible error which is a
measure of the amount of noise in our data. Our data will have certain amount of noise
or irreducible error that can not be removed.

The bias-variance tradeo is a particular property of all (supervised) machine learning

models, that enforces a tradeo between how "exible" the model is and how well it
performs on unseen data. This means that we need to nd the right/good balance
without overtting and undertting the data.
To get an impression whether overtting occurs, we just can plot the hypothesis. But
this only works in low dimensions and not in cases where we have many features. Two al-
ternative approaches exist to reduce overtting. First, we reduce the number of features.
This can be done manually or using a model selection algorithm. Second, we keep all
features but change their magnitude, i.e. some are becoming values close to zero, using
so-called regularization. Note that could just throw away some features unfortunately
we also throw away information.

We summarize the Variance-Bias Tradeo for the linear regression y = Xθ + with

∼ N (0, σ 2 ) where we assume n features and m observations or training data.

Proposition 89. The minimum of the loss function L = ||y − Xθ||2 is given by the OLS
parameter estimates θ̂ = (X0 X)−1 (X0 y). The bias is given by Bias(θ̂) = E(θ̂) − θ and the
(y−Xθ̂)0 (y−Xθ̂)
variance Var(θ̂) = σ 2 (X0 X)−1 where σ 2 is estimated from the residuals σ̂ 2 = m−n .

The OLS estimator is unbiased but its variance can be huge if predictor variables are
highly correlated or if there are many predictors (for n → m, the variance explodes). To
446 CHAPTER 5. ASSET MANAGEMENT INNOVATION

reduce variance one has to introduce some bias which makes us move in Figure 5.17 from
the right-hand side where unbiased OLS puts us towards the center with an optimized
trade-o.

5.2.5 Regularization
5.2.5.1 Theory
Regularization is a method to counter act overtting. Consider the hypothesis

fθ (x) = θ0 + θ1 x + θ2 x2 + θ3 x3 + θ4 x4 + θ5 x5

where the parameters of the fourth and fth order term are much smaller than the other
ones. Clearly, these parameters add to overtting since they add to model complexity.
We do not want to set them manually equal to zero but penalize the algorithm if he
chooses them - we do not throw away information but set a hurdle that they get a
positive weight only if they are indeed important. Regularization achieves this. It takes
the cost function and modies it such that all parameters are shrunk - those which are
already small become almost negligible. We modify our cost function by introducing an
extra term:

m m
!
1 X X 1
J(θ) = (hθ (x(k) ) − y (k) )2 + λ θk2 = (||Xθ − y||2 + λ||θ||2 ) (5.13)
2m 2
k=1 k=1

with λ the regularization parameter. This addition induces a trade o between the goals
to t the training set well (rst expression) and to keep the number of parameters small
(second term). Using this regularized objective function we obtain a smoother t and
gives a much better hypothesis. For λ large, all parameters are penalized in the sense
that they are getting close to zero. Hence, we have to be careful of not ending up with
undertting by choosing the parameter too large.

5.2.5.2 Application to Linear Regression

To understand the impact of this regularization we rst note that the normal equation
solution for the regularized linear regression problem reads
−1
θ∗ = X0 X + λI X0 y (5.14)

with I a (n + 1) × (n + 1) matrix which has in its rst rows zeros and the remaining n × n
matrix is the identity matrix. To check this we write in matrix notation

1
(θ0 X − y)0 (θX − y) + λθ0 Iθ .

J=
2
Taking the derivative w.r.t. θ:

∇J = Xθ − y + λIθ = 0,
5.2. MACHINE LEARNING (ML) 447

multiplying from the left X0 solving implies the solution.

Considering the bias and variance of the regularized model it follows that we add a
little bias but can reduce the variance instead and the parameter estimates are shrunk
towards zero and more so the larger the value of λ. To prove these claims we just need
to repeat the argumentation and calculation of last section which we omit. The result
are
EP (θ∗ |X) = I − λ(λI + X0 X)−1 θ̂

showing the bias. Using the spectral theorem of linear algebra one can show that the
matrix I−λ(λI+X0 X)−1 indeed leads to a shrinkage of the parameters, i.e. EP (θ∗ |X) ≤ θ̂.
One can next express the variance and the MSE explicitly for the regularized problem.
The variance expression reads

−1 −1
σ 2 (θ∗ |X) = σ 2 Iλ − X0 X) − λσ 2 λI + X 0 X .

We omit the derivation of these formula but instead to an explicit calculation for a one-
dimensional case.

Suppose that there is a single feature which can take only two values 1 and −1.
Hence, the design matrix X reads

1 −1 0 1 1 1 −1 2 0
X= , XX= = .
1 1 −1 1 1 1 0 2

But then
2+λ
0 0
λI + X X =
0 2+λ
and the inverse matrix of this diagonal matrix is simply

1
0 −1 2+λ 0
(λI + X X) = 1 .
0 2+λ

Using this the parameters θ0∗ , θ1∗ are shrunk by 1− λ

2+λ . If λ increases, the parameters
converge towards zero. The variance reads

1 1
2 ∗ 2 2+λ 0 0
σ (θ |X) = σ 1 1 − λ 2+λ 1 .
0 2+λ 0 2+λ

If λ>0 the regularized variance is always smaller than the non-regularized one. This
is indeed the benet from regularization: we can reduce large variance at the cost of
introducing a bit of bias.

The MSE reads

1
2 2 2 2

MSE = 4σ + λ (θ̂ 0 + θ̂ 1 ) .
(2 + λ)2
448 CHAPTER 5. ASSET MANAGEMENT INNOVATION

If noise dominates the size of the input parameters σ 2 > θ̂02 + θ̂12 , for λ=2 the MSE is
smaller than half the variance for example.
that are absent or rare in low-dimensional spaces become generic.

5.2.5.3 Penalty approaches in portfolio optimization

We apply the penalty approach to portfolio optimization problems.
The rst method, the ridge regression, is equivalent to the method of last section, i.e.
it is based on the loss function:

N
X N
X
L0 = min (yj − x0i β)2 + λ βj2 = min ||y − xβ||2 + λ||β||2 (5.15)
β∈RN β∈RN
i=1 j=1

We changed notation to the traditional ones in asset management: θ is replaced by the

vector of β and the design matrix is denoted x. By changing the values of λ we are
controlling the penalty term. The following proposition is derived in the same way as
the results of last section.

Proposition 90. The minimum of the loss function L0 is given by the parameter esti-
mates β̂ = (x0 x + λI)−1 (x0 y) and

Bias(β̂ 0 ) = λ(x0 x + λI)−1 βOLS , Var(β̂) = σ 2 (x0 x + λI)−1 x0 x(x0 x + λI)−1

with βOLS the beta of the ordinary least square problem with zero penalty term.

If λ increases, variance decreases and the bias increases. Ridge regression shrinks all
coecients of OLS by a uniform factor. It does not set any coecients to zero. What
is the optimal value for λ? A traditional approach is to choose λ such that some in-
formation criterion (AIC for example) is smallest. A ML approach is to minimize the
cross-validated sum of squared residuals (or some other measure).

We consider next LASSO (Least Absolute Shrinkage Selector Operator) regulariza-

tion. The problem reads:

N
X N
X
L00 = min (yj − x0j β)2 + λ |βj | = min ||y − xβ||2 + λ||β||21 . (5.16)
β∈RN β∈RN
j=1 j=1

The innocent looking dierence is that we consider the penalty term using a dierent
distance measure stick - L1 versus L2 norm. But the impact on the optimal betas is
qualitatively and quantitatively dierent. Increasing lambda, the model parameters are
not only becoming smaller but some original small parameters are attaining the value
zero. Therefore, under LASSO a feature selection happens.
5.2. MACHINE LEARNING (ML) 449

Consider a two dimensional problem. The level sets of the quadratic term in the loss
function are ellipses. The penalty terms are circles around zero in the L2 -norm and a
diamond around zero in the LASSO case. The minimum is the point where the ellipses
and the circle or diamond intersect. In the ridge case, this will be generically at a point
which is not on one coordinate axis: . both beta estimates are not zero. In the diamond
case generically the intersection is on one axis and hence one parameter is zero.

The analytical solution of the problem L00 in the case where the xi are orthonormal
is given next:

Proposition 91. Assume that the vectors xi are orthonormal. The minimum of the loss
function L00 is given by the parameter estimates:
!
β̂ = β̂jOLS max 0, 1 −
Nλ
.
|β̂jOLS |

There is no explicit formula for the Bias and Variance.

We consider LASSO for portfolio optimization to reduce estimation risk in the un-
known parameters µ and C. There is empirical evidence that this approach provides
higher out-of-sample performance and that Sharpe ratios are more stable. The opti-
mization problem is still convex and therefore any local numerically found minimum is
a global minimum. The optimization problem reads:

N
X
0
min φ Cφ + λ |φj | , s.t. : e0 φ = 1 , φ0 µ ≥ r . (5.17)
φ∈RN
j=1

Deviations from the zero vector are punished since one superimposes a 'V'-type function
to the risk function. Small values of φ eventually are reduced to zero. This results in a
sparser investment vector. There are many dierent variants of the LASSO approach, see
Fastrich et al. (2013) and Zhou (2006) for the adaptive LASSO approach to counteract
some biases inherent in (5.17).

Bruder et. al (2013) compare the OLS-mean variance approach with the LASSO-
mean variance one for the S&P 500, with monthly rebalancing between Jan 2000 to Dec
2011, see Table 5.1.

Method Return Volatility Sharpe Ratio Max. Drawdown Turnover

OLS-MV 3.60% 14.39% 0.25 -39.71% 19.4
LASSO-MV 5.00% 13.82% 0.36 -35.42% 5.9

Table 5.1: OLS-mean variance versus LASSO-mean variance (Bruder [2011])

The LASSO approach shows a better risk adjusted performance than the traditional
one. The extreme losses are comparable in both approaches. LASSO approach does not
450 CHAPTER 5. ASSET MANAGEMENT INNOVATION

provides any form of a tail hedge. The turnover is much smaller for the LASSO approach
which is a consequence sparse optimal investment vector and information matrix in the
LASSO approach. Google stock is for example hedged in the OLS model by 99 stocks
compared to 13 stocks only in the LASSO model.

LASSO requires powerful numerical software tools. Take MSCI world with around
10 500 stocks. Theoretical convexity of the problem is lost in most type of LASSO ap-
proaches due to the sparsity of the matrix: curvature is almost zero. One needs to
search carefully for the true global minimum. Since the covariance matrix is high dimen-
sional with many zero entries, its inversion becomes delicate. One has to use advanced
algorithms to produce a meaningful inverse.

5.2.6 Theory ML
5.2.6.1 Learning Finite Classes
Assume F, S are given and realizability holds. Empirical risk is dened by

|{(xi , yi )|f (xi ) 6= yi }|

Remp (f ) = (5.18)
m

which counts the errors of the algorithm is observable, contrary to theoretical risk (5.1),
and it depends on the data set. Empricial Risk Minimization (ERM) is given by
any algorithm fERM that minimizes empirical risk:

fERM (x) := argminf ∈F Remp (f ) (5.19)

This is the most important estimator for unknown theoretical risk. It should be consid-
ered with care since overtting may lead to a very low performance of the ERM.

Theorem 92 (ERM PAC Learnable, Finite Case) . Assume that F is a nite set, real-
izability holds, fERM is dened in (5.19) and

1 |F |
m ≥ log
δ

for all , δ > 0, then with probability 1 − δ

R(fERM ) ≤ .

This theorem applies to any machine learning model satisfying the assumptions; it
does not restrict P nor F . We prove more general theorems below.
to the best f ∈ F : Denition 88 remains unchanged except that RP (f ) < is replaced
by
5.2. MACHINE LEARNING (ML) 451

5.2.6.2 Generalization
Consider a hypothesis class F
fm the classier with smallest empirical risk Remp (fm ).
and
Is true risk R(fm ) small too? fm is applied on all of X with the
Is the error still small if
unknown P ? The strong law of large number states that |R(fm ) − Remp (fm )| converges
to zero for m → ∞ for a xed f . The classier fm then generalizes well.

But there are two issues to consider. First, the rate of convergence is left unknown in
the strong law of large numbers. Convergence speed can be that slow that the number
m of test data needed for a given accuracy becomes extremely large. Second, empirical
risk should approximate true risk uniformly in F and P, i.e. not only for a xed f. This
denes consistency.

Why is consistency important? Consider a nite set F such that for each f there
exists a sample where the dierence between true and empirical risk is small. But for
a given sample we don't know how many of the functions in F the dierence is small.
Furthermore, fm which minimizes empirical risk need not minimize true risk too. The
dierence between the two risk measures can become large. We want to rule out both
cases; empirical risk needs to converge towards true risk independent of f ∈F - this is
called uniform convergence.

The inequality of Hoeng and Cherno addresses the speed of convergence for a
given xed function f . Empirical risk is close to the actual risk:

2
P (|Remp (f ) − R(f )| ≤ ) ≤ 2e−2m . (5.20)

Hence, for m → ∞ suciently large, the training error provides for example a good
estimate of the test error. Thinking about the risk as an expected value or a mean, the
bound states that the mean value dierence of two random variables are getting close in
probability with increasing size of the data at an exponential rate.

We consider empirical risk minimization. We would like that empirical and unknown
true risk become close to each other independent of the chosen f ∈F and the unknown
probability P. Then ERM is called a consistent algorithms. The dierence between
empirical and true risk should become simultaneously small for all functions f ∈ F .
Uniformity requires that even for the worst possible function f :7

sup |R(f ) − Remp (f )| ≤ .

f ∈F

But then

|R(fm ) − Remp (f )| ≤ sup |R(f ) − Remp (f )| ≤ .

f ∈F

7 If you are not familiar with the supremum, replace it by the maximum.
452 CHAPTER 5. ASSET MANAGEMENT INNOVATION

The quantity on the right hand side is what uniform law of large number deals with. Prov-
ing uniform convergence of empirical risk also implies uniform convergence of |R(fm ) −
R(fF )| where fF ∈ F is the theoretical classier which minimizes true risk:

|R(fm ) − R(fF )|
= R(fm ) − R(fF ) ( Rfm is the optimum)

= R(fm ) − Remp (fm ) + Remp (fm ) − Remp (fF ) + Remp (fF ) − R(fF )
≤ R(fm ) − Remp (fm ) + Remp (fF ) − R(fF )
≤ 2 sup |R(f ) − Remp (f )|
f ∈F

where we used in the second last line Remp (fm ) − Remp (fF ) ≤ 0 by denition of fm being
the optimum for ERM and in the last line we used that f (fm ) ≥ R(fF ) by denition of
the optimal theoretical risk. Therefore,

P (|R(fm ) − R(fF )| ≥ ) ≤ P (sup |R(f ) − Remp (f )| ≥ /2).

f ∈F

Hence, if we can prove consistency for the empirical risk inequality, consistency for
|R(fm )−R(fF )| follows. The proof for the empirical risk consistency expression supf ∈F |R(f )−
Remp (f ) is done in several steps.

We allow F to be an innite set.

• Step I Symmetrization or ghost trick. R(f ) is not known. The trick replaces
this quantity by observable empirical risk by doubling virtually the existing sample
of m points.

Proposition 93 (Vapnik and Chervonenkis Symmetrization Lemma) . Let F con-

sist of m elements with n samples. For m ≥ 2:

P (sup |Remp (f ) − R(f )| ≥ ) ≤ 2P 0 (sup |Remp (f ) − Remp

0
(f )| ≥ /2)
f ∈F f ∈F

where the second distribution P 0 refers to the IID distribution of a sample with
size 2n, Remp measures the risk on the rst n samples and Remp 0 on the second n
samples.

Using the ghost trick we can replace the unknown R(f ) on the left hand side by
empirical risk in the desired bound.

• Step II Finiteness Step I implies that the innite set F can be replaced by a
nite one F with at most 2
2m elements in F :

2P 0 (sup |Remp (f ) − Remp

0
(f )| ≥ /2) = 2P 0 (sup |Remp (f ) − Remp
0
(f )| ≥ /2).
f ∈F f ∈F

The bound on the left hand side requires to consider an innite set which can be
reduced to the calculation using a nite set on the right hand side.
5.2. MACHINE LEARNING (ML) 453

• Step III Shattering Coecient, Union Bound, Hoeding The last step is
to bound the above expression by:

2 /4
P 0 (sup |Remp (f ) − Remp
0
(f )| ≥ /2) ≤ S(F, 2m)e−m (5.21)
f ∈F

with S the shattering coecient, see below. The the union bound trick as well the
Hoeding inequality are used in the proof.

If the RHS of the last expression converges to zero for m to innity, then ERM is con-
sistent for the innite function set F. Given the exponential function, if the shattering
coecient is not growing to strong, convergence follows.

Given uniform convergence, how should the space F be chosen such that the shat-
tering coecient times the exponential function in Step III converges? If we choose F
to be all functions, then the classier fm contains all Bayes classiers which leads to
inconsistency. This follows from the so-called no-free lunch theorem in ML which should
not be confused with the no-free lunch theorem in the advanced theory of no arbitrage
pricing.
To prevent such a situation, we use prior information or a hypothesis to restrict F.
Clearly we do not want to reduce it to the extend that the classier with zero error or
small error (PAC, Agnostic) is ruled out. The following error decomposition shows that
there is an intermediate reduction, see Figure 5.19:

R(fm ) − R(fBayes ) = R(fm ) − R(fF ) + R(fF ) − R(fBayes ) (5.22)

| {z } | {z }
Estim. error Approx. error

If we choose F small, then fm is close to fF ; a small approximation error. Estimation

risk is a random quantity. But fF is distant from the best estimator of Bayes; a large
approximation error. Approximation risk is not driven by randomness but by the error
if we search in a subset F instead in the space of all functions Fall . It does not depend
on the sample size. If the realizability assumption holds, this risk is by denition zero
since fF and the Bayes classier coincide. Estimation risk depends on the training set
size and on the size or complexity of the hypothesis class. We have shown in the nite-F
theory that this error increases logarithmically with the size |F | and it decreases with
number of data samples m. To minimize total risk, the above decomposition induces
the bias-complexity tradeo. Choosing F large, decreases the approximation error and
increase the estimation error due to overtting. A very small F has the opposite eects
due to undertting.

5.2.6.3 Inequality of Hoeding

We provide the details of the three steps in the last section and start with the inequality
of Cherno (1952) and its generalization of Hoeding (1963). They describe how good
nite empirical risk approximates expected true risk. Their bounds are used over and
over in statistical learning theory.
454 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Approximation Error Estimation Error

fBayes fn

Space of all Function Fall Space F use by algorithm

Figure 5.19: Estimation and approximation error.

Proposition 94 (Hoeding) . Let x1 , . . . ,P

xm be independent bounded random variables
m
with xi taking values in [ai , bi ]. Let Sm = i xi . Then for every > 0:

−2 2
P (|Sm − E(Sm )| ≥ ) ≤ 2e Wm (5.23)

with Wm
2 = − ai )2 .
P
i (bi

An increasing number m of training data reduces the probability of large deviations

of the empirical mean from the expected value. Replacing the variables in the theorem
with those used in the learning theory where the random variables take values in the
unit interval (the cube Wm volume equals 1), we get for the deviation of empirical risk
to true risk:
2
P (|Remp (f ) − R(f )| ≤ ) ≤ 2e−2m . (5.24)

2
We rewrite (5.24) using δ := 2e−2m as

s
log 2δ
P (|Remp (f ) − R(f )| ≤ )≤δ . (5.25)
2m

Inversion of the probability means that with probability at least 1−δ

s
log 2δ
|Remp (f ) − R(f )| ≤ .
2m
5.2. MACHINE LEARNING (ML) 455

For any function

q f and any positive δ true risk is bounded by empirical risk plus the
2
logδ
2m with probability at least 1 − δ. If m is suciently large the training error becomes
a good estimate of true error. But the result is limited since it holds for a given function
f: For this function there is a training set where the bound holds with probability 1−δ
but for a dierent function f0 the training set may be dierent and the bound fails with
probability δ. The bound does not hold uniformly. Hoeding's bound is not sucient
to prove consistency of empirical risk minimization. Applying the Glivenko-Cantelli
theorem, which holds under the same assumptions as the DKW theorem below, uniform
convergence follows if F is a one-dimensional space.8
The next theorem summarizes:

Proposition 96 (Vapnik and Chervonenkis). Uniform convergence for all positive in

(??) is necessary and sucient for consistency of empirical risk minimization w.r.t. F.

This abstract result is not very useful for applications since there is no characterization
whether for a given set F the uniform law of large numbers holds. Since our set F is high
dimensional we face the problem: which properties of F determine uniform convergence
We start with the union bound trick.

5.2.6.4 Union Bound

The trick is based on the elementary statement that the probability of a union of events
is smaller or equal to the sum of the individual probabilities. We apply this to the case
where F consists of only nitely m many functions fi . Then, we have nitely many 'or'
operations:

P (sup |Remp (f ) − R(f )| ≥ )

f ∈F
= P (|Remp (f1 ) − R(f1 )| ≥ or |Remp (f2 ) − R(f2 )| ≥ . . .) . (5.27)

Applying the union bound trick and Hoeding inequality:

2
X
P (sup |Remp (f ) − R(f )| ≥ ) ≤ P (|Remp (fi ) − R(fi )| ≥ ) ≤ 2me−2m . (5.28)
f ∈F i
8 Convergence means in the almost everywhere sense. The inequality of DKW quanties how fast an
empirical distribution function approaches the distribution function from which the empirical samples
are drawn. It generalizes the Glivenko-Cantelli Lemma of uniform convergence of empirical functions.
We dene the empirical distribution function:
m
X
Fm (x) = 1/n χ{Xi ≤x} , x ∈ R.
i=1

Proposition 95. (Dvoretzky - Kiefer - Wolfowitz inequality) Let Xi be IID. Then

r
2 1
P sup |Fm (x) − F (x)| > ε ≤ 2e−2mε for every ε ≥ ln 2. (5.26)
x∈R 2m
456 CHAPTER 5. ASSET MANAGEMENT INNOVATION

This proves that empirical risk minimization over a nite set F is consistent with respect
to F: The supremum can be taken outside of the probability. Equivalently, for each
function f ∈F we have with probability 1−δ
s
log 2δ
R(f ) = Remp (f ) + log m + . (5.29)
2m
We see that uniformity increases the error bound by the factor log m. This nite dimen-
sional theory cannot be directly generalized to innite sets F since for m to innity 5.29)
becomes meaningless.

5.2.6.5 VC Theory
Can F be learned if its cardinality is innite, i.e. does the theorem about agnostic learn-
ing for nite F generalize? We start with an example which shows that niteness of F
is a sucient condition for learnability but not a necessary one. Hence, the size of the
class F is not the measure needed to classify the complexity of ML models in learnable
and non-learnable ones.

Let x ≥ r ∈ R, and zero else. Set F + equal to all fr , where r runs

fr (x) = 1, if
through the positive reals. F
+ consists of all positive half lines. It has uncountable many

elements and we set X = R, Y = {0, 1}. Despite its dimensionality, F

+ is statistically

PAC learnable and agnostic learnable. To simplify the calculation assume realizability,
i.e. f∗ is an algorithm which perfectly classies the data. To nd fERM , the algorithm
selects the maximal r such that to no real number x < r is assigned the value 1, i.e.
r > f ∗. Let fˆ be the function chosen by our algorithm. Then there is a region [f ∗ , fˆ]
with probability epsilon where f ∗ and fˆ disagree: fˆ assigns 0 to an x in this interval while
f ∗ assigns 1. On the remaining interval (fˆ, ∞) both functions agree with probability 1−.
Then,

|S|
Y
/ [f ∗ , fˆ]) =
P (RP (fˆ) > ) ≤ P (∀(xi , yi ) ∈ S, xi ∈ / [f ∗ , fˆ]) ≤ (1 − )|S| ≤ e−|S| .
P (xi ∈
i=1

1 1
fˆ is

If we choose |S| ≥ m(, δ) = log δ , the probability that the error of larger than
can be made smaller than δ , i.e. the algorithm is learnable. The innite set F is
described by a single parameter. We guess that if an innite set F can be described by
a nite number of parameters, then the set is statistically learnable. This is true for the
above example in higher dimensions. But it fails in general to be true.

The key step in determining which innite sets F can be learned is based on the
ghost sample trick idea of Vapnik and Chervonenkis: It reduces an innite to a nite
problem where the union bound trick can be applied and where the factor m in nite
dimension is replaced by a capacity measure which can be computed for innite sets.
5.2. MACHINE LEARNING (ML) 457

Let x1 , . . . , x m be data points and Zm be the sample of the m points (xi , yi ). Set
|FZm | equal to the cardinality of F restricted to Zm . Although F is innite, |FZm | is
nite. The shattering coecient S(F, m) of F is dened by:
S(F, m) = max{FZm |x1 , . . . , xm ∈ X}
m instances Xm from input space X is shattered by a function
In other words, a set of
class of F 2m labellings can be generated using functions from F . If we
if all possible
3
consider three points in the plane, i.e. m = 2, there are 2 = 8 labellings. Using
hyperplanes as the only functions in F , the points are shattered by the hyper planes, see
Figure 5.20.

Figure 5.20: Computing the VC dimension of hyperplanes, i.e. F = hyperplanes, in

dimension 2 for three points. For any of the eight possible labellings of these points, we
can nd a linear classier that obtains zero training error on them. But there is no set
of 4 points that this hypothesis class can shatter: A set of 3 points can be shattered, but
no set of four points.

The number S(F, m) ≤ 2m is independent of the dimension of the set F . The

growth function S(F, m) is the maximum number how m points can be classied by F .
Consider the case where F consists of all possible functions. This class is that rich that
it can classify each sample in every possible way. That means S(F, 2m) = 2
2m . To prove

convergence of ERM, we have to consider from (94)

2 /4 2 /4 2 /4
S(F, 2m)e−m = 22m e−m = e2m log 2 e−m = m(log 2 − 2 /4).
If log 2 − 2 /4 > 0, i.e. 2 < 4log2, for m to innity the expression diverges. That is,
consistency of ERM for the unrestricted set of all functions Fall does not follow. Since
458 CHAPTER 5. ASSET MANAGEMENT INNOVATION

the bound for consistency is only an upper bound, i.e. a sucient condition, we cannot
not directly conclude that the ERM is inconsistent if we use all functions. The following
condition
log S(F, m)/m → 0
denes a necessary and sucient condition for ERM to be consistent. Using this on the
unrestricted set of functions

log S(F, 2m)/(2m) = 1 .

Therefore, nally

Theorem 97. ERM is not consistent on Fall .

But if the shattering coecient grows polynomially, say S(F, 2m) ≤ (2m)k , then
ERM is consistent.
The following theorem summarizes the above discussion, i.e. that the growth function
is the right one to generalize the nite dimensional theory.

Proposition 98 (Vapnik and Chervonenkis) . For any δ > 0, with probability 1 − δ any
function f ∈ F satises
r
4
R(f ) = Remp (f ) + (2 log S(F, 2n) − log δ) . (5.30)
m
The shattering coecient which we used so far has the drawback that it is dicult to
calculate. It turns out that a dierent capacity gure, the VC dimension is better suited.
To dene this number, a sample Zm of size m is shattered by the class F if the function
class can realize any labelling on the given sample, i.e. |Fm = 2m . The VC-dimension
A(F, m) is dened as the largest number m such that there exists a sample of size m
which is shattered by F. If the VC dimension of F is nite, then F is learnable:

Theorem 99. F is PAC learnable if and only if the VC dimension of F is nite. Then,
the complexity mF (, δ) grows at the same rate as

VC-dim(F)

1
log .
δ
For agnostic learning, the rst epsilon is replaced by its squared number.
The VC-dimension measures the ability of a set of functions to t available nite data.
A set of functions has VC-dimension d if there exist h samples that can be shattered by
this set of functions, but there does not exist h + 1 samples that can be shattered. If
one considers the half-planes in Rd , then VC is d + 1, see Figure 5.20 for the case d = 2
since there exists three points that can be shattered but four point cannot be shattered.
If en (y), n = 1, . . . , m, is a set of m linearly independent function, then the function

f (y, θ) = χP θn en (x)+a>0
n
5.2. MACHINE LEARNING (ML) 459

is equivalent to linear functions in m-dimensional space. The VC dimension is m + 1. If

arbitrarily large samples can be shattered, the VC dimension is set equal to innity.
This concludes our presentation of machine learning theory. Summarizing, PAC learn-
ability discussed so far allow the sample sizes to depend on the accuracy and condence
parameters, but they are uniform with respect to the labeling rule and the underlying
data distribution. That is, classes that are learnable must have a nite VC-dimension.
We refer to the literature for weaker notions of learnability. s

5.2.7 Linear Threshold Model

Linear predictors are one of the most useful and used families of hypothesis classes F.
They are intuitive, easy to interpret, and they t the data reasonably well in many
problems. Consider the ane function

y(θ) = x0 + hθ, xi

that is each function is parametrized by θ. These function form the class F. Dierent
classier (hypothesis classes) of linear predictors are compositions g ◦ †(θ). In a binary
classication g is chosen to be the sign function and in a regression, g is the identity
function.

In the case of binary classication, F is equal to the class of halfspaces:

F = {x → f (x, θ) = sign(hθ, xi)}

where (setting x0 = 0)
(
+1, hθ, xi ≥ 0
f (x, θ) = sign(hθ, xi) = , θ ∈ dn .
−1, hθ, xi < 0

Each classier forms a hyperplane that is perpendicular to the vector θ and intersects
at the origin. The θ vector is orthogonal to the plane. It points in the direction where
hθ, xi increases most, see Figure 5.21. The sign does not change if we change the order
of the x's: The linear classier does not cares about the nearness of the labels. The next
theorem summarizes learnability:

Theorem 100. If x0 = 0, then the VC dimension of the class of halfspaces F in Rd is

d. If x0 6= 0, the VC dimension is d + 1.

Assuming m data points in the training set and realizability, the ERM classier
for half-spaces is expected to make zero error on the training set. The ERM can be
implemented by using the perceptron alogrithm on half-spaces. The idea is to adjust the
parameters θ incrementally in order to minimize classier training error step-by-step.
Consider the task to classify images into 'access/no access' to a building. If f (x, θ) = +1
460 CHAPTER 5. ASSET MANAGEMENT INNOVATION

A linear classifier without offset parameter Maximum margin linear classifier

<q,x> > 0 <q,x> > 0

<q,x> = 0 <q,x> = 0
ggeom
+1 labelled images +1 labelled images
Decision Boundary Decision Boundary

q q

<q,x> < 0
-1 labelled images
<q,x> < 0
-1 labelled images

Figure 5.21: Geometry of the classier problem.

the image has been classied correctly. By the perceptron update rule the adjustment
after k steps for image m reads:

θ(k+1) = θ(k) + ym xm

if a mistake is realized (hym (θ

(k) ), x i < 0). Then
m

ym hθ(k+1) , xm i = ym h(θ(k) + ym xm ), xm i = ym hθ(k) , xm i + ym

2
hxm , xm i,

i.e.
ym hθ(k+1) , xm i = ym hθ(k) , xm i + ||xm ||2 ≥ ym hθ(k) , xm i .
Given a mistake at stage k , the updated value becomes more positive in k + 1 and after a
certain number of updates the value becomes positive - the kept xed image is classied
correctly. Then the next image is considered. Updating the parameters in the same way
leads to a correctly classied new image after some steps. This is continued for all images.
But will these updates keep the former updates stable - the convergence question of the
algorithm?

Proposition 101. Assume that for all test images m exists a constant γ > 0 such that
hym (θ∗ ), xm i ≥ γ and that all training images have bounded norm ||xm || ≤ r. Then the
perceptron algorithm converges in a nite number of steps k with

r2 ||θ∗ ||2
k≤ . (5.31)
γ2
5.2. MACHINE LEARNING (ML) 461

The number γ is called the margin, a name whose meaning will become clear be-
∗ ∗
low. θ is the decision parameter for the plane hθ , xi = 0. Therefore, the assumption
∗
hym (θ ), xm i ≥ γ > 0 means that there exists a linear classier in our class with nite
parameter values that correctly classies all training images. The inverse upper bound
γ2
is the smallest distance in the image space from any image to the decision bound-
r2 ||θ∗ ||2
∗
ary specied by θ . It measures how well the two classes of images are separated by a
linear boundary. This is the geometric margin γgeom and it inverse is a measure of how
dicult the problem is: The smaller the geometric margin the more dicult the problem
under consideration, see Figure 5.21.

The bound in the theorem can be rewritten

r2
k≤ 2
.
γgeom

Remarkably, the bound does not depend directly on the dimension of the images (pixels)
nor on the number of training images. Nevertheless, the bound turns out as a measure
of complexity of the problem of learning linear classiers - the VC-dimension.

How well does the perceptron classify images which are not in the training set? If
r2 |
the two assumptions of the theorem hold true also for new images, then after k≤ 2
γgeom
9
mistakes in classifying the new images , all further images will be classied correctly. In
this sense, the above result generalize to the new images.

We assumed that there exists a linear classier that has a large geometric margin. Is
it possible to nd such a large margin classier directly? The next section provides the
answer.

5.2.8 Support Vector Machines (SVM)

We show how the optimal margin classier can be found directly. The classier is called
the Support Vector Machine (SVM). Intuitively, see Figure 5.22 where the optimal and
a non-optimal classier are shown, one could nd the maximum margin linear classier
by rst identifying any classier that correctly classies all examples and then increasing
the geometric margin until towards is maximum number: SVM nds the separating
hyperplane with the largest margin directly. This hyperplane minimizes the upper bound
of the classication error, i.e. it minimizes overtting.
This denes an optimization problem: Maximize the geometric margin under the
constraint that the classier is correct under on all training examples:

γ2
max , yk hθ, xk i ≥ γ, ∀k . (5.32)
θ ||θ||2
9 The algorithm does not know when he made a mistake. This detection has to be added to the model.
462 CHAPTER 5. ASSET MANAGEMENT INNOVATION

SVM

<q,x> > 0
+1 labelled images
<q,x> = 0
Decision Boundary

<q,x> < 0
-1 labelled images

Figure 5.22: The optimal SVM hyperplane is shown together with a non-optimal hyper-
plane where the margin is smaller than in the optimal case. The two data points which
are element of the two hyperplane belonging to the SVM optimal one are denoted by a
square.

This problem is recast in a more suitable form. Replacing max by min provides a
quadratic objective function, inserting the usual factor 1/2 and since the result depends
on the ratio θ/γ , we set without loss of generality γ = 1. Summarizing, the problem
reads

1
min ||θ|| , yk hθ, xk i ≥ 1, ∀k . (5.33)
θ 2
This denes a quadratic optimization problem which can be generalized. Using the
Lagrangian, the Kuhn-Tucker conditions are necessary and also sucient for this convex
problem for an optimum. If αk is the Lagrange multiplier associated to constraint k in
the optimization problem, the complementarity condition of Kuhn-Tucker

αk (yk hθ, xk i − 1) = 0

implies that only data points which are elements of the two hyperplanes in Figure 5.22
marked with the square can have αk > 0 since they are the only points where the con-
straint holds with equality. These two data points are called support vectors. For all
other data points the alphas are zero. In the pixel example, the solution depends only
on the subset of images which are exactly on the margin. The remaining images do not
matter. Hence, the support vectors are sucient to dene the training set.
5.2. MACHINE LEARNING (ML) 463

The many conditions in the Kuhn-Tucker which are due to the inequality constraint
make it dicult to solve the problem. Therefore, one transforms the optimization prob-
lem from the above formulation (primal model) to its dual model form which is easier
∂L
to solve. That for, one solves in the primal model the equation
∂θ = 0 with L the
Lagrangian w.r.t. to θ and substitutes this solution back into the Lagrangian which im-
plies the dual Lagrangian LD which depends only on alpha, y and xk . From a statistical
learning theory perspective, maximizing the margin means minimizing the VC dimension
of the support vector machine. Support vector machines minimize bot empirical risk and
the condence interval.

So far we did not considered the typical situation where images are dicult to classify
because of labelling errors, i.e. some few images pop-up in the wrong half-plane in the
optimal solution. We alter the optimization problem of SVM to account for these types of
errors in the maximum margin linear classier. The simplest form is to introduce 'slack'
variables. Slackness means that we measure the degree to which each margin constraint
is violated and associate a cost for the violation in the objective function. The problem
then reads:

n
1 X
min ||θ|| + c ξk , yk hθ, xk i ≥ 1 − ξk , ξk ≥ 0 , ∀k . (5.34)
θ 2
k=1

where ξ are the slack variables. If we have to set ξk > 0, then the margin constraint is
violated (possible misspecication) and the penalty costs occur. Increasing the constant
c, i.e. increasing the penalty costs, leads to ξk = 0 for all k. We are back in the original
problem. For small c many margin constraints can be violated. It is reasonable to ask
whether this is indeed the trade-o we want.

So far we assumed that data sets can be separated linearly by a hyperplane. But
often a non-linear curve is needed instead. We refer to the literature for the powerful
methods which rely on the clever idea to transform the non-linear into a linear one by
mapping the data into a higher dimensional space and then to use the above linear theory
in this space (kernel methods).

5.2.9 Tree Based Learning

This section is based on the material from AnalyticsVidhya. Tree based (TB) learning
algorithms are supervised learning methods. They are very popular since they can be
of a high accuracy, stability and easy to interpret. TB algorithms can be used both for
classication and regression. Unlike linear models such as linear regressions they can
account for non-linear relationships. Examples of TB are random forest, decision trees
or boosted trees.

As a rst example how a decision tree is constructed we consider 17 asset managers

are considered in Table 5.2.9 which are characterized by four dierent features. Outper-
464 CHAPTER 5. ASSET MANAGEMENT INNOVATION

formance 'no' means that the track record of the asset manager is not outperforming the
benchmark track record.

ID Emplyoment University CFA Outperformance

1 Self Employed Master No No
2 Self Employed Master Yes Yes
3 Employed Master Yes Yes
4 Academic Master No Yes
5 Academic Batchlor Yes Yes
6 Academic Batchlor Yes No
7 Employed Batchlor Yes Yes
8 Self Employed Master No No
9 Self Employed Batchlor No Yes
10 Academic Batchlor Yes No
11 Self Employed Batchlor Yes Yes
12 Employed Master Yes Yes
13 Employed Batchlor No Yes
14 Academic Master Yes No
15 Employed Master No Yes
16 Academic Master No No
17 Self Employed Master No No

To start with the construction of the tree, we choose the employment status as a rst
node or a question and we count how strong the three dierent types of employment status
allow us to attain as many pure states as possible: This are states where the number the
number of asset managers under consideration either all belong to the performing class or
to the non-performing one. This are the most informative states whereas a node in tree
which leads to 50 percent performers and 50 percent non-performers is not informative
at all - we ip a coin to obtain the same information level. Let us rst construct a
decision tree on an ad hoc basis and then in a second step analyze how the tree should
be constructed optimally. Figure 5.23 shows the dierent steps in the construction of the
tree.

The blue nodes in the gure are pure nodes and the red ones are non-pure end nodes
where there is either no question left to ask or whether the question cannot split the
node further towards a pure node. The node 1/1 after the CFA question is an example
where asking for degree question is useless since both have a master's degree.

P
To construct the tree optimally we use the entropy measure S=− i pi log2 pi . The
lower the entropy for a given question, the higher the information gain by asking the
question. From the raw data, p1 = 10/17 and p2 = 7/17 are the probabilities to be an
outperforming AM or not. The entropy of the raw data is:

10 7
Sraw = − log2 (10/17) − log2 (7/17) = 0.977
17 17
5.2. MACHINE LEARNING (ML) 465

Question 4
Bachlor 1 /2
Question 2
Yes 1/3 University

Academic 2/4 CFA Master 0/1

Question 1 No 1/1 End

Employment Question 3
Bachlor 2/0
Status
Question 5
Self-Employed
University Yes 1/0
3/3
Master 1/3 CFA

No 0/3
Employed
5/0

Figure 5.23: Ad hoc construction of a decision tree.

which is close to 1, the total uninformative case. The entropy for Academic question is

2 4
Sacad = − log2 (2/6) − log2 (4/6) = 0.981
6 6
similar uninformative as the raw data and for the Employment Status

6 6 5
Sempl = − × 0.981 + + × 0 = 0.677.
17 17 17
Therefore, the information gain of Sempl relative to the raw data is highest 0.3, the gain
from Academic is 0.034 and from the CFA is 0.021. This denes the ordering of the
questions in the tree and if there are much more questions which questions should be
ruled out since they add only little entropy gains, i.e. information gains.

As a second example, we have N = 30 stocks which each stock described by three

input variables: Creditworthiness score (high, low), sector (energy, transportation) and
its sustainability score (high, low). 15 of the stocks were performing well in the last
investment period. The goal is to nd a model to predict which stock will perform well
in the next period based on the input variables. Decision trees identify the variables
which creates the best homogeneous sets of stocks. How are these variables and the
splitting identied? While the variables in the example are all categorical (of the yes or
no type), the variables can also take continuous values. Figure 5.25 explains the main
terminology for decision trees. The only missing term is Pruning, i.e. removing sub-nodes
of a decision node which is the opposite operation of splitting.
466 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Root Node

Parent Node of B and C

Decision Node Decision Node A

Terminal Node B Terminal Node C

Child Node of A Child Node of A

Terminal Node Decision Node

Terminal Node Terminal Node

Branches
Sub-trees Splitting

Figure 5.24: Terminology for decision trees.

It is common to speak about regression trees if the dependent variable is continuous;

else the expression classication tree is used.

Denition 102. A classication tree is a decision tree in which each node has a binary
decision based on Xi < a or not for a xed value a ∈
mathbbR.

The root node contains all data (Xi , Yi ). For both models, the prediction space is
cut into disjoint subsets. Splitting from top down cuts the prediction space into new
branches as long as the user dene to terminate the splitting process. If there are too
many splits, overtting follows: Performance will be poor when applied to new data.
Pruning is the counter measure to reduce overtting. How does the tree decides when
to make the next split? Many dierent algorithms are used. The general goal is that at
each node, feature Xi and the threshold a are chosen to minimize resulting diversity in
the children nodes. Consider the splits w.r.t. Creditworthiness and Sector, respectively.
The Gini index split calculates rst the index for each sub-node and then the index is
calculated for a split weighted Gini score of each node of that split.
The weighted Gini index for the split Creditworthiness is then 1/4 ∗ 10/30 + 1/4 ∗
20/30 = 1/8 + 1/6 = 0.29. For the split Sector, the Gini Index is

12/30 7/12)2 + (5/12)2 + 18/30 8/18)2 + (10/18)2 = 0.51.

The Gini score for Split on Sector is higher than the other one. The node split will be on
Sector. Intuitively, there is more diversity in the Sector split compared to the other split
5.2. MACHINE LEARNING (ML) 467

Split on Creditworthiness Split on Sector

N=30 Assets N=30 Assets

Low 10 High 20 Energy 12 Transportation 18

Performing Performing Performing Performing
well last period well last period well last period well last period
5 10 7 8

Figure 5.25: Split on Creditworthiness and split on Sector.

where prediction is close to random coin toss. From an information perspective, the purer
a node is the less information is needed to describe the node. Hence, the above dened
entropy S is another quantity to calculate a split. A method for continuous variable are
reduction of variance calculations using the above two step procedure of calculating the
variance for each node and then using the weighted average for the split variance value.

To control for overtting, that is extreme 100% accuracy on training set by making
one leaf for each observation, is achieved either by setting constraints or by pruning.
Constraints can be set on the parameters in the tree: One can x the minimum number
of observations in the nodes for a split, dene the minimum samples for a terminal node,
the maximum depth of tree, the maximum number of terminal nodes or the maximum
features to consider for split. These restrictions prevent the model from learning relations
which are specic to the node but do not generalize.

Splitting is a myopic approach: The algorithm checks locally whether a split should
happen but does not consider a global view. The algorithm only stops if it reaches a
constraint value. In this sense, algorithms are greedy: Myopic decision makers which do
not take into account any future decisions. They act in the analogy to investment as one
period optimizer in a multi-period context. Pruning is the choice which consider eects a
few steps ahead. The implementation follows the usual backward induction logic. First,
generate the decision tree to a large depth and then work backwards by removing all
nodes which imply negative returns.
468 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Comparing tree based models with linear logistic regression for classication and lin-
ear regressions the classic models are appropriate if the relationship between dependent
and independent variable is indeed linear. But if there non-linearities between the vari-
ables then only tree based models can account for them. Further more, tree based models
are often simpler to explain than their linear counter-parts.

Often not a single model is used but an ensemble of models to achieve a better
accuracy and model stability. Like any ML model, tree based models suer from the
bias-variance tradeo. Small trees for example lead to low variance and high bias. In-
creasing the complexity of the model a reduction in prediction error due to lower bias
follows. But at some point, high complexity starts to overt the model that is, variance
is increasing. Ensemble models are a method to manage the bias-variance trade-o. En-
semble methods include Bagging, Boosting and Stacking approaches. Boosting combines
many 'weak' or high bias models in an ensemble that has lower bias than the individ-
ual models, while bagging combines 'strong' learners in a way that reduces their variance.

Bagging reduces the variance of predictions by combining the result of multiple clas-
siers modelled on dierent sub-samples of the same data set. Starting with a training
setX of size N , bagging generates M new training sets Xi0 each of size M by sampling
from X uniformly and with replacement. The M models are tted using the above M
samples and combined by averaging the output for regression or voting for classication.
This averaging procedure stabilizes the single algorithms.

5.2.10 Naive Bayes Classier

Consider a training set X , where each training example x is an n-dimensional feature
vector x = (x1 , x2 , ..., xn ) and C is a set of classes c1 , . . . , cm . In which class should a
new instance x̃ be classied? Think about x being a vector of features characterizing the
risk preferences of an investor and c be the investment goals balanced, growth, income
for example. A new instance x̃ is a new investor with his prole and we want to classify
or attribute to this new investor the best suitable investment goal. That is we want to
nd the most probable class
c∗ = arg max P (c|x̃) .
c∈C

Typically, we don't know the probability P (c|x̃). But Bayes' Theorem tells us how to
calculate this quantity using P (x̃|c) and P (c) instead:

P (x|c)P (c)
c∗ = arg max
c∈C P (x̃)

and since P (x̃) is the same for all classes, the maximum does not change if we delete the
denominator:
c∗ = arg max P (x̃|c)P (c). (5.35)
c∈C
5.2. MACHINE LEARNING (ML) 469

The Naive Bayes Classier assumes that the attributes are conditionally independent
given the classication:
n
Y
P (x̃|c) = P (x̃i |c) . (5.36)
i=1

This assumption is for example violated if we consider text classication (Natural Lan-
guage Processing (NLP)). Here x is a sequence of words and say there are two classes
c1 , c2 which classify the text into 'complaints' and 'non-complaints. The probability that
a whole sequence of words given the class 'complaint' is the same as the product of the
probabilities over the individual words does not hold true since the meaning of a sentence
is not generated by independence between the single words.

Denition 103. The naive Bayes classier nds the most probable class for x̃:
n
Y
c = arg max
∗
P (x̃i |c)P (c) . (5.37)
c∈C
i=1

We apply this to the following asset management example. Consider 18 investors of

an AM rm, which have ve features Risk Prole, Experience, Home Bias, Liquidity,
Buy Portfolio, see Table 5.2.10.

ID Risk Prole Experience Home Bias Liquidity Buy Portfolio

1 Low High No Fair No
2 Low High No Excellent No
3 Medium High No Fair Yes
4 High Medium No Fair Yes
5 High Low Yes Fair Yes
6 High Low Yes Excellent No
7 Medium Low Yes Excellent Yes
8 Low Medium No Fair No
9 Low Low Yes Fair Yes
10 High Medium Yes Fair Yes
11 Low Medium Yes Excellent Yes
12 Medium Medium No Excellent Yes
13 Medium High Yes Fair Yes
14 High Medium No Excellent No
15 Low Low Yes Fair Yes
16 High Medium Yes Fair Yes
17 Low Medium Yes Excellent Yes
18 Medium Medium No Excellent Yes

The prediction ŷ is whether a new client with given features will buy or not buy the oered
portfolio solution. The features of the new client are x̃ = (Risk Prole=Low, Experi-
ence=Medium, Bias = Yes, Liquidity = Fair). Two classes are of interest: c1 means that
an investor will buy the portfolio and c2 that she will not do so. To decide this question,
470 CHAPTER 5. ASSET MANAGEMENT INNOVATION

we use the Naive Bayes Classier formula and start with P (c1 ) = 13/18, P (c2 ) = 5/18.
Table 5.2.10 summarizes the necessary conditional probabilities.

P (RP=Low|c1 ) = 4/13 P (RP=Low|c2 ) = 3/5

Summing the probabilities, implies

P (x̃|c1 ) = 0.07 , P (x̃|c2 ) = 0.019.

Finally,
P (c1 )P (x̃|c1 ) = 0.05 , P (c2 )P (x̃|c2 ) = 0.005.
The individual x̃ will most likely buy the portfolio product.

5.2.11 Nearest Neighbour Analytics

Consider clients where each instance x is a vector with n attributes. We want to categorize
clients which are near and distant from each other in n dimensional space. If clients
cluster, one knows similar demand for goods and services which allows for sales activities.
Figure 5.26 plots a set of costumers which are scattered according to their risk aversion
index and their income. A third feature describes whether the client bought a specic
product (green) or whether he refused to do so (blue).

Figure 5.26: Nearest neighbours.

5.2. MACHINE LEARNING (ML) 471

Consider a new client x̃ (red dot). If we consider he rst nearest neighbour, he be-
longs to the class who does not buys the product. Considering the 3 nearest neighbours,
we assign to the new client the class blue, i.e. buy the product. Considering the 5
nearest neighbours, a next assignment follows and so on. The reason to consider only
an odd number of neighbours is to avoid unambiguous assignment. We denote by Nk (˜§)
the neighbourhood of x̃ of thek instances given a distance metric d. Which metric d
should one choose? If the inputs x are real numbers, the Euclidian distance is a possible
metric.
10 If inputs are binary valued, the Hamming distance is used.

If we use the Euclidian distance, the distance in income and the distance in risk
aversion are of dierent sizes. Therefore, the attributes are normalized to take values
in [0, 1] and furthermore, it attributes have dierent weights, the terms in the Euclidian
norm are weighted respectively. Typically, the weighting function v(x, y) can be chosen
inversely proportional to d(x, y), the closer two components of x, y are, the more weight
is attributed. The classication task is to nd the class cj ∈ C such that the weighted
distance in a neighbourhood is maximized, i.e.

X
c(x̃) = arg max v(x, x̃)d(cj , c(x)) . (5.38)
c∈C
x∈Nk (x̃)

5.2.12 'Sentimental Risk'

The goal is to integrate additional, forward-looking signals, which are based on the
computer-assisted analysis of a a variety of corporate and nancial news, into a standard
factor risk model.

Consider the factor returns

X
RkF (t) = Ak,m m RF ∈ RM ,
m
√
A = C, with C the covariance matrix of the factor returns and IID normally dis-
2
tributed with mean zero and variance σ . Given the factors, the linear pure time
series risk model reads
Rn (t) = βn0 RF + σ̃n ˜n
with σ̃n the idiosyncratic volatilities and ˜ IID normally distributed. Data analytic
providers deliver news statistics for many nancial instruments such as stock, stock in-
dices, commodities and many more such as the number of positive and number of negative
messages per time unit or a signal which summarizes all sentiments at a given date of
an asset. There are many dierent ways how such sentiment signals can be integrated in
the pure time series risk model.

10 The Manhatten distance is given by d(x, y) = P |xi − yi |, the p-norms are used or the Chebyshev
j
distance which is given by d(x, y) = maxi |xi − yi |.
472 CHAPTER 5. ASSET MANAGEMENT INNOVATION

One way is to assume that the signal impact the time series volatility of the factors.
This implies a new adjusted volatility σa which captures the additional information in the
sentiment signals. The simplest relationship between the original and adjusted volatilities
is a linear regression
σa (t) = a0 + a1 Ξ(t)σ(t)
with Ξ the sentiment analytic signal. If there exists for each factor an adjusted volatility,
then the factor θk = σa,k /σk is integrated in the risk model as follows. Setting

Aθij = θi Aij

we get
2
σa,k = θk2 σk2 .
The modied model X = Aθ would change the volatilities as desired without changing
the correlation structure.

5.2.13 Customer Retention: Text Mining

Consider an AM rm which receives information from its customers. Some information
are Complaints which we distinguish from all other information called General Contacts.
The task is to automatically classify any client feedback into a Complaint or a General
Contact. Figure 5.27 illustrates the data process for text mining task and the data
preprocessing part. We present text classication for four ML algorithms: Naive Bayes
(NB), support vector machine (SVM), random forests (RF), and articial neural networks
(ANN). We use the U. S. consumer complaint database and random tweets. The goal
is: How accurate can the machine learning algorithms classify between complaints and
general texts?

5.2.13.1 Pre-Processing
The dataset has two parts. Part I 10,000 consists of complaints regarding three dier-
ent topics credit reporting, debt collection and mortgage from the U.S. database from
October 2018 to March 2019. Part II consists of 10,000 authorized random tweets from
various users located in the U.S. Retweets, which don't have meaningful content, and
tweets with less than 200 characters are excluded. Each complaint and tweet has around
199 and 46 words, respectively.

Complaints and tweets cannot be processed by algorithms without tokenization. To-

kenization segregates individual words from a document. It makes ML more robust and
practical After tokenization, each document consists of numbers of tokens instead of sen-
tences. The number of distinct tokens in all documents determines the dimension of the
database.

Four steps follow afterwards in the pre-processing, see Figure 5.27: First, we remove
anything that doesn't provide useful information for classication such as 'X' covering
5.2. MACHINE LEARNING (ML) 473

private information, web addresses, hashtags, numbers, punctuations. Second, a content

transformation takes place. All letters are changes to lowercase (why?) and 'Mr.' is
replaced by Mister. Third, stop words are removed. A stop word is 'the'. This greatly
reduces dimension of documents and it is essential for computational eciency. Fourth,
normalization is applied which consists of stemming and lemmatization. Both steps im-
prove information retrieval. Stemming removes suxes from words that have common
root forms and produce the stem. Stemming converts 'studying', 'studies' and 'studied' to
'studi' which is not an existing word (no morphology. Lemmatization distinguish 'lovely'
and 'love', while stemming returns 'love'. Lemmatization acquires semantic meaning by
looking up in a lexical database like WordNet (Princeton University, 2010). It resolves
words like 'good', 'better' and 'best' to their lemma 'good'. After all the clean-up, the
average numbers of words in complaints and tweets shrink from 199 and 46 to around 73
and 19, respectively.

Evaluation /
Validation

- Performance assessment
- Feedback loop
Data Text
Modeling
Acquisition Preprocessing

- Acquisition - Transformation - Discover

- Cleaning - Extract
Application
- Organize
knowledge
- Presentation
- Campaign
- Software

Text Pre-processing

• Remove special Morphological

Data • Detect language
characters replacement:
Acquisition • Select a language
• Convert to lower given  give

Select most significant Stem words:

Replace synonyms
words Texts  Text

Modeling

Evaluation / Test performance of

Train classifier
Validation classifier

Figure 5.27: Text mining big data analytics (Swissquant [2017]).

An original complaint text reads:

To whom it may concern : Can ANYONE help? I have fullled all requirements re
Claim † XXXX dd XXXX Green Tree Loan Servicing, & I am being told that I MUST
wait until further investigations are complete before the check which was issued by XXXX
XXXX can be released to me for completion of repairs to the home I live in. I have an
XXXX 15 year old daughter who is suering in this HEAT WAVE -along with XXXX
cats, a bird my wife and myself. No hotel will take us. My next step will be an attorney
& the Media. XXXX XXXX
474 CHAPTER 5. ASSET MANAGEMENT INNOVATION

After the pre-processing:

may concern can anyone help fullled requirements re claim dd green tree loan ser-
vicing told must wait investigations complete check issued can released completion repairs
home live i year old daughter suering heat wave along cats bird wife no hotel take us my
next step attorney media

5.2.13.2 Bag-of-Word Model (BoW)

After the texts have been processed, the next step is to simplify the documents. The
standard is the Bag-of-Word (BoW) model, which is often used for text classication
tasks. This model simplies the presentation of documents in such a way that it dis-
regards the order information and semantics of words (Wallach, 2006). Ignoring the
order of words, however, does not have a dramatic eect on text classication (Lewis,
1998); quite in contrast to the spoken language of humans. The BoW model extracts all
the individual words from the documents to compile a vocabulary. It then determines
the meaning of the words using a term weighting system. The Document-Term Matrix
(DTM of Document-Term Matrix), in which the rows correspond to the documents and
the columns correspond to the terms, displays the term weighting system. The size of the
DTM for our dataset is 20,000 times 30,770. Each entry in this matrix is a number twi,k
which counts how often a term k appears in document i. According to McCallum and
Nigam, the multinomial model exceeds the multivariate Bernoulli model, especially for
large vocabulary and large variance in document length. Furthermore, the multinomial
model takes into account the document length. We therefore choose the multinomial
model to capture the frequencies of terms in the documents. In this model McCallum
extended the assumption 2 of NB that the probability of occurrence of a term is in-
dependent of its context or position in the document and that document lengths are
independent of classes. The factor for the frequency of terms (tf ) counts the times at
which a term k appears in a document i:
tf (termi,k ) = Fi,k
Apart from the term frequency factor, we add the inverse document frequency (idf ) to
the term weighting system. A term that occurs more than once in a document is crucial,
while a term that occurs in many documents is considered uninformative (Sebastiani,
2002). The factor idf measures the relative importance of a term for a document:

idf (termi,k ) = log(N/n(k)),

with N the total number of documents and n(k) the number of documents with the term
k. After normalizing the document vectors to equal length, the term weighting system
is complete as follows:

tf (termi,k )idf (termi,k

twi,k = qP
N 2
i (tf (termi,k )idf (termi,k )
5.2. MACHINE LEARNING (ML) 475

for document i and term k.

5.2.13.3 Dimension Reduction

As mentioned, the vocabulary consists of 30,770 individual words. The large number of
DTMs could have a negative eect on the performance and computing eciency of the
algorithms. Therefore, dimension reduction is the essential step before data modelling.
Note that the DTM is thinly populated, i.e. the number zero is present in many places.
By selecting a subset of variables that captures as much of the original content as possible,
we obtain a lower dimensional matrix to represent the data. This process is called feature
or paint selection (Fodor, 2002). We apply feature selection by setting the thrift threshold
to 0.95: Only terms that occur in at least 5% of all documents are retained. The number
of variables is then reduced to 98. The dimension of the DTM has shrunk by 99.68%.

5.2.13.4 Training and Testing

Now the DTM is suitable for data modelling. The algorithms learn the data structure
based on the training set. We use the test set to evaluate their performance, but none
of the documents are already included in the training data set. In other words, the
algorithms should accurately classify documents not only within the sample but also
outside the sample. If the algorithms can capture the general pattern of the data, the
gap between training and test errors should be small. To divide the entire data set into
training and test sets, we construct a function that randomly rearranges the order of all
documents. The rst 70% of the data belongs to the training set and the remaining 30%
to the test set. All algorithms are executed based on the same data set.

5.2.13.5 Performance Measurement

We use the Confusion Matrix to display the predictions of the classiers and the actual
labels. The diagonal components, true positive (TP) and true negative (TN), refer to
correctly labelled complaints or tweets. The non-diagonal components, false positive
(FP) and false negative (FN), refer to incorrectly classied documents. FP indicates the
number of tweets labelled as complaints (Type I errors), while FN indicates the number
of complaints labelled as tweets (Type II errors). Dierent performance measures can be
dened. Error err = (F P + F N )/M and the accuracy rate acc1 − err are

TP + TN
acc = .
M
The hit rate or recall is given by rec = T P/P , i.e. how many complaints are correctly
classied given the total number of complaints. and the false alarm rate by F P/N .
Precision pre = T P/P 0 measures how many of the predicted complaints are actually
complaints. The F-Score is another evaluation metric that represents the harmonic mean
of precision and memory. From a FI perspective, FN rate is more important to control
than FP rate: Better to quickly realize a false alarm than to neglect a complaint because
476 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Predictions

Complaints Tweets Total

TP FN
Complaints True Positive False Negative
P

True
FP TN
Tweets False Positive True Negative
N

Total P’ N’ M

Figure 5.28: Confusion matrix for text classication.

ons thinks it is general text. A late or improper response to complaints would jeopardise
the customer's relationship. Therefore, we use the F2 value, which is twice as weighted
as the precision
(1 + β 2 )pre × rec
F2 =
β 2 × pre + rec
where β equals 2. High accuracy rates could be attributed to chance. We use the
bootstrap method with replacement to estimate the overall accuracy, F2 value, and
classication performance of the classiers. Here, the classiers build their models based
on a temporary training set, and the models are evaluated using the appropriate test set.
We repeat this procedure times with a for loop where dierent bootstrapped samples
are randomly generated for each iteration. The total accuracy rate and F2 score are the
average accuracy rates and F2 scores calculated for each iteration.
The confusion matrix the Naive Bayes classier reads in an example:

Predicted Predicted
Complaint General
Actual Complaint 49.54% 3.12%
Actual General 1.87% 45.47%

The diagonal elements show the correct predictions. There is almost no general text
predicted to be a complaint (Type I error). About 8% of the complaints are not detected
as such (Type II error). The business risk the rm faces due to big data analytics is that
employees who write the answer do not recognize the Type II error communication, then
5.2. MACHINE LEARNING (ML) 477

the customer will receive an improper response. The parametrization of the algorithm
can change the ratio Type I to Type II error.

Figure 5.29 shows the results for dierent classiers.

Figure 5.29: Performance of the four classiers Naive Bayer, Support Vector Machines,
Random Forrest and Articial Neural Network. (Zin (2019)).

5.2.14 Portfolio Construction with Machine Learning, I

Gu et al. (2018) use machine learning to measure asset risk premia. They perform a
comparative analysis of ML methods in the prediction of cross section and time series
stock return. They identify trees and neural nets as the best predictors. That is the
non-linear predictor interactions are the source for the result. Regressions fail to account
for this type of structure. As a second nding. the methods agree on the same small
set of risk premia: Dominant predictive signals are found for momentum, liquidity, and
volatility. They measure risk premia both of the aggregate market and individual stocks.

Measurement of an asset's risk premium means predicting conditional expectation of

a future realized excess return. Relative to traditional econometric methods, ML pro-
vides a far more expansive list of potential predictor variables and richer specications
of functional forms. ML does not forces to use linear predictions as in the Fama-French
model for example. Furthermore, the zoo of predictors is a serious problem, both for
academia and investors. ML is used for variable selection and dimension reduction tech-
niques.
478 CHAPTER 5. ASSET MANAGEMENT INNOVATION

The authors use 300 000 individual stocks over the time horizon 60 years from 1957
to 2016. Our predictor set includes 94 characteristics for each stock, interactions of each
characteristic with eight aggregate time series variables, and 74 industry sector dummy
variables, totalling more than 900 baseline signals. Some of our methods expand this
predictor set much further by including non-linear transformations and interactions of
the baseline signals. Gu et al. (2018)

Their benchmark panel of individual regressions of individual stock returns consists

of there lagged stock-level characteristics: size, book-to-market, and momentum. This
benchmark is parsimonious, simple and comparison against this benchmark is conserva-
tive, since the three characteristics are routinely demonstrated to be among the most
robust return predictors. This nding is in line with Lewellen (2015). In their hugh
sample, the out-of-sample R-squared from the benchmark model is 0.16% per month for
the individual stock returns. Using OLS for such a hugh set with more than 900 pre-
dictors leads to negative R-squared. OLS is not able to generate any predictive power
out-of-sample if that many parameters need to be estimated (overtting).

Dimension reduction or penalization techniques are needed in the OLS case. Using
parameter shrinkage and variable selection, which both limit the degrees of freedom in
the regression, brings the out-of-sample R-squared back to 0.09% per month. Principal
component regression (PCR) and partial least square (PLS), which reduces the dimension
of the predictor set to a few linear combinations of predictors, raise the out-of-sample
R-squared to 0.28% and 0.18%, respectively. Non-linear specication further improves
predictions. The authors use generalized linear models, regression trees, and neural net-
works. Regression trees and neural nets lead to a R2 between 0.27% and 0.39%. The
economic gains are considerable. An investor in the S&P 500 which uses neural network
forecasts reaches a 21 percentage point increase in annualized out-of-sample Sharpe ratio
(0.63) relative to the 0.42 Sharpe ratio of a buy-and-hold investor. Forming a long-short
decile spread, sorted on stock return predictions from a neural network, the strategy
earns an annualized out-of-sample Sharpe ratio of 2.35 compared to the Sharpe ratio of
0.89 of their benchmark.

We describe some of the used machine learning methods. All methods have the
objective to minimize the mean squared predictions error (MSE). They describe an asset's
excess return of asset j as an additive prediction error model:

Rtj1 = E(Rt+1
j
|Ft ) + jt+1 = g(ztj ) + jt+1 (5.39)

) where zj is the vector of predictors. Since predictions do not depend on time and the
individual stock the estimates of risk premia are more stable for any individual asset
contrary to standard methods where the cross-sectional is re-estimated in each period.
Since g depends on ztj information prior to t or from other stocks is not used. ML re-
quires careful construction of the sub-samples for testing, estimation and hyperparameter
tuning to control model complexity for the out-of-sample performance. Depending on
5.2. MACHINE LEARNING (ML) 479

the algorithms used, dierent tuning methods are in force, see Guo e al (2018) for details.

Dierent choices of the function g dene dierent models. In the simple linear model
imposes conditional expectations can be approximated by a linear function of the raw
predictor variables, i.e. g = hzti , φi with the parameter vector φ. The objective function
is the ordinary mean square error loss

N,T
1 X j i
2
L= Rt+1 − hzt , φi .
NT
j,t=1

Using statistical robustness methods (the Huber loss function), this least square objec-
tive can be tuned to account better for observations which are more informative. Penalty
models induce sparsity in the sense that they force small variable to become zero.

If predictors are highly correlated, the above shrinkage and selection methods are
not optimal. It is better to choose an average of the predictors as the sole predictor in a
univariate regression. This average approach is the essence of dimension reduction. Prin-
cipal components regression (PCR) and partial least squares (PLS) are two approaches.
PCR starts with a principal components analysis (PCA), which conserves the covariance
structure among the predictors, and then the leading predictors, given by the highest
eigenvalues, are used in the predictive regression. PCR rules out coecients by consider-
ing the covariation among the predictors before considering their goodness in predicting
future returns. PLS contrarily rst performs a dimension reduction by exploiting co-
variation of predictors with the forecast target. We refer to Yeniay and Göktas for a
comparison of PCR, OLS, PLS.

The generalized linear model expresses the model return forecast error as a sum
of an approximation error (bias, not knowing the true model g ∗ ), an estimation error
(variance, not knowing the true parameters in the model g) and an intrinsic error term.
Generalized linear means that non-linear univariate transformations of the predictors are
considered. As it turns out ex-post a weakness is that it does not allows for interactions
among predictors. Considering multivariate functions of predictors would generate such
interactions. But the number of parameters of such a model becomes computationally
intractable.

Instead regression trees are used for incorporating multi-way predictor interactions.
Formally,
K
X
g(zi,t , θ, K, L) = θk χ{zi,t ∈Ck (L)}
k=1
where Ck (L) is one of the K partitions of the data and θk is the sample average of
outcomes within the partition. This formula says that given a tree consider all paths
starting from the root node to the end nodes (sum over K ), at each node in a given path
check whether the feature is above or below the threshold value (indicator function), and
480 CHAPTER 5. ASSET MANAGEMENT INNOVATION

multiply all indicator functions in a given path. Based on the basic decision tree model,
boosting and random forest ensemble methods are introduced in order to stabilize the
results, to improve the performance and to manage the bias-variance tradeo.

Given these models, the out-of-sample R2 for individual excess stock return forecasts
is calculated:
(Ri,t+1 − R̂i,t+1 )2
P
i,j∈τ
R2 = 1 − P 2
Ri,t+1
i,j∈τ

where τ indicates that only the testing sub-sample is used for tting. The metric is used
without demeaning which is meaningful since we consider individual stocks and not broad
indices. Using the historical averages adds a lot of noise. All models under consideration
increase their monthly R2 by 3 percentage points when benchmarked to the historical
means. Table 5.2 shows the monthly out-of-sample stock level prediction performance.

OLS* OLS-3 PLS ENet RF GBKT* NN1 NN3 NN5

All -4.60 0.16 0.18 0.19 0.27 0.30 0.35 0.39 0.35
Top 1000 -14.21 0.15 -0.10 0.10 0.62 0.53 0.44 0.72 0.69
Bottom 1000 -2.13 0.37 0.29 0.18 0.29 0.27 0.41 0.46 0.40

Table 5.2: Monthly R2 for the entire panel of stocks using OLS, OLS using only size,
book-to-market, and momentum (OLS-3), PLS, elastic net (ENet), random forest (RF),
gradient boosted regression trees (GBRT), and neural networks with one to ve layers
(NN1-NN5). The generalized linear model (GLM) are not reported (bad performance
anyway). '*' indicates the use of Huber loss instead of the l2 loss. Top 1,000 or bottom
1,000 means 1000 stocks by market value. (Guo et al. [2018])

The negative results for OLS reect the in-sample-overt. Restricting OLS to three
style premia or using penalization as in ENet improves the performance signicantly.
Regularizing the linear model via dimension reduction improves predictions even fur-
ther, the PLS case. Hence, dimension reduction dominates variable selection. Boosted
trees and random forests are competitive with these methods. Neural networks are the
best performing predictor overall. But the drawback of neural network is the diculty
to interpret the results - what is the economic meaning of the dierent layers one to
ve, why do they generate the performance. Therefore, neural networks fail to be inter-
pretable which is a serious drawback in asset from a client perspective and also from a
regulatory one.

Assessing the statistical signicance of the return performances, the authors use
Diebold- Mariano test statistics for pairwise comparisons of a column model versus a
row model. The statistics implies that the performance dierences among regularized
linear models are all insignicant, that is all OLS models, ENet, PLS and PCA pro-
duce statistically indistinguishable forecast performance. Random forest and boosted
5.2. MACHINE LEARNING (ML) 481

trees improve over linear models marginally. Again, neural networks are the only models
that produce large and signicant statistical over all linear models. When one consid-
ers which characteristics matter in the dierent model, a few characteristics turn out to
signicantly contribute in all models to the return performance: Momentum on several
time scales, volatility characteristics, spreads for example matter. That is, as we have
seen in other parts market driven characteristic dominate macro economic or accounting
type characteristics.

5.2.15 Portfolio Construction with Machine Learning, II

This example is due to de Prado (2016). The authors introduce to so-called Hierarchical
Risk Parity (HRP) approach. HRP is concerned about the issues sensitivity, concentra-
tion and under-performance of classical optimization models and the Markowitz model as
a particular case. One source for these issues is the need to invert the covariance matrix.
HRP is a method were this inversion is not needed. HRP is also expected produces less
risky portfolios out-of-sample compared to traditional risk parity methods.

Correlation matrices can be represented as complete graphs, which lack the notion
of hierarchy: Each investment is substitutable with another. There is no hierarchical
relationship. All nodes are of the same importance. Small estimation errors are magnied
in such a structure. Consider an investor, which invests in many assets where some assets
are close substitutes to each other while others are complementary. Say stocks with a
similar liquidity and of the same economic sector are more substitutable than stocks
which have dierent characteristics. Such a classication of the dependence leads to a
tree structure which includes hierarchical models and not a symmetric complete graph
where weights between any nodes can vary freely, see Figure 5.30.
While a covariance matrix has N × (N − 1)/2 edges to connect the N nodes, a tree
has only N − 1 edges to rebalance the weights among peers at various hierarchical levels.
Furthermore, in the covariance matrix the weights distribution has no natural point to
start with. But in a tree the weights are distributed top-down which is consistent with
many asset managers investment behaviour.

The HRP algorithm is constructed in three steps: First, similar investments are
grouped into clusters, based on a proper distance metric. This denes tree clustering.
Second, the rows and columns of the covariance matrix are reorganized such so that the
largest values lie along the diagonal. This leads to a quasi-diagonalization of the clustered
tree. With such a quasi-diagonalization the problems of inverting the covariance matrix
are circumvented. Third, the allocations is split top-down through a recursive bisection
of the reordered covariance matrix.

The tree construction, step one, is done in several steps. The rst stage is generate
tree clustering out of the data. Let a T ×N matrix of observations X be given with N
the number of asses and T the periods. The goal is to map the N column vectors into
a hierarchical structure of clusters, such that allocations can ow downstream. The rst
482 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Figure 5.30: A complete-graph (top) of 50 × 50 correlation matrix and a tree-graph

(bottom) structures (Lopez [2016]).

step is to dene 'distance of distances'. Let ρ be the correlation matrix of dimension

N × N. Then D
r
1
D = (dij ) , dij = d(Xi , Xj ) = (1 − ρij )
2
is a N × N distance function. The Euclidean distance d¯ between any two column-vectors
of D is dened by
v
uN
uX
¯
dij = t (dni − dnj )2 .
n=1

Note that dij is dened on column-vectors of X but d¯ij is dened on column-vectors of D,

the distance of distances: It is a distance function dened on the entire correlation matrix
and not only on particular cross-correlations. The distance-of-distances are clustered as

u(1) = argmini6=j (d¯ij ),

i.e. it is selecting the smallest distance-of-distances number in the d¯ matrix. Since

clustering does not apply to the whole distance matrix, we have to dene the distance
between a new cluster u(1) and and the unclustered items as follows

d¯i,u(1) = min((d¯ij )j∈u(1) ).

5.2. MACHINE LEARNING (ML) 483

Finally, the matrix d¯ij is updated by appending d¯i,u(1) and dropping the clustered columns
and row j ∈ u(1), see the example for illustration:

     
1 − − 0 − − 0 − −
ρ = 0.7 1 − → d = 0.3873 0 − → d¯ = 0.5659 0 − → u(1) = (1, 2)
0.2 −0.2 1 0.6325 0.07746 0 0.9747 1.1225 0

and

 
  0 − − −
0
¯ i,h=1,...4
0.5659 0 − −
u(1) → di,u(1) =  0  → (d) =  .
0.9747 1.1225 0 −
0.9747
0 0 0.9747 0

Finally, the above steps are recursively such that N −1 clusters can be appended to the
matrix D until the algorithm stops when the nal cluster contains all of original items.
The sequence of the cluster formation can be illustrated using a dendogram.

The next stage, quasi-diagonalization, reorganizes the rows and columns of the co-
variance matrix, so that the largest values lie along the diagonal. This operation places
similar investments close to each other and dissimilar ones far apart. The used algorithm,
which we do not discuss, preserves the order of the clustering.

Figure 5.31: Quasi-diagonalizes of the clustered correlation matrix, in the sense that the
largest values lie along the diagonal. HRP does not require a change of basis unlike the
PCA approach for example. HRP works with the original investments. (Lopez [2016]).
484 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Stage three uses the fact that inverse-variance allocation is optimal for a diagonal
matrix. For the quasi-diagonal matrix of stage two one approach to achieve a recursive
bisection is split the allocations of the quasi-diagonal matrix between adjacent subsets
in inverse proportion to their aggregated variances.

The author compares in- and out-of-sample the minimum variance portfolio (GMV),
the inverse volatility portfolio construction (IVP) of risk budgeting (where correlation
information is discarded) and the HRP. In all portfolio constructions he applies long-only
constraints. The simulation is done for 10 assets. The GMV allocates 92.66% on 5 top
holdings and 0 to three assets. The HRP allocation is in between the highly concentrated
GMV and the IVP almost equal distribution. The GMV and the HRP portfolios have
almost the same risk although GMV only uses half of the assets. Therefore, an event im-
pacting the top ve assets will have a more severe impact than in the HRP case. From an
out-of-sample perspective, Gaussian returns are generated with mean zero and 10% stan-
dard deviation, random shocks are added to account for price jumps and the portfolios are
rebalanced monthly (every 22 observations). The simulations are repeated 10, 000 times.
All mean portfolio returns out-of-sample are essentially zero. But the variance of the out-
of-sample portfolio heavily dier. The GMV variance is out-of-sample the highest one,
72.47% greater than in the HRP's. Intuitively, shocks aecting a specic investment
penalize the GMV concentration. Shocks involving several correlated investments
penalize IVP which ignores the correlation structure. HRP protects against common
and idiosyncratic shocks by balancing between diversication across all investments and
diversication across clusters of investments at multiple hierarchical levels.

5.3 Blockchain
Blockchain, a technology, Bitcoin, a crypto currency, and cryptography, mathematics of
encription/decription and digital signatures, are the three main pillars in a 'digital asset'
world.

5.3.1 Cryptography
Cryptography is a main mathematical discipline in a digital world. It makes protection
and validation possible in a world of strangers when we want to exchange values in a
blockchain at a zero human trust level. The main goals of cryptography are

1. Condentiality (access protection)

2. Integrity (change protection)

3. Authenticity (counterfeit protection)

4. Obligation (non-repudiation)
5.3. BLOCKCHAIN 485

For the rst goal, encription and decription are used. For the other tree goals, digital
signatures are used.

Today, the operationalization of these goals means to dene mathematical problems

(cryptogaphy) which are hard to solve for supercomputers. While the goals matter since
thousands of years between humans which want to trade, cooperate or ght, the dier-
ence is that today computers are used for encription, decription and signing. The main
source for cryptography is Goldwasser and Bellare (2008).

The ancient problem of cryptography is secure communication over an insecure chan-

nel. Alice wants to send to Bob a secret message. In the traditional solution, Alice
and Bob meet before transmissions starts where they agree on a pair of encryption and
decryption algorithms E, D, and the secret information key S which is the same for both
of them: They use the same keys and the same algorithm for encryption and decryp-
tion. This is called symmetric cryptography or private key encription. The meeting to
exchange the secret information is risky and does not scale to many individuals. Alice
encrypts the message M by computing the ciphertext c = E(S, M ), sends it to Bob who
decrypts c by computing

D(S, c) = D(S, E(S, M )) = M.

This shows that cryptography needs inverse functions or properties of the preimage of
functions, the so-called one-way function. These are functions where it is easy to calcu-
late f (x) from x but hard to invert, i.e. to calculate x from f (x).

Although one-way functions are believed to exist, mathematical proofs about their
existence are missing. An example is the factoring function of prime numbers, i.e.
f : (x, y) → f (x, y) = xy . The product is simple to calculate but nding the prime
factors is dicult.

Asymmetric cryptography overcomes the diculties of symmetric cryptography by

still using the same algorithms for encryption and decryption but by allowing for dierent
keys for the sender and receiver of a message, see Die and Hellman (1976). Alice and
Bob do not have a common secure key S. Instead they have their own private key. To
achieve this, Bob publishes information, the public key vkB , which is used for verication.
Everybody knows this key and can send messages to Bob using this key without the
need to meet Bob. Bob also generates a private key pkB which is known only to him
and allows him to decrypt any message sent to him. For secure public key encryption
trapdoor functions are used, i.e. one-way functions for which there exists some trapdoor
information known to the receiver Bob alone which Bob can use to invert the one-way
function, see Figure 5.32. Alice also generates a public and a private key.

Denition 104. Private keys are denoted pkX with X the owner of private key and vkX
denotes a public key.
486 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Diffie–Hellman
Symmetric-key Asymmetric-key
key exchange

Figure 5.32: Symmetric key, asymmetric key and Die - Hellman key exchange (Source:
Wikipedia [2016]).

Rivest, Shamir and Adelman proposed a rst candidate trapdoor function, the RSA
system. Before w consider this algorithm, we introduce to basic number theory - modular
arithmetic (MA).

5.3.2 Modular Arithimetic (MA)

The maths is used to formalize the encryption and decryption algorithms as well as gen-
eration of the keys. MA is arithmetic for integers where the addition of two numbers
restarts after a certain value (modulus). The clock is a prototype example: Hour arith-
metic's is modulo 12, written mod 12. 3 = 15 mod 12 means 3 − 15 = −12 is an integer
multiple of 12. An equivalent denition of a = b mod n is that there exists an integer k :
a = kn + b.11 We summarizes some calculus rules.

11 Examples are 40 = 18 mod 11, −40 = 8 mod 6, −40 = 8 mod 8. The modulus operation denes an
equivalence relation on the integer numbers. The relation satises reexivity, symmetry and transitivity.
5.3. BLOCKCHAIN 487

Proposition 105. Let ai = bi mod n with i = 1, 2 and k an integer.

a + k = b + k mod n
ka = kb mod n
a1 ± a2 = b1 ± b2 mod n
a1 a2 = b1 b2 mod n
ak = bk mod n, k > 0
k a 6= k b mod n

Basic for cryptography is the existence of the inverse a−1 of a. If c=d mod φ(n),
where φ is Euler's totient function, then a
c = ad ( mod n) provided a is coprime with n;
this replaces the last false statement in the above properties. The totient function

φ(n) := card{a ∈ N|1 ≤ a < n ∧ gcd(a, n) = 1},

describes all natural numbers smaller than n with greatest-common-divicsor (gcd) 1

with n: The totient function counts the positive integers up to a given integer n that
are coprime to n. For example, φ(20) = card{1, 3, 7, 9, 11, 13, 17, 19} = 8. 12 The totient
function is a multiplicative function: φ(pq) = φ(p)φ(q) for q, p prime which is at the
basis of the multiplication and inversion in MA. The key property for inversion is Euler's
Theorem: If a and n are relatively prime then

aϕ(n) ≡ 1 mod n.

As an application, the solution of ax = b modn is

x = aφ(n)1 bmodn

provided gcd(a, m) = 1. There exists an integer modular multiplicative inverse denoted

a−1 , such that aa˘1 = 1 mod n if and only if a is coprime with n. If a = b mod n and
a−1 exists, then a˘1 = b˘1 mod n. Finally, for ax = b mod n and if a is coprime to n,
then x = a
−1 b mod n. For the RSA algorithm, the inverse of the function a → avk modn,
pk
is the function b → b , the multiplicative inverse of vk mod(φ(n)). Not knowing the
factorization of n makes it dicult to compute the totient function and hence the com-
putation of the private key.

As an example, assume that to each letter of the alphabet we associate one number
0 to 25. Encryption means
y = ax + b mod(n)

and decryption is given by

x = a−1 (y − b) mod(n).

12 Two integers a and b are coprime if the only positive integer factor that divides both of them is 1 or
equivalently that their greatest common divisor is 1.
488 CHAPTER 5. ASSET MANAGEMENT INNOVATION

We set the key (a, b) = (9, 13) and gcd(a, 26) = 1. We decrypt the letter C which is
mapped into the number 2. Then

y = 9 · 2 + 13 mod(26) = 31 mod(26) = 5.

Decryption implies
x = 3(5 − 13)mod(26) = 2
where the calculation of the inverse is the tedious part. Since the gcd of 9 and 26 is 1,
the inverse a−1 exists. The number of possible keys 12 · 26 = 312 is very small.

We consider the discrete logarithm which is assumed to be a one-way function.

We recall that n the reals x = logb a solves bx = a. To dene the discrete analogue the
concept of a group is needed.

Denition 106. A set G is a group if there is an operation ∗ such that for two elements
of gq , g2 ∈ G also g1 ∗ g2 ∈ G, the operation ∗ is associative, there exists an unit element
e such that g ∗ e = e ∗ g = g for all g and for each g there exists an inverse g −1 such that
g ∗ g −1 = g −1 ∗ g = e.

The integer numbers with ∗ n numbers

the addition operation, the permutations of
and the rotations in Rn dene groups. If g g k = g ∗ g ∗ . . . ∗ g is well
is a group element,
dened for k a positive integer and if k is negative, the denition applies to g
−1 . For

a, b ∈ G, an integer k which solves bk = a is the discrete logarithm k = logb a.

The group of integers modulo n, Zn := Z/nZ, is an important group in cryptography.

Zn means that two numbers a, c ∈ Z are the same if they have the same reminder when
divided by n.
We have

a1 + a2 = s1 n + r1 + s2 n + r2 = (s1 + s2 )n + qn + r = r

where r1 + r2 = qn + r. Hence, Zn = {0, 1, . . . , n − 1} is an additive group if any

number exceeding n is 'cut'. Consider Z7 . Then 6 + 3 = 9 = 1 × 7 + 2 and therefore,
[6 + 3] = [9] = [2] in Z7 . The numbers 2 and 9 belong to the same equivalence class [2]
which for simplicity is written 2.

The set Z∗n consists of all integer numbers m, 1 ≤ m ≤ n such that gcd(m, n) = 1.

Proposition 107. Zn is a group under addition modulo n. Z∗n is a group under multi-
plication modulo n.

Z∗n is a group since a = b mod n

gcd(a, n) = gcd(b, n), i.e. the congru-
implies
ence classes modulo n gcd(a, n) = 1 and gcd(b, n) = 1 implies
are well-dened. Next,
gcd(ab, n) = 1, i.e. closure under multiplication follows. Finally, given gcd(a, n) = 1
and nding the inverse aa
−1 = 1 mod n is possible using Bézout's Lemma. The inverse
5.3. BLOCKCHAIN 489

satises gcd(a−1 , n) = 1, i.e. it is also an element of the group.

The order of the group Z∗n is given by Euler's totient function and the group is cyclic:
All group elements can be generated by multiplication of a single element, if and only
if n = 1, 2, 4, q k , 2q k where k is a positive integer and q any prime number dierent from 2.

Consider the function f : (n, g, x) → (g x mod n, p, g) where the group is assumed to

be cyclic and g the unique generator is conjectured to be a one-way function: Computing
f can be done be repeated multiplication but the calculation of the inverse requires to
calculate the discrete logarithm. There is no ecient algorithm to invert the function
for large enough values of the parameters. This is the basic method used in cryptogra-
phy. To set-up such a function one needs to nd primes and generators, see the literature.

5.3.3 RSA Algorithm

We consider the RSA algorithm example based on Sullivan (2013) and Corbellini (2015).
The RSA algorithm is used both for encryption, the case we consider now, and for digi-
tal signatures. Bob wants to send the message M 'Hello Alice' to Alice. The goal is to
convert all letters into a deterministic sequence number, then these numbers are mapped
into a random-locking number (encryption) which can only be mapped back to the orig-
inal sequence if the private decryption key is used. Since computers prefer to work with
not too large numbers, a maximum function is used. The private and public key are two
numbers larger than zero and smaller than the maximum number.

13 and 7 are chosen. The maximum

To start with assume that the two prime number
number is 91 = 7 × 13. The public key of Alice is the number vkA = 5. An algorithm
based on the information in the system of 91 and 5 generates the private key pkA = 29
for Alice. How can this be used to convert transmit the letter C in the message 'Hello
Alice' ? First, the letter has to be turned into a number. The UTF-8 schemes attributes
the number 67 to the letter C . Then, the letter C is multiplied 5 times - the public key
- with itself. Since already 67 × 67 > 91, the calculation is done modulo the remainder.
This means,

67 × 67 = 4489 = 91 × 49 + 30 .
Therefore, the result after the rst multiplication is 30. This is then again multiplied
with 67, which is larger than 91 and applying the same division as above, the result
is 8 (the remainder). This is repeated in total 5 times leading to the number 58 - the
encryption E of C = 67 is E(67) = 58. This is the message Alice receives. Now she uses
the private key number 29 and multiplies the 58 with itself 29 times where she uses the
same logic - after each multiplication we do the next multiplication with the remainder
- for decryption D:
58 × 58}
| {z = 67
29 times, modulo 91
490 CHAPTER 5. ASSET MANAGEMENT INNOVATION

which is the letter C. In other words,

D(58) = D(E(67)) = 67 .

If you don't know the private key number 29, then you don't know how many times
you have to multiply 58 with itself in the above time consuming way to calculate and
consider in each step the remainder. Besides the easy part of multiplication (encryption),
decryption is a hard to solve factoring-type.
Summarizing,

• Alice choose prime numbers p, q , which she keeps secret, and set n = pq .

• Alice chooses vkA such that the greatest common divisor of vkA and φ(n) is 1, i.e.
the public key and the Euler function number φ(n) are coprime vka ∈ Z∗φ(n) .

• Alice computes the inverse of vkA , the private key satisfying pkA vkA = 1 mod (φ(n)).

• Alice makes n and the public key public keeping p, q and the private key secret.

• Bob encrypts C = M vkA mod (n).

• Alice decrypts C as M vkA pkA = M = C pkA mod (n).

Although the example explains the basic concept, real-life algorithm are more rened.
13

The example did not consider in detail how the keys are distributed and management
in a public key system. Die and Hellmann invented the Secret Key Exchange (SKE).
Fix a prime p and a generator g in the cyclic group Z∗p where g, p are public known. Alice
picks at
∗
random an element x ∈ Zp−1 and Bob picks at random an element y ∈ Zp−1 .
∗

Alice calculates

a = gx mod p

and Bob calculates b = gy mod p. The keys x, y are private to Alive and Bob, respectively.
Alice sends Bob a and Bob sends Alice b. But then

ay = (g x )y = g xy = (g y )x = bx ∈ Z∗p .

Hence, Alice and Bob can both calculate the result without a prior meeting to generate
a shared key. If Eve wants to calculate ay or bx , then she faces the problem that she
does not knows x or y. To nd these numbers she has to compute the discrete logarithm
which is believed to be not tractable.

13 There more rened mathematical concepts are used for the multiplication and factorization problem.
Instead using multiplication dened on nite integer sets one use so-called elliptic curve cryptography,
see Sullivan and Collabrini for an introduction.
5.3. BLOCKCHAIN 491

5.3.4 Hash Functions

Important functions in cryptography are cryptographic hash functions (algorithms).
To motivate this function, note that many algorithms such as RSA are slow and they gen-
erate output with the same size as the input. Therefore, it makes sense to shorten (hash)
the message rst, and process the short hash. A hash function has 160-512 bit output.
SHA1 was nalized 1995 and maps strings from almost arbitrary length to strings of 160
bits. The hash function ] acts on the message M of any length and produces an output
- the hash or digest - of xed length:
] : M → ](M ) = hash .

Hash functions accelerate database lookup by detecting duplicated records in a large le.
The hash function is deterministic: For the same input always the same hash-output
follows. The term 'cryptographic' means that the hash function needs to satisfy some
security, authentication or privacy criteria. First, the time to compute the hash should
be short for any message input. Second, to reconstruct a message given a hash result is
impossible unless one tries all possible combinations; there are too many combinations.
Changing the message only by a little amount of information should change the hash
value in a way that the new and the old hash look uncorrelated. Example of a SHA224
hash where in one sentence a dot is added:

SHA224(The quick brown fox jumps over the lazy dog)

0x730e109bd7a8a32b1cb9d9a09aa2325d2430587ddbc0c38bad911525
SHA224(The quick brown fox jumps over the lazy dog.)

0x619cba8e8e05826e9b8c519c0a5c68f 4f b653e8a3d8aa04bb2c8cd4c

Finally, it should be a hard problem to nd two dierent inputs which lead to the same
output - the so-called collision resistance. Summarizing, using cryptographic hash func-
tion makes it easy to verify that some input data maps to a given hash value, but if
the input is unknown, it is dicult to reconstruct it by knowing the hash value. For the
proof-of-work in Bitcoin transactions one has for example to compare fast and easily data
of arbitrary size and to be sure that the message which was digital signed did not changed.

Hash functions are not one-way functions. Historically, popular cryptographic hash
functions have a lifetime of around 10 years before they were broken.
Protocol - the le storage problem

A client wants to store a le on a server. The le has a name F and data M. He
wants to retrieve le F later.
In a basic protocol

• Client sends le F with data M to server

• Server stores (F, M)

492 CHAPTER 5. ASSET MANAGEMENT INNOVATION

• Client deletes M

• Client requests F from server

• Server returns M

• Client has recovered M

What if server is adversarial and returns M' instead of M? A simple solution is that the
client does not delete M and then compares M' with M. But this requires enough memory
to store the data M.

In a hash based protocol, client sends le F with data M to server

• Server stores (F, M)

• Client stores ](M ), deletes M

• Client requests F from server

• Server returns M'

• Client compares ](M 0 ) = ](M )

5.3.5 Digital Signatures

A digital signature should basically reproduce the properties of physical signatures on
paper for digital documents. Two main properties are: First, only the person signing can
sign, but any other person can validate the signature. Second, the signature applies to a
specic document and cannot be applied to other documents without the consent of the
signatory.

A digital signature is a sequence of bits generated by the sender using a signature

scheme for a message. Typically, this signature is attached to the message and sent with
it so that the recipient can verify that the message actually originates from the sender
and has not been modied in the transmission path.

Digital signatures are important in physical and virtual transactions of assets. A

digital signature scheme provides a way for Alice and Bob to sign messages so that the
signatures can be veried by anyone else. Alice creates a private and public key, signs the
message using the private key and sends it to Bob. Bob uses Alice's public key to verify
that Alice signed the message, that the message contents have not been altered since the
message was signed and that Alice cannot later repudiate having signed the message,
since she is the only owner of her private key. The digital signature DS depends on the
private key pkA and the message M. If Eve, which is the evil in the game, changes M,
then DS 0
changes to DS . Applying the public key of Alice to DS 0 does not conrm
authorship of Alice. Unlike physical signatures, DS have to change all the time such that
5.3. BLOCKCHAIN 493

they cannot be learned and misused.

The RSA system allows to implement digital signatures as follow. Alice wants to
sign electronically a document M. Alice signs M by appending the digital signature
DS(M ) = f −1 (M ) with f is Alice's trapdoor function, i.e. only Alice knows the trapdoor
information. But then anybody can check the validity of the signature since f (f −1 (M )) =
M. This shows that the signature becomes invalid if in the message M is changed. In the
RSA system, DS(M ) = M pkA mod n. Using Alice's public key, anybody can calculate

(M pkA )vkA mod n.

If the result equals M, then the signature Md must have been created by Alice which is
the only to know pkA . Figure 5.33 illustrates the digital signature process for a hashed
message.

Alice Bob
M #(M)
Broadcasting

M= M M #(M)
Hello Bob +DS(#(M),pkA) +DS(#(M),pkA) =ad987

Identical?
hash

DS(#(M),pkA,
#(M) DS(#(M),pkA) DS(#(M),pkA) vkA, )
=ad987 =9a8c7 =9a8c7 = ad987

Sign #(M) with

Verification signed #(M) with
private key
public key

Figure 5.33: The process of signing a hashed document: Hashing the document, signing
the hashed document using the private key, broadcasting the document plus the signed
hash and decomposition of the broadcast in two pieces: Hashed documents and the
verication of the signed hash. If the two results agree, then Alice signed the document
and the document did not change during broadcasting.

For a numerical example, the message M = 9 takes values in 0, 1, . . . , n − 1 with

n the largest number. Applying the signature function on M, i.e. E((n, vkA ), M ) =
M pkA modn = DS , where n = 1591 = p ∗ q = 37 ∗ 43 the product of two prime numbers,
the totient function property φ(n) = (p − 1)(q − 1) = 36 ∗ 42 = 1512 implies for the
494 CHAPTER 5. ASSET MANAGEMENT INNOVATION

public key vkA = 17 with the Euclidian algorithms pkA = 89. Using this generated key

DS = M pkA modn = 989 , mod1591 = 440 .

To verify the result,

D((n, vkA ), DS) = DS vkA modn = 44017 mod1591 =9=M

i.e. the validation using the public key only and the known number n proves that Alice
signed the document.
In the case of Bitcoin, public keys (or addresses) correspond to identities of Bitcoin
users. A Bitcoin user can send a message or transaction from his address by signing it
with his private key. At Bitcoin there is no central place which registers and identies
the users. Each user registers himself by generating - as often as he wants - a new
address. At rst glance, this decentralized identity management gives the impression
of granting users a high degree of anonymity and privacy. This impression is put into
perspective when looking over time. Movements are assigned to each address, which
are visible to all participants and behind which patterns can be identied using data
analytics. Furthermore, at some point in time today say a criminal using Bitcoin for
money laundering needs to leave the Bitcoin network by exchanging the Bitcoins in say
dollars. It is there where secret services position their software to reveal the identity of
the criminal. One therefore often speaks of Bitcoin as a pseudonymous system.

5.3.6 Blockchain
With the implementation of Bitcoin at the beginning of 2009, something new was cre-
ated: Bitcoin enables joint accounting with participants who do not trust each other, do
not know each other and do not know how many other participants are in the system.
The technology that makes this possible is called Blockchain and allows a new data man-
agement model. The term blockchain refers to the fact that transactions are grouped
into blocks and conrmed together. The conrmation in turn attaches the block with
the new transactions to a chain of previous blocks and thus incrementally builds up a
transaction history. If transactions are not grouped into blocks but the decentralized
infrastructure is kept one speaks about Mutual Distributed Ledger Technology (MDLT).
We do not dierentiate between MDLT and blockchain in the sequel.

Denition 108. A mutual distributed ledger technology (MDLT) denes ownership (mu-
tual), a technology (distributed servers) and the object (ledger)14

The basic functionality corresponds to the model of the Replicated State Machine:
Participants manage a quantity of data (state) by holding a copy of the data (replica)
locally and executing operations on it that change the data. The initial state has to be

14 The records in the ledger consider ownership, transactions, identity of assets. In order to allow for
communication between agents, they need to agree (consensus) about the state and authenticity of the
ledger.
5.3. BLOCKCHAIN 495

the same for all participants and the operations are deterministic: Any participant who
applies the operations in the same order to the initial state will arrive at exactly the same
end result. In such a system, consensus is the consensus when all participants agree on
the current state of the data. In the example of Bitcoin, the data is the Bitcoin balance
of individual participants and the operations are transactions between these participants.

Less abstract, blockchains

15 are decentralised protocols for recording transactions
and asset ownership. Contrary to centralised protocols with an authority in charge to
maintain a unique common ledger, blockchains operate within a network of participants
who possess and update their own version of the ledger (distributed). The ledger act as
the custodian of the transaction information. The blockchain in the original format is
public (such as for Bitcoin or Etherum), i.e. it belongs to everyone or nobody and par-
ticipants are anonymous. In other words, any trust function in a value chain attributed
to a third party, such as a bank in payment systems, is transformed to a trust function in
the blockchain. We use the expressions blockchain and MDLT as synonyms. Decentral-
ized, public or private architectures, are the alternative to the often existing centralized
architectures in the nancial industry such as stock exchanges, interbank clearing, mon-
When does it make sense to replace a centralized
etary policy of a central bank.
architecture by a decentralized one?

While the internet revolutionized information exchange, blockchain can revolu-

tionize value exchange using the internet. Ownership is of low importance for the
internet since the goal is to spread information. But it is critical when it comes to the
exchange of values. Blockchain technology will compete with well-established structures
owned by exchanges, central banks or other nancial intermediaries. In fact, the number
of blockchain projects is large but the number of working protable blockchains is yet
low: Many centralized solutions in the FI are dicult to beat in terms of costs, perfor-
mance, privacy or a mixture of all of them.

Two requirements are both necessary for a blockchain making sense: Decentraliza-
tion dominates centralized architecture and trust. Technological decentralization
is a well-known concept. Trust means that trust in a decentralized P2P is preferred over
the trust in a central network with say 3rd party validators.

We consider blockchains for money transfer in more details for Bitcoin cryptocur-
rency which is one of the few up and running blockchain applications. Traditional money
transfer using traditional banking services and trust is shown in Figure 5.34. We do not
consider how money is generated and how money is represented but we consider the
third control structure transaction execution. When Alice sends Bob CHF 10, they

15 Rubin is an excellent video on youtube about the basics. References for this section are Duivestein
et al. (2016), Tasca (2016), Aste (2016), Rifkin (2014), Swan (2015), Peter and Panayi (2015), Davidson
et al. (2016), UBS (2015), Nakamoto (2008), Franco (2014), Bliss and Steigerwald (2006), Peters et al.
(2014), Zyskind et al. (2015), Berentsen and Schär (2018).
496 CHAPTER 5. ASSET MANAGEMENT INNOVATION

3rd Party Bank A

Validation
Bank A
Alice Account Alice
Alice Eve Account Bank A

3rd Party Central Bank

CHF 10 Validation
CHF 10
CHF 20
CHF 10 Account Bank B

Bank B
Bob
Account Bob
Bob
Central 3rd Party Validations
Private Ledger
Alice owner of CHF 10?
Alice pays Bob 10 Double Spending?

Bob pays Eve 20 Accounts are used

Figure 5.34: Alice paying CHF 10 to Bob using the centralized banking system. On a
payment level, Alice announce her willingness to pay to her bank A. The bank checks
whether Alice possesses CHF 10 and makes sure that there is no double spending. This
third party validation is repeated by the central bank where the bank accounts of bank
A and B are checked.

both use 3rd trusted parties - banks. Alice orders her Bank to transfer the money to
Bob's bank. Both banks keep the accounts - i.e. the ledgers. Both banks are trusted
- Alice Bob do not need to know each other. The banks check whether the money can
be transferred - the transaction legitimization? The central bank is then a trusted 3rd
party acting between the ledgers of the two banks and running an own ledger where the
bank's account balances are recorded.

This traditional transaction execution consists of three parts:

1. Transaction Feasibility. Using online banking or initiating payments by written

payment order.

2. Transaction Legitimization. Banks are permanently checking the legitimization

of payment orders. At each date only a single version of the ledger exists.

3. Transaction Consensus. At each date a central party allows for ecient exe-
cution and it xes at each date in a unique way the distribution of money in the
whole system.

These parts of transaction execution hold for all monetary system, also for crypto
currencies. Blockchain attempt to change this classical money transfer in three respects:
5.3. BLOCKCHAIN 497

1. There is no central third party validation.

2. The transactions are done at a higher speed.

3. The transaction fees are lower.

We describe in principle how this works; leaving aside the many details which matter
if one considers an implementation of a blockchain.

Consider this in more details for Alice, Bob and Eve, where Alice sends CHF 10 to
Bob and Bob CHF 5 to Eve. One - there is no trusted 3rd party - has to assure for
example that Alice is Alice, that Alice possesses the money, that she did not promised to
pay the same CHF 10 to multiple recipients and that indeed Bob is receiving the money
see Figure 5.35.

Open Ledger

Alice has CHF 20

Distributed Open Ledger
Alice  Bob CHF 10
Alice has CHF 20

Bob  Eve CHF 5 Alice  Bob CHF 10

K Bob  Eve CHF 5

Alice  Eve CHF 15 Alice  Eve CHF 5

CHF 20
Distributed Open Ledger CHF 20 Distributed Open Ledger
Alice Eve
Alice has CHF 20
Alice Eve Alice has CHF 20 CHF 5
Alice  Bob CHF 10 CHF 15 Alice  Bob CHF 10
Bob  Eve CHF 5 Bob  Eve CHF 5
Alice  Eve CHF 15 Alice  Eve CHF 15

CHF 10 CHF 5
CHF 10 CHF 5

Distributed Open Ledger

Distributed Open Ledger
Alice has CHF 20
Alice has CHF 20 Bob
Bob Alice  Bob CHF 10
Alice  Bob CHF 10
Bob  Eve CHF 5
Bob  Eve CHF 5
Alice  Eve CHF 5
Alice  Eve CHF 15

Figure 5.35: Left Panel: Open and distributed open ledger technology. Right Panel:
Validation of transactions. New transactions are grouped into a new block and after its
validation - the consensus work to install unambiguous asset ownership - the block is
added to the existing blockchain. Each block is further marked with a time-stamp and a
digital ngerprint (a hash) (identication number) of the previous block. This hash ('K'
in the above example) identies a block uniquely and the verication of the ngerprint
can be easily done by any node in the network.

The central open ledger records that Alice indeed has CHF 20 on her account and
that she is able to pay CHF 10 to Bob. Both transactions are recorded and time-ordered
linked. If Alice wants to pay CHF 15 to Eve while only CHF 10 are left, in the open
ledger the participants realize that she fails to have enough cash. In a next step, this
498 CHAPTER 5. ASSET MANAGEMENT INNOVATION

central ledger is then removed by making a copy of this ledger and save it on the servers
of all participants (the nodes): A distributed or decentralized open ledger architec-
ture.

Transaction feasibility means that a payment from Alice to John who is not directly
connected to Alice is possible: Alice broadcasts the payment instructions to her next node
Bob, who broadcasts to his next node and so on. There are many paths linking Alice and
John. The payment network works also if some links are not functioning: A decentralized
system is more robust than a centralized one. The drawback of the decentralized system
is that there are no admission constraints, each node can broadcast any type of infor-
mation: Each node needs to be able to check the validity of each transaction information.

The distribution of the ledgers and unrestricted broadcasting generate challenges.

The rst is synchronization. All copies of transactions have to be identical at each
date: New transactions need to be validated and potential new transaction and the
validated transaction has to be added to all ledger copies, see the right hand panel in
Figure 5.35. Given the three veried (blue) transactions, the transaction of Alice to Eve
of CHF 5 is not yet validated. First verify that indeed Alice generated the transaction
message. To achieve this, Alice signs the message using her private key. Everybody in
the network can verify that Alice generated the message using Alice's public key. This
legitimization check has to be done by each node in a given path in the network afresh.
Given transaction legitimation, the transaction ends in a queue of transactions which
wait to be validated as a block such that they can be written to the ledgers. Validation
means to nd transaction consensus. This consensus should prevent from double spend-
ing. Assume Alice tries to double spend by broadcasting the same amount twice. In
a centralized system, the rst transaction arriving is valid and hence there is no prob-
lem. In a decentralized system, the two messages sent by Alice are a priori ready to
be validated. How should one observe that the same amount of money is planned to
be spent twice? Which transaction will be nally validated is irrelevant for the network
but consensus is needed about the validated one. To achieve this agreement without a
central party, the open transactions are arranged into blocks by the miners participants,
i.e. the nodes in the network which are willing to do the consenus work.

In a centralized banking architecture, trust in the banks is sucient for validation. In

an anonymous network, a dierent approach is needed: Economic incentives and cryp-
tography dene the incentives for miners to do the validation of the block.

The consensus for Bitcoin is called proof-of-work (PoW). Each miner is free to
choose the amount of transactions which he wants to validate. They solve a purely nu-
merical problem unrelated to the block's content (mining). More precisely they solve
a cryptographic puzzle using a try-and-error approach indicated by 'K' in the gure.
A miner who solves his problem rst attaches his proof-of-work, to his block and then
broadcasts it. All other miners can easily verify the correctness of the PoW. The vali-
5.3. BLOCKCHAIN 499

dated block is added to the blockchain. The PoW requires eort such as energy spent
by the computers and investment in the hardware for example. Nakamoto (2008) argues
that PoW generates a stable consensus, i.e. a single chain, if miners always take the last
solved block as the parent for their next block. Each block under PoW-consideration
needs to make reference to a yet validated block where the new block after consensus
nding should be linked to. This reference is done using a hash value. Changing a
past validated block by the miner changes the hash which leads to inconsistencies in the
blockchain. The participants by construction consider always the longest chain which
contains legitimate transactions. Therefore, to cheat a miner needs to be able to recalcu-
late a whole chain afresh for validation before a single new block is validated by another
miner. This is practically not feasible. To generate a new block by a miner takes just
a short time. Without restricting this block generation process, validation for consensus
would become impossible since the frequency how blocks are generated dominates the
speed of propagation in the network. Therefore, the process is slowed down such that on
average each ten minutes a new block is mined and veried.

The winner gets remunerated for his eorts (new Bitcoins generation). He takes it
all. In a PoW one authenticates the fact that resources have been spent to solve a crypto-
graphic problem. These denes the economic incentives: The more computer calculation
power a miner invests the higher the likelihood that he will mine the block as rst ones. If
Alice wants to cheat by using a double-spending strategy, she rst has to spend resources
in order to validate the block containing her fraudulent transactions. PoW validation is a
peer-to-peer type consensus mechanism since the validation can be veried to be true by
all miners. No trust is needed and no node can simply claim to have found a key without
having spent resources due to the easy possibility of verication of the candidate solution.

Summarizing:

• The rules are contained in the Bitcoin protocol, an open source cryptographic
protocol.

• A blockchain exchanges values using the internet in a distributed ledger framework.

• Switch from single third party trust to distributed ledger trust for transactions.

• Unambiguous ownership rights at any moment in time due to the consensus mech-
anism.

• The P2P complete stranger consensus PoW is the most expensive and slowest
consensus mechanism.

• Approved data in the distributed ledger cannot be changed - immutable history of

transactions exist.

• Persistence: The blockchain is independent of service providers, device manufac-

turers or any type of applications.
500 CHAPTER 5. ASSET MANAGEMENT INNOVATION

• Fork. Assume that a miner attaches his mined block not to the last validated but
the second last one - a fork follows. Miners can choose to attach validated block to
the original chain or to the other one of the fork. Then there are competing versions
of the ledger. Forks reduce the the credibility and reliability of the blockchain.
Even if, eventually, all miners agree to attach their blocks to the same chain, the
occurrence of the fork is not innocuous. A fork can also occur when some miners
adopt a new version of the mining software that is incompatible with the current
version. Does the blockchain protocol rule out the occurrence of forks?

We close this section with an analogy to the Coin of Yap problem. It is a problem
which the population in the Yap islands in Western Pacic Ocean faced. The Yaps
produced stone money. There were ve dierent sizes of stones where the largest one
needed around 20 men to be transported. It was not possible to carry the stones from
one island to the next one for exchange reasons using the canoes. How could one use the
stones for payment if they could not be physically exchanged against the goods? The
solution was to store the ownership information in the consciousness of the Yap people
(the blockchain): The Yap knew who owes the dierent stone pieces. They did not need
to move them when ownership changes since the public memory records the changes
in ownership. There is a society consensus over ownership. If there is a conict, the
stronger strain wins. Due to the limited size of islands and population the system costs
never became too high to become ineective.

5.3.7 Dierent Blockchain Types, Type of Consensus

One often speaks of Permissioned or Permissionless DLT systems. A Permissionless Protocol
is dened by:

• •Anyone can participate in the protocol and receive say Bitcoin as rewards by
performing the PoW-based mining operation.

• The mechanism of pouring currency in the system via proof of work, makes it
feasible for anyone (possessing sucient hashing power) to participate.

• The ledger itself is public, readable and writeable by anyone who possesses Bitcoin.

In a Permissioned Protocol participation is restricted:

• Producing transactions and/or blocks can only be performed after being authorized
by the other nodes.

• In their simplest form the set of nodes is static: the set of nodes implementing the
protocol is xed and determined at the onset of protocol execution.

Contrary to permissionsless networks, the actors in a permissioned network are named.

The intention is that they are also legally accountable for their activity. The transactions
in such networks will be predominantly so-called o-chain assets - at currencies, titles
of ownership, digital representation of securities - whereas in the permisssionless world
5.3. BLOCKCHAIN 501

on-chain assets such as virtual currency are transacted. Since the number of actors is
smaller in permissioned blockchains, only a small number of participants need to operate
which makes such networks more scalable than the permissionless ones.

Figure 5.36: Emergence of dierent network topologies (Celent [2015], UBS [2015]).

Since actors in a permissioned network are not anonymous, the time-consuming and
expensive PoW is not needed. Much simpler and faster consensus schemes apply. It
is possible to use classical consensus algorithms from the eld of distributed computing
such as Paxos or Practical Byzantine Fault Tolerance (PBFT). These protocols are based
on polls in which participants vote on the next operation to be applied. This is possible
because each participant knows how many votes will result in a majority and when the
vote will be successful. An example of a permissioned blockchain is Ripple.

The market dynamics for blockchain consisted 2017 of about 300 start-ups worldwide
and more than eighty percent of the global banks running blockchain projects (WEF
(2017)). 20% of the global banks will have a commercial blockchain product by the end
of 2017 (IBM (2016)) and global investments in this technology are estimated to be USD
1.5 bn (WEF (2017)). To PoW is a costly way to reach consensus. In February 2018
the energy needed to perform the PoW is similar to total energy consumption of Romania.

There are alternative consensus mechanisms to the PoW. Proof-of-stake is also

based on algorithms. Users of the technology are asked to prove ownership over a stake (a
currency or any other asset). There is no competitions as in the PoW and the mechanism
is much less energy consuming than the PoW, faster and cheaper. For example one selects
502 CHAPTER 5. ASSET MANAGEMENT INNOVATION

a random a participant on the basis of the data in the system , say selected tokens which
are linked to an address. The chosen address may make the next proposal for the further
development of the blockchain. In such a proof-of-stake system, the probability of being
allowed to make the next proposal increases with the tokens of a participant. This
eliminates the need for time-consuming proof-of-work calculations, and participants with
a greater interest in the continued existence of the system (as they have invested in it)
make relatively frequent decisions. However, the implementation of this concept is not
easy, as participants are able to behave strategically and thus increase their inuence in
the system or behave incorrectly. Therefore, most proof-of-stake systems so far use a
combination of proof-of-stake and proof-of-work to solve these manipulation attempts,
but accordingly have the high energy consumption as a disadvantage.

5.3.8 Blockchain Examples

Blockchains can be used in many areas. Besides logistics and transportation, healthcare
or the energy industry as examples, the technology can have an impact in several areas
of the nancial industry: Clearing and settlement, brokerage and nancial research ac-
tivities, correspond banking, trade nance, remittance and payments, trust and custody
functions in asset management, smart contracts for automated, self-controlled manage-
ment of nancial contracts and distributed storage, authentication, anonymization of
private information.

5.3.8.1 Bitcoin
Considers miners which want to do the PoW. All the blocks in the Bitcoin block chain
have a short string of meaningless datacalled a nonce attached to them. The mining
computers are required to search for the right meaningless string such that the block as a
whole satises a certain arbitrary condition. Specically, it is required that the SHA-256
hash of the block have a certain number of leading zeros. A miner selects the message
M of Alice ready for validation and selects a random number k , the nonce, and let all
information running through the hash, i.e. he calculates ](M + k). If this result is larger
than the thresholds T , he chooses a new k and continues until ](M + k) < T . Then the
miner broadcasts k and everybody can easily check that the hash is indeed smaller than
the threshold level. The nonce is a 32-bit data string. Varying the nonce is a trivial
task since 232 amount to around 4 billion possibilities which today can be checked in a
few seconds. Therefore, to increase the complexity the transactions are grouped into a
so-called Merkle tree form. In a Merkle tree data blocks are grouped in pairs and the
hash of each of these blocks is stored in a parent node. The parent nodes are in turn
grouped in pairs and their hashes stored one level up the tree. This continues until the
root node is reached.

The SHA-256 has function is used whose output is 64 digit string. Consider the hash

000000000000004c296e6376db3a241271f43fd3f5de7ba18986e517a243baa7.
5.3. BLOCKCHAIN 503

which was the hash 2013 of a block ready for the miners. It has 16 ciphers, all number
from 0 to 9 and letters from a to f. The hash starts with 16 zeros, the threshold level.
The diculty of the problem is not constant over time, this means the number of zeros
in the header is varying in a non-manipulable way. The diculty is calibrated in such
a way that it is possible to nd a block in about 10 minutes. The SHA hash goes one
way: It has 2256 outputs which one needs to evaluate in order to break the hash or to
calculate the input.

The Bitcoin system is managed dierently from a centralized network. How is the
management organized such that the system can be improved and deciencies can be
corrected when there is no central party with the power to do so? To prevent that a
member of the network changes the network which is not in the interest of the users can
be avoided by either sanctioning such actions or by setting incentives such that for each
member the dominating strategy is not to deviate from the existing rules, i.e. a kind of a
Nash equilibrium. It is this game theoretic concept which is implemented in the Bitcoin
system. To allow for changes, a voting system is used where a predened majority has to
exist before a change is implemented. This democratic rule is very complicated since not
all nodes have the same rights and action spaces (miners have an advantage) but other
user groups also have the possibility to form coalitions which then can try to enforce
their views. In any case, in such a DMLT, no one can be forced to follow any decision.
If part of th e community is not willing to follow a change but they decide to use the
old code, then the system separates into two systems: A forc realized. In a soft forc the
rules for consensus are stricter than in the original chain, i.e. new ledger entries are also
valid under the old system. In a hard fork, the new register entries are not longer valid
under the old rules of the blockchain before the fork happened.

5.3.8.2 Settlement
The process where a buyer and a seller agree to exchange a security (trade execution) and
the date where the trade is settled (assets are exchanged) can be 2 or 3 days depending
on the jurisdiction and the type of asset. A longer period between trade execution and
settlement raises settlement risk - the risk that one leg of the transaction may be com-
pleted but not the other, and counter party risk - one party defaults on its obligation.
Besides the reduction of risk, a decentralized blockchain technology could also reduce the
costs the trade and settlement process.

A standard trade-clearing-settlement process life cycle can be described as follow

(Bliss and Steigerwald [2006]):

Trading.

• The investors (buyer and seller) who wish to trade contact their trading member
which place their orders on the exchange.
504 CHAPTER 5. ASSET MANAGEMENT INNOVATION

• The trades are executed in the exchange or any other platform such as a multilateral
trading facility or an organized trading system.

Clearing.
• Clearing members who have access to the clearing house or the central counter
party, which are also trading members, settle the trades.

• Clearing and settlement can be bilateral, i.e. settled by the parties to each contract.
The G20 enforces after the GFC to switch from bilateral to central counter party
(CCP) clearing for the OTC derivatives. A CCP acts as a counterparty for the
two parties in the contract. This simplies the risk management process, as rms
now have a single counterparty to their transactions. Through a process termed
novation, the CCP enters into bilateral contracts with the two counterparties, and
these contract essentially replace what would have been a single contract in the
bilateral clearing case. This also leads to contract standardisation and there is a
general reduction in risk capital required due to multilateral netting of cash and
fungible securities. Therefore, CCP means that the bilateral clearing topology is
transformed into a centralized or star shaped one. From a systemic risk perspective,
while the more risky bilateral connections are replaced by less risky centralized ones
the major risk concentration is now located in the few CCPs.

Settlement.
• The two custodians, who are responsible for safeguarding the assets, exchange the
assets where a typical instruction is 'delivery versus payment': Delivery of the
assets will only occur if the associated payment occurs.

Using a blockchain means to transform the centralized CCP topology back into a decen-
tralized one where there is no need for an CCP. In the trading-clearing-settlement cycle,
a consortium blockchain can be used as follow to satisfy the present standards. On the
trading level, a consortium of brokers can set up a distributed exchange, where each of
them operate a node to validate transactions. The investors still trade through a broker,
but the exchange fees can be drastically reduced. On the clearing level, a consortium of
clearing members can set up a distributed clearing house, thus eliminating the need for a
CCP. Contrary to bilateral clearing, the contract stipulations are administered through
a smart contract which reduces risk management issues. If the securities and money are
digitalized, settlement does not need any custodians with securities depositories but the
assets are part of the permissioned blockchain.

5.3.8.3 R3CEV, Corda

R3CEV is a rm that leads a consortium partnership with over 100 of the world's lead-
ing nancial institutions. The goal is to design and deliver advanced distributed ledger
technologies to the nancial markets around the world. The blockchain used in described
in the white paper Corda (Brown et. al (2016)).
5.3. BLOCKCHAIN 505

Consider banks (the nodes) which search for a technology to record and enforce
nancial contracts such as cash, derivative or any other type of products. More precisely,
the banks want to record and manage the initiation and the life cycle of nancial contracts
between two or more parties which is grounded in the legal documentation of the contracts
and which is compatible with the existing emerging regulation in an

• ecient way: duplications and reconciliation of transactions are not necessary.

• open way: every regulated institution can use the technology.

• appropriate privacy/public mix way: consensus about transactions is reached on a

smaller than full ledger level.

These requirements lead to the solution Corda. We state the most important changes
compared to the Bitcoin blockchain.

First, there are no miners and there is no proof-of-work since no currency needs to
be generated (mining) and due to the mixed private/public association of information
no general consensus on the ledger is needed. The advantages are avoidance of costly
mining activities, of a deationary currency and of a concentration of the mining capa-
bilities in a few nodes. Second, Bitcoins can only contain a smaller amount of data due
to the xed length data format. This is not useful if one considers all economic, legal and
regulatory information in an interest rate swap between two parties. Corda encodes the
information of arbitrary complex nancial contracts in a contract code - the prosa of the
allowable operations dened in term sheets is encoded. Corda call this code state ob-
jects. Consider a cash payment from bank A to a company C . The state object contains
the legal text describing the issuer, the date, the currency, the recipient etc. and the
codication of the information. This state is then transformed into a true transaction if
the bank digitally signs the transaction and if it veried, that the state object is not used
by another transaction. Hence, there are two type of consensus mechanics. First, one
has to validate the transaction by running the code in the state object to see whether
it is successful and to check all required signatures. This consensus is carried out only
by the parties engaged in the transaction. In other words, the state object is a digital
document which records all information of an agreement between two or more parties.
Second, parties need to be sure that the transaction under consideration is unique. This
consensus which checks the whole existing ledger is done by an independent third party.
Summarizing, the ledger is not globally visible to all nodes. The state objects in the
ledger are immutable in the same way as we described it for blockchains. Given that
not all data is visible to all banks, strong cryptographic hashes are used to identify the
dierent banks and the data.

Why are the leading banks pushing this system? They can all use only one ledger
which makes reconciliation and error xing in today's individual ledgers at topic of the
past. Furthermore, the single ledger does not change the competitive power of the banks
in the ledger. The economic rationale, prot and risks to enter into a swap remain within
506 CHAPTER 5. ASSET MANAGEMENT INNOVATION

UBS and Goldman Sachs but the costs and operational risks of the infrastructure are
reduced due to the collaboration to maintain shared records. In other words, while the
banks keep the prot and loss from their banking transactions unchanged to the present
competitive situation, they reduce the technology cost part by cooperation.

5.3.8.4 Smart Contracts, Ethereum

The concept of smart contracts was invented by Szabo (Szabo (1997)):

Denition 109. A smart contract is a computerized transaction protocol that executes

the terms of a contract. The general objectives are to satisfy common contractual con-
ditions (such as payment terms, liens, condentiality, and even enforcement), minimize
exceptions both malicious and accidental, and minimize the need for trusted intermedi-
aries. Related economic goals include lowering fraud loss, arbitrations and enforcement
costs, and other transaction costs.

The functionality of a smart contract means contracting on contingencies on a de-

centralized consensus at low-cost with algorithmic execution. Achieving decentralized
consensus a self-executing a distributed ledger is needed. Contingencies in a smart con-
tract are codied making automated execution is feasible and reducing enforcement cost.
XXX then dene:

Denition 110. Smart contracts are digital contracts allowing terms contingent on de-
centralized consensus that are self-enforcing and tamper-proof through automated execu-
tion.

Thus, in a blockchain, other types of protocols than currency transactions can be

performed. Smart contracts, unlike cryptocurrencies, do not require validation (consen-
sus) through a cryptographic system. The blockchain network automatically enforces
execution of the contract when trigger events are realized. When an event occurs, the
computer code in the document triggers a pre-programmed action. This digitizes the
lifecycle management of contracts. The rational is to reduce manual expenses in the
management of contracts over time and a reduction in error rates. The risks are that
someone improperly programs the smart contracts and then makes wrong decisions.

An examples of a smart contract is a Bitcoin transfer between two agents which is

made dependent on some other conditions which extends the Ethereum decentralized
blockchain platform that handles smart contracts. Applications run as they were pro-
grammed. This takes place without any downtime, censorship, fraud or interference
from third parties. Such contracts are useful cost cutters in the life cycle management
of nancial contracts since for example the built-in software automatically carries out a
corporate action in the documents for given market signals.

Vitalik Buterin wrote 2013 the white-paper. The market capitalization of Etherum
amounted to USD 1 bn in October 2016 and USD 74 bn in December 2017. Ethereum
5.3. BLOCKCHAIN 507

is an application platform. Eevelopers can create applications without building their

own blockchain with smart contracts are on top of the blockchain. Ethereum enables
peer-to-peer contracts and peer-to-peer applications through its own currency carrier.
In Ethereum the block time is set to 14 to 15 seconds compared to the 10 minutes at
Bitcoin. Ethereum can be used for crowdfunding, voting systems, options markets and
many other applications.

What happens if the software of a smart contract has a fault or the logic of the soft-
ware allows someone to use the software in his favour? This was the case in the so-called
Decentralized Anonymous Organization (DAO) Hack. DAO was a form of investor-
directed venture capital fund. It was the biggest crowdfunding experiment in the world
raising USD 150 millions within 21 days. However, on June 17 2016, a hacker exploited a
security bug on the smart contract and transferred USD 50 millions to his own account.
Th cryptocurrency Ether lost 50% of its value on the same day. Since the hacker did
nothing illegal but was just smarter than those who wrote the smart contract code there
was a priori no reason to consider any actions regarding the validity of the transaction.
But many in the community were invested and hence faced personal losses if one would
not o-set the hacker's transaction.

Figure 5.37: The hard fork in the Etherum protocol.

The rst alternative was to cancel the transaction and restore the money to the DAO
users. The second choice was to do nothing. Then the hacker would keep with USD
50 millions and a lot of people invested in the DAO would loose their investment. The
508 CHAPTER 5. ASSET MANAGEMENT INNOVATION

cancellation of the transaction, leading to a hard fork, would enable all DAO investors
to exchange their tokens at a xed price, as in a currency reform. They only need to
update their software. The old DAO exists on the old Ethereum blockchain, but should
die out without an investor. The token of the hacker would become worthless. But parts
of the community refuse the update. They see a violation of the ideals of Ethereum.
In protest, they stay in the old blockchain and baptize it Ethereum Classic. Instead of
losing value, the DAO wins. The event damaged the reputation of the technology from
a security perspective. In addition, the community damaged its reputation during the
period: responsibilities were not clear, blaming started and it was not possible to nd a
single solution.

5.4 Currencies and Crypto-Currencies

5.4.1 Money and Payment Systems
Something is considered to be money if: It stores value, can be used as a medium for
exchange of goods and services and is a unit of account.16

5.4.2 Fiat Money

Fiat money backed by a monetary framework, an economy and a monetary system, Cen-
tral bank act as monetary policy makers, the ultimate settlement agent for monetary
transactions and hold reserves in gold as a trust backing facility. But ultimately, the
government's ability to raise taxes and the willingness of banks to lend money dene
the major monetary policy forces. Most currencies such as the USD, Remimbi are at
money and most digital money is at money too. The economy denes the uctuating
quantity value of a currency given by the GDP, interest rate levels, ination etc. The
monetary framework denes the stable acceptance value - do the population accepts
a currency? Trust in a currency is a function of these three system components. For
crypto currencies, the economy component is empty or it consists of a business promise
such as in Initial Coin Oerings ICO used by start up rms. An investor sells his dollars
to buy say Ether, hands over the Ether to the ICO, receives the start up token, hopes
that the ICO business will not default which then boosts the value of the token. The
ICO will change Ether with dollars to pay its investment activities. It is dicult to make
economic sense out of this long chain of transactions which could be shortened by simply
investing directly using dollars and receiving equity or debt capital.

16 Store of value means that money must be able to be reliably saved, stored, and retrieved and the
value must remain relatively stable over time. Medium of exchange means that it is used to compare
the values of dissimilar objects, as a standard of deferred payment, that is an accepted way to settle a
debt. Unit of account is a standard numerical monetary unit of measurement of the market value of
goods, services, and other transactions. Divisibility and fungibility are other characteristics of an unit
of account.
5.4. CURRENCIES AND CRYPTO-CURRENCIES 509

Trust Trust CHF Trust Bitcoin

Swiss Economy ?
Economy

Monetary System Monetary Monetary System Monetary

Monetary System Monetary • Banks Framework Framework
Framework • Blockchain
• SIC Payment System • Swiss State • Code • P2P
• CLS • SNB • Code
• PoW

Figure 5.38: Three components for money.

People's belief in the value of money is fundamental to any currency: It is not possible
to enforce value to a currency if people do not want to accept the currency. Figure 5.39
provides an overview over dierent currencies.
Table 5.3 summarizes some features of at money, money issued in a permissioned
blockchain and money issued in a MDLT.

5.4.3 Bitcoin
First, Bitcoin represents a crypto-currency.
17 This means a unit of a Bitcoin is used
to store and transmits values between individuals who belief in this currency. Second,
Bitcoin represents a communication medium. All individuals using or creating Bitcoins
communicate by the Bitcoin protocol via the internet. The protocol is the code which
contains the set of rules used in the Bitcoin system.

At the time of writing, the number of Bitcoin transactions is around 3000 000 trans-
action per day which is approximatively equal to USD 3 bn at market exchange rates in
November 2017 and the market cap of Bitcoins by the end of 2015 is USD 261 billion
(Source: Blockchain.info). A crypo-currency combines two main components: A new
currency such as Bitcoin and a new decentralized payment system - the blockchain.

17 The text follows Antonopoulos (2015), Jogenfors (2016), Aste (2016), Khan Academy (2016), Boehme
et al. (2015) and Tasca (2016), BIS (2018). For an economic review see Bank of England (2014) and
Boehme et al (2015).
510 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Figure 5.39: Overview of the dierent currencies. Source: Source: Bech and Garrati
(2017).

Bitcoin has value because people belief in it. If people stop to believe in it, as long as
there is no real economic production backing the coin, then the value evaporates. That
belief can be created quickly and vanish also rapidly. We consider the period after the
rst Gulf War in the Kurds region of Iraq. Kurds used in their areas of Iraq the Iraqi
Swiss Dinar.
18 Hence, although a legal tender existed in Iraq, the Saddam dinars, it
became worthless in the Kurd regions. People cannot to be forced to belief in a currency.

So far, we did not compared digital and crypto curencies, see Figure 5.40.

5.4.3.1 Fact and Figures

By 2011, 10 USD was worth 1 Bitcoin. In 2013, the exchange rate was up to 266 USD
for one Bitcoin. Shortly after the high, the exchange rate dropped by 80 percent. In
November 2013 the exchange rate was 1200 USD/Bitcoin. After the default of the plat-
form Mt. Gox, the rate dropped to a value of 340 USD/Bitcoin. In 2017, the value of
Bitcoin exploded. From around USD 800 in Jan to more than USD 16,000 in Dec 2017
and crashing to less than USD 6'000 at the beginning of Feb 2018. This shows the risks of
Bitcoin and in at the present status the non-usability to store value or to make payments.

According to the tokens-economy.com webpage out the around 18 mn mined Bitcoin

18 'Swiss' because the printing plates were made in Switzerland and stolen.
5.4. CURRENCIES AND CRYPTO-CURRENCIES 511

Feature Fiat Crypto DLT Crypto MDLT

Permissioned
Storage Holdings Accounts FI DLT MDLT
Double Spend. Identity P2P restricted P2P open
TX Processing FI Trusted ledger nodes Proof-of-work
Settlement Central Bank Encoded Follow longest chain
Supply CB and loan policy Protocol Protocol
TCM Reputation FI Reputation issuer/DLT PoW
Scalability High (Visa) Not major importance Bounded
Users KYC KYC Anonymous
Power Centralized Centr. / Decentr. Majority rules
Recon Needed Not needed Not needed
Trade Reversal Yes Yes No
Risk Systemic Trusted nodes Loss pk , exchanges

Table 5.3: Summary of dierent monetary systems. FI Financial Intermediary, CB

Commercial Bank, KYC Know Your Customers, TX Transaction, Rep Reputation, Recon
Reconcilliation, TCM Trust Creating Mechanism, PoW Proof-of-Work, pk Private Key.
Source: Adapted and extended from Natarajan et al. (2017)

Digital Crypto
• Every non-physical currency is a digital currency • Subset of digital currencies

• Digital currencies consist of numbers and digits • Features: Privacy, distributed mutual ledger for transaction recording,
cryptography
• 90% of global currency are digital, most of it is a fiat currency
• Etherum, Bitcoin, Litecoin and more than 1’000 other crypto currencies
• Online banking, mobile payment, Paypal, Mint, credit cards are based on digital
currencies • 99% without a regulatory or institutional backing, most are not considered to
be legal tenders
• Digital currency possess a monetary regulatory and institutional setting. They
are accepted as legal tenders • Lost cryptographic key to get access to the cryptocurrency or stolen coins are
not replaced and lost for the economic system forever since there is no 3rd
• Money generation is mostly done by the inside money mechanism and party in the system
secondary by central banks
• Coins are generated by the mutually distributed ledger technology
• Convertible in cash. Payments are not public
• Fully transparent but full anonymous payment system
• There is no anonymity
• 7x24, payments without knowing the recipients, payments only possible in the
• 7x24, worldwide payments, no need to know the recipient peer group which accepts the coin

• Banks and other intermediaries act as 3rd party validators using accounts as • Trust, security and protection, see below
ledgers which are centrally stored and not public. Trust is in this validators.
Errors can be off-set, stolen coins are often replaced and a lost identification • High volatility, i.e. Vol BTCUSD is around 14 xtimes larger than Vol CHF USD
for authentication can be replaced by a new one (lost ID card)

Figure 5.40: Comparing digital and crypto currencies.

by the beginning of 2020, 1.6 mn have been stolen and 5.01 mn have been lost. More
than 35 percent of all coins have been therefore stolen or lost.Reasons for losing the coins
are loss of private keys, operational risks by sending the coins to the wrong address when
people do not use the QR code but type wrongly the address or even by sending the coin
512 CHAPTER 5. ASSET MANAGEMENT INNOVATION

to the genesis block to exchange them. Bitcoin are not fungible. 1 Bitcoin is not equal
to 1 Bitcoin. Premiums of up to 15% are paid for Bitcoin that is freshly mined. The
reason is that these Bitcoins certainly do not have a harmful history with their owners
and are therefore unproblematic when exchanged for a Fiat currency via stock exchanges
or banks. The all-in cost of mining Bitcoin is about USD 5,600 at the beginning of 2020
at a price of about USD 8,500, which means that the miners' prot is currently almost
USD 2,000 per Bitcoin received if they win the PoW.

The Bank of England (2014) states that volatility of Bitcoins is 17 times larger than
the volatility of the British pound: The use of Bitcoins as a short-term storage medium
is questionable although nothing can be inferred about its value as a long-term storage
medium. The number of transactions of retail clients is used to measure their willingness
to accept Bitcoins as a medium of payment. Since this number is not observable, proxy
variables are used instead such as data from 'My Wallet', see Bank of England (2014).
The analysis shows that the number of transactions per wallet is decreasing since 2012
to a value 0.02 transactions per wallet. Most clients buy-and-hold their Bitcoins instead
of using them. Finally, there is little evidence that Bitcoins are used as units of account
since.

The traditional payment systems are safe, cost-eective and scalable, i.e. they handle
high volumes. Visa, Mastercard and Papal handle between 3'500 and 240 transactions
per second, while for Bitcoin and Ether the number is a around 7-20.
19 Bitcoins are
so far only cheaper to produce than those in centralized system since the miners in the
crypto-currency system receive as a subsidy new currency coins for their proof-of-work
eorts. Given that the production of new Bitcoins is decreasing over the next decades,
that energy production to achieve consensus grows over-proportionally and if the ex-
change value of Bitcoin will not stabilize at a large value compared to the USD, then the
eect of subsidies diminishes leading to increasing costs for Bitcoins issuance. Removing
centralized trust by using a P2P trustless MDLT is costly in several respects. The en-
ergy consumption of the Bitcoin miners equals 2018 total energy consumption of the 20
million nation of Romania. Etherum is also highly energy intensive. It will be of vital
importance whether other consensus than the PoW can be designed and which will be
accepted such that the MDLT will consume much less energy.

The number of hashes drive energy costs. Aste (2016) estimates that to keep a capital
of around USD 10 bn secure in the Bitcoin blockchain annual costs of 10% are needed.
The reason is the number of hashes which are generated every second of 1 bn times 1 bn
fro the PoW. Given the high transaction costs, most users access their cryptocurrency
not directly but via an intermediary such as crypto-wallet providers or crypto exchanges.
That is, the main motivation of Bitcoin of not needing a central third party such as a
central bank end by trusting often unregulated third parties. It is then no surprise that

19 Committee on Payments and Market Infrastructures, Statistics on payment, clearing and settlement
systems in the CPMI countries, December 2017; www.bitinfocharts.com; Digiconomist; Mastercard;
PayPal; Visa; BIS calculations.
5.4. CURRENCIES AND CRYPTO-CURRENCIES 513

fraudulent or hacked institutions such s the Mt Gox leads to thefts and zero-recovery
losses for the users.

Permissioned crypto currency often do not face some of the above problems of Bit-
coins. The World Food Programme's blockchain-based handle payments for food aid
serving Syrian refugees in Jordan. The unit of account is centrally controlled by the
World Food Programme. Vsing a permissioned version of the Ethereum protocol, the
decits of Ethereum were overcome (slow, expensive) and transaction costs are reduced
by 98% also relative to bank-based alternatives.

Scalability is another limitation since the transaction ledger is growing over time.
The Bitcoin ledgers amount 2017 to 170 GB with a growth of 50 GB in 2017. Therefore
a simple Fermi-type calculation shows that the network size needed to replace standard
currency regimes is out of any feasible size. This not only means storage of data but also
the needed processing capacity for transaction verication.

Figure 5.41 shows market capitalization of Bitcoin, Ripple and Etherum, the average
transaction costs, that Bitcoin mining is around the 10 minutes as it should be and the
mining in Etherum takes much less time reecting the proof-of-stake approach. Com-
paring the number of daily Bitcoin - around 1000 000 by the end of 2015 (Coinometrics,
Capgemini) - with the number of daily transactions by Visa (212 mio.), MasterCard (93
mio.) and all other traditional entities together summing up to 340 mio. - the Bitcoin
percentage is 0.03% of this total transaction volume.

Market Capitalization in USD bn Average Block Time in min

Average Transaction Fee in USD

Average Hashrate (#/s) per day

Figure 5.41: Bitcoin statistical data (BitInfoCharts.com [2018]).

514 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Although we focus on Bitcoin, there an ination of crypto-currencies. Coinmar-

ketcap.com reports that by September 2015 there were 676 listed crypo-currencies, in
December 2017 there were 1,376 crypto-currencies with total market cap USD bn 556.
2015, Bitcoin consolidated 85% of market capitalization and number two Ripple follow-
ing with 6%. 2017, the consolidated value of Bitcoin is fallen to 43 percent, Etherum as
number two has 12 percent. The tenth largest entity - Bytecoin - represented a market
capitalization of only 0.2% in 2015. In 2017, the currency Monero represented as num-
ber ten only 0.9%. Since the Bitcoin system can be copied, most new coins are pure
copies of the Bitcoin system or copies with minor changes. These coins are also called
altcoins. The many crypto currencies are undergoing a selection process where some of
them sharply increase in market share value in a period but then can also fast evaporate
to become insignicant. This evolutionary process triggers the question: How often will
people accept losses in a crypto currency which vanished unless they lose complete trust
in the whole crypto currency business?

The algorithm of the Bitcoin protocol dene the supply side. Therefore supply is
xed and inelastic which is one source of the high price volatility. Since every currency
loses value if it fails be a scarce resource, new Bitcoins are issued in a controlled way.
Bitcoins do not specify a claim on somebody contrary to digital money created by the
creation of loans since each loan creates a deposit position on the loan borrower's bank
account. The demand and supply for Bitcoins has no physical foundation and the total
supply of Bitcoins is limited to the creation of 21 million Bitcoins. Given the rule-based
creation process, this amount will be reached around 2041. With this xed supply side
and its diminishing rate of productions, Bitcoins are a deationary currency. Bitcoins
miners are in some sense the clearing houses which maintain the book-keeping system
and verify the validity of transactions.

Bitcoins do not have a well-dened governance structure as central banks have. The
identity of any participant in the network is for example unveried. This contradicts the
increasing regulatory and legal ghting against money laundering or tax hiding activi-
ties. Prominent in the early days were the uses of Bitcoin in the anonymous Silk Road
platform. The main activity in this platform was trading narcotics. The U.S. investi-
gation estimated that in the period Feb 2011 to Jul 2013 9.9 million Bitcoin payments
were made with an equivalent of USD 214 million. After the demise of Silk Road an
unclear number of successors or competitors are actively using Bitcoin. But the initially
signicant fraction of money inow into the Bitcoin system from criminal activities - the
residual value - signicantly decreased. Tasca (2016) reports that in 2012 the relative
income for black market and online gambling had a share in the Bitcoin income ow
of around 70%. This number collapsed in the last two years to less than 10%. Bitcoin
transaction are contrary to real or electronic payments strictly irreversible. This prop-
erty is due to the desire to keep the Bitcoin system at a manageable level. Changing the
protocol, as we discussed above, follows a complicated game theoretic motivation which
can lead to forks and where dierent types of network members have dierent rights and
5.4. CURRENCIES AND CRYPTO-CURRENCIES 515

the possibility to form coalitions if a change in the protocol is suggested.

From the risk perspective counter party risk of currency exchanges is critical. Ex-
changes active in Bitcoin charge transaction fees between 20 and 200 bps. The number
of such exchanges is modest since the exchanges need an internet infrastructure which
is able to withstand attacks. The rules to set-up an exchange are strict in the U.S. and
also in UK or Germany for example. Prominent is the default of Mt. Gox exchange in
Japan in 2012. They reported that they lost 7540 000 Bitcoins of their customers which
amounts to USD 450 million. The counter party risk of exchanges matters for the clients
since most convert their electronic currencies into Bitcoins and leave the Bitcoin at the
exchange. The exchange acts as a bank. Moore and Christen (2013) estimate that 45
percent of the currency exchanges terminated operations. While large exchanges often
faced security problems, the reasons for the smaller ones are unknown. Therefore, if the
exchange which in fact act as a bank holding Bitcoins accounts of the customers shuts
down counter party risk realized. The loss given default following Moore and Christen
(2013) is 46% - only 54% of the closed exchanges reimburse their customers.

Furthermore, the proof-of-work mechanics consumes a lot of physical energy. The

PoW is estimated to need as much energy in 2021 as Denmark. Therefore, only a few
networks such as that one for Bitcoin can be added in the world before touching the
limits of energy consumption.

The following statements which are often heard in the FI summarizes the discussion:

Blockchain yes; Bitcoin no.

But this statement does not mean that a dierent coin based on a blockchain which
is more mature can become an important crypto currency, see the Section about Libra
below.
Figure ?? gives an overview of dierent blockchain consus mechanism. The gure
shows that there is no such thing as 'the' blockchain technology but that there are
many dierent types of technologies. All technology has pros and cons which are to be
considered if a specic application is to be implemented.

5.4.4 Bitcoin Blockchain Security

So far the Bitcoin network did not suered from a fork. This leads to the folklore that
the blockchain underlying Bitcoin is secure. Can this statement be proven?

We start with some market facts. In 2016 the most active miners are located in China
who cover around 50% of the total market share (Tasca (2016)), followed by Europe with
around 25%. This is also reected in the traded currency pairs. The traded volume
CNY/BTC is about three times larger than the USD/BTC one. This dominance of
Chinese activity can also be observed in the number of active Bitcoin clients normalized
516 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Figure 5.42: Overview of dierent consensus mechanism for blockchains. (www.tokens-

economy.com (2020)

by the number of users which have direct access to the internet: The number in China
is around 5 times larger than the second largest numbers of the US or Russia. Bitcoin
start-ups raised around USD 1 bn in the three years 2012 − 2015 with an annual growth
rate of 150%. This rate dominates other start-up rates such as crowdfunding, lending
or banking in general by factor 2 − 3. If a mining pool gains 51 percent of computing
capacity, they can attack the network by rewriting in principle all blocks and generate a
new blockchain. The pool gash.io in January 9, 2014 possessed 45 percent of the min-
ing power and needed to appeal pool members to exit the pool. Summarizing, mining
industry is an oligopoly where the market share of the ten largest miners is between
70% − 80% by the end of 2015 (Tasca (2016). This raises security concerns since to gain
51% consensus about a block transaction verication becomes more risky the less miners
contribute to the majority value.

A stream of theoretical work focus on a rational analysis of the system. They treat
Bitcoin as a game between competing rational single miners or pools of miners which
maximize a utility function which captures the incentive structure for the system. The
goal is to prove under which condition Bitcoin achieves a stable game theoretic equi-
librium. Overall, the results are rather pessimistic. This means, unless one does not
impose strong conditions attacks on the Bitcoin mining protocol follow leading for exam-
ple to forks on the blockchain. Eyal and Sirer (2013) for example show that the Bitcoin
protocol is not incentive-compatible. They show that an attack from colluding miners
5.4. CURRENCIES AND CRYPTO-CURRENCIES 517

leads to a revenue which exceeds their fair revenue value. They propose a modication
of the protocol which then protects against selsh mining pools. Sompolinsky and Zohar
(2013) analyze the implications of high volune throughput on Bitcoin's security against
double-spend attacks. They show that the strength of the attacks can weaken to reverse
even accepted transactions if volume increases. They propose a reorganization of the
Bitcoin blockchain by new rules which have been implemented by the Ethereum project.
the expected success outlook of a competing mining pool. Lewenberg et al. (2015) ana-
lyze the stability of mining pools. The authors examine the dynamics of pooled mining
and how they should share the rewards when they behave in a cooperative way. Using
cooperative game theory, for particular networks under under high transaction loads the
distribution of the rewards is unstable. This means, some miners have an incentive to
switch between the pools. These ndings are in contrast with the empirical observation
no fork or substantial slowdown that is attributed to rational attacks has been observed
to date.

Given this dierence between theory and observations, Badertscher et al. (2018) ask:

How come Bitcoin is not broken using such an attack? Or, stated dierently, why
does it work and why do majorities not collude to break it?
Why do honest miners keep mining given the plausibility of such attacks?

They use a rational-cryptography framework for capturing the economic forces that
underly the tension between honest miners and deviating miners, and explain how these
forces aect the miners' behavior. They show how expected revenues of the miners in
combination with a high monetary value of Bitcoin, can explain the fact that Bitcoin is
not being attacked in reality even though majority coalitions are in fact possible. Hence,
assumptions about the miners' incentives, which depend solely on costs and rewards for
mining, can substitute the honest-majority assumption.

5.4.5 Libra
Facebook (FB) published in June 2019 the details of the Libra blockchain. Compared
to most small Fintech initiatives, the Libra Association was populated by economic and
technological giants: Mastercard, PayPal, Visa, Ebay, Uber, Lyft, Spotify, Vodafone,
Coinbase among others. Some of them such as Mastercard, PayPal or Visa are nancial
intermediaries whereas Facebook is social network and Vodafone is a telecommunication
rm. Hence, the Libra Association consisted of around 100 rms of dierent sectors. Note
that during 2019 almost all payment rms withdraw due to the strong political pressure
in the US on Libra, see below. The cryptocurrncy coins should have low volatility (stable
coin) relative to some stable at currency to avoid Bitoin-like volatility. Therefore, Libra
is linked to a broad basket of ordinary currencies and low-risk government bonds. The
coins should transfer via Facecook channels to the payment centers PayPal or Visa,
they are traded at Coinbase, stored at Xapo and accepted at Ebay, Uber and Spotify.
Summarizing, see Figure 5.43, Libra is a hybrid structure.
518 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Balance Sheet
Liability
Asset
Equity

Libra Authorized
reserve resellers

Libra End-Users
Financial Sector Interface P2P-Financial P2P
Sector

Figure 5.43: The structure of Libra. The gure shows the interconnected balance sheet
in the economy which is split into the traditional nance sector which backs Libra, the
traders linking the P2P economy of Libra with the blockchain (BC) and the nancial
sector. in the P2P part, end users can act in a P2P way with other end users or with
the traders (Adapted from Müller (2019).

The linkage to the nancial sector via the reserves stabilizes the currency which is a
main advantage to the else highly volatile crypto currencies. But this link also generates
some delicate issues. Libra is linked to the traditional payment system infrastructure
which is old and has to be renewed. It is too costly and too slow in particular if cross
country payments are considered. Given this link, Libra can be seen as a New USD
where the old infrastructure is only partly used but the currency is privately controlled
and mined. In this sense Libra can be seen as a wake-up call for the traditional payment
infrastructure: Either they develop a new system in the near future or the private sector
will simply install such a system.

Therefore, political actors are watching the currency project of Facebook suspiciously.
Some fear that Libra could put systemically important banks under pressure and severely
restrict the monetary policy leeway of states. Above all, nancial politicians in the USA
are critical. Some accuses Libra of endangering national security, posing a risk to cyber
security and torpedoing data protection. The fact that FB is the gurehead of Libra is
proving to be a heavy burden in the political arena. Many politicians are currently wor-
ried about data protection. Facebook has already repeatedly trampled on the protection
of the data of billions of people in the past, which is why Libra can also be expected to
5.4. CURRENCIES AND CRYPTO-CURRENCIES 519

commit gross data protection violations, according to the tenor.

Critical voices can also be heard in Europe. Some predict that Facebook could be-
come a shadow bank that circumvents regulations. The French Finance Minister called
on central banks worldwide to investigate whether the new Facebook currency could be
a possible gateway for money laundering and terrorist nancing. FB announced that it
has a technological solution up its sleeve for the anonymity problem of its blockchain
platform and that identities could be veried. In addition to data protection and system
stability, the governments in Europe and the USA also have concrete interests at stake.
If Libra were to become a success and its reserve policy increasingly abandoned the tra-
ditional currencies, this could considerably reduce the money creation prots of states.
In Switzerland, the Swiss National Bank (SNB) currently distributes one billion francs
a year to the Confederation and the cantons. Since the dollar is still the world's reserve
currency, seigniorage is particularly important in the USA. US President Donald Trump
recently stressed the international supremacy of the dollar and explained that there is
only one right currency in the US.

While Libra is facing this opposition, maybe the real problem for the monetary sys-
tem is in China. China is heavily investing in new payment technologies and while the
western governments focus and stop Libra a Chinese crypto currency can emerge which
cannot be stopped as Libra and which is then likely to challenge the USD as the main
world currency. The Chinese initiative is based on the advanced technological status and
its acceptance by the population.

To understand the impact of networks, we consider Tencent. Tencents WeChat

dominates the Chinese messaging app market because it combines social media with
e-commerce and payment functionality. Everyday life in China is now characterized by
new opportunities: When you get to know each other, you scan your WeChat QR code,
and anyone who buys sweet potatoes from a street shop also pays with WeChat - even
the beggars have now switched to the new form of payment and have labels with a QR
code at hand. Analysts assume that WeChat was successful not only because of its ease
of use and broad functionality, but also because WeChat was able to build on many
existing users, and so network eects came into play from the very beginning. This could
be an indication of Libra's future success. Finally, Facebook with WhatsApp dominates
the messaging market in many countries, which is why network eects are likely to have
an impact right from the start.

A further problem is that Libra can be seen as a derivative of the USD. But then
several intricate regulatory questions arise.

Since regulators emphasize 'mass regulation before mass adoption' Libra faces a rough
regulatory process; By Libra's goals it will have systemic importance and thus has to
comply with highest prudential standards. The importance of central governance and
FB data privacy track record add to the concerns.
520 CHAPTER 5. ASSET MANAGEMENT INNOVATION

Who could use Libra? There are almost 2.5 billion Facebook users. This denes an
enormous client potential if Libra would be open to retail clients too. Many of these users
live in places where there is little trust in the traditional nancial and state institutions.
If Libra is stable in value preservation, why should these people not use this system?
Even if the system receives a vast amount of data about the behavior and preferences of
customers?
The Libra code is open source and Facebook creates a own blockchain using its
own programming language Move. The blockchain is not owned by Facebook but by the
association. The association is based in Switzerland and hence Facebook is relieving itself
of its responsibility to governments and regulators. Each member operates validator-
nodes (miners) and the fee to become a member is 10 mn USD. This amount times the
100 starting member denes the 1 bn USD backing of the coin by USD. Facebook is
just one member. With this structure Facebook cannot be accused to control a possible
worldwide crypto-currency and the decentralization simplies it for Facebook that the
currency is used on the Facebook channels which is the goal of Facebook. It is intended
that Libra which at the beginning should be a wholesale cryptocurrency should become
open to anybody.
As in a classic blockchain, miners attach to a read-only database transactions bun-
dled into blocks as part of a consensus process. The blockchain uses the best features
of other structures such as Ethereum, Ripple and IOTA among others. The blockchain
should scale to billions of accounts, require high transaction throughput, low
latency and an ecient storage system for high capacity. Source: LIBRA
White Paper. It is a centralised enterprise, potentially a gigantic systemically relevant
fund manager (100% backup), supporting government debt. Even if Libra could tech-
nically manage blockchain consensus for many miners it will have no self-interest to go
the 100 nodes, as this would dilute RoE. As it is also explicitly stated, LIBRA is just
the starting point: The system should become the basis for future innovations in the -
nancial sector. Note that the programming language Move allows to create digital assets
and smart contracts in general. Transactions are only functions of the current state of
the blockchain and not of historical states. It thus keeps the option open to prune old
transaction data or to enable full nodes to verify transactions even if they do not have
the full history. This would reduce the storage problem dramatically. Ethereum data
base which requires the full history has to date reached 1 and 2 terabytes in size.

The blockchain is transparent and users can store their own keys and verify the
blockchain. Libra is a permission-free, low-cost digital payment method. Libra is chal-
lenging payment service providers not in the association and the issuers or at money.
Chapter 6

Proofs
We prove Proposition 37:

Proof. To prove the proposition the standard no arbitrage argument is used. Assume
F (t, T ) > S(t) · er(T −t) . We set up a portfolio W as follows. We borrow a money amount
S(t) to buy the cheap stock S and go short the more expensive forward for the same
amount. Then W (t) = 0. At T , we pay back the loan, sell the stock to full the forward
contract obligation and settle the forward contract which pays S(T ) − F (t, T ). The cash
balance at T is

W (T ) = −S(T ) · er(T −t) + S(T ) − (S(T ) − F (t, T ))

= −S(T ) · er(T −t) + F (t, T ) > 0

Using such a strategy we start with zero value and end at T with certainty with a a
positive value - this is an arbitrage which allows for the construction of a money machine.
A similar argument applies for the other inequality.

We prove Proposition 4:

Proof. The proof follows from the fact that the variance of the sum is equal to the sum
of the variances since there is no covariance:
   
N N
X 1  1 X Nc
σp2 = var  Rj = 2 var  Rj  ≤ 2
N N N
j=1 j=1

with c the largest variance of all N assets.

We prove Proposition 5:

Proof. The proof is only slightly more complicated than the former proof, and leads to
the result:
var 1
σp2 = + (1 − )cov .
N N

521
522 CHAPTER 6. PROOFS

By increasing the number N of assets, the average portfolio variance var can be made
arbitrarily small - the portfolio variance is determined by the average covariance. But
the average portfolio covariance approaches a non-zero value.

We prove Proposition 7:

Proof. Dierentiate both sides of the equation f (tu) = tf (u) with respect to t, apply the
chain rule, and choose t = 1. For the converse, let g(t) = f (tu). Since htu, ∇f (tu)i =
f (tu) we have
1 1
g 0 (t) = hu, ∇f (tu)i = f (tu) = g(t) .
t t
Solving this dierential equation for g implies g(t) = g(1)t. This implies f (tu) = g(t)t(f u).

We prove Proposition ??:

Proof. Starting for the assumption µm = λµa +(1−λ)µp and assuming that the expected
return of the passive investment equals market return at once also follows that active
return has to be equal to passive return.

We prove the optimal dynamic investment decision rules of the Merton models 4.13:

Proof. We rst split the integral in two parts for small dt:
Z t0 +dt Z T
J(t0 , w0 ) = max Et0 ,w0 u(t, c, W )dt + u(t, c, W )dt + f (W (T ), T )
c t0 t0 +dt
dWt = g(t, c, W )dt + σ(t, c, W )dBt , W (t0 ) = w0 . (6.1)

Using the Principle of Optimality, the control function in the second integral should be
optimal for the problem beginning at t0 + dt in the state W (t0 + dt) = w0 + dW . Hence,

Z t0 +dt Z T
J(t0 , w0 ) = max Et0 ,w0 u(t, c, W )dt + Et0 +dt,w0 +dW u(t, c, W )dt + f (W (T ), T ) .
c t0 t0 +dt
hR i
T
Optimality implies Et0 +dt,w0 +dW t0 +dt u(t, c, W )dt = J(t0 + dt, w0 + dW ), i.e.

Z t0 +dt
J(t0 , w0 ) = max Et0 ,w0 u(t, c, W )dt + J(t0 + dt, w0 + dW ) . (6.2)
c t0

We next approximate the second value function since dt is small. This also allows us to
assume that the control c is constant over a time interval with length dt. We get:

J(t0 , w0 ) = max Et0 , w0 [u(t, c, W )dt + J(t0 , w0 ) + ∂t J(t0 , w0 )dt

c
1 2
+ ∂w J(t0 , w0 )dW + ∂ww J(t0 , w0 )(dW )2 ] + o(dt) . (6.3)
2
523

This looks like a second order expansion in the state variable - but the square of Brownian
motion (dB)2 ) is linear in time (see the part and appendix on continuous time nance),
2 2 2
i.e. (dW ) = (g(t, u, W )dt + σ(t, u, W )dB) = σ dt. The only random component in the
above value function expression is therefore the term ∂w JdW . Since E[dB] = 0, we get

E[∂w JdW ] = ∂w Jgdt .

Dividing by dt we nally get the fundamental partial dierential equation (PDE)

1 2 2
0 = max u + ∂t J + ∂w Jg + + ∂ww Jσ . (6.4)
c 2

Therefore,

1. Taking formally the derivative w.r.t. to c in the above PDE gives us optimal
decision making c as a function of the unknown value function J.

2. Reinsert this candidate into the fundamental PDE (6.4) solve the resulting J-
equation with the boundary and initial conditions (if any).

3. Use this explicit solution J to obtain the fully specied optimal policy c∗t and the
∗
optimal controlled state dynamics Wt .

Insering J(t, W ) = e−rt V (W ). into the fundamental PDE leads after cancelling of the
exponential function to

ca

2 2
0 = max − rV + ∂w V g + +1/2∂ww V σ .
c,ω a
(6.5)

The wealth dynamics Wt follows from the asset dynamics and the consumption rate.
There is a risky asset with dynamics dS/S = µdt + σdB where the drift and the volatility
are constant and a so-called risk less asset with dynamics dB = Brdt. The growth rate of
wealth is the equal to the weighted sum of the asset growth rates minus the consumption
rate, i.e.
dW/W = ωdS/S + (1 − ω)dB/B − c/W dt .
The weight ω is equal to the number of risky assets times their price S divided by total
wealth. Inserting the asset dynamics in the wealth growth rate equations gives the nal
wealth dynamics:

dW = (ωµW + (1 − ω)rW − c)dt + σωW dB .

Inserting this dynamics in the fundamental PDE gives:

ca

1
0 = max − rV + (ωµW + (1 − ω)rW − c)∂w V + + (σωW )2 ∂ww
2
V . (6.6)
c,ω a 2
524 CHAPTER 6. PROOFS

Taking the derivative w.r.t. to the two choice variables, setting them to zero gives the
candidate solutions (First Order Conditions):

∗ 1
∗ r−µ 1
c = (∂w V ) a−1 , ω = ∂w V . (6.7)
σ 2 W ∂ww
2 V

This candidate optimal choice solution possess a drawback - they depend on the yet
unknown value function. One has to determine the value function V. To achieve this,
we reinsert the optimal candidate functions into the fundamental PDE. This gives an
equation for the unknown value function V:
1 1−a (r − µ)2 (∂w V )2
V = (∂w V ) a−1 + rW ∂w V − . (6.8)
a 2σ 2 2 V
∂ww
This is a highly non-linear equation but the value function V (w) is proportional to the
expected value of ca . Therefore, a guess is V (W ) = αW a as a candidate solution with
α a constant. Testing this guess in the PDE we see that all terms are proportional to
W a : We can factor out this power function times a complicated function which does
not depend on the state variable W. Since this product has to be zero for all W, the
complicated function has to be zero which gives us a value for the constant α and we
obtained in this way a solution for the unknown value function. To carry this out we
insert this guess into (6.8):

(r − µ)2 a

1 − a a−1
a 1
0=W α α − 1 + ra − .
a 2σ 2 a − 1
| {z }
=:F (α)

That is, the state variable dependence W

a appears in each term of the original PDE
a
and can be factored out. Therefore V (W ) = αW solves the PDE if F (α) = 0. This
equation can be solved explicitly, leading to a constant α∗ . Hence we found a solution
for the value function PDE which then provides us an explicit solution for the choice
variables:
1 µ−r 1
V (W ) = α∗ W a , c∗ = W (aα∗ ) a−1 , ω ∗ = .
σ2 1 − a

We prove the Markowitz Proposition 69:

Proof. With the Lagrangian

1
L = hφ, Cφi + λ1 (1 − he, φi) + λ2 (r − hµ, φi),
2
the rst order conditions
 
∂L
∂φ1
 ∂L 
∂L  ∂φ2 
0 = :=  .
 , (6.9)
∂φ  . 
 . 
∂L
∂φN
525

are a set of N equations. From

1 ∂hφ, Cφi ∂hφ, µi

= Cφ , =µ
2 ∂φ ∂φ
we get

0 = Cφ − λ1 e − λ2 µ (6.10)

1 = he, φi (6.11)

r = hµ, φi . (6.12)

The optimality conditions are therefore N + 2 linear equations in the N + 2 variables

φ, λ1 , λ2 . To solve this system, we proceed as follows. Since C is strictly positive denite,
C −1 exists and (6.11) implies

φ = λ1 C −1 e + λ2 C −1 µ .

Multiplying this last equation from the left with e and µ, respectively, and using the
normalization condition and the return constraint, we get a linear system for the two
Lagrange multipliers:

1 = λ1 he, C −1 ei + λ2 he, C −1 µi
r = λ1 hµ, C −1 ei + λ2 hµ, C −1 µi . (6.13)

If we set τ = (λ1 , λ2 )0 andy = (1, r)0 the last system reads

he, C −1 ei he, C −1 µi

y = τ =: Aτ . (6.14)
hµ, C −1 ei hµ, C −1 µi

If A is invertible, we are done since then y = Aτ can be trivially solved for τ. This
∗
determines the Lagrange multipliers λi and inserting this result in φ∗ = λ∗1 C −1 e+λ∗2 C −1 µ
gives us the optimal portfolio and proves the proposition. We prove that within the given
model, the matrix A is invertible, i.e. we claim that det A = ∆ > 0. To prove this we
use the Cauchy-Schwartz inequality, i.e. for two arbitrary vectors x, y we have

|x|2 |y|2 ≥ hx, yi2 ,

where the strict inequality holds if the two vectors are independent. To rewrite the
determinant in the form needed for the Cauchy-Schwartz inequality, we have rst to
dene the vectors x, y . Therefore, we use the decomposition C = U U 0, which always
exists for strictly positive denite, symmetric matrices. Using this, we get

he, C −1 ei = he, (U U 0 )−1 ei = he, (U 0 )−1 U −1 ei = hU −1 e, U −1 ei =: hx, xi ,

where we used
hx, A0 Axi = hAx, Axi
526 CHAPTER 6. PROOFS

and properties of the matrix inverse. Proceeding in the same form with the other elements
of A and dening hµ, C −1 ei = hy, xi we get

det A = hx, xihy, yi − hx, yi2 .

Hence, using the Cauchy-Schwartz inequality det A ≥ 0 follows. Since µ and e are
linearly independent, the same holds for x = U −1 e and y = U −1 µ too. This nally
proves det A > 0 and the prove of the proposition is completed. For further reference,
we note the optimal multiplier values:

1
λ∗1 = (A−1 y)1 = −hµ, C −1 µi + rhe, C −1 µi

(6.15)
∆
1
λ∗2 −1
−he, C −1 µi + rhe, C −1 ei .

= (A y)2 = (6.16)
∆

We prove the Mutual Fund Proposition 70:

Proof. Let φ and ψ be two solutions of the Markowitz portfolio problem. Then they
satisfy linear FOC but then also any convex linear combination aφ + (1 − a)ψ also
satises the FOC. Using the that the sum of weights times the Lagrange multipliers add
up to one, the weight a follows.

We prove the Value-at-Risk formula (4.21)

Proof. The V aRα for the quantile α and a xed time horizon solve implicitly the in-
equality:
P (X ≤ −Varα ) ≤ α .
If P ∼ N (µ, σ 2 ), the inequality reads
Z −VaRα (x−µ)2
1
√ e− 2σ 2 dx ≤ α .
2πσ −∞

With the change of coordinates z = x−µ

σ this becomes
Z −VaRα
1 1 2
√ e− 2 z σdz ≤ α ,
2πσ −∞
with σ the Jacobian, i.e.

Z − VaRσα −µ
1 1 2
√ e− 2 z dz ≤ α .
2π −∞

The upper limit of the integral depends on α, the mean and variance. Setting the variance
to unity and the mean to zero, then for a given α the critical factor k or the VaR follows.
For α = 0.01, i.e. a VaR of 99% condence, numerically solving
Z kα
1 1 2
√ e− 2 z dz ≤ 0.01
2π −∞
527

kα = −2.33 follows. Increasing the condence interval to 99.9 percent, i.e. α = 0.001,
the critical value becomes kα = −3.09.

We use these insights in the VaR calculation. From

Z − VaRσα −µ
1 1 2
√ e− 2 z dz ≤ α .
2π −∞

follows
VaRα −µ
− ≤ kα ,
σ
or the V aRα is under normality equal to

−VaRα ≤ σkα + µ .

Since the VaR-constraint binds,

−VaRα = σkα + µ .

This is the VaR for a xed time horizon. Calculating for example variance on an annual
basis but the VaR on a weekly basis, the square-root rule implies:

√
−VaRα = σkα T

assuming zero mean and in the example T = 52.

ŝF (x) = lim sF (z)

z∈C+

We prove Proposition 79:

Proof. We prove the SML relationship and form a portfolio consisting of asset i and the
market portfolio M where we invest the fraction of wealth φ in i and 1−φ in M. The
expected rate of return of this portfolio is

µφ = φµi + (1 − φ)µM

and the variance is

σφ2 = φ2 σi2 + (1 − φ)2 σM + 2φ(1 − φ)cov(i, M ) .

As a function of φ, the pair (σφ , µφ ) traces out a curve in the risk-return space. The
curve cannot cross the CML, this would violate the property that the CML is an ecient
boundary of the feasible region. Hence, as a passes through zero, the curve traced out
by (σφ , µφ ) must be tangent to the CML at M. In other words, the slope of the CML
and of the curve at the point M must be equal, where the point M is where φ = 0.
528 CHAPTER 6. PROOFS

Calculating the slopes and setting them equal, implies

µi − µ0 = βi (µT − µ0 ) . (6.17)

(µi − µM )σM µM − Rf
2 = .
cov(i, M ) − σM σM
Solving for the expected return of asset i proves the claim.

We prove Propsition 84:

Proof.

We prove Proposition 30:

Proof. The proof uses the Separating Hyperplane Theorem. A hyperplane can be written
in the form
hx, ai = d .
Subspaces of Rn are kernels of linear maps F. The Riesz-Fischer theorem implies

F (x) = hx, ai = 0

for the kernel. Since each ane subspace is representable by F (x) = d we dene:

Denition 111. The hyperplane Hx through the vector x is dened by:

Hx = {a ∈ Rn |hx, ai = d} , (6.18)

and the half spaces Hx+,− are dened by:

≥
Hx+,− = {a ∈ Rn |hx, ai ≤ d} . (6.19)

Let U, V be two subsets of Rn . The hyperplane H separates the sets U, V ⇐⇒ U and

V are in dierent half spaces. The hyperplane H separates the sets U, V strictly ⇐⇒
Hx separates the sets and the sets are disjoint.
Proposition 112 (Separating hyperplane theorem) . Let C and K be two disjoint and
convex subsets of Rn .
Let C be compact and K be closed. Then there exits a hyperplane
H , which separates C and K strictly.
529

The compactness of one set is necessary:

1 1
U = {(x, y) ∈ R2 | x > 0, y ≥ } , V = {(x, y) ∈ R2 | x > 0, y ≥ − } .
x x
The sets are disjoint and convex. But they are not compact and therefore they cannot
be strictly separated.

Figure 6.1: The hyperplane separates the two convex sets A and B in R2 . A set is convex if
any 'line with end and starting point in the set remains fully in the set'.

We show that there exists a hyperplane through z0 , which is perpendicular to y0 x0

and which does not intersect U, V . Let

d(C, K) = inf ||x − y||

x∈C,y∈K

be the shortest distance between C and K. For C compact and K closed a minimizing
point x0 , y0 exist, i.e.

d(C, K) = ||x0 − y0 || > 0 .

Let Hx0 be the hyperplane through x0 , which is perpendicular to the line y0 x0 . We write
Hx0 as follows:

Hx0 = {z ∈ Rn |hy0 − x0 , z − x0 i = 0} .

The function φ(λ) measures the distance between y0 and x:

φ(λ) := ||y0 − (x0 + λ(x − x0 ))||2

= hy0 − x0 , y0 − x0 i − 2λhy0 − x0 , x − x0 i + λ2 hx − x0 , x − x0 i.
530 CHAPTER 6. PROOFS

This function is continuously dierentiable and we have φ(λ) ≥ φ(0), ∀λ ∈ [0, 1], since
x0 is closest to y0 . Therefore, φ0 (λ) = −2hy0 − x0 , x − x0 i + 2λhx − x0 , x − x0 i and

φ0 (0) = −2hy0 − x0 , x − x0 i ≥ 0 ,

i.e.
hx0 − y0 , x − x0 i ≤ 0 , ∀x ∈ U,
since C is convex. In the same way one shows that for Hy0 the inequality

hy0 − x0 , y − y0 i ≤ 0 , ∀x ∈ U

holds. Since for all y∈V

hy0 − x0 , y − y0 i = hy0 − x0 , y0 − x0 i + hy0 − x0 , y − y0 i ≥ 0,

it follows that Hx0 separates the sets C, V and the same is true for Hy0 . Therefore, Hz0
separates the sets strictly.

We prove Proposition 30.

Proof. ⇒. Let ψ be a vector where all components are strictly positive. We claim that
it is a state vector if each attainable payo V = Pφ implies hψ, V i = hS0 , φi (we omit
the time index T ). V = Pφ implies

hψ, V i = hψ, Pφi = hP0 ψ, φi.

If ψ is a state vector, S0 = P0 ψ , i.e. hψ, V i = hS0 , φi follows. If hψ, V i = hS0 , φi,

hP0 ψ, φi = hS0 , φi .

This proves the claim.

Hence, for each attainable payo V = Pφ the identity hψ, V i = hS0 , φi holds. There-
fore, if all components of V are positive, also hS0 , φi ≥ 0 holds, i.e. arbitrage is not
possible.

⇐. We set

M = {(x, xK+1 ) ∈ RK+1 | x = Pφ, xK+1 = −hS0 , φi = −V0 }

and X
K = {x ∈ RK+1 | xi ≥ 0, xi = 1} .
i

M is an augmented space of payos. It consists of all payos at date T plus the price
of the portfolio −hS0 , φi at time zero. K is a simplex. M is a convex and closed set
and K is compact. Since the compact set lies in the positive orthant the denition of no
531

arbitrage implies that M and K are disjoint. The Separation Theorem then applies:

There exists a vector z ∈ RK+1 such that hz, xi < b < hz, yi for all x ∈ M, y ∈ K .

Since M z ∈ M ⊥ . But
is a linear space, these inequalities can only hold if the vector
then b > 0. Since also hz, yi > b > 0 for y ∈ K , all components of the vector z are
zk
positive. This allows us to dene the state price density as ψk :=
strictly
zK+1 and ψ
solves S0 = P0 ψ . To prove this, recall that z ∈ M⊥ and therefore for each strategy vector
φ∈ RN :
0 = hzK+1 ψ, Pφi − zK+1 hS0 , φi = zK+1 (hP0 ψ, φi − hS0 , φi) .
Therefore, hP0 ψ, φi = hS0 , φi, i.e. P0 ψ = S0 , holds for all strategies φ. This proves the
claim.

We prove the Riesz-Fischer Proposition 55 in nite dimension. That is we prove:

Theorem 113. (Riesz ) Let X be a Hilbert space and p : X → R a linear map. There
exists a vector r∗ ∈ X , the Riesz kernel, such that

p(x) = hr, xi

for all x ∈ X .

Proof. We recall some facts from linear algebra and projection geometry rst:
Let M and M 0 be subspaces of Rn . Then M 0 is the complement of M . If M is a linear
of R , we dene the orthogonal complement M :
subspace
n ⊥

M ⊥ = {x ∈ Rn | hx, y, i = 0 , ∀y ∈ M } .

i each vector x ∈ Rn can be written as the sum of two vectors z∈M and z0 ∈ M 0 i.e.

x = z + z0 .

We write Rn = M ⊕ M 0 . Then M0 is the complement of M i M and M0 have only the

n
zero vector in common and the two spaces span R , i.e.

dim M + dim M 0 = dim Rn = n .

p
If two vectors x, y are orthogonal, i.e. hx, y, i = 0, and the norm norm || • || = h•, •i
is induced by the scalar product. Considering the orthogonal decomposition
n
of R in
M ⊕ M ⊥, the vector y∈M dened by

x = y + y0 , y0 ∈ M ⊥ ,

is the orthogonal projection of x onto M : y = PM x. This vector has minimal distance

to x, i.e.
y = arg min ||x − w||2 , if x = y + y0 , y0 ∈ M ⊥, y ∈ M .
w∈M
532 CHAPTER 6. PROOFS

The kernel and the image of a linear map f : Rn → Rm are dened as follow:

ker f := {x ∈ Rn | f (x) = 0} ⊂ Rn

and
imf := {y ∈ Rm | y = f (x), x ∈ Rn } ⊂ Rm .
The dimension formula
dim Rn = dim ker f + dim imf
holds.

We start to prove the theorem. If l(y) = 0, we set z = 0. Suppose that l(y) 6= 0.

Since iml ⊂ R, we have

dim M = dim ker l + dim iml = dim ker l + 1 = dim ker l + dim(ker l)⊥ .

Since the kernel is a subspace it follows dim(ker l)⊥ = 1. Let e∈M be a basis of (ker l)⊥ .
We decompose the vector y∈M

y = y 0 + λe, y 0 ∈ ker l, λ ∈ R .

Since e and y0 are orthogonal,

he, yi = he, y 0 i + λhe, e, i = λhe, e, i

and therefore
he, yi
λ= .
he, e, i
For all y∈M we get:

l(y) = l(y 0 + λe) = l(y 0 ) + λl(e) = λl(e) ,

where we used the linearity of l and that y 0 ∈ ker l. But this implies

he, yi l(e)e
l(y) = λl(e) = l(e) = hẽ, yi, ẽ = .
he, e, i he, e, i

This proves, that each linear functional can be represented in the claimed form by a
scalar product. Uniqueness follows by taking two dierent vectors ẽ and ẽ0 and showing
that they indeed have to agree.

We prove Proposition 57:

Proof. To do.

We prove Proposition 59:

533

Proof. To do.

We prove Proposition 63:

Proof. We prove the direction 'SDF implies the expected return representation'. Take a
0
SDF M = a + b f and consider an asset i with return Ri . The general covariance formula
applied to 1 = E[M R] implies

1 1 1 1
E[Ri ] = −1− cov(M, Ri ) = −1− b0 cov(f, Ri ).
E[M ] E[M ] E[M ] E[M ]

For the multivariate regression of asset i on the factors

Ri = αi + βi0 f + i

the vector of betas is given by βi0 = Cf−1 cov(f, Ri ) with Cf the factor covariance matrix.
Substituting this expression into the above expected return formula for the asset return
we get

E[Ri ] = κ + Λ0 βi

where
1 1
κ= − 1 ,Λ = bCf .
E[M ] E[M ]
This proves the claim.

To prove the other direction, we assume that E[Ri ] = κ + Λ0 βi holds for some scalar
κ and some vector Λ for each asset i. We search
0
for a, b such that M = a + b f follows.
Since

E[Ri ] = κ + Λ0 βi = κ + Λ0 Cf−1 cov(f, Ri )

it suces to have κ= 1
E[M ] −1 and b = −E[M ]Cf−1 Λ. Choosing

1 1
b=− Cf−1 Λ , a = (1 + µ0f Cf−1 Λ)
1+κ 1+κ

the random variable M = a+b0 f is such that for each asset i the equation E[Ri ] = κ+Λ0 βi
holds. Therefore,
1 1
E[Ri ] = −1− b0 cov(f, Ri )
E[M ] E[M ]
holds too and M is a SDF.

The proof is taken from Wikipedia. We prove the bias-variance equation ??:
534 CHAPTER 6. PROOFS

Proof. For any random variable X we have

2
Var[X] = E[X 2 ] − E[X]

Rearranging:

2
E[X 2 ] = Var[X] + E[X]

Since f is deterministic, E[f ] = f . Given y = f +ε and E[ε] = 0, implies E[y] =

E[f + ε] = E[f ] = f . Since Var[ε] = σ 2
2
Var[y] = E[(y − E[y])2 ] = E[(y − f )2 ] = E[(f + ε − f )2 ] = E[ε2 ] = Var[ε] + E[ε] = σ 2

Thus, since ε and fˆ are independent, we can write

E (y − fˆ)2 = E[y 2 + fˆ2 − 2y fˆ] = E[y 2 ] + E[fˆ2 ] − E[2y fˆ]

= Var[y] + E[y]2 + Var[fˆ] + E[fˆ]2 − 2f E[fˆ]

= Var[y] + Var[fˆ] + f 2 − 2f E[fˆ] + E[fˆ]2
= Var[y] + Var[fˆ] + (f − E[fˆ])2 = σ 2 + Var[fˆ] + Bias[fˆ]2

We prove Proposition 94:

Proof. The following bounds are used over and over in statistical learning theory.

Theorem 114 (Hoeding). Let X1 , . . . , Xn be independent bounded random variables

with Xi taking values in [ai , bi ]. Let Sn = ni Xi . Then for every > 0:
P

−2 2
P (|Sn − E(Sn )| ≥ ) ≤ 2e Wn (6.20)

with Wn2 = − ai )2 .
P
i (bi

The proof uses a technical lemma and the Cherno bounding method.

Lemma 115. Let X be a random variable with expected value zero and taking values in
the interval [a, b]. For s > 0,
2 (b−a)2 /8
E[esX ] ≤ es .

Proof. The convexity of the exponential function implies

x − a sb b − x sa
esx ≤ e + e .
b−a b−a
535

Since E[X] = 0 and setting p = −a/(b − a) if follows

a sb b sa
E[esX ] ≤ − e + e = (1 − p + pes(b−a) )e−sp(b−a) .
b−a b−a

Setting g(u) = −pu + log(−p + peu ), u := s(b − a) we can write

E[esX ] ≤ eg(u) .

The function g satises g(0) = g 0 (0) = 0 by taking the derivative the second derivative
00
satises g (u) ≤ 1/4. Taylor's theorem up to second order around zero implies for some
c ∈ [0, u] (the rst two terms in the series are zero):

1 u2 s2 (b − a)2
g(u) = u2 g 00 (c) ≤ = .
2 8 8

Using this lemma, we prove Hoeding's theorem.

Let X be a non-negative random variable and > 0. The inequality of Markov states

E[X]
P [X ≥ ] ≤ .

Hence for s > 0:
E[esX ]
P [X ≥ ] = P [esX ≥ es ] ≤ .
es
The Cherno method means to nd a positive s such that an upper bound on a random
expression is minimized:

P
P (Sn − E[Sn ] ≥ ) ≤ e−s E[es i (Xi −E[Xi ]) ]
Y
= e−s E[es(Xi −E[Xi ]) ]
i
Y
−s
≤ e es(Xi −E[Xi ])
i
−s s i (bi −ai )2 )/8
P
= e e
−22 /Wn2
:= e

by using rst the Markov inequality, then the independence of the random variables,
then the technical lemma and nally by choosing s appropriately. This concludes the
proof for Sn − E[Sn ]. The same bounds hold for E[Sn ] − Sn and hence the proof of the
theorem follows.

We prove Proposition 98:

536 CHAPTER 6. PROOFS

Proof. Let f be the function were the supremum is attained. Then

χ(Remp (f )−R(f )|≥) χ(Remp (f )−Remp

0 (f )≤/2)
= χ(Remp (f )−R(f )≥ ∧ Remp (f )−Remp
0 (f )≥−/2)
≤ χRemp (f )−Remp
0 (f )>/2 .

Taking expections w.r.t the ghost sample:

0
χRemp (f )−R(f )≥ P (Remp (f ) − Remp (f ) ≤ /2) ≤ P 0 (Remp (f ) − Remp
0
(f ) > /2) .

The inequality of Chebyshev implies:

4var(f ) 1
P 0 (Remp (f ) − Remp
0
(f ) > /2) ≤ 2
≤ 2
n n
since random variables with values in the unit interval have a variance of less than 1/4.
Putting things together we have:

1
χRemp (f )−R(f )≥ (1 − ) ≤ P 0 (Remp (f ) − Remp
0
(f ) > /2) .
n2
Taking expectations w.r.t. the rst sample proves the result.

We prove Proposition 101:

Proof. By denition, we get in update k

hθ∗ , θ(k) i = hθ∗ , θ(k−1) i + hym (θ∗ ), xm i ≥ hθ∗ , θ(k−1) i + γ .

Iteration for k updates

hθ∗ , θ(k) i ≥ kγ .
The next step is to bound the norm ||θ(k) ||2 . By denition, the boundness assumption
of x and k iterations we get:

||θ(k) ||2 = ||θ(k−1) + ym xm ||2 ≤ ||θ(k−1) ||2 + r2 ≤ kr2 .

Therefore, hθ∗ , θ(k) i grows at least linearly and ||θ(k) ||2 increases at most linearly. We
consider the cosine

hθ∗ , θ(k) i kγ
cos(θ∗ , θ(k) ) = (k) 2 ∗ 2
≥√ .
||θ || ||θ || kr2 ||θ∗ ||

By combining the two we can show that the cosine of the angle between θ(k) and θ∗ has
to increase by a nite increment due to each update. Since cosine is bounded, we can
only make a nite number of updates.
Chapter 7

Appendix
AM Firm Description USD bn 52w 2y 3y 5y
Vanguard S&P 500 ETF 224 20.2 14 10.4 15.7
Vanguard 500 Inx 182 19.6 14.1 10.3 15.6
Vanguard TSM Idx, Adm 138 20.2 14 10.4 15.7
iShares:Core Instl Idx, Inst 133 20.2 14 10.4 15.7
Vanguard S&P 500 123 19.5 14 10.1 15.48
Vanguard TSM Idx;Inv 121 19.6 14.1 na na
Vanguard TSM Idx;Inst+ 116 28.1 12.6 6.7 8.1
Vanguard Tot I Stk, Ins 108 19.6 14.1 10.3 15.6
Vanguard TSM Idx, Inst 92 20.2 14 10.4 15.7
Fidelity Instl Indx, InsP 89 30.8 15.5 13.3 16.8
Vanguard Contrafund 88 28.3 12.7 6.8 8.2
Vanguard Tot I Stk, Ins 86 19.6 14.1 10.3 15.6
Vanguard TSM, Idx, ETF 85 14.3 10.5 7.7 10.8
Vanguard Wellington;Adm 85 3.5 2.75 2.3 1.9
American Tot Bd II, INV 84 24.1 15 12.1 16.4
iShares:MSCI Funds Gro, A 81 27.4 9.8 6.1 8.7
Vanguard EAFE ETF 81 3.7 2.9 2.4 2.0
Vanguard Tot BD, Adm 78 20.2 14 10.4 15.7
American 500 Index, ETF 77 12.7 9.8 6.2 9.6
Fidelity Funds Inc, A 72 20.1 14 10.4 15.7
American 500 Idx, Pr 71 15.08 8.5 4.9 7.9
Dodge Funds CIB, A 68 14.9 15.1 9.6 16.4
Vanguard Cox Stock 65 27.5 11.2 7.3 9.3
Dodge FTSE ETF 65 26.5 11.8 4.51 10

Table 7.1: Source: Lipper Performance Report. November 2017

537
538 CHAPTER 7. APPENDIX

Largest Custodians
Rank Provider Assets under custody USD bn Reference date
1 BNY Mellon 28,300 Sep 30, 2014
2 J.P. Morgan 21,000 Mar 31, 2014
3 State Street 20,996 Mar 31, 2014
4 Citi 14,700 Mar 31, 2014
5 BNP Paribas 9,447 Jun 30, 2014
6 HSBC Securities Services 6,210 Dec 31, 2013
7 Northern Trust 5,910 Sep 30, 2014
8 Societe Generale 4,915 Sep 30, 2014
9 Brown Brothers Harriman 3,800 Mar 31, 2014
10 UBS AG 3,438 Sep 30, 2014
11 SIX Securities Services 3,247 Dec 31, 2013
12 CACEIS 3,200 Dec 31, 2013

Table 7.2: Source: globalcustody.net.

Chapter 8

References
1. D. Acemoglu, A. Malekian and A. Ozdaglar, Network Security and Contagion, Journal of
Economic Theory 166, 536-585, 2016.
2. A. Acquisti, C. Taylor, and L. Wagman, The Economics of Privacy, Journal of Economic
Literature 54. 2442-492, 2016.
3. Accenture, Digital Business Era: Stretch Your Boundaries, Accenture Technology Vision
2015, 2015.
4. C. Ackermann, R. McEnally and D. Ravenscraft, The Performance of Hedge Funds: Risk,
Return, and Incentives. Journal of Finance, 833-874, 1999.
5. V. Agarwal, N.D. Daniel and N.Y. Naik, Role of Managerial Incentives and Discretion in
Hedge Fund Performance. The Journal of Finance, 64(5), 2221-2256, 2009.
6. A. Agrawal, J. Horton, N. Lacetera and E. Lyons, Digitization and the Contract Labor
Market: A Research Agenda, in A. Goldfarb, S. Greenstein and C. Tucker, Economics of
Digitization: An Agenda. National Bureau of Economic Research, 2013.
7. h. Albrecher, P. Embrechts, D. Filipovi¢, G. W. Harrison, P. Koch, S. Loisel, P. Vanini
and J. Wagner, Old-Age Provision: Past, present, Future. European actuarial journal,
6(2), 287-306, 2016.
8. G.S. Amin and H.M. Kat, Hedge Fund Performance 1990 - 2000: Do the 'Money Machines'
Really add Value?, Journal of nancial and quantitative analysis, 38(02), 251-274, 2003.
9. M. Andersson, P. Bolton and F. Samama, Hedging Climate Risk, Financial Analysts Jour-
nal, 72(3), pp. 13-32, 2016.
10. R.M. Anderson, S.W. Bianchi and L.R. Goldberg, Determinants of Levered Portfolio Per-
formance, Forthcoming Financial Analysts Journal, UCLA at Berkeley, 2014.
11. R. Anderson and T. Moore, The Economics of Information Security, Science 314, 610-613,
2006.
12. A. Ang, Mean-Variance Investing, Lecture Notes Columbia University, ssrn.com, 2012.
13. A. Ang, Asset Management. A Systematic Approach to Factor Investing, Oxford Univer-
sity Press, 2014.
14. A. Ang, W. Goetzmann, and S. Schaefer, Evaluation of Active Management of the Nor-
wegian GPFG, Norway: Ministry of Finance, 2009. (the Professor's Report)

539
540 CHAPTER 8. REFERENCES

15. A. Ang, S. Gorovyy and G.B. Van Inwegen, Hedge Fund leverage. Journal of Financial
Economics, 102(1), 102-126, 2011.
16. A. Ang, D. Basu, M. D.Gates and V. Karir, Model Portfolios, ssrn.com, 2018.
17. A. M. Antonopoulos, Mastering Bitcoin, O'Reilly Books, New York, 2015.
18. F. Allen and D. Gale, Financial Markets, Intermediaries and Intertemporal Smoothing, J.
Pol. Econom., 105, 523-546, 1997.
19. A. Artzner, F. Delbaen, J.-M. Eber and D. Heaths, Coherent Measures of Risk, Mathe-
matical Finance, 9(3), 203-228, 1999.
20. T. Aste, Blockchain, University College London, Center for Blockchain Technologies,
preprint ssrn.com, 2016.
21. C. S. Asness, Hedge Funds: The (Somewhat Tepid) Defense, AQR, October 24, 2014.
22. C.S. Asness, How Can a Strategy Still Work if Everyone Knows About it? International
Invest Magazine, September, 2015.
23. C.S. Asness and J. Liew, The Great Divide of Market Eciency, Institutional Investor,
March 03, 2014.
24. C.S. Asness, A. Frazzini, R. Israel and T. Mokowitz, Fact, Fiction, and Value Investing,
Forthcoming, Journal of Portfolio Management, Fall 2015, 2015.
25. V. Agarwal, N. D. Daniel, and N. Y. Naik, Do Hedge Funds Manage Their Reported
Returns?, Review of Financial Studies, forthcoming, 2011.
26. V. Agarwal and N.Y. Naik, Multi-Period Performance Persistence Analysis of Hedge Funds,
JFQE, 35(03), 327-342, 2000.
27. F. Allen, J. Barth and G. Yago, Fixing the Housing Market: Financial Innovations for
the Future, Wharton School Publishing-Milken Institute Series on Financial Innovations,
Upper Saddle River, NJ: Pearson Education, 2012.
28. F. Allen and G. Yago, Financing the Futures. Market-Based Innovations for Growth.
Wharton School of Publishing and Milken Institute, 2012.
29. G.O. Aragon and J.S. Martin, A Unique View of Hedge Fund Derivatives Usage: Safeguard
or Speculation? Journal of Financial Economics, 105(2), 436-456, 2012.
30. Assenagon Asset Management, 1. Assenagon Derivatetag am See, 2013.
31. M. Avellaneda and D. Dobi, Structural Slippage of Leveraged ETFs, ssrn.com, 2012.
32. D. Avramov, R. Kosowski, N.Y. Naik and M. Teo, Hedge Funds, Managerial Skill, and
Macroeconomic Variables. Journal of Financial Economics, 99(3), 672-692, 2011.
33. Ph. Bacchetta, C. Tille and E. van Wincoop, Self-Fullling Risk Panics, American Eco-
nomic Review 102, 3674-3700, 2013.
34. K. E. Back, Asset Pricing and Portfolio Choice Theory, Oxford University Press, 2010.
35. Bank of England, The Economics of Digital Currencies, Quarterly Bulleting, Q3, 2014.
36. D. H. Bailey, J. M. Borwein, M. L. de Prado and O. J. Zhux, Pseudo-Mathematics and Fi-
nancial Charlatanism: The Eects of Backtest Overtting on Out-Of-Sample Performance,
Notices of the American Mathematical Society, 61(5), 458-471, 2014.
541

37. M. Baker, B. Bradley and J. Wurgler, Benchmarks as Limits to Arbitrage: Understanding

the Low-Volatility Anomaly, Financial Analysts Journal, 67(1):40-54, 2011.
38. N. Barberis and A. Shleifer, Style Investing, Journal of Financial Economics 68 (2), 181-99,
2003.
39. L. Barras, O. Scaillet, and R. Wermers, False Discoveries in Mutual Fund Performance:
Measuring Luck in Estimated Alphas, The Journal of Finance 65.1, 179-216, 2010.
40. G. Baquero and M. Verbeek, A Portrait of Hedge Fund Investors: Flows. Performance
and Smart Money, ssrn.com, 2005.
41. P.A. Bares, R. Gibson and Gyger, Performance in the Hedge Funds Industry: An Analysis
of Short-and Long-Term Persistence, The Journal of Alternative Investments, 6(3), 25-41,
2003.
42. M. Bech and R. Garratt, Central Cank Cryptocurrencies, BIS Quarterly Review, Septem-
ber, 5570, 2017.
43. L. Bennanni, T. Le Guenedal, F. Lepetit, L. Ly, V. Mortier, and T. Roncalli, How ESG
Investing Has Impacted the Asset Pricing in the Equity Market, 2018.
44. I. Ben-David, F. Franzoni, A. Landier and R. Moussawi, 2012, Do Hedge Funds Manipulate
Stock Prices, Fisher College of Business Working Paper Series.
45. R. Berentsen and F. Schaer, Bitcoin: A Currency Here to Stay?, Swiss Finance Institute
Seminar, Zurich, October, 2014.
46. R. Berentsen and F. Schaer, Bitcoin, Blockchain, Kryptoassets, Universität Basel, 2017.
47. Roland Berger, FinTechs in Europe Challenger and Partner, Zurich, November, 2016.
48. P. L. Bernstein, Wimps and Consequences, The Journal of Portfolio Management, p.1,
1999.
49. BIS, Cryptocurrencies: Looking Beyond the Hype, Bank of International Settlement,
Basel, 2018.
50. Black Rock, ETF landscape: Global Handbook Q1, 2011.
51. F. Black, M. C. Jensen and M.S. Scholes, The Capital Asset Pricing model: Some Empirical
Tests, papers.ssrn.com, 1972.
52. F. Black and R. Litterman, Robert, Asset Allocation: Combining Investor Views with
Market Equilibrium, Goldman Sachs Fixed Income Research Note, September, 1990.
53. R. B. Bliss and R. Steigerwald, Derivatives Clearing and Settlement: A Comparison of
Central Counterparties and Alternative Structures, Economic Perspectives, 30(4), 2006.
54. D. Blitz, Strategic Allocation to Premiums in the Equity Market, ssrn.com, 2011.
55. J.-P. Bouchaud and M. Potters, Financial Applications of Random Matrix Theory: a Short
Review, arXiv preprint arXiv:0910.1205, 2009.
56. D. Blitz, Is Rebalancing the Source of Factor Premiums?, The Journal of Portfolio Man-
agement, Summer 2015, 2015.
57. R. Boehme, N. Christin, B. Edelmann and T. Moore, Bitcoin: Economics, Technology and
Governance, Journal of Economic Perspectives, Vol. 29, 2, Spring 2015, 213-238, 2015.
542 CHAPTER 8. REFERENCES

58. A. Börsch-Supan, K. H. Alcser, Health, Aging and Retirement in Europe: First Results
from the Survey of Health, Ageing and Retirement in Europe. Mannheim: Mannheim
Research Institute for the Economics of Aging (MEA), 2005.
59. A. Börsch-Supan, A. Ludwig, and J. Winter, Ageing, Pension Reform and Capital Flows:
A Multi-Country Simulation Model, Economica 73.292, 625-658, 2006.
60. A. Börsch-Supan, M. Brandt, C. Hunkler, T. Kneip, J. Korbmacher, F. Malter and S.
Zuber, Data resource prole: the Survey of Health, Ageing and Retirement in Europe
(SHARE), International journal of epidemiology, dyt088, 2013.
61. C. Badertscher, J. Garay, U. Maurer, D. Tschudi and V. Zikas, But why does it Work?
A Rational Protocol Design Treatment of Bitcoin, In Annual International Conference on
the Theory and Applications of Cryptographic Techniques, Springer, Cham 34-65, 2018.
62. T. Bourgeron, E. Lezmi and T. Roncalli, Robust Asset Allocation for Robo-Advisors,
arXiv, arxiv.org/abs/1902.07449, 2018.
63. M.W. Brandt, Portfolio Choice Problems, Brandt, in Y. Ait-Sahalia and L.P. Hansen
(eds.), Handbook of Financial Econometrics, Volume 1: Tools and Techniques, North
Holland, 269-336, 2010.
64. M. Brenner and Y. Izhakian, Asset Prices and Ambiguity: Empirical Evidance, Stern
School of Business, Finance Working Paper Series, FIN-11-10, 2011.
65. R. Brian, F. Nielsen and D. Steek, Portfolio of Risk Premia: A New Approach to Diver-
sication, MSCI Barra Research Insights, 2009.
66. S. Browne, Reaching Goals by a Deadline: Digital Options and Continuous-Time Active
Portfolio Management, Adv. Appl. Prob. 31, 551-557, 1999.
67. S. J. Byun and B.H. Jeon Momentum Crashes and the 52-Week High, 2018.
68. R. G. Brown, J. Carlyle, I. Grigg and M. Hearn, Corda: An Introduction, squarespace.com,
2016.
69. S.J. Brown, W. Goetzmann, R.G. Ibbotson and S.A. Ross, Survivorship Bias in Perfor-
mance Studies, Review of Financial Studies, 5(4), 553-580, 1992.
70. S.J. Brown, W. Goetzmann and R.G. Ibbotson, Oshore Hedge Funds: Survival and
Performance, 1989-95, Journal of Business, 72(1), 1999.
71. S.J. Brown, W. Goetzmann and J.M. Park, Conditions for Survival: Changing Risk and
the Performance of Hedge Fund Managers and CTAs, ssrn.com, 1999.
72. B. Bruder, N. Gaussel, J.-C. Richard and T. Roncalli, Regularization of Portfolio Alloca-
tion, Lyxor White Paper Series, 10, 2013.
73. J. Bruna, Mathematics of Deep Learning, Courant Institute of Mathematical Science,
NYU, 2018.
74. C. Burges, A tutorial on support vector machines for pattern recognition. Data mining
and knowledge discovery, 2. Jg., Nr. 2, S. 121-167, 1998.
75. A. Corbellini, Elliptic Curve Cryptography: A Gentle Introduction, webpage of A. Cor-
bellini, 2015.
543

76. R.J. Caballero, Macroeconomics after the Crisis: Time to Deal with the Pretense-of-
Knowledge Syndrome, Journal of Economic Perspectives, Volume 24, Number 4, Fall, 85
- 102, 2010.
77. R.J. Caballero and A. Krishnamurthy, Collective risk management in a ight to quality
episode. The Journal of Finance, 63(5), 2195-2230, 2008.
78. C. Camerer, G. Loewenstein, and D. Prelec. Neuroeconomics: How Neuroscience can
Inform Economics. Journal of economic Literature: 9-64, 2005.
79. J.Y. Campbell and L. M. Viceira, Strategic Asset Allocation: Portfolio Choice for Long-
Term Investors, books.gooble.com; 2002.
80. C. Cao, Y. Chen, B. Liang and A.W. Lo, Can Hedge Funds Time Market Liquidity?,
Journal of Financial Economics, 109(2), 493-516, 2013.
81. M.M. Carhart, On Persistence in Mutual Fund Performance, The Journal of nance, 52(1),
57-82, 1997.
82. Z. Cazalet and T. Roncalli, Style Analysis and Mutual Fund Performance Measurement
Revisited, Lyxor Research Paper, 2014.
83. Y. Chen, Timing Ability in the Focus Market of Hedge Funds, Journal of Investment
Management, 5(2), 66, 2007.
84. Y. Chen, Derivatives Use and Risk Taking: Evidence from the Hedge Fund industry,
Journal of Financial and Quantitative Analysis, 46(04), 1073-1106, 2011.
85. CEM Benchmarking, CEM Toronto, 2014.
86. N. Chatsanga and A.J. Parkes, International portfolio optimisation with integrated cur-
rency overlay costs and constraints. Expert Systems with Applications, 83, 333-349, 2017.
87. P.Cheridito and E. Kromer, Reward-Risk Ratios, Journal of Investment Strategies 3(1),
1-16, 2013.
88. T. Chordia, A. Goyal and A. Saretto, p-hacking: Evidence from Two Million Trading
Strategies, University of Lausanne, preprint, 2017.
89. Y. Choueifaty and Y- Coignard, Toward Maximum Diversication. Journal of Portfolio
Management, 35(1), 40, 2008.
90. M.M. Christensen, On the History of the Growth Optimal Portfolio, University Southern
Denmark, Preprint, 2005.
91. J. Cochrane, Asset Pricing, Princeton University Press, 2005.
92. J. Cochrane, The Dog That Did Not Bark: A Defense of Return Predictability, Review of
Financial Studies 21 (4): 1533 - 75, 2077.
93. J. Cochrane, Discount Rates, Presidential Address AFA 2010, Journal of Finance, Vol
LXVI, 4, August, 2011.
94. P. Cocoma, M. Czasonis, M. Kritzman and D. Turkington, Facts about Factors. The
Journal of Portfolio Management, 43(5), 55-65, 2017.
95. N. Cuche-Curti, O. Sigrist and F. Boucard, Blockchain: An Introduction, Research and
Policy Notes, Swiss National Bank, 2016.
544 CHAPTER 8. REFERENCES

96. J. Cui, F. De Jong and E. Ponds, Intergenerational Risk Sharing within Funded Pension
Schemes. Journal of Pension Economics and Finance 10.01, 1-29, 2011.
97. C. Culp and J. Cochrane, Equilibrium Asset Pricing and Discount Factors: Overview and
Implications for Derivatives Valuation and Risk Management, Modern Risk Management:
A History. Peter Field, ed. London: Risk Books, 2003.
98. T. Dangl, O. Randl and J. Zechner, Risk Control in Asset Management: Motives and
Concepts, K. Glau et al. (eds), Innovation in Quantitative Risk Management, Springer
Proceedings in Mathematics and Statistics 99, 239-266, 2015.
99. V. DeMiguel, V. Galappi and R. Uppal, Optimal Versus Naive Diversication: How In-
ecient is the 1/n Portfolio Strategy?, Review of Financial Studies, 22(5), 1915-1953,
2009.
100. V. DeMiguel, Y. Plyakha, R. Uppal, G. Vilkov, Improving Portfolio Selection using Option-
Implied Volatility and Skewness, Forthcoming in Journal of Financial and Quantitative
Analysis, 2010.
101. G. De Nard, O. Ledoit, and M. Wolf, Factor Models for Portfolio Selection in Large
Dimensions: The Good, the Better and the Ugly, Working Paper No. 290, 2018.
102. M. L. de Prado, Building Diversied Portios that Outperfom out-of-sample, ssrn.com,
May, 2016.
103. L. Deville, Exchange Traded Funds: History, Trading, and Research, Handbook of Finan-
cial Engineering, Zopounidis, Doumpos and Pardalos (eds)., 67-99, 2007.
104. K. Daniel and T. Moskowitz, Momentum Crashes, The Q-Group: Fall Seminar, 2012.
105. K. Daniel and S. Titman, Evidence on the Characteristic of Cross Sectional Variation in
Stock Returns, Journal of Finance 55 (1), 380-406, 1997.
106. Deutsche Bank, Equity Risk Premia, Deutsche Bank London, February, 2015.
107. Deutsche Bank, A New Asset Allocation Paradigm, Deutsche Bank London, July, 2012.
108. F.X. Diebold, A. Hickman, A. Inoue, and T. Schuermann, Converting 1-Day Volatility to
h-Day Volatility: Scaling by Root-h is Worse than You Think, Risk, 11, 104-107, 1998.
109. D. Dobi and M. Avellaneda, Structural Slippage of Leveraged ETFs, Preprint NYU, 2012.
110. J. Dow and S. R. d. C.Werlang, Uncertainty Aversion, Risk Aversion, and the Optimal
Choice of Portfolio, Econometrica, Vol. 60, No. 1, 197 - 204, 1992.
111. M. Dudler, B. Gmür and S. Malamud, Risk-Adjusted Time Series Momentum, Working
Paper, 2014.
112. S. Duivestein, M. van Doorn, T. van manen, J. Bloem and E. van Ommeren, Design to
Disrupt, Blockchain: Cryptoplatform for a Frictionless Economy, SogetiLabs, 2016.
113. E. Van Duuren, A. Plantinga and B. Scholtens, ESG integration and the investment man-
agement process: Fundamental investing reinvented. Journal of Business Ethics, 138(3),
525-533, 2016.
114. F.R. Edwards and M.O. Caglayan, Hedge Fund Performance and manager skill, ssrn.com,
2011.
115. EFAMA, European Fund and Asset Management Association, Annual Figure 2013, 2014.
545

116. EFAMA, European Fund and Asset Management Association, Annual Figure 2017, 2018.
117. D. Ellsberg, Risk, Ambiguity, and the Savage Axioms, Quarterly Journal of Economics,
75, 643-669, 1961.
118. E.J. Elton and M. J. Gruber, Risk Reduction and Portfolio Size: An Analytical Solution,
Journal of Business: 415-437, 1977.
119. Ernst & Young, What's new? Innovation for Asset Management, 2012 Survey, 2012.
120. Ethereum, www.ethereum.org, 2016.
121. ETF Sta, A Short Course in Currency Overlay. etf.com, April, 1999.
122. I. Eyal and E. G. Sirer, Majority is not Enough: Bitcoin Mining is Vulnerable, International
Conference on Financial Cryptography and Data Security. Springer Berlin Heidelberg,
2014.
123. F. Fabozzi, R. J. Shiller, and R. Tunaru, Hedging Real-Estate Risk, working paper 09-12,
Yale International Center for Finance, 2009.
124. M. Faber, A Quantitative Approach to Tactical Asset Allocation. Journal of Wealth
Management 9 (4), 69 - 79, 2007.
125. E.F. Fama, The Behavior of Stock Market Prices, Journal of Business, 38, 34-101, 1965.
126. E.F. Fama, Ecient Capital Markets: A Review of Theory and Empirical Work, Journal
of Finance 25, 383 - 417, 1970.
127. E.F. Fama, Ecient Markets: II, Journal of Finance, 46(5), 1575-1618, 1991.
128. E. F. Fama and J. D. MacBeth, Risk, Return, and Equilibrium: Empirical Tests, Journal
of political economy, 81(3), 607-636, 1973.
129. E.F. Fama and K. R. French, Permanent and Temporary Components of Stock Prices,
Journal of Political Economy 96: (2): 246 - 67. 1988.
130. E.F. Fama and K.R. French, Disagreement, Tastes, and Asset Prices, Journal of Financial
Economics 83 (3), 667-89, 2007.
131. E.F. Fama and K.R. French, A Five-Factor Asset Pricing Model, Journal of Financial
Economics, 116, 1-22, 2015.
132. B. Fastrich, S. Paterlini and P. Winker, Constructing Optimal Sparse Portfolios Using
Regularization Methods, ssrn.com, 2013.
133. J. D. Fisher, D.M. Geltner, and R.B. Webb, Value indices of commercial real estate: a com-
parison of index construction methods. The journal of real estate nance and economics,
9(2), 137-164, 1994
134. T. Fletcher, Machine Learning for Financial Market Prediction, PhD Thesis University
College London, 2012.
135. A. Frazzini and L. H. Pedersen, Betting Against Beta, Journal of Financial Economics
111.1, 1-25, 2014.
136. G. Frahm and C. Memmel, Dominating estimators for minimum-variance portfolios. Jour-
nal of Econometrics, 159(2), 289-302, 2010.
546 CHAPTER 8. REFERENCES

137. P. Franco, Understanding Bitcoin: Cryptography, Engineering and Economics. John Wi-
ley& Sons, 2014.
138. J. Freire, Massive Data Analysis: Course Overview, NYU School of Engineering, 2015.
139. C.B. Frey und M.A. Osborne, The Future of Employment: How Susceptible are Jobs to
Computerisation?, Oxford, September, 2013.
140. W. Fung, D.A. Hsieh, N.Y. Naik and R. Ramadorai, Hedge Funds: Performance, Risk,
and Capital Formation, The Journal of Finance, 63(4), 1777-1803, 2008.
141. W. Fung and D.A. Hsieh, Empirical Characteristics of Dynamic Trading Strategies: The
Case of Hedge Funds, Review of nancial studies, 10(2), 275-302, 1997.
142. W. Gale and R. Levine, Financial Literacy: What Works? How could it be more Eective,
Financial Security Project, Boston College, 2011.
143. J. Gatheral, Random Matrix Theory and Covariance Estimation, New York, October 3,
2008.
144. M. Gao and J. Huang, Capitalizing on Capitol Hill: Informed Trading by Hedge Fund
Managers, In Fifth Singapore International Conference on Finance, 2011.
145. D.M. Geltner, N. G. Miller, J. Clayton, and P. Eichholtz, Commercial real estate analysis
and investments (Vol. 1, p. 642). Cincinnati, OH: South-western, 2001.
146. C. R. Genovese, A Tutorial on False Discovery Control, Carnegie Mellon University, 2004.
147. D.M. Geltner and J. Fisher, Pricing and Index Considerations in Commercial Real Estate
Derivatives Journal of Portfolio Management Special Issue: Real Estate, 1 - 21, 2007.
148. E. Gerbl,Robo-Advisors. Kampf um das grosse Geld, Bilanz, 22.10.2019.
149. M. Getmansky, B. Liang, C. Schwarz and R. Wermers, Share Restrictions and Investor
Flows in the Hedge Fund Industry, Working Paper, University of Massachusetts, Amherst,
2015.
150. M. Getmansky, M.P. Lee, and A. Lo, Hedge Funds: A Dynamic Industry In Transition,
NBER, 2015.
151. G. Gigerenzer and G.Goldstein, Reasoning the Fast and Frugal Way: Models of Bounded
Rationality, in Heuristics: The Foundations of Adaptive Behavior, eds Gigerenzer G.,
Hertwig R., Pachur T., editors. (New York: Oxford University Press; ), 31-57, 2011.
152. C. Gini, Measurement of Inequality of Incomes, The Economic Journal: 124-126, 1921.
153. P. W. Glimcher, and E. Fehr, eds. Neuroeconomics: Decision making and the brain.
Academic Press, 2013.
154. Global Sustainable Investment Alliance, 2016 Global Sustainable Investment Review, GSIA
Report, March, 2017.
155. W.N. Goetzmann, J.E. Ingersoll and S.A. Ross, High-water Marks and Hedge Fund Man-
agement Contracts, Journal of Finance 58, 1685 - 1717, 2003.
156. W.N. Goetzmann and A. Kumar, Equity Portfolio Diversication, Review of Finance, Vol.
12, No. 3, 433 - 463, 2008.
157. W.N. Goetzmann and K. Rouwenhorst, The History of Financial Innovation, Carbon Fi-
nance Spearker Series at Yale, 2007.
547

158. S. Goldwasser and M. Bellare, Lecture Notes on Cryptography, MIT, 2008.

159. I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press, 2016.
160. A. Goyal, Empirical Cross-Sectional Asset Pricing: a Survey, Financial Markets and Port-
folio Management, 26(1), 3-38, 2012.
161. A. Goval and N. Jegadesh, Cross-Sectional and Time-Series Tests of Return Predictability:
What Is the Dierence? Review of Financial Studies, 31(5), 1784 - 1824, 2018.
162. A. Goyal and S. Wahal The Selection and Termination of Investment Management Firms
by Plan Sponsors, Journal of Finance 63, 1805 - 1847, 2008.
163. M. Grinblatt and S.Titman, Mutual Fund Performance: An analysis of quarterly portfolio
holdings, Journal of business: 393-416, 1989.
164. R.C. Grinold, The Fundamental Law of Active Management, The Journal of Portfolio
Management 15.3, 30-37, 1989.
165. R.C. Grinold and R.N. Kahn, Active Portfolio Management. A Quantitative Approach
for Providing Superior Returns and Controlling Risk, McGraw-Hill, Second Edition, New
York, 2000.
166. S.J. Grossman and J. E. Stiglitz, On the Impossibility of Informationally Ecient Markets.
The American Economic Review: 393-408, 1980.
167. S. Gu, B. T. Kelly and D. Xiu, D., Empirical Asset Pricing Via Machine Learning, Chicago
Booth, 2018.
168. Harvey, C. R., Liu, Y. and Zhu, H., . . . and the cross-section of expected returns. The
Review of Financial Studies, 29(1), 5-68, 2016.
169. W. Hallerbach, Disentangling Rebalancing Return, Journal of Asset Management, 15, 301-
316, 2014.
170. J. Hansen, Australian house prices: A comparison of hedonic and repeat-sales measures.
Economic Record, 85(269), 132-145, 2009.
171. L. Hansen and T. Sargent, Robust Control and Model Uncertainty. American Economic
Review 91 (2), 60-66, University Press, 2008.
172. C.R. Harvey, Y. Liu and H. Zhu, The Cross-Section of Expected Returns, Working Paper
ssrn.com, 2015.
173. J. Hasanhodzic, A. W. Lo, and E. Viola, Is It Real, or Is It Randomized?: A Financial
Turing Test, MIT Working Papers, 2010.
174. M. Hassine and R. Roncalli, Measuring Performance of Exchange Traded Funds. ssrn.com,
2013.
175. C. R. Harvey and Y. Liu, Backtesting, Journal of Portfolio Management, Volume 42,
Number 1, 13-28, 2015.
176. C. Harvey and A. Siddique, Conditional Skewness in Asset Pricing Tests, Journal of Fi-
nance, 55:1263-1295, 2000.
177. R. Haugen and A. Heins, Risk and the Rate of Return on Financial Assets: Some old Wine
in new Bottles, Journal of Financial and Quantitative Analysis, 10:775-784, 1975.
548 CHAPTER 8. REFERENCES

178. S. Hayley, Diversication Returns, Rebalancing Returns and Volatility Pumping, City
University London, 2015.
179. J. M. Grin, Are the Fama and French Factors Global or Country Specic?, Review of
Financial Studies, 15(3), 783-803, 2002.
180. S. Gu, B. Kelly and D. Xio, Empirical Asset Pricing via Machine Learning, Booth School
of Business University of Chicago, July 21, 2018.
181. E. Hazan, Theoretical Machine Learning, Princeton University, 2017.
182. G. He and R. Litterman, The Intuition Behind Black-Litterman Model Portfolios, Goldman
Sachs Asset Management Working paper, 1999.
183. R.D. Henriksson and R.C. Merton, On Market Timing and Investment Performance. II.
Statistical Procedures for Evaluating Forecasting Skills, Journal of business, 513-533, 1981.
184. O.C. Herndahl, Concentration in the Steel Industry, Diss. Columbia University, 1950.
185. U. Herold, Portfolio Construction with Qualitative Forecasts, Journal of Portfolio Man-
agement, Fall 2003, 61-72, 2003.
186. E. Hjalmarsson, Portfolio Diversication Across Characteristics, The Journal of Investing,
Vol. 20, No. 4, 2011.
187. S. Holden and J. VanDerhei, 401 (k) Plan Asset Allocation, Account Balances, and Loan
Activity in 2003, Investment Company Institute, Perspective, Vol. 6, No. 1., 2004.
188. K. Hou, C. Xue, and L. Zhang. Replicating Anomalies. No. w23394. National Bureau of
Economic Research, 2017.
189. H. Hong and M. Kacperczyk, The Price of Sin: The Eects of Social Norms on Markets,
Journal of Financial Economics, 93(1), 15-36, 2009.
190. G. Huberman and Z. Wang, Arbitrage Pricing Theory, Federal Reserve Bank of New York
Sta Reports, Sta Report no.216, 2005.
191. J. Huij and M. Verbeek, On The Use of Multifactor Models to Evaluate Mutual Fund
Performance, Financial Management, 38(1), 75-102, 2009.
192. M. Hulbert, The Prescient are Few, New York Times, July 13, 2008.
193. R.G. Ibbotson, P. Chen and K.X. Zhu, The ABCs of Hedge Funds: Alphas, Betas, and
Costs, Financial Analysts Journal, 67(1), 15-25, 2011.
194. T. Idzorek, A Step-By-Step guide to the Black-Litterman Model, Incorporating User-
Specied Condence Levels, Working paper, 2005.
195. T. Idzorek, and M. Kowara, Factor-Based Asset Allocation vs. Asset-Class-Based Asset
Allocation, Financial Analysts Journal, Vol. 69 (3), 2013.
196. A. Ilmanen, Expected Returns: An Investor's Guide to Harvesting Market Rewards, Wiley
Finance, 2011.
197. A. Ilmanen and J. Kizer, The Death of Diversication Has Been Greatly Exaggerated, The
Journal of Portfolio Management, Vol. 38, No. 3, 2012.
198. Investment Company Institute, Prole of Mutual Fund Shareholders, 2014, ICI Research
Report, 2014.
549

199. T. Jaakola, Machine Learning, MIT OpenCourseWare, MIT, 2016.

200. R. Jagannathan and T. Ma, Risk Reduction in Large Portfolios: Why Imposing the Wrong
Constraints Helps, Journal of Finance 58, 1651 - 1684, 2003.
201. R. Jagannathan, A. Malakhov and D. Novikov, Do Hot Hands Exist Among Hedge Fund
Managers? An Empirical Evaluation. The Journal of Finance, 65(1), 217-255, 2010.
202. T. Jaakkola, Machine Learning, MIT OpenCourseWare, 2016.
203. N. Jegadeesh and S. Titman, Protability of Momentum Strategies: An Evaluation of
Alternative Explanations. The Journal of Finance, 56(2), 699-720, 2001.
204. T. Jenkinson, H. Jones and J.V. Martinez, Picking winners? Investment consultants'
recommendations of fund managers, Forthcoming Journal of Finance, 2014.
205. M.C. Jensen, Some Anomalous Evidence Regarding Market Eciency, Journal of Financial
Economics, 6, 95-101, 1978.
206. H. Jiang and B. Kelly, Tail risk and Hedge Fund Returns, Chicago Booth Research Paper,
(12-44), 2012.
207. B. Johnson, A. Laszka, J. Grossklags, M. Vasek and T. Moore, Game-Theoretic Analysis
of DDoS Attacks Against Bitcoin Mining Pools, International Conference on Financial
Cryptography and Data Security. Springer Berlin Heidelberg, 2014.
208. B. Jones, Re-thinking Asset Allocation - The Role of Risk Factor Diversication, Deutsche
Bank Macro Investment Strategy, September 2011.
209. B. Jones, Rethinking Portfolio Construction and Risk Management, Deutsche Bank Macro
Investment Strategy, January 2012.
210. JP Morgan and Oliver Wyman, Unlocking Economic Advantage with Blockchain. A Guide
for Asset Managers, 2016.
211. J. Jogenfors, Key Distribution and Trust, Elliptic Curve Cryptography, Cryptography
Lecture 9, Linkoeping University, 2014.
212. J. Jogenfors, Digital Cash and Bitcoin, Cryptography Lecture 12, Linkoeping University,
2014.
213. E. Jurczenko and J. Teiletche, Active Risk-Based Investing, Working Paper ssrn.com, 2015.
214. Khan Academy, https://www.khanacademy.org.
215. D. Kahneman, Thinking Fast and Slow. New York: Farrar, Straus and Giroux, 2011.
216. D. Kahneman and A. Tversky, Prospect Theory: An Analysis of Decision under Uncer-
tainty, Econometrica 47: 236 - 91, 1979.
217. S. Kandel and R. F. Stambaugh, On the Predictability of Stock Returns: An Asset-
Allocation Perspective, The Journal of Finance, Vol LI, No. 2, 385-424, 1996.
218. H. Kaya, W. Lee and Y. Wan, Risk Budgeting with Asset Class and Risk Class Ap-
proaches', The Journal of Investing, Vol, 21, No. 1, 2012.
219. J.L. Kelly, A new Interpretation of Information Rate, Bell System Technical Journal, 35,
917-926, 1956.
550 CHAPTER 8. REFERENCES

220. W. Kinlaw, M. Kritzman, and D. Turkington, The Divergence of High- and Low-Frequency
Estimation: Causes and Consequences, The Journal of Portfolio Management. Special
40th Anniversary Issue. 2014.
221. W. Kinlaw, M. Kritzman, and D. Turkington, The Divergence of High- and Low-Frequency
Estimation: Implications for Performance Measurement, The Journal of Portfolio Man-
agement, 2015.
222. F. Knight, Risk, Uncertainty, and Prot, New York: Houghton Miin, 1921.
223. M.P. Kritzman, Puzzles of Finance: Six Practical Problems and Their Remarkable Solu-
tions, John Wiley, New York, NY, 2000.
224. R. Kunz, Asset Management, DAS in Banking and Finance, SFI, 2014.
225. Y.K. Kwok, Lecture Notes, University of Hong Kong, 2010.
226. C.H. Lanter, Institutional Portfolio Management, Swiss Finance Institute, Asset Manage-
ment Program, 2015.
227. B. Lawler, B. Mossmann, P. Nolan, and A. Ang, Factors and Advisor Portfolios, preprint
SSRN, July 15, 2019.
228. O. Ledoit and M. Wolf, Improved Estimation of the Covariance Matrix of Stock Returns
with an Application to Portfolio Selection, Journal of Empirical Finance, 10(5), 603-621,
2003.
229. O. Ledoit and M. Wolf, The Power of (Non-)Linear Shrinking: A Review and Guide to
Covariance Matrix Estimation, Working Paper University of Zurich No. 323, 2019.
230. W. Lee, Advanced Theory and Methodology of Tactical Asset Allocation, Duke University,
2000.
231. W. Lee and D.Y. Lam, Implementing Optimal Risk Budgeting, The Journal of Portfolio
Management, 28, 1, 73-80, 2001.
232. O. Ledoit and M. Wolf, A well-conditioned estimator for large-dimensional covariance
matrices. Journal of multivariate analysis, 88(2), 365-411, 2003.
233. O. Ledoit and M. Wolf, Nonlinear Shrinkage of the Covariance Matrix for Portfolio Selec-
tion: Markowitz Meets Goldilocks, Revue of Financial Studies, vol 30, 2018.
234. B. Lehmann and D.M. Modes, Mutual Fund Performance Evaluation: a Comparison of
Benchmarks and Benchmarks' Comparisons, Journal of Finance, 233 - 265 June, 1987.
235. M. Leippold, Resampling and Robust Portfolio Optimization, Lecture Notes University of
Zurich, 2010.
236. M. Leippold, Asset Management, Lecture Notes University of Zurich, 2011.
237. M. Leippold and R. Rüegg, Fifty Shades of Active and Index Alpha, ssrn.com, 2018.
238. M. Leippold and R. Rüegg, FamaFrench factor timing: The long-only integrated ap-
proach, University of Zurich, June 29, 2019.
239. E. Levina and R. Vershynin, Partial Estimation of Covariance Matrices, Probability theory
and related elds, 153(3-4), 405-419, 2012.
240. S. F. LeRoy and J. Werner, Principles of Financial Economics, Lecture Notes, UC Santa
Barbara and U Minnesota, 2000.
551

241. J. Lewellen, S. Nagel and J. Shanken, A Sceptical Appraisal of Asset Pricing Tests, Journal
of Financial Economics 96, 175-194, 2010.
242. Y. Lewenberg, Y. Bachrach, Y. Sompolinsky, A. Zohar and J. Rosenschein, Bitcoin Mining
Pools: A Cooperative Game Theoretic Analysis, Proceedings of the 2015 International
Conference on Autonomous Agents and Multiagent Systems. International Foundation for
Autonomous Agents and Multiagent Systems, 2015.
243. H. Li, X. Zhang and R. Zhao, Investing in Talents: Manager Characteristics and Hedge
Fund Performance, Journal of Financial and Quantitative Analysis, 46(01), 59-82, 2011.
244. B. Liang, Hedge Funds: The Living and the Dead. Journal of Financial and Quantitative
Analysis, 35(03), 309-326, 2000.
245. C.-Y. Lin, Big Data Analytics, Lecture Notes, University of Columbia, 2015.
246. A. Lo, Data-Snooping Biases in Financial analysis. AIMR Conference Proceedings. Vol.
1994. No. 9. Association for Investment Management and Research, 1994.
247. A. Lo, The Statistics of Sharpe Ratios, Financial Analysts Journal, (58)4, 2002.
248. A. Lo, Ecient Markets Hypothesis, The New Palgrave: A Dictionary of Economics, L.
Blume, S. Durlauf, eds., 2nd Edition, Palgrave Macmillan Ltd., 2007.
249. D. Luenberger, Projection Pricing, Stanford University, researchgate.net, 2014.
250. F. Maccheroni, M. Marinacci and D. Runo, Alpha as Ambiguity: Robust Mean-Variance
Portfolio Analysis, Econometrica. Volume 81, Issue 3, pages 1075 - 1113, May, 2013.
251. G. Magnus, The Age of Ageing: Global Demographics, Destinies, and Coping Mechanisms,
First webcast: The Conference Board, 2013.
252. D. Mahringer, W. Pohl and P. Vanini, Structured Products: Performance, Costs and
Investments, SFI White Papers, 2015.
253. S. Maillard, T. Roncalli and J. Teiletche, On the Properties of Equally-Weighted Risk
Contributions Portfolios, ssrn.com 1271972, 2008.
254. B.G. Malkiel, The Ecient Market Hypothesis and Its Critics, Journal of economic per-
spectives, 59-82, 2003.
255. B.G. Malkiel and A. Saha, Hedge Funds: Risk and Return, Financial analyst journal,
61(6), 80-88, 2005.
256. L. Martellini and V. Milhau, Factor Investing: A Welfare-Improving New Investment
Paradigm or Yet Another Marketing Fad? EDHEC-Risk Institute Publication, July, 2015.
257. W. Marty, Portfolio Analytics. An Introduction to Return and Risk Measurement, Springer
Texts in Business and Economics (2nd edition), Springer Berlin, 2015.
258. J. F. May, World Population Policies: Their Origin, Evolution, and Impact, Canadian
Studies in Population 39, No. 1 - 2 (Spring/Summer 2012):125 - 34, Dordrecht: Springer,
2012.
259. McKinsey& Company, Looking Ahead in Turbulent Times - Strategic Imperatives for
Asset Managers Going Forward, SFI Asset Management Education, R. Matthias, 2015.
260. McKinsex&Company, State of the Industry 2014/15 - a Perspective on Global Asset Man-
agement, SFI Asset Management Education, R. Matthias, 2015.
552 CHAPTER 8. REFERENCES

261. A. Mehtaa, M. Bukov, C.-H. Wang, A.G.R. Daya, C. Richardson, C.K. Fisher and D.
J. Schwab, A high-bias, low-variance Introduction to Machine Learning for Physicists,
Physics Reports, March, 2019.
262. Melbourne Mercer Global Pension Index, Report, 2015.
263. The Memo, Looking for a UK business loan? Amazone might be the answer, 2015.
264. The Millennial Disruption Index, Viacom Media Networks, 2013.
265. MIT, Applied Macro- and International Economics II, Spring 2016, MIT OpenCourseWare,
2016.
266. E. Moritz, The Big Four - werden Amazon, Google, Apple und Facebook die besseren
Banken?, Finance News, 2016.
267. R. C. Merton, Lifetime Portfolio Selection under Uncertainty: the Continuous-Time Case,
The Review of Economics and Statistics 51 (3): 247 - 257, 1969.
268. R. C. Merton, Optimum consumption and portfolio rules in a continuous-time model,
Journal of Economic Theory 3 (4): 373 - 413, 1971.
269. R. C. Merton, An Intertemporal Capital Asset Pricing Model, Econometrica: Journal of
the Econometric Society, 867-887, 1973.
270. R. C. Merton, On the Pricing of Corporate Debt: The Risk Structure of Interest Rates,
Journal of Finance, 29:449-470, 1974.
271. A. Meucci, Black - Litterman Approach, Encyclopedia of Quantitative Finance, Wiley
Finance, 2010.
272. A. Meucci, Fully Flexible Views: Theory and Practice, ssrn.com library, 2010b.
273. The Millennial Disruption Index, Viacom Media Networks, 2013.
274. P. Milnes, The Top 50 Hedge Funds in the World, hedgethink.com, 2014.
275. T. J. Moskowitz, Y.H. Ooi, and L. H. Pedersen, Time series momentum, Journal of Fi-
nancial Economics 104.2, 228-250, 2012.
276. J. Müller, Steht uns die Libralisierung der globalen Währungsordnung bevor? Presentation
SFIRT, November, Zurich, 2019.
277. A. H. Munnell, M.S. Rutledge and A. Webb, Are Retirees Falling Short? Reconciling the
Conicting Evidence, Reconciling the Conicting Evidence (November 2014). CRR WP
16, 2014.
278. A.H. Munnell and M. Soto, State and Local Pensions are Dierent from Private Plans,
Center for Retirement Research at Boston College, Number 1, November, 2007.
279. S. Nakamoto, Bitcoin: A Peer-to-Peer Electronic Cash System, 2008.
280. NHGRI Genome Sequencing Program (GSP), www.genome.gov/sequencingcostsdata. 2017
281. S.V. Nieuwerburgh and R.S.J. Kojen, Financial Economics, Return Predictability, and
Market Eciency, University of Tilburg, Preprint, 2007.
282. R. Novy-Marx and J. D. Rauh, Policy Options for State Pension Systems and their Impact
on Plan Liabilities, Journal of Pension Economics and Finance 10.02: 173-194, 2011.
283. OECD Science, Technology and Industry Scoreboard: Innovation for Growth, Paris, 2013.
553

284. S. Pafka and I. Kondor, Estimated Correlation Matrices and Portfolio Optimization, Phys-
ica A, 343, 623-634, 2004.
285. S. Pal and T.K.L. Won, Energy, Entropy, and Arbitrage, arXiv preprint arXiv:1308.5376,
2013.
286. A. Patton, T. Ramadorai and M. Streateld. Change You Can Believe In? Hedge Fund
Data Revisions. Journal of Finance, 2013.
287. L. Pastor, R. F. Stambaugh and L. A. Taylor, Scale and Skill in Active Management,
Journal of Financial Economics, 2014
288. L. Pastor, and R. F. Stambaugh, Comparing Asset Pricing Models: An Investment Per-
spective, Journal of Financial Economics, 56, 335-381, 2000.
289. A. F. Perold and W. F. Sharpe, Dynamic Strategies for Asset Allocation, Financial Analyst
Journal, Jan, 16-27, 1988.
290. L. H. Pedersen, Sharpening the arithmetic of active management. Financial Analysts
Journal, 74(1), 21-36, 2018.
291. G. W. Peters, E. Panayi and A. Chapelle, Trends in Crypto-Currencies and Blockchain
Technologies: A Monetary Theory and Regulation Perspective, arXiv preprint, 2015.
292. S. Perrin and T. Roncalli, Machine Learning Optimization Algorithms and Portfolio Allo-
cation, preprint, ssrn.com, 2019.
293. E. Podkaminer, Risk Factors as Building Blocks for Portfolio Diversication: The Chem-
istry of Asset Allocation, Investment Risk and Performance, CFA Institute, 2013.
294. PriceWaterhouseCoupers, Asset Management 2020, A Brave New World, assetmanage-
ment, 2014
295. PriceWaterhouseCoupers, Asset & Wealth Management Revolution: Embracing Exponen-
tial Change, 2018.
296. E. Quian, A Mathematical and Empirical Analysis of Rebalancing Alpha, www.ssrn.com,
2014.
297. N. Rab and R. Warnung, Scaling Portfolio Volatility and Calculating Risk Contributions
in the Presence of Serial Cross-Correlations, arxiv.q-n.RM, preprint, 2011.
298. M. Rabin, Risk Aversion and Expected-Utility Theory: A Calibration Theorem, Econo-
metrica 68.5, 1281-1292, 2000.
299. T. Ramadorai, Capacity Constraints, Investor Information, and Hedge Fund Returns,
Journal of Financial Economics, 107(2), 401-416, 2013.
300. S. Ramaswamy, Market Structures and Systemic Risks of Exchange-Traded funds, BIS,
2011.
301. S.C. Rambaud, J.G. Perez, M.A. Granero and J.E. Segovia, Markowitz Model with Eu-
clidian Vector Spaces, European Journal of Operational Research, 196, 1245-1248, 2009.
302. R. Rebonato and A. Denev, Portfolio Management under Stress: A Baysian Net Approach
to Coherent Asset Allocation, Cambridge University Press, Cambridge, 2013.
303. L. M. Rotando and E.O. Thorp, The Kelly Criterion and the Stock Market, The American
Mathematical Monthly, December, 1992.
554 CHAPTER 8. REFERENCES

304. J. Rifkin, The Zero Marginal Cost Society: The Internet of Things, the Collaborative
Commons, and the Eclipse of Capitalism, Palgrave Macmillan Trade, 2014.
305. C. O. Roche, Understanding Modern Portfolio Construction, ssrn.com working paper,
2016.
306. P. Rohner, Seminar Asset Management, University of Zurich, 2014.
307. R. Roll, A Critique of the Asset Pricing Theory's Tests, Journal of Financial Economics
4: 129 - 176, 1977.
308. T. Roncalli, Introduction to Risk Parity and Budgeting, Chapman & Hall, Financial Math-
ematics Series, 2014.
309. T. Roncalli, How Machine Learning Can Improve Portfolio Allocation of Robo-Advisors,
swissQuant Conference, 2018.
310. S.A. Ross, The Arbitrage Theory of Capital asset Pricing, Journal of Economic Theory
13, 341 - 60, 1976.
311. S. Satchell and A. Scowcroft, A Demystication of the Black-Litterman Model: Managing
Quantitative and Traditional Portfolio Construction, Journal of Asset Management, Vol
1, 2, 138-150, 2000.
312. C.J. Savage, The foundation of statistics, Wiley, New York, 1954.
313. W. F. Sharpe, Capital asset prices: A theory of market equilibrium under conditions of
risk, Journal of Finance, 19 (3), 425-442, 1964.
314. B. Scherer, Portfolio Construction and Risk Budgeting, Third Edition, Risk Books, 2007.
315. SEC, Mutual Funds: A Guide for Investors, New York, 2008.
316. S. Schaefer, Factor Investing, Lecture at SFI Annual Meeting, 2015.
317. P. Schneider, Generalized Risk Premia, Journal of Financial Economics. forthcoming,
2015.
318. P. Schneider, C. Wagner and J. Zechner, Low Risk Anomalies, Preprint SFI, 2016.
319. C. Shimizu, H. Takatsuji, H. Ono, and K. Nishimura, Structural and temporal changes
in the housing market and hedonic housing price indices: A case of the previously owned
condominium market in the Tokyo metropolitan area. International Journal of Housing
Markets and Analysis, 3(4), 351-368, 2010.
320. J. Siegel, Stocks for the Long Run, McGraw-Hill, New York, NY, 1994.
321. S. Shalev-Shwartz, Introduction to machine Learning, Lecture Notes The Hebrew Univer-
sity of Jerusalem, 2016.
322. R. J. Shiller, The Uses of Volatility Measures in Assessing Market Eciency, Journal of
Finance 36: 291 - 304, 1981.
323. R. J. Shiller, From Ecient Markets Theory to Behavioral Finance, Journal of Economic,
Perspectives 17 (1): 83 - 104, 2003.
324. R. J. Shiller, Speculative Asset Prices, Cowles Foundation Paper No. 1424, 2014.
325. R. J. Shiller, Market Eciency and Role of Finance in Society, Key Note Lecture, EFA
2014, Lugano, 2014.
555

326. R. J. Shiller and A.N. Weiss, Home Equity Insurance, The Journal of Real Estate Finance
and Economics, 19(1): 21-47, 1999.
327. M. Silver, How to better measure hedonic residential property price indexes, IMF Working
Paper, 2018.
328. A. J. Smola and B. Schölkopf, A tutorial on support vector regression. Statistics and
computing, 14. Jg., Nr. 3, S. 199-222, 2004.
329. Y. Sompolinsky and A. Zohar, Secure High-Rate Transaction Processing in Bitcoin, In-
ternational Conference on Financial Cryptography and Data Security. Springer Berlin
Heidelberg, 2015.
330. State Street, The Folklore of Finance, Center of Applied Research. 2014.
331. G.V.G. Stevens, On the Inverse of the Covariance Matrix in Portfolio Analysis, The Journal
of Finance, Vol. 53(5), 1821-1827, 1998.
332. R. Sullivan, A. Timmermann, and H. White, Data-snooping, Technical Trading Rule Per-
formance , and the Bootstrap, The Journal of Finance 54 (5), 1647 - 1691, 1999.
333. M. Swan, Blockchain: Blueprint for a New Economy, O'Reilly Media, 2015.
334. Swissquant, Costumer Retention, Big Data Analytics, 2017.
335. J. Syz, M. Salvi and P. Vanini, Property Derivatives and Index-Linked Mortgages, Journal
of Real Estate Finance and Economics, Vol. 36, No. 1, 2008.
336. J. Syz and P. Vanini, Real Estate, Swiss Finance Institute Annual Meeting, 2008.
337. N. Sullivan, A (Relatively Easy To Understand) Primer on Elliptic Curve Cryptography,
Cloudfare blog, 2013.
338. N. Szabo, Formalizing and Securing Relationships on Public Networks, First Monday, 2(9),
1997.
339. P. Tasca, Economic Foundation of the Bitcoin Economy, University College London, Center
for Blockchain Technologies, Blockchain Workshop Zurich, 2016.
340. N. Taleb, The Black Swan. The Impact of the Highly Improbable. New York: Random
House, 2010.
341. J. Teiletche, Risk-Based Investing: Myths and Realities, CFA UK Masterclass, London
June 9th, 2015.
342. J. Teiletche, Active Risk-Based Investing, CQ Asia, Hong Kong, 2014.
343. M. Teo, The Liquidity Risk of Liquid Hedge Funds, Journal of Financial Economics, 100(1),
24-44, 2011.
344. J. Ter Horst and M. Verbeek, Fund Liquidation, Self-Selection, and Look-Ahead Bias in
the Hedge Fund Industry, Review of Finance, 11(4), 605-632, 2007.
345. J. Treynor and K. Mazuy, Can Mutual Funds Outguess the Market, Harvard business
review, 44(4), 131-136, 1966.
346. F. Trojani and P. Vanini, A Note on Robustness in Merton's Model of Intertemporal
Consumption and Portfolio Choice, Journal of Economic Dynamics and Control, Vol. 26,
No. 3, 423-435, 2002.
556 CHAPTER 8. REFERENCES

347. Tu and Zhou, Data-Generating Process Uncertainty, What Dierence Does it Make in
Portfolio Decisions?, Journal of Financial Economics, 72, 385-421, 2003.
348. S. Tilly and F. Triebel, Automobilindustrie 1945-2000, Stepanhie Tilly % Florian Triebel
(eds), Oldenburg Verlag München, 2013.
349. UBS, Strategy and Regulation. Impact of Regulation on Strategy and Execution, SFI
Conference on Managing International Asset Management, N. Karrer, 2015.
350. UBS, Distribution Strategies in Action, SFI Conference on Managing International Asset
Management, A. Benz, 2015.
351. Vershynin, R., How close is the sample covariance matrix to the actual covariance matrix?.
Journal of Theoretical Probability, 25(3), 655-686, 2012.
352. Viacom Media Networks, 2013.
353. L. Vignola and P. Vanini, Optimal Decision-Making with Time Diversication, Review of
Finance, 6.1, 1-30, 2002.
354. I. Walter, The Asset Management Industry Dynamics of Growth, Structure and Perfor-
mance , edited By Michael Pinedo and Ingo Walter, 2013.
355. J.H. White, Volatility Harvesting: Extracting Return from Randomness, arXiv, November,
2015.
356. World Economic Forum, The Future of Long-term Investing, New York, 2011.
357. World Economic Forum, Future of Financial Services, New York, 2015.
358. World Economic Forum, Beyond Fintech: A Pragmatic Assessment Of Disruptive Potential
In Financial Services, New York, 2017.
359. A. Yeniay and A. Göktas, A Comparison of Partial Least Square Regression with other
Prediction Methods, Journal of Mathematics and Statistics Volume 31, 99-111, 2002.
360. A. Zelltner and V.K. Chetty, Prediction and Decision Problems in Regression Models from
the Baysian Point of View, Journal of the American Statistical Association, 60, 608-616,
1965.
361. ZKB, Index Methoden, 2013.
362. H. Zou, The Adaptive LASSO and its Oracle Properties, Journal of the American Statis-
tical Association 101(476), 14181429, 2006.
363. G. Zyskind, N. Oz and A. Pentland, Enigma: Decentralized Computation Platform with
Guaranteed Privacy, arXiv preprint, 2015.
Index
Permissioned Protocol, 481 Benchmark Return, 179
Benchmarking, 69
Active Investment and Benchmarking, 299 Beta and Volatility Based Low Risk Anoma-
Active versus Passive lies, 282
Sharpe's Arithmetics , 67 Beta Pricing Model, 260
Altcoins, 494 Bias-Variance Trade-O, 431
Alternative Investments (AIs) Bitcoin Protocol, 480
Insurance-Linked Investments, 102
Bitcoin Security, 497
Arithmetical Relative Return (ARR), 179
Black-Litterman Model, 329
Asset Class
Black-Scholes Equation, 222
Denition, 13
Black-Scholes, Formula for Call, 218
Asset Management Industry
Black-Scholes, Interpretation, 219
Wealth 2020, 17
Black-Scholes, Interpretation No Arbitrage,
Asset Management Overview, 14
218
Asset Pricing
Brinson-Hood-Beebower (BHB) Eect, 179
Absolute Pricing, 261, 413
Broken Covered Interest Parity (CIP), 206
Fundamental Asset Pricing Equation,
Buy-and-hold, static, 164
263, 416
General Equilibrium, 414
Call Option, 195
Good and Bad Times, 264, 417
Capital Weighted Index Funds, 91
Low Volatility Strategies, 282
Capital-Guaranteed Producs (CP), 226
Multi Factor Models, 281
CAPM
Multi Period, 281
Appraisal (Information) Ratio, 376
What Happens if an Investment Strat-
Assumption, 372
egy is Known to Everyone? , 285
Beta Pricing Model, 371
Asset Pricing in Financial Markets, 193
CML and SML, 374
Average Investment Capital (AIC), 182
Conditional CAPM, 380

Back-to-Back Swap, 155 Empirical Failure, 378

Backtests Jensen's alpha, 376

Data Snooping, 389 Performance Measurement, 376

False Discovery Rate (FDR), 396 Proposition, 373

Multiple Testing, 395 Treynor Ratio, 376

Barrier Reverse Convertibles, 56 CAPM Cross Section, 378
Basis of Forwards, 200 CAPM Time Series, 378
Basis Risk, 200 Cash Flow (CF), 149

557
558 INDEX

Centralized and Decentralized Architecture, Trinomial Model, 210

476 Complete Market, 192
Cholesky Decomposition, 260 Diversication, 51
CIO Investment Process, 333 Asset Allocation Europe, 65
Complete Market, 185 Conservative, Balanced, Dynamic, Growth
Compounding, 151 Portfolios, 52
Conduct Risk, 47 Costs and Performance, 66
Cost and Risk Function, 430 Dierent Portfolio Constructions , 63
Covered Interest Parity (CIP), 204 Herndahl Index, 62
Cox-Ross-Rubinstein Model, 211 Needed Investment Amount, 55
Cross-Sectional vs Time Series Predictabil- Risk Scaling, Square-root Rule, 66
ity, 251 Shannon Entropy , 62
CRR Tasche Index, 61
Accuracy, 215 Two Statistical Propositions, 55
Discrete and Continuous Time, 215 Drifted Weights, 165
Equivalent Martingale Measure, 213 Dynamic Investment
Filtration, 213 Goal Based Investment (GBI), 311
Martingale Representation Theorem, 212 Merton Model, 310
No Arbitrage, 212
Possible State, 212 Eective Rate, 152

Pricing Call, 214 Ellsberg Paradoxon, 305

Self Financing Strategy, 212 Energy Term, 177

cryptocurencies, 489 Environment, Social, Governance (ESG) In-

Currency Overlay, 204, 208 vestment, 132

Estimation Risk, 341
Data Pre-Processing, 421 ETF
Demography and Pension Funds, 29 Construction, 95
Dietz Return, 182 Dierent Asset Classes, 99
Digital Signatures, 466, 473 Leveraged ETFs (LETFs), 100
Discount Factor, 150 Unfunded Swap-Based Approach, 98
Discount Function, 150 Exact Factor Model, 260
Discrete Logarithm, 469 Exact Factor Pricing Equation, 269
Discrete Model Excess Return, 254
Arrow-Debreu Securities, 194 Expectation Functionas, 266
Cox-Ross-Rubinstein (CRR), 211 Expectation Kernel, 266
Discount Factor, 193 Expected Loss, 429
First Fundamental Theorem of Finance Expense Ratios for Actively Managed Funds,
(FFTF), 193 Index Funds and ETFs, 102
Hedge Risk, 210
No Arbitrage, 192 Factor Investing, 381
Payo Matrix, 191, 420 Factor Investment
Risk Neutral Probabilities (RNP), 193 Industry Approach, 363
Second Fundamental ... , 194 Factor Model, 259
State Prices, 193 Factor Risk Premium, 260
INDEX 559

Factors, 259 Great Financial Crisis 2008 (GFIC), 59

False Discovery Methodology Greece and EU Uncertainty, 306
FDR, 119 Greeks and Black and Scholes, 222
False Discovery Rate FDR, 406 Greeks, Delta, 219
Fama-French Greeks, Rho, 221
3-Factor Model, 381 Greeks, Vega, 221
5-Factor Model, 384 Gross Return, 254
Feature Set, 426 Growth of Wealth, 9
Fee Models, 10 Growth Optimal Portfolios, 303
Fork, 484 Growth Rate of Wealth, 164
Forward Rate Agreements (FRA), 153 Growth Rates AuM, 16
Forwards and Futures, 198
Frobenius Norm, 352 Hansen-Jagannathan Bound, 255

Frontier Returns, 268 Hedge Funds

Fund Industry CTA Strategy, 111

Mutual Funds , 81 Denition, 108

Taxonomy of Mutual Funds , 82 Entries and Exits, 114

Mutual Funds and SICAVs, 79 Fees, 112

Overview, 77 Industry, 109

US Mutual Funds versus European UCITS, Investment Performance, 115

80 Strategies, 111

Active vs Passive Investments, 402 Withdrawing Restrictions, 113

Fees for Mutual Funds, 84 Hedging Approach, 187

Fundamental Law of Active Manage- Hedonic Index, 275

ment, 404 Herding of Pension Funds, 292

Skill and Luck in Mutual Fund Man- Heuristic Models, 305

agement, 406
i all assets, 299
Success of the Active Strategy, 402
Incomplete Market, 185
TER and Performance, 85
Independent Sample Error, 369
UCITS, 86
Index Construction, 89
FX Forward, 204
Index Funds and ETFs, 88
Game Theoretic Concept Blockchain, 484 Index Sampling, 299
Gamma , 220 Information Coecient (IC), 403
General Linear Model, 460 Information Ratio IR, 335, 403
Generalization Error, 430 Interest Rate Parity
Geometric Margin, 447 CIP, 205
Global AM Covered, 204
2014-2020, 74 Trilemma, 206
AM versus Trading, 75 UIP, 206
AM versus Wealth Management, 77 Uncovered, 204
Demand and Supply Side, 70 Interest Rate Swaps (IRS), 153
Eurozone, 70 Internal Rate of Return (IRR), 182
Global Figures 2007-2014, 72 Interval Error, 369
560 INDEX

Investment Consultants, 35 Support Vector Machines (SVM), 447

Symmetrization Trick, 438
Kelly Criterion, 303 Threshold Linear Classier, 445
Tree Based Learning, 449
Label Set, 426
Union Bounds, 441
Law of One Price, 186
Vapnik and Chervonenkis, 441
Ledoit-Wolf Shrinkage, 329
Vapnik and Chervonenkis Symmetriza-
Libra, 498
tion Lemma, 438
Log Utility, 172
Macro Economic Uncertainty, 306
Long Run Return and Risk, 51
Margins, 201
Longevity and Demographics, 9
Market Evolution
Option Trading Book, 222
Machine Learning, 424
Market Neutral, 357
Agnostic Learning, 436
Market Portfolio, 176
Approximation Error, 439
Market Price of Risk (MPR), 310
Batch Setting, 426
Market Weighted Indizes, 68
Bayes Classier, 429
Markowitz, 297
Bias-Variance Tradeo, 432
Comparing Other Models, 324
Classication Algorithm, 426, 427
Many Risky Assets, 313
Complexity, 427
Mutual Fund Theorem, 317
Consistency, 437
Principle, 297
Customer Retention: Text Mining, 456
Principle Component Analysis (PCA),
Dvoretzky - Kiefer - Wolfowitz inequal-
346
ity, 441
Risk-Free Asset, 318
Empirical Risk Minimization, 441
TAA and SAA, 323
Empirival Risk Minimization (ERM),
Tangency Portfolio, Capital Market Line,
435
319
Ensemble Models, 452
Martingale Property, One Period Model, 188
Error Function, 429
Mean-Variance with Benchmark, 299
Estimation Error, 439
Minimal Market Model, 184
Generalization, 437
Minimum Risk Value, 429
Geometric Margin, 447
Money-Weighted Rate of Return (MWR),
Hypothesis Class, 425
181
Inequality of Hoeding, 440
Moore-Penrose Pseudo Inverse, 192, 420
Linear Threshold Model, 445
Mutual Distributed Leader Technology, 475
Margin, 447
Naive Bayes Classier, 452 No Arbitrage, 191
No-Free-Lunch Theorem, 439 No Arbitrage Condition, 150
On-line Setting, 426 Normalized Portfolio, 163
Perceptron Rule, 446
Probably Approximately Correct (PAC), One Period Model
429 Risk Neutral Pricing, 188
Realizability, 427 Risk Neutral Probability, 188
Sentimental Risk Model, 455 One-Way Function, 466
INDEX 561

Optimal Investment Projections AuM 2020, 17

Introduction, 170 Proof-of-Work PoW, 480
Long-term (hedging demand), 170 Proof-of-Work PoW), 479
Rebalancing = Short Volatility, 168 Put Option, 195
Rebalancing and Leverage, 183 Put-Call Parity, 209
Rebalancing Facts, 170 PV, FV, 150
Short-term (myopic), 170
Volatility Drag, 166 Quadratic Programming (QP), 298

Ordinary Risk Premium, 260

Real Estate Equilibrium Valuation, 280

Par Swap Rate, 155 Real Estate Replication Portfolio Valuation,

Parameter Uncertainty, Estimation Risk, 342 280

Payo Pricing Functional, 266 Real Estate Risk, 274

Pension Fund Rebalanced, 164

DB versus DC, 25 Redundant Dimensionality Error, 369

Management, 27 Regularization, 431, 433

Pension Funds, 32 Regularization Techniques, 346

Dened Benet (DB), 22, 24 Regulation

Dened Contribution (DC), 22, 24 Anti-Tax-Evasion, 48

Longevity and Fertility, 23 CIO Investment Process, 43

SAA, TAA, 27 Client Segmentation, 41

TAA and SAA, 28 Conduct Risk, 47
Technical Interest Rates, 26 Fines in UK, 48
Three Pillar System, 22 Hedge Fund Disclosure, 49
Performance Attribution Tree, 180 Impact Swiss Banking Industry, 38
Permissionless Protocol, 481 Intermediation Channel Segmentation,
Platform-as-a-Service (PaaS), 39 40
Popularity of Markowitz Model, 325 Mandate Solutions, 45
Portfolio, 163 MiFID II, 39
Predictability Overview, 38
Denition, 243 Product Suitability, 42
Forecast Regression, 247 Relative Entropy Return Decomposition, 178
Fundamental Asset Pricing Equation, Relative or Derivative Asset Pricing, 184
254 Relative Pricing
Martingale, 244 Arbitrage Pricing Theory (APT), 272
Return Predictability, 247 Renaissance Medallion Fund, 253
Pricing Kernel, 266 Replication Portfolio, 185
Prime Finance, 235 Repo Transaction, 236
Principal Component Analysis, 346 Return and Leverage, 182
Private Investors and Institutional Investors, Ridge Regression, 433
18 Riesz Kernel, 261
Private Markets, 106 Riesz-Fischer Theorem, 261, 513
Prot and Loss P&L, 223 Risk Budgeting
Projection Theorem, 259 Budgeting Problem, 337
562 INDEX

Equal Risk / Risk Parity Contribution Test Set, 426

(ERC), 338 The Nodes, 478
Introduction, 335 Theta, 220
Risk Allocation, 336 Thikonov Regularization, 327
Risk Measurements, 336 Time Value of Money, 149
Risk Factors Time-Weighted Rate of Return (TWR), 180
Industry Evolution, 362 Tobin Separation, 374
Momentum, 358 Tracking Error, 68
Quality, 357 Training Set, 426
Risk Free Return, 254 Transaction Consensus, 477
Risk Preferences, 300 Transaction Execution, 477
Risk Weighted Index Funds, 94 Transaction Feasibility, 477
Rule 2/20 for Hedge Funds, 109 Transaction Legitimization, 477
True Risk, 429
Securities Lending Business SLB, 235 Two-Pass Regressions, 379
Self-nancing, 163
Self-Financing Strategy, 163 Uncertainty, 299

Sharpe Ratio (SR), 61 Uncovered Interest Parity, 205

Short-Term versus Long-Term Investment Uniformity of Minds, 145

Horizons, 286 Universality, 349

Small Sample Error, 369

Value Chain, Investment Process and Tech-
Sovereign wealth fund (SWFs), 21
nology, 128
Stability Markowitz, 314
Statistical Learning Model, 425
Vector Space E, 266
Vega, 223
Statistical Models, 297
Volatility Drag Equation, 166
Stochastic Discount Factor (SDF), 193, 254,
Volatility Harvesting, 166
261
Stochastic Portfolio Theory (SPT), 175
Warren Buet, 253
Structured Products, 223, 224
Wealth of Nations, 15
Symmetric Cryptography, 466
When Diversication Fails, 59
Synchronization in Blockchains, 478

Yap problem, 481

TAA Construction, 196
Yield-to-Maturity (YtM), 151
Technology
Cryptography Examples, 483
Dierent Currencies, 489
Big Data Denition, 421
Bitcoin, 490
Customer-centricity, 125
DAO hack, 488
Disruptive Eciency, 125
Etherum, 487
Hierarchical Risk Parity Portfolio Con-
struction, 462

View publication stats

LCCI LEVEL 1&2 Textbook
100% (6)
LCCI LEVEL 1&2 Textbook
100 pages
CFA Institute Industry Guides - The Asset Management Industry
No ratings yet
CFA Institute Industry Guides - The Asset Management Industry
224 pages
Master Budgeting: June July August September October Third Quarter
No ratings yet
Master Budgeting: June July August September October Third Quarter
10 pages
Best Practice in Infrastructure Asset Management
75% (4)
Best Practice in Infrastructure Asset Management
16 pages
Financial Markets and Investments
0% (1)
Financial Markets and Investments
500 pages
Velocity Manager Presentation October 2011
No ratings yet
Velocity Manager Presentation October 2011
36 pages
Bookassetmgtnew
No ratings yet
Bookassetmgtnew
545 pages
Bookassetmgtnew
No ratings yet
Bookassetmgtnew
512 pages
Investment-Ravi Shukla PDF
No ratings yet
Investment-Ravi Shukla PDF
210 pages
Bookassetmgtnew PDF
100% (1)
Bookassetmgtnew PDF
516 pages
Book Asset Management
No ratings yet
Book Asset Management
510 pages
Investment Anal 1
No ratings yet
Investment Anal 1
285 pages
Fundamentals of Risk Management - & Valuation For Funds
No ratings yet
Fundamentals of Risk Management - & Valuation For Funds
103 pages
Notes Cfa Summary
60% (5)
Notes Cfa Summary
336 pages
Merged PDF From ResearchGate
No ratings yet
Merged PDF From ResearchGate
29 pages
Investment Analysis by GarethMyles
No ratings yet
Investment Analysis by GarethMyles
331 pages
Finance Wikibook
100% (1)
Finance Wikibook
112 pages
NJ Fundz Nism Questions
No ratings yet
NJ Fundz Nism Questions
51 pages
Valuation and Volatility
No ratings yet
Valuation and Volatility
177 pages
Notes
No ratings yet
Notes
13 pages
Financial Investing
No ratings yet
Financial Investing
19 pages
Im English Unit 1
No ratings yet
Im English Unit 1
29 pages
Alternative Investments and Strategies
100% (2)
Alternative Investments and Strategies
414 pages
CH 1 - Investment Management
No ratings yet
CH 1 - Investment Management
18 pages
NISM Quick Rivision 2
No ratings yet
NISM Quick Rivision 2
49 pages
Asset Management - Tools and Strategies-Bloomsbury (2011)
No ratings yet
Asset Management - Tools and Strategies-Bloomsbury (2011)
256 pages
Module-1 Im
No ratings yet
Module-1 Im
22 pages
How Finance Works Notes
No ratings yet
How Finance Works Notes
63 pages
FINS3630: Bank Financial Management
No ratings yet
FINS3630: Bank Financial Management
46 pages
FM Task 2...
No ratings yet
FM Task 2...
39 pages
Session 1. Investments Management
No ratings yet
Session 1. Investments Management
53 pages
2016 Factbook
No ratings yet
2016 Factbook
317 pages
Finance Complete Theory Notes
No ratings yet
Finance Complete Theory Notes
58 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
Cfmip31171bgm Fa
No ratings yet
Cfmip31171bgm Fa
223 pages
Tong Hop - Investment Fin350
No ratings yet
Tong Hop - Investment Fin350
62 pages
Financial Planning For Individuals Unit 3
No ratings yet
Financial Planning For Individuals Unit 3
29 pages
Investment Managemet
No ratings yet
Investment Managemet
78 pages
IM Unit 1 Revised
No ratings yet
IM Unit 1 Revised
26 pages
Fina 411 - w21 - F - c01 - Invest Env
No ratings yet
Fina 411 - w21 - F - c01 - Invest Env
27 pages
Risk Management
No ratings yet
Risk Management
6 pages
Selected Finance Notes
No ratings yet
Selected Finance Notes
213 pages
Investment Analysis and Portfolio Management
No ratings yet
Investment Analysis and Portfolio Management
284 pages
Investment Analysis
No ratings yet
Investment Analysis
6 pages
Inv Management-EU 2010
100% (1)
Inv Management-EU 2010
121 pages
Inv Management2010
No ratings yet
Inv Management2010
121 pages
Fundamentals of Investment UNIT I
No ratings yet
Fundamentals of Investment UNIT I
6 pages
Computational Finance 2
100% (1)
Computational Finance 2
71 pages
Quick Revision
No ratings yet
Quick Revision
61 pages
IM Module 1
No ratings yet
IM Module 1
17 pages
Main-Stata For Finance
No ratings yet
Main-Stata For Finance
85 pages
Investment MGT
No ratings yet
Investment MGT
40 pages
Aditya Kadam Portfolio Management of Investments Newwwwww
No ratings yet
Aditya Kadam Portfolio Management of Investments Newwwwww
56 pages
Safari 6
No ratings yet
Safari 6
11 pages
JKNJKN
No ratings yet
JKNJKN
72 pages
Fundamentals of Inesting - Additional Articles
No ratings yet
Fundamentals of Inesting - Additional Articles
13 pages
UU-MBA-SEM-IV - Wealth Management Investment Environment - Copy-170-248
No ratings yet
UU-MBA-SEM-IV - Wealth Management Investment Environment - Copy-170-248
79 pages
Fim3701 - Lu3
No ratings yet
Fim3701 - Lu3
79 pages
Managerial Accounting - Pierre L. Titard
No ratings yet
Managerial Accounting - Pierre L. Titard
402 pages
Cluster University Jammu Fee Challan
No ratings yet
Cluster University Jammu Fee Challan
2 pages
Rdo 49
No ratings yet
Rdo 49
3 pages
Press Release - Dec 2019
No ratings yet
Press Release - Dec 2019
2 pages
Swot Abn Amro
No ratings yet
Swot Abn Amro
10 pages
MCIRFEBRUARY1311
No ratings yet
MCIRFEBRUARY1311
4 pages
Lesson-3 Ee
No ratings yet
Lesson-3 Ee
15 pages
Zse Investors Guide: A Guide To Investing On The Zimbabwe Stock Exchange
No ratings yet
Zse Investors Guide: A Guide To Investing On The Zimbabwe Stock Exchange
7 pages
Comprehensive 3year Accounting Project
No ratings yet
Comprehensive 3year Accounting Project
25 pages
The Influence of Philippine Business Environment On Financial Goals, Policy and Management
No ratings yet
The Influence of Philippine Business Environment On Financial Goals, Policy and Management
13 pages
CPA Ireland-Financial-Accounting-April-2018
No ratings yet
CPA Ireland-Financial-Accounting-April-2018
19 pages
Topic 2 Special Journals
No ratings yet
Topic 2 Special Journals
6 pages
AIG Travel Insurance Customer Care and Complaints Contact List
No ratings yet
AIG Travel Insurance Customer Care and Complaints Contact List
4 pages
Cheques Clearing
No ratings yet
Cheques Clearing
5 pages
Week 11 - Adjustments For Financial Statements
No ratings yet
Week 11 - Adjustments For Financial Statements
23 pages
Daily Transaction Report: Monday, January 04, 2016
No ratings yet
Daily Transaction Report: Monday, January 04, 2016
8 pages
Itns 285
No ratings yet
Itns 285
2 pages
Dhani Noc1705164808485
No ratings yet
Dhani Noc1705164808485
47 pages
Vehicle Loan
No ratings yet
Vehicle Loan
30 pages
Account Confirmation
No ratings yet
Account Confirmation
2 pages
Ae 18 Financial Markets
No ratings yet
Ae 18 Financial Markets
4 pages
Bonus Rates-FY 2017-18 - tcm47-67139
No ratings yet
Bonus Rates-FY 2017-18 - tcm47-67139
2 pages
Financial Management 2 - Chapter 1 and 2 Summary
No ratings yet
Financial Management 2 - Chapter 1 and 2 Summary
2 pages
Subrogation and Transfer of Ownership in Total Loss
No ratings yet
Subrogation and Transfer of Ownership in Total Loss
17 pages
Hammad Raza
No ratings yet
Hammad Raza
2 pages
Perhitungan Pelunasan An PT SATU AMERTA 74403582219
No ratings yet
Perhitungan Pelunasan An PT SATU AMERTA 74403582219
4 pages
Lecture 10 - Policy Values: Lecturer: Trần Minh Hoàng
No ratings yet
Lecture 10 - Policy Values: Lecturer: Trần Minh Hoàng
44 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Testabc 0 Bookassetmgt

Uploaded by

Testabc 0 Bookassetmgt

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Book · May 2018

Banking View project

The user has requested enhancement of the downloaded file.

July 12, 2020

2.7.1 Mutual Funds and SICAVs . . . . . . . . . . . . . . . . . . . . . . 79

3 Fundamentals Theory 149

3.1.8 Rebalancing = Short Volatility Strategy . . . . . . . . . . . . . . . 167

3.8.6 Fallacies in Long Term Investment . . . . . . . . . . . . . . . . . . 290

4 Portfolio Construction 293

5 Asset Management Innovation 411

5.2.1 Set-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

• Technological Disruptions: Platforms, data analysis and mutual distributed ledger

• The distribution of AM services will be redesigned. Economies of scale force global

Today's technology enables new approaches to investment. Such connections between

• Technology is irreversible while regulation is not. Regulators could revoke any

• Technology has still an overall positive connotation - it improves the circumstances

• Technology puts clients center stage. Regulation intends to do so.

Finally, in AM need-to-know and need-to-think both matter. It is important to know

Asset Management Overview

Asset management is a systematic process of analyzing, trading, lending and bor-

Four key questions in AM are:

2. How do we invest? The investment method question.

3. Where do we invest? The asset selection question.

• Which clients should be served?

2.1 Wealth of Nations and Assets under Management (AuM)

• Western Europe, northern America, Japan: 1.6% p.a.

Markets 31 Dec 1989 31 Mar 2016 trn USD

Share of Global Nominal Consumption

2.2.1 Private Investors

Figure 2.4: Balance sheet of private households in Switzerland (SNB [2018]).

2.2.2 Sovereign Wealth Funds (SWFs)

2.3 Pension Funds

concept of intergenerational risk sharing.

The drivers in pay-as-you-go systems

The above statements can be illustrated by the following back-on-the-envelope cal-

• Unemployment 10 percent, earn 0.4

• Demography: Working cohorts 10% smaller than retired ones (demography)

2.3.0.1 DB versus DC Planes

Another perspective on DC and DB is nancial literacy - the ability of decision makers

2.3.0.2 Demographic Changes and Longevity

To understand redistribution risk, we consider the technical interest rate. This is by

2.3.1 Management of Pension Funds

SAA : EP (Rt+1 ) , TAA : EP (Rt+1 |Ft )

Denition 2. (Sharpe (2007)) In a SAA, an investor's return objectives, risk tolerance,

Example Historical background TAA

2.3.2 Demography and Pension Funds

2.3.2.1 Demographic Facts

In developed, Western countries, persistent sub-replacement fertility levels, ageing,

• The intergenerational warfare is a myth. There is no support in the data used to

Impact on Retirement Systems

2.3.2.2 Pension Funds

• Raising mandatory social charges on employees to cover increasing pension obli-

Benefits Coverage Regulation

Adequacy Sustainability Integrity

40% 35% 25%

Melbourne Mercer Global Pension Index

Grade Index Value Countries Description

A >80 DK, NL Robust retirement system that

• Cutting retirement benets. Limiting the growth of pension expenditures to the

• Reforming the systems away from pay-as-you-go toward dened-contributions or

These changes impact asset management. The demographic problems in developed

2.3.2.3 Role of Asset Management

Whatever of the above measure is considered, it is evident that asset management

2.3.2.4 Investment Consultants

What drives investment consul-

Recommendations, and in particular changes in recommendations, have a strong im-

The underperformance of recommended products in the equally weighted case could

2.4 Who Decides?

• Changes of business model in large banks seem to be successful.

II has the following goals:

• Strengthening market integrity and competition through greater market trans-

• Harmonization and strengthening of regulation.

• Improving investor protection.

• Limiting the risks of market abuse in relation to derivatives on commodities, in

In the eurozone, suitability and appropriateness have to follow client segmentation

Another perspective on DC and DB is nancial literacy - the ability of decision makers

Denition 2. (Sharpe (2007)) In a SAA, an investor's return objectives, risk tolerance,

• Cutting retirement benets. Limiting the growth of pension expenditures to the

• Reforming the systems away from pay-as-you-go toward dened-contributions or

1. Specic service-/product-related restrictions

• Eective challenge: a sound risk culture promotes an environment of eective chal-

2.5 Risk, Return, Diversication and Reward-Risk Ratios

• discuss two proposition from statistics concerning diversication;

2.5.2 Diversication of Assets - Portfolios

Two questions regarding diversication arise:

• When does diversication make little sense?

Institutional investors also fail to diversify suciently. The University of Rochester's

Considering the second question, Warren Buet states: Diversication is protection

Diversication also reduces complexity of portfolio risk management. If a portfolio

Example Needed Investment Amount for Diversication

How much wealth is needed to achieve a diversication in 20 securities? Given the

2.5.3 Two Mathematical Facts About Diversication

Ri,t = αi + βi0 Ft + i,t (2.2)

where there are K factors F , R, F, are IID Gaussian

Rt ∼ N (α, C) , Ft ∼ N (0, I) , t ∼ N (0, D2 )

2.5.5 When Diversication Fails

Subsample correlations are expected to dier from full-sample estimates, a condi-

Tasche's diversication index

Denition 6. [Sharpe Rato] The Sharpe ratio SR is dened by

Herndahl's concentration index

Proposition 7. Let f be a continuously dierentiable function on a open subset of Rn .