SSRN 4709397
SSRN 4709397
Abstract
Roger Clarke, PhD, is the former President of Ensign Peak Advisors. Harindra de Silva, PhD,
CFA is a Portfolio Manager at Allspring Global Investments. Steven Thorley, PhD, CFA is an
Emeritus Professor of Finance at the Marriott School of Management.
Individual stock characteristics for investable equity market factors are based on
accounting or statistical measures assigned to each stock. For example, the Small Size
characteristic is typically measured as the log of market capitalization with a negative sign to
capture smallness instead of largeness. The log transformation provides a more normal
distribution of characteristics across stocks but does not address the linearity of stock returns to
the size characteristic. The stock measure for another popular equity market factor, Value, is now
often the inverse P/E ratio or earnings yield. As in the older Fama and French (1993) book-to-
market measure of Value, the motivation for inverting market-to-book is the mathematical
properties of a ratio. Specifically, the P/E ratio for some stocks is undefined due to zero or negative
earnings, and equal changes in high versus low P/E ratios are not economically equivalent. But
beyond simplicity in modeling the return generating process, there is no reason to assume a linear
relationship between either earnings yield or book-to-market and realized security returns.
A dichotomous distinction between positive and negative price momentum has received
attention, but little research has been published about non-linear relationships between security
returns and the Momentum characteristic after controlling for other common factors like Value
and Profitability. For any given factor, does a stock with a scored characteristic of say 2.0 have
twice the average active return as a stock with a score of 1.0? Does a stock with a scored
characteristic of -1.0 have a negative active return of the same magnitude? De Boer (2020), Zhang
(2022), Bollerslev, Patton, Quaedvlieg (2023), Kagkadis (2023) et. al., and Didisheim (2024) et.
al., provide background on recent findings of non-linear returns in investment management. This
study provides a systematic examination of non-linearities in the return-to-characteristic
relationship for the pure version of five well-known equity market factors.
Our research data set is the largest one-thousand U.S. stocks each month from 1964 to
2023, with a focus on the last twenty years, 2004 to 2023. Data on characteristics are gathered for
five popular equity market factors that have emerged over time: Value, Momentum, Small Size,
Low Beta, and Profitability. The start date and factor choices come from the availability of
comprehensive return and characteristic data in CRSP and Compustat for at least one thousand
individual stocks. We use trailing annual earnings yield for Value, the logged version of Carhart’s
The Value and Size factors have extensive research records in academic literature, as
introduced by Fama and French (1992), although we use the practitioner definition of earnings
yield for Value instead of book-to-market. The Momentum factor was introduced by Jegadeesh
(1990) and Jegadeesh and Titman (1993) with asymmetric properties discussed in Barroso and
Santa-Clara (2015) and the relationship to other factors explored by Ehsani and Linnainmaa
(2020). The Low Beta anomaly was implicit in early academic research on the CAPM and more
recently popularized by Frazzini and Pedersen (2014). We use Low Beta as a more precise and
portfolio-relevant characteristic of the “low volatility” factor. Novy-Marx (2013) introduced the
Profitability factor into academic literature. Quality as defined by many practitioners is shown by
Hsu, Kalesnik, and Kose (2019) to be primarily dependent on gross margin, so readers interested
in Quality can make inferences from the Profitability factor.
Basic equilibrium economics in Sharpe (1964) and Jensen, Black, and Scholes (1972),
initiated the belief that factors with positive market-relative performance must represent a reward
for taking on systematic risk. In contrast, most financial economists now believe that factors with
a significant positive long-term alpha are informational or behavioral anomalies that have not or
cannot be arbitraged away, for example Shliefer and Vishney (1997). Others worry that some of
the historical results are the outcome of data mining the factor “zoo” examined in Feng, Giglio and
Xu (2020) among others. We mitigate the problem of ex-post data mining by examining factors
that gained popularity prior to the turn of the century, and except for Value, by using the stock
characteristic definitions proposed by the original academic researchers. We focus on the most
recent twenty-year sample, 2004 to 2023, but also report on two earlier twenty-year periods that
show changes in the structure of the anomalies over time.
The five sets of scores in Equation 1, sk ,i , are standardized characteristics to earnings yield
(i.e., inverse trailing P/E), log price momentum, negative log market capitalization, negative
trailing 36-month market beta, and prior-year gross-margin. The characteristics are cross-
sectionally standardized each month to weighted mean-zero unit-variance variables as described
in the Technical Appendix. For notational convenience, Equation 1 does not have time subscripts,
t, because the exposure data for each cross-sectional regression is taken at one point in time,
specifically the beginning of each month. One insight from weighted regressions is that the
composition of the securities in the portfolio that generate the factor return can be inferred from
the estimated coefficients. As shown in the Technical Appendix, the estimated coefficients in
Equation 1 represent active (i.e., market differential) returns to optimally constructed single-factor
portfolios. The factor portfolios are fully invested and primarily composed of long security
positions, in contrast to the long/short portfolios used to examine factors in many academic
studies.1
The five pure single-factor portfolios are neutralized with respect to linear and non-linear
exposure to the other four characteristics as discussed in Clarke, de Silva, and Thorley (2017), in
contrast to Fama-French style factors which are only partially neutralized to the size characteristic.
In practice, multi-factor strategies based on the pure single factor portfolios contain little shorting
due to offsets for each security between factors as well as reduced active risk devoted to any single
factor. Thus, multi-factor portfolios are either long-only or can be constrained to be long-only
1
The factor portfolios specified by the regression coefficients in Equation 1 have a one standard
deviation exposure to the factor of interest. The portfolios are commonly known as 120/20 long-
short where the amount of shorted security capitalization varies from about 10 to 30 percent.
5
Table 1 reports the results of Equation 1 applied to the 120 months from January 2004 to
December 2023 with market returns in excess of the contemporaneous risk-free rate (one month
T-bill) in the first column. The monthly returns are annualized in Table 1 by multiplying the mean
return by 12 and the return standard deviation by the square root of 12. For example, the first
column shows that the average excess market return was 9.33 percent over 20 years, with a
standard deviation of 15.01 percent, giving a Sharpe Ratio (mean divided by standard deviation)
of 0.622. The Sharpe Ratio entries in the other five columns are for active returns in contrast to
the more common Sharpe Ratio definition of average excess return over excess return standard
deviation. The pure active returns in the next five columns have a one-standard deviation exposure
to the factor of interest due to standardized scoring of the security characteristics in Equation 1,
resulting in variation in risk between the factors. For example, the active return standard deviation
of the Value portfolio is 2.56 percent while the Momentum portfolio active return standard
deviation is almost twice as large at 4.75 percent.
2
Panel regressions that simultaneously include all 20*12*1000 = 240 thousand observations using
market capitalization weights have similar results for average active returns with larger t-statistics.
However, the estimated coefficients do not equate to identifiable portfolios and the panel
regressions do not provide a time-series active risk parameter.
The realized active market beta of the pure factor portfolios is reported in the fourth row
of Table 1. For example, the Value portfolio’s active beta of 0.01 is almost zero, meaning the
portfolio’s total realized market beta is 1.00 + 0.01 = 1.01, slightly greater than one. The other
factors also have total betas close to one, except for the Low Beta portfolio which is by design
tilted towards low beta stocks. The market beta of the Low Beta portfolio in Table 1 is 1.00 - 0.25
= 0.75, and the portfolio’s alpha, calculated by active return minus active beta times market return,
is 1.20 percent.3 The other alpha calculations in Table 1 are close to the mean active return for
each factor. For example, the Value alpha is -22 basis points, slightly under the -14 basis point
mean return due to a market beta that is slightly higher than one.
The active risks reported in the second to the last row of Table 1 use the realized market
betas, in other words calculated as the standard deviation of the alpha return. The Information
Ratio for each factor in the final row is alpha divided by active risk. The IR for the Profitability
portfolio is 0.592, about twice the magnitude of the Momentum and Low Beta portfolio IRs of
0.301 and 0.280, respectively. Like the Value portfolio, the alpha of the linear Small Size portfolio
has been slightly negative over the last twenty years, leading to a negative IR. Information Ratios
3
The security return model ri = i + i rM and portfolio active weights of wM i si give the
N
portfolio’s alpha as P = rP − P rM where P = wM i si i − 1 is the portfolio’s active beta.
i =1
7
K
IRP = IR j
2
(2)
j =1
of 0.729 because of the non-zero correlations between the realized returns reported in Table 2. For
example, Value has a realized return correlation of -0.271 to Profitability, even though both
portfolios come from the same linear regression.
The time-series correlations between returns are lower than those for non-pure factor
portfolios, produced by Equation 1 with just one score set on the right-hand side, but they are not
zero. For example, the return correlation (not shown in Table 2) between non-pure Value and non-
where si is the scored characteristic, and si2 is based on the squared characteristic. As described
in the Technical Appendix, the squared characteristic is analytically orthogonalized to the linear
score, making them cross-sectionally uncorrelated. The cubed characteristic, si3 , is jointly
orthogonalized to both the linear score and squared characteristics. The squared and cubed terms
in Equation 3 are rescored after orthogonalization to a weighted mean of zero and weighted
variance of one. The orthogonalization process adds precision to the examination of non-linear
return patterns using cubic regressions because the right-hand side variables would otherwise be
highly correlated leading to larger correlations in realized portfolio returns. Specifically, the slope
coefficients, 1 , 2 , and 3 in Equation 3 are equivalently estimated by three individual univariate
regressions. As with Equation 1, an intercept term if included is exactly zero, or equal to the
market portfolio return if total security returns are used on the left-hand side. As with the
regression using just linear characteristics in Equation 1, the coefficients from the 15-variable
regression represent portfolio returns with the security weights in each portfolio being a by-
product.
Table 3 reports on a regression like Equation 3 with 15 right-hand side variables, where
the five scores are included along with their orthogonalized squared and cubed terms. Table 3
does not include return means and standard deviations to save space but reports on the other rows
in Table 1. For example, the pure linear score coefficient for the Value portfolio has an average
10
The Low Beta factor in Table 3 has a very large weight on the cubed portfolio of -143
percent, offset by a 50 percent weight on the score portfolio, yielding a combined non-linear factor
portfolio IR of 0.457. The non-linear Profitability portfolio combines -88 percent of the squared
portfolio and -36 percent of the cubed portfolio, offset by 24 percent on linear score portfolio,
leading to a large combined non-linear Profitability portfolio IR of 0.812. In other words, the non-
linear Profitability portfolio has an extraordinary alpha of 2.45 percent over the last twenty years
at an active risk of just over three percent.
Statistical tests of non-linearity in factor returns can be based on t-statistics for either the
squared or cubed annualized alpha for each factor, computed by IR times the square root of 20
years. For example, the t-statistic for the cubed Momentum portfolio is 0.544×201/2 = 2.4, and the
t-statistic for the squared Profitability portfolio with a sign change is 0.723×201/2 = 3.2. But a
complete test of non-linearity is based on the improvement in non-linear versus linear IR for each
factor
t − statistic = ( IR 2
non
2
)
− IR lin N (4)
11
Table 4 reports time-series correlations between 15 portfolio retuns, i.e., the coefficients in
Equation 3 with 15 rather than 3 right-hand side variables. As in Table 2, correlations with
12
Five-Factor
2.5% Profitability
Small Size
Linear Low Beta
1.0%
Value
0.5%
0.0%
Linear Value
Linear Small Size
-0.5%
0.0% 1.0% 2.0% 3.0% 4.0% 5.0% 6.0% 7.0%
Active Risk
The results of the “market plus 15” Fama-Macbeth monthly regressions can be visualized
in several ways, but we start with the market relative performance of the optimal non-linear factor
portfolios. Figure 1 plots the performance of the five combined non-linear factor portfolios in
13
Small Size has an economically significant IR increase in Figure 1, going from -0.068 as
reported in Table 1, to a non-linear combined portfolio IR of 0.396 as reported in Table 3. The
Low Beta and Profitability non-linear factor portfolios have Information Ratios of 0.457 and 0.812,
respectively, both incrementally better than their linear counterparts, placing the non-linear
Profitability portfolio alpha at about 2.5 percent with an active risk of about 3.0 percent. Figure 1
also plots a multi-factor portfolio created by combining the non-linear factor portfolio returns
weighted by their respective Information Ratios. The five-factor portfolio optimally (i.e., using
return cross-correlations) combines the performance of the five individual non-linear factor
portfolios and has an extraordinary alpha of just under 2.5 percent at an active risk of about 2.0
percent, an Information Ratio of 1.191. In contrast, the optimal linear five-factor portfolio has an
alpha of just under 1.5 percent with the same level of active risk. Although impressive, the actual
Information Ratio of 1.191 for the optimal non-linear five-factor portfolio is slightly less than the
Sharpe’s rule approximation of 1.219 from Equation 2 because of the non-zero cross-correlations
between the 15 return columns reported in Table 4.
The visualizations in Figures 2 and 3 provide more perspective on the nature of the five
linear and non-linear pure factor portfolios. Figure 2 plots the security active weights as a ratio to
market portfolio weight for each security for the five linear factors based on the regression reported
in Table 1. The individual security dots are sized using their market capitalization in mid-year
2023 for perspective on the capitalization-weighted nature of the Fama-Macbeth cross-sectional
regressions. For example, the linear Small-size portfolio has security weights that decline linearly
with the Small Size score using the Information Ratio of -0.068 as reported in Table 1. The largest
two dots for on the size characteristic line for Apple and Alphabet and indicate slight market over-
14
100%
Active/Market Weight
50%
0%
-50%
-100%
-150%
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Scored Exposure
The vertical axis in Figure 2 is based on 2004 to 2023 fitted non-linear scores, scaled so
that dots above zero are over-weights compared to the market portfolio and dots below zero are
under-weights compared to the market portfolio. The adjustment of the vertical scale employs the
factor portfolio relationship
wP ,i = wM ,i (1 + si ) (5)
discussed in the Technical Appendix where wP,i and wP,i are weights of security i in the factor and
Specifically, the fitted score is converted into active portfolio weight, wP ,i − wM ,i , divided by
market weight, or wP ,i / wM ,i - 1. The 100 percent level on the vertical axis indicates an active
15
Figure 2 is designed to display a lot of information about the individual securities including
the cross-sectional distribution of standardized characteristics in mid-year 2023. For example, the
distribution of characteristics for the Profitability factor is close to normal, with few securities
outside the range of -2.0 to 2.0. In contrast, the cross-sectional distribution of the Value
characteristic has many securities outside the two standard deviation range, and the two largest
stocks Apple and Alphabet are just to the left of the middle of the distribution. All five of the
linear factor curves cross at the origin, where active weights are zero, because of capitalization
weighting of the individual regression observations in Equation 1.
150%
100%
Active/Market Weight
50%
0%
-50%
-100%
-150%
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Scored Exposure
16
20%
10%
0%
-10%
-20%
-30%
-40%
-50%
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
17
The next largest cumulative alphas in Figure 4 are for the cubed Momentum portfolio at
about 33 percent, called Moment 3, followed by Beta 1 and Moment 1 each ending at about 20
percent. Large cumulative alphas, positive or negative, are also seen at the bottom of Figure 4,
specifically the Moment 2, Small 3, and Beta 3 portfolios, all ending at about -20 percent. The
associated squared Momentum, cubic Small Size, and cubic Low Beta IRs are -0.306, -0.332 and
-0.346, respectively, as given in Table 3. The cumulative alphas and associated Information Ratios
do not completely convey the combined magnitude of any given non-linear factor portfolio
because of non-zero return correlations between the three parts. For example, the large negative
correlation between the Profit 1 and Profit 2 returns of -0.497 in Table 4 is visually evident in
Figure 4 with the large opposing moves of those two plots late in the calendar year 2008.
We can also use the monthly Fama-Macbeth regression from the 15-variable version of
Equation 3 to draw more accurate non-linear active return patterns than Figure 3. Each month
fitted active returns are generated for hypothetical stocks with scores that range from -2.0 to 2.0
along the factor of interest, with scores fixed at zero for the other four factors. The 240 fitted
return observations from 2004 to 2023 are then averaged over time, with the annualized time-series
return standard deviation measuring risk at each score, and after dividing by the square root of
240, providing a standard error for the average. Figure 5 plots the average active return from a
18
8%
6%
4%
Average Return
2%
0%
-2%
-4%
-6%
-8%
-10%
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
Pure Score
Figure 6 uses the same 240 fitted returns as Figure 5, but plots active return risk at each
score for the factor of interest, holding the other four scores constant at zero. Active risk is mostly
linear and symmetric on the negative and positive score side for the Value factor, but slightly
higher for positive compared to negative Low Beta scores, and higher on the lower end of the
Momentum, Profitability, and especially the Small Size spectrum. For example, the risks at scores
of 1.0 and -1.0 for Small Size are both about 5 percent, but the risk associated with Mega Cap
4
As in Figure 3, the non-linear fitted return curves from the weighted Fama-Macbeth regressions
do not cross at zero each month due to squared characteristic scores that are adjusted to have a
mean of zero. The fitted return at zero is subtracted along the entire score spectrum each month
so the average active return and risk are exactly zero at the origin in Figures 5 and 6.
19
20%
Active Risk
15%
10%
5%
0%
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
Pure Score
20
30%
20%
10%
0%
-10%
-20%
-30%
-40%
-50%
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
Figure 7 covers the middle 20-year period with 240 monthly returns from January 1984 to
December 2003. By far the best portfolio performance over these twenty years is for the earnings
yield or Value score, ending with a cumulative alpha of about 60 percent, which translates into an
extraordinary single-factor Information Ratio of about 60/(3×20) = 1.000. Much has been said
about the demise of the Value factor in recent decades, which historically was the best performing
of the five factors but fixing a date for the “death of Value” is subjective. Figure 7 does indicate
21
8%
6%
4%
2%
Average Return
0%
-2%
-4%
-6%
-8%
-10%
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
Pure Score
22
Figure 8 plots the average active return to pure score for each of the five factors over the
twenty years from 1984 to 2003, as in Figure 5 for the most recent twenty years. Value and
Momentum are the best performing factors but are both almost linear despite allowing for non-
linearity in the return-to-characteristic relationship. Small size has two twists, as in the most recent
twenty years, but the shape is essentially flipped vertically so that the curve dips down on the left-
hand side. The non-linear pattern for Profitability is also different than in the most recent twenty
years, with most of the value added coming from stocks with positive Profitability scores rather
than underweighting those with negative scores that had on average positive active returns. The
Low Beta factor plot for 1984 to 2003 is quite linear and flat, but Low Beta alpha was strong in
those years after accounting for the downward slopping Low Beta version of the SML for the
traditional CAPM.
30%
20%
10%
0%
-10%
-20%
-30%
-40%
-50%
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
23
8%
6%
4%
Average Return
2%
0%
-2%
-4%
-6%
-8%
-10%
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
Pure Score
Figure 10 plots the average active return to pure score for the earliest twenty years from
1964 to 1983, as in Figure 5 for the most recent twenty years. Interesting, the active factors were
in generally more linear back then, except for the Low Beta anomaly, which followed the predicted
downward slopping SML on the low beta side (i.e., large positive Low Beta scores) but not on the
24
where sk ,i is the scored characteristic of stock i for factor k. To first analyze linear relationships,
the k = 2 to 5 applications of Equation 6 only include the earnings yield score, s1,i . As in Equation
1, the estimated intercept term, 0 , in the univariate regressions is exactly zero because of market
capitalization weighting. Figure 11 plots the fitted scores from these four pair-wise regressions on
the vertical axis and earnings yield score on the horizontal axis. The lines are composed of
individual dots for each stock with many outside the -2.0 to 2.0 score range based on the
distribution of the earnings yield characteristic across stocks.
The largest linear relationship in Figure 11 is the negative correlation between earnings
yield and gross margin, indicating that higher accounting gross margins are currently found among
lower earnings yield stocks. On the other hand, earnings yield is positively correlated to the Low
Beta characteristic, indicating that higher yielding stocks currently tend to have lower market
betas. Figure 12 is like Figure 11 but with the right-hand side of Equation 6 including the squared
and cubed terms. The cubed and squared scores can be but are not orthogonalized in Equation 6
because in this section we are interested in overall fit, not the individual coefficients.
25
0.8
0.6
0.4
Fitted Exposure
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Exposure Score
1.0
0.8
0.6
0.4
Fitted Exposure
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Exposure Score
26
Figure 13 captures the current linear relationships of gross margin rather than earnings
yield to the other four characteristics. Note that the cross-sectional distribution of gross margin is
positively skewed with a long right tail, as seen by the individual dots on the right-hand side of
Figure 13, but no dots below -2.0. The earnings yield fit is negatively sloped, a reconfirmation of
the negative relationship between these two sets of exposures in Figure 11. The strongest linear
correspondence to gross margin is the -0.335 correlation to the Small Size characteristic, the fitted
line at the characteristic score of 1.0.
1.0
0.8
0.6
0.4
Fitted Exposure
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Exposure Score
27
1.0
0.8
0.6
0.4
Fitted Exposure
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
Exposure Score
28
0.4
Correlation Coefficient
0.2
0.0
-0.2
-0.4
-0.6
1963
1965
1967
1969
1971
1973
1975
1977
1979
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
2009
2011
2013
2015
2017
2019
2021
2023
Gross margin across stocks typically has a negative correlation to the Small Size
characteristic as shown in Figure 15, meaning that larger stocks tend to have lower financial
accounting gross margins. But the current correlation of almost -0.4 is by far the largest ever in
this entire 60-year history. In other words, Profitability exposure, which is often understood to be
almost the opposite of Value exposure across stocks, is currently more negatively correlated with
Small Size. The correlation of the gross margin characteristic, or any other characteristic, to price
momentum across stocks is erratic because the momentum characteristic definitionally changes
each year for each stock. Figure 15 shows that the correlation of gross margin with the Low Beta
29
Other charts like Figure 15 could be shown for the linear relationship between, for example,
Small Size and the other four factors, as well as charts for the various non-linear relationships. But
they all lead to the same empirical conclusion that the pair-wise relationships between factor
exposures are both non-linear and dynamic over time. Statements in equity factor research about
one set of characteristics being positively or negatively related to another set should be thought of
as specific to a point in time, not permanent characterizations of market structure. For exmaple,
the more complex non-linear relationships shown in Figure 13 for gross margin also change
dynamically over time, as they do in charts like Figure 13 for the other four characteristics.
where S is an N-by-K matrix of pure scores, WM is a vector of market weights, and B is an N-by-
30
For all three types of scores, non-pure, linear pure, and fully pure, the weights for factor
portfolio construction are simply,
wPi = wM i si (8)
where wM i are market weights and wPi are the active (i.e., market differntial) weights on the i =1
to N securities in the portfolio. Conceptually, the scores in Equation 8 convert market portfolio
weights into active factor portfolio weights that sum to zero rather than one. As discussed in the
Technical Appendix, single-factor portfolios constructed by Equation 8 are mean-variance optimal
under the assumption of a “market plus one active factor” return generating process and
homogeneous idiosyncratic risk. The portfolio returns are calculated each month by the
summation,
N
rP = wM i si ri (9)
i =1
The portfolio returns in Equation 9 are active, meaning differential to the market return,
and used to measure factor performance with an additional adjustment for non-unitary market betas
as described in Footnote 3. The factor portfolios constructed by Equation 8 are not long-only
because securities with exposure scores less than -1 are shorted. The construction of strictly long-
only portfolios removes the lower end of exposures, obfuscating an examination of non-linearity
in the return-to-exposure relationship. The portfolios are commonly called 120/20 long/short,
31
The first column of Table 5 reports the total alpha for the fully pure Value factor portfolio
in each sub-period, followed by characteristic categories that contain securities with exposure
scores delimited by minus one, zero, and positive one. Specifically, the four category returns are
conditional summations that separate the total portfolio active return in Equation 8 into
rlarge under-weight =
si −1.0
wM i si ri (10)
and three other categories where the conditions under the summation are the -1.0 to 0.0 score range
(under-weight category), the 0.0 to 1.0 score range (over-weight category), and the greater than
1.0 (large over-weight category) score range, respectively. The categories are themselves
portfolios with securities weighted by score times market capitalization but have not been adjusted
to be fully invested portfolios. This ensures that the total portfolio alpha is the simple sum of four
category portfolio alphas, a property which would not hold if each portfolio were divided by the
sum of security capitalization weights in that category each month.
We give the four categories factor-specific names borrowed from current investment
management jargon associated with the range of exposures, for example Glamor (scores less than
a minus one), Expensive (minus one to zero), Shallow Value (zero to one), and Deep Value (scores
greater than one) for the Value categories. The small -31 basis point Value portfolio alpha in Table
5 for the most recent 2004 to 2023 period is decomposed into similarly small category alphas of
14, -5, 5, and -45 basis points, indicating relative linearity across the Value exposure spectrum.
As previously documented using the Fama-Macbeth cross-sectional regressions in the first section,
nothing along the Value spectrum has significantly outperformed or underperformed the market
portfolio in the last 20 years. While one could create more categories (i.e., quintile or decile
portfolios) with equal capitalization in each, zero is the natural boundary between over- and under-
weighting securities compared to the market portfolio. Similarly, minus one is the boundary
between securities that are not only underweighted but shorted in the full-spectrum single-factor
portfolio.
32
Table 5 also reports active risks and Information Ratios for each category. Note that even
with return-to-exposure linearity, the alpha and active risk numbers are larger in magnitude in the
tail categories, for example Glamour and Deep Value in Table 5, because of greater factor
exposure. The average exposures would be -1.53, -0.46, 0.46, and 1.53, respectively, under a
perfectly normal cross-sectional distribution of characteristics.5 Thus, a general rule of thumb is
a 3-to-1 (i.e., 1.53 compared to 0.46) magnitude in tail versus middle category alphas and active
risks because they have three times more exposure to the factor. For example, the Glamour
category active risk of 1.55 percent is about three times larger than the Expensive category active
risk of 0.53 percent. On the other hand, Information Ratios calculated by alpha divided by active
risk do have directly comparable magnitudes across the categories.
5
The weighted cross-sectional characteristic distributions vary by factor and month, but have
approximately normal distributions, with total capitalization of about 15 percent in the tail
categories and 35 percent in the middle categories. The pure earnings yield and pure momentum
characteristics tend to be thin tailed compared to normal with about 10 percent and 40 percent in
the tail versus middle categories.
33
Table 6 next reports on the pure Small Size factor, with the now popular Mega Cap moniker
associated with stocks that have exposure scores below negative one. The other three categories
contain stocks with larger, medium, and smaller capitalization within the largest 1000 stocks in
the U.S. market. What are commonly called small-cap stocks, for example the Russell 2000, are
not included in this study, so smallness is relative within our data set that approximates the Russell
1000. In the first twenty-year period (1964 to 1983) the Small Size factor under this definition
had good performance with an IR of 0.417. However, in the middle twenty years, the IR dropped
to 0.118 due to negative IRs in the middle categories of -0.256 and -0.201. Then in the most recent
20 years (2004 to 2023) shorting the Mega Cap category became quite problematic with an IR of
-0.133, while under-weighting Larger Cap stocks became profitable with an IR of 0.289. The
category results confirm the large cubed term IR of -0.332 in Table 3 and the complex shape of
the non-linear Small Size factor in Figure 3. The Small Size categories show a dramatic shift in
the pattern of non-linearity over the last twenty years compared to 1984 to 2003, with the Mega
Cap IR dropping from 0.249 to -0.133, while the Larger Cap IR rose from -0.256 to 0.289.
34
Table 6 next reports on the Low Beta factor with the Top Beta category for standardized
exposures less than -1.0, then High Beta, Low Beta, and Bottom Beta for exposures greater than
1.0. The range of realized total market betas goes from about 1.2 on the low end of the Low Beta
exposure range to about 0.8 on the high end of the exposure spectrum. The 36-month historical
beta characteristic range across securities is a bit wider, from about 0.6 to 1.4. In other words,
security betas exhibit the well-documented shrinkage towards one in realized versus predicted
values, popularly known as the Scholes and Williams (1977) correction. The strong IR
35
Table 6 indicates that unlike Value, the Profitability factor has maintained strong
performance over time, with the total factor IR increasing from 0.520 in the first twenty years, to
0.595, and then 0.661 in the most recent 2004 to 2023 period. Except for the early 1964 to 1983
period, the most consistent observation of non-linearity for the Profitability factor is that IR
performance is concentrated in the tail categories, for example 0.753 and 0.466 in the most recent
twenty years, compared to 0.164 and 0.027 for the middle categories. This pattern of non-
nonlinearity is consistent with the large negative IR of -0.723 for the squared Profitability
characteristic at the bottom of Table 3, and the downward parabolic shape for the non-linear
Profitability factor in Figure 3.
We next explore the impact of linear and non-linear factor portfolio purification using non-
pure, linear pure, and fully pure scores for each factor, where Table 6 was for fully pure scores.
Table 7 reports the total and category Information Ratios in the same format as Table 6, but only
for the last twenty years to save space. The first section of Table 7 reports on portfolios constructed
using simple scored characteristics with lower Information Ratios as reported in Clarke, de Silva,
and Thorley (2017). For example, the total Profitability factor IR in the 2004 to 2023 period drops
from 0.661 in Table 6 to 0.595 in Table 7. Almost all the total factor IRs are lower in the first
section of Table 6 because the non-neutralized factor portfolios are contaminated with exposures
to the other factors, increasing the active risk without an increase in alpha. In some cases, the IRs
are also lower because of declines in alpha due to non-neutralized exposure and thus larger realized
return correlation to other factors.
36
The multi-factor IR at the end of each total portfolio column in Table 7 is calculated by
Sharpe’s IR rule as given in Equation 2 and measures the combined impact of all five portfolios.
The Information Ratio for the optimal combined five-factor portfolio is only approximate given
some correlation between the realized factor returns. The multi-factor IR increases by 0.777 –
0.724 = 0.053 through neutralizing linear characteristic relationships between factors, with an
additional 0.851 – 0.777 = 0.074 increase by also neutralizing non-linear relationships. Table 7
indicates that most of the increase in the Momentum and Low Beta factor IRs comes from linear
purification, while most of the increase in the Profitability factor IR comes from the non-linear
purification.
37
Four of the five now popular linear factors were explored by academic and practitioner
research after observing realized returns from the 1960 to the 1990s. Indeed, the only factor
characteristic motivated by equilibrium economic theory rather than being discovered ex-post was
market beta, and the classic CAPM prediction turned out to have little empirical backing.
Specifically higher beta stocks were not found to have higher average returns, but rather about the
same return as lower betas stocks, a phenomenon now called the Low Beta anomaly. Similarly,
this study shows that most factors have non-linear returns across characteristics by examining 60
years of return data after the fact. The study does not directly address the issue of out-of-sample
return forecasting, although there is promise based on the strength and persistence of the non-linear
patterns. We have looked at monthly portfolio return forecasting using prior 10-year non-linear
38
We also do not address performance attribution for multi-factor portfolio strategies in this
paper. Specifically, one could envision a process where the returns for a given month are explained
not just by the performance of linear Value, Profitability, and Size portfolios, but by returns to
category portfolios like Deep Value, Highly Profitable and Mega Cap. Indeed, market observers
frequently summarize daily or monthly market results with references to Mega Caps, which could
be modeled with a mega-cap dummy variable. For factors with material non-linear returns, more
of the realized performance of active strategies is naturally explained by four category returns
rather than one portfolio return. Performance attribution could also be enhanced by the inclusion
of squared and cubed characteristics, as in the monthly Fama-Macbeth cross-sectional regressions
in the first section. While non-linear performance attribution is outside the scope of this paper, we
briefly comment on the traditional academic objective of explaining the cross-section of realized
security returns using R-squared statistics.
Figure 16 plots the rolling 36-month R-squared for all 60 years in this study from Equation
1 with 5 right-hand side variables, as well as R-squared for the market plus 15 version of Equation
3.6 The linear and cubic R-squared plots are also matched by a plot of R-squared from a LOESS
regression on Equation 1, a methodology that allows for non-parametric forms for non-linear
relationships between security returns and characteristics.7
6
The direct formula for R-squared in observationally weighed cross-sectional regressions is
N N
1 − wi ei2 / wi ri 2 where ei are the residual returns and wi are the weights. Adjusted R-squared
i =1 i =1
is often employed in multivariate regressions to account for the number of independent variables.
As a practical matter the change in degrees of freedom using 15 rather than 5 regressors has little
impact because of the large sample size, N = 1000, an Effective N due to observational weighting
of about 100 to 200 a month.
7
LOESS (locally estimated scatterplot smoothing) allows for a non-polynomial shape in the
regression curves that replace the ck in Equation 1. LOESS is computationally difficult to
39
30%
25%
R-squared
20%
15%
10%
5%
0%
1963
1965
1967
1969
1971
1973
1975
1977
1979
1981
1983
1985
1987
1989
1991
1993
1995
1997
1999
2001
2003
2005
2007
2009
2011
2013
2015
2017
2019
2021
2023
Much has been published in the academic literature about increases in explanatory power
by adding additional factor characteristics, going up to 30 or 40 percent in some studies. The
results in Figure 16 suggest that allowing for non-linearity in these five popular factors has a
substantially greater impact than adding one or even several less commonly used factors.
implement in our weighted-observation large-sample context and less known in financial markets
research.
40
While more complicated than exploiting the linear version of well-known market
anomalies, methodologies for incorporating non-linear analysis in portfolio construction are
provided in the body and Technical Appendix of this study. Other researchers have used squared
and cubed characteristics to explore non-linearity in returns, but our more precise implications
come from the methodological innovation of employing squared and cubed scores that are
orthogonalized with respect to the linear stock characteristic. Non-linear portfolio strategies are
also informed by the factor characteristic categories used later in the paper, motivating dummy
variable or piece-wise linear regression analysis and portfolio construction. The simplest action
for quantitative analysts may be to specifically add cubed log market capitalization and squared
gross margin to their list of otherwise linear stock characteristics.
The motivating research question in this paper was if stocks with a 2.0 scored characteristic
have on average twice the realized active return as stocks with a 1.0 scored characteristic, and if
stocks with a scored exposure of -1.0 have a negative average active return of the same magnitude
41
Separate from the basic finding of non-linear return-characteristic patterns for four popular
factors, this study shows that non-linear as well as linear purification of factor exposures is needed
because of highly significant non-linear pair-wise characteristic-to-characteristic patterns that
change gradually over time. Table 7, which focuses on the last twenty years, shows that the
incremental increase in multi-factor portfolio performance from purging non-linear relationships
is larger than the increase from purging simple linear correlation coefficient patterns. The increase
from non-linear purification is much more significant than linear purification for the Profitability
factor.
Our empirical study of non-linear returns illustrates the need for benchmark-anchoring in
cross-sectional stock statistics like exposures, standard deviations, and correlations, as in the well-
known weighted return calculation for a portfolio. Benchmark anchoring motivates the use of
capitalization weighted rather than equally weighted monthly Fama-Macbeth regressions that
would otherwise produce coefficients which are not optimal portfolios with returns that are active
with respect to an equally weighted market-wide portfolio, not the market benchmark. Equally
weighted regression results are driven by the 80 percent of stock observations that only represent
about 20 percent of market capitalization.
We mention but do not fully explore the issues of non-linear return forecasting and
performance attribution. Commercial risk modelers have already introduced squared and cubed
characteristics for some factors, but non-linear performance attribution has only been heuristically
42
With some hesitation about further populating the “factor zoo” we note that the only factors
identified and emphasized by academic researchers in the 1990s were those with a long track
record of linear performance. A vertically downward or upward parabolic average return stock
characteristic, for example, with strong non-linear long-term performance, would not have been
identified given the prevailing use of linear research methodologies. An important extension of
our research on U.S. stock returns will be return forecasting methods based on persistent non-
linear patterns within factor portfolio spectrums. Other questions require the extension of non-
linear factor return analysis outlined in this paper to international equity markets. The CRSP and
Compustat databases only include about half the global public equity market. We anticipate that
other researchers will apply our methodologies including weighted Fama-Macbeth regressions
with orthogonalized squared and cubed characteristics to European and Asian markets. Is a non-
linear pure Value factor still “alive” in non-U.S. markets? Do the high performing pure
Profitability and Low Beta factors have the same non-linear properties in global markets?
43
Benchmark Anchoring
This paper uses the “benchmark anchoring” concept that cross-sectional references to the
market or other portfolios should be weighted by the size of the constituent securities within the
portfolio. Most investors understand that the portfolio return is a capitalization-weighted average
of the security returns but may not employ weighted statistics in other contexts. Another example
of the need for weighted statistics in portfolio theory is that the equally weighted average market
beta (which includes second moment and correlation statistics) across stocks is only approximately
one. The capitalization-weighted beta across the stocks that comprise the market portfolio is
exactly one.
wTPi = wM i (1 + si ) (A1)
where wM i is the market weight of the security. The scores are benchmark-anchored with a
44
Linear in Exposure
Our main empirical research question is where along the spectrum of exposures the average
active return to well-known factors originates. While we and others have examined this question
for the low beta and price momentum anomalies, little has been done and published with respect
to the linearity of the other factors using capitalization-weighting and pure factor methodologies.
The term “non-linear” can have a lot of meanings, but here we specifically ask whether a change
in score from say -1.0 to 0.0, has the same impact on average active return as a change in score
from say 0.0 to +1.0.
We are not examining the “long side” and “short side” of the factor exposure spectrum.
The methodology is primarily long-only since deviations in security weights are relative to
benchmark weights. The basic weight formula in Equation A1 over-weights and under-weights
securities compared to the market benchmark. Using Equation A1, the active (difference to
benchmark) security weight, wPi , is the simple product,
wPi = wM i si . (A3)
where f ( si ) is some non-linear function of si with the property that w Mi f ( si ) = 0 . Few total
security weights in combined multi-factor linear or non-linear portfolios are negative at reasonable
levels of active risk (e.g., 3 percent) suggesting that short selling is rare in actual application.
The realized active (i.e., market differential) factor portfolio return is the sum-product,
N
rP = wM i si ri . (A5)
i =1
where the ri are security active returns. We decompose the active return for a given portfolio in
45
and three similar equations where the condition is for scores in the -1 to 0, 0 to 1, and greater than
1 range. The ranges are generically labeled the large under-weight, under-weight, over-weight,
and large over-weight categories.
Note that the scores in the large underweight category shown in Equation A6 are less than
-1.0, so they represent the short sells. Thus, the total factor portfolio in Equation A5 is often called
a 120/20 long-short portfolio, where the amount of shorting varies from almost zero to as high as
40 percent. If the security active returns, ri , in the large underweight category are on average
negative, then the category return in Equation A6 will be positive, generating a positive
contribution to the total portfolio active return.
where rM is the market portfolio return, i is the security-specific “alpha” component of returns,
and i is the residual return. We qualify the term alpha because return models typically have a
security-specific market-beta multiplier in front of the market return. In expectation, the residuals
are uncorrelated with each other and with the market return, distributed with zero mean and a
security-specific (i.e., heterogenous) variance of i2 . Internal consistency for the forecasted values
for the alpha component dictates that they also have a market-weighted sum of zero,
N
w
i =1
Mi i = 0 . (A8)
Technically speaking, Equation A8 is an ex-ante condition on the rationality of the security return
forecasts, separate from the ex-post concept that the realized value of the residuals has a cross-
sectional weighted sum of zero.
Under the model for security returns in Equation A7, the Markowitz (1952) mean-variance
optimal active (i.e., market differential) weight on each security is,
46
i =1 i2
where the non-subscripted parameter A is the targeted level of active portfolio risk. Equation A9
wM i si A
wP i = (A10)
i N
w
i =1
2
Mi si2
and the expected portfolio return (i.e., sum product of expected security returns and weights) is
P = IC BR A . (A11)
Equation A11 is known as the fundamental law of active portfolio management, where point-in-
time breadth (BR) is defined by one over the radical in Equation A10. Under the assumption of
homogeneous idiosyncratic risk, the optimal security weights in Equation A10 are the product of
market weights times scores, multiplied by a constant,
BR A
wP i = wM i si . (A12)
The conditions for the mean-variance optimality of the factor portfolio defined by Equation A3
are 1) a market-plus-one-factor security return generating process, 2) homogeneous idiosyncratic
risk, and 3) an unspecified level of active risk.
47
We first derive a transformation of squared score that is orthogonal to linear score and has weighted
zero mean and unit variance. With some algebra it can be shown that
si2 − Skew si − 1
s 2i = (A13)
( )
1/2
Kurt − Skew2 − 1
has these properties. For example, for a perfectly normal distribution with Skew = 0 and Kurt = 3,
Equation A13 simplifies to
s 2i =
1
2
(s 2
i )
−1 . (A14)
Equation A13 has a zero mean because the first term in the numerator goes to 1 when weighted,
the second term goes to zero when weighted, and the third term is -1. Similarly, the denominator
is constructed to ensure unit variance of s2i.
The cubed score transformation that is orthogonal to just the linear score is a bit more
complicated to solve algebraically but can be shown to be
The cubed score transformation that is jointly orthogonal to both the linear score and the squared
transformation s2i involves substantial algebra, but can be solved as
48
For example, for a normal distribution with Skew = 0, Kurt = 3, Hyper = 0, and Tail = 15, both
Equations A15 and A16 both simplify to
s3i =
1
6
(s 3
i )
− 3 si . (A18)
In other words, deviations from a normal cross-sectional distribution of score create much of the
complexity of the orthogonal squared and cubed transformations in Equations A13 and A16
compared to Equations A14 and A18.
Taking the Value central moments in Table A1 as an example, the weighted cross-sectional
correlation between the score and squared score is 0.343. The correlation between the score and
cubed score is much higher at 0.732, and the correlation between squared and cubed scores is
0.480. Alternatively, the three weighted correlations using the squared and cubic transformations
in Equations 1 and 4 are exactly zero. Thus, the slope coefficients in a weighted multivariate
49
50
Ang, Andrew, Robert Hodrick, Yuhang Xing, and Xiaoyan Zhang. 2006. “The Cross-Section of
Volatility and Expected Returns.” Journal of Finance 61 (1): 259–99.
Arnott, Robert, Vitali Kalesnik, and Juhani Linnainmaa. 2023. “Factor Momentum.” The Review
of Financial Studies, 36 (8); 3034–3070
Asness, Clifford, Andrea Frazzini, Ronen Israel, Tobias Markowitz, and Lasse Pedersen. 2018.
“Size Matters, If You Control Your Junk.” Journal of Financial Economics 129 (3): 479-
509.
De Boer, Sanne. 2020. “Nonlinear Factor Attribution.” Journal of Investment Consulting 20 (1);
21-29.
Barroso, Pedro, and Pedro Santa-Clara. 2015. “Momentum Has Its Moments.” Journal of
Financial Economics 116 (1): 111–20.
Bollerslev, Tim, Andrew Patton, and Rogier Quaedvlieg. 2023. “Granular Betas and Risk Premium
Functions”, NBER and European Central Bank working paper.
Carhart, Mark. 1997. “On Persistence in Mutual Fund Performance.” Journal of Finance 52 (1):
57-82.
Cederburg, Scott, Michael O’Doherty, Feifei Wang, and Xuemin Yan. 2019. “On the Performance
of Volatility Managed Portfolios.” Journal of Financial Economics, forthcoming.
Clarke, Roger, Harindra de Silva, and Steven Thorley. 2017. “Pure Factor Portfolios and
Multivariate Regression Analysis.” Journal of Portfolio Management 43 (3): 16-31.
Clarke, Roger, Harindra de Silva, and Steven Thorley. 2020. “Risk Management and Optimal
Combination of Equity Market Factors.” Financial Analysts Journal 76 (3): 57-79.
Didisheim, Antoine, Shikun Ke, Bryan Kelly, and Semyon Malamud. 2024. “Complexity in
Factor Pricing Models” Swiss Finance Institute Research paper No. 23-19.
Ehsani, Sina, and Juhani Linnainmaa. 2020. “Factor Momentum and the Momentum Factor.”
Journal of Finance forthcoming.
Fama, Eugene, and Kenneth French. 1992. “The Cross-Section of Expected Stock Returns.”
Journal of Finance 47 (2): 427–65.
Fama, Eugene, and Kenneth French. 1993. ‘Common Risk Factors in the Returns on Stocks and
Bonds.” Journal of Financial Economics 33 (1): 3-56
Fama, Eugene, and James MacBeth. 1973. “Risk, Return, and Equilibrium: Empirical Tests.”
Journal of Political Economy 81 (3): 607-636.
51
Frazzini, Andrea, and Lasse Pedersen. 2014. “Betting Against Beta.” Journal of Financial
Economics 111 (1): 1-25.
Grinold, Richard. 1989. “The Fundamental Law of Active Management.” Journal of Portfolio
Management 15 (3): 30–37.
Hsu, Jason, Vitali Kalesnik, and Engin Kose. 2019. “What is Quality?” Financial Analysts
Journal, 75 (2): 44-61.
Israel, Ronen, Kristoffer Laursen, and Scott Richardson. 2021. “Is Systematic Value Investing
Dead?” Journal of Portfolio Management 47 (2): 38-62.
Jegadeesh, Narasimhan, and Sheridan Titman. 1993. “Returns to Buying Winners and Selling
Losers: Implications for Stock Market Efficiency.” Journal of Finance 48 (1): 65–91.
Jensen, Michael C., Fischer Black, and Myron Scholes. 1972. “The Capital Asset Pricing Model:
Some Empirical Tests.” Studies in the Theory of Capital Markets, edited by Michael C.
Jensen, 79–121.
Kagkadis, Anastasios, Harald Lohre, Ingmar Nolte, Sandra Nolte (Lechner), Nikolaos Vasilas.
“Power Sorting.” SSRN working paper, August 2023.
Novy-Marx, Robert. 2013. “The Other Side of Value: The Gross Profitability Premium.” Journal
of Financial Economics 108 (1): 1-28.
Markowitz, Tobais, and Mark Grinblatt. 1999. “Do Industries Explain Momentum?” Journal of
Finance 54 (4): 1249-1290.
Scholes, Myron, and Joseph Williams. 1977. “Estimating betas from nonsynchronous data.”
Journal of Financial Economics 5 (3): 309-327.
Shliefer, Andrei, and Robert Vishney. 1997. “The Limits to Arbitrage.” Journal of Finance 52
(1): 35-55.
Sharpe, William. 1964. “Capital Asset Prices: A Theory of Market Equilibrium under Conditions
of Risk.” Journal of Finance 19 (3): 259–63.
Treynor, Jack, and Fischer Black. 1973. “How to Use Security Analysis to Improve Portfolio
Selection.” Journal of Business 46 (1): 66–86.
52
53