Foreign Exchange Forecasting Via Machine Learning: Christian Gonz Alez Rojas Molly Herman
Foreign Exchange Forecasting Via Machine Learning: Christian Gonz Alez Rojas Molly Herman
1
B. Fundamental Variables Dataset USDMXNt . We then construct an estimated long/short signal
The fundamental variables data uses the monthly closing by computing:
price of the USDMXN currency pair as our target variable.
(
\ t+1 − USDMXN
1 if USDMXN \ t ≥0
We use 27 features that describe the macroeconomic condi- Signal t =
\
tions of both the US and Mexico between March 1990 and 0 otherwise
October 2018. The additional features that are considered in Both strategies yield a binary signal output that we can
this dataset are detailed in Table II. execute as a trading strategy.
TABLE II
B. Models
F UNDAMENTAL FEATURES : M ONTHLY DATASET
The performance of different machine learning algorithms
is tested for each framework. In particular, we considered:
Type Country Variables
Economic Mexico IP, Industrial Production 1) Logistic/Linear Regression: We use logistic and linear
Activity Trade Balance (Exports - Imports)
US IP, Industrial Production regression as our benchmark models.
Trade Balance (Exports - Imports)
Labor US Unemployment 2) Regularized Logistic/Linear Regression: We consider
Market Non-farm Payroll L1 and L2 regularization applied to logistic and linear regres-
Prices Mexico CPI, Consumer Price Index sion. This allows to reduce overfitting in the validation set.
PPI, Producer Price Index
US CPI, Consumer Price Index The hyperparameter λ , which penalizes large coefficients, is
PPI, Producer Price Index tuned using the validation set accuracy.
Debt Mexico National Debt
US National Debt 3) Support Vector Machines/Regression (SVM/SVR): It is
Sentiment US PMI, Purchasing Managers Index highly likely that fitting FX dynamics requires a non-linear
Investor Sentiment
Other Mexico M2 Money Supply
boundary. SVM/SVR with a Gaussian kernel provide the
US M2 Money Supply flexibility to generate a non-linear boundary as a result of the
infinite-dimensional feature vector generated by the kernel.
C. Data Processing 4) Gradient Boosting Classifier/Regression (GBC/GBR):
Almost all data processing is identical in both datasets. We Tree-based models allow us to capture complex interactions
first split the data into 60% train set, 20% validation set, and between the variables. Unlike Random Forests, which re-
20% test set. These subsets are taken sequentially in order quire bootstrapping, GBC allows us to keep the time-series
to keep the time-series nature of the data and to guarantee structure of the data while considering non-linearities. It is
our algorithms train exclusively on past data. important to notice that GBC and GBR is just considered
for the market variables dataset, due to the division of work
To translate our problem into a classification problem, between the authors (See section IX).
we introduce the Signalt variable which we set to 1 if the
USDMXN was higher tomorrow than today. This is: 5) Neural Networks (NN): Neural networks can model
( complex relationships between input features, which could
1 if USDMXNt+1 − USDMXNt ≥ 0 improve the forecasting performance. We consider fully-
Signalt =
0 otherwise connected networks. The architecture is shown in Fig. 1.
We also perform data processing on the features. In partic- Input Hidden Hidden∗ Output
ular, we standardize using the mean and standard deviation
I1
of the training set for every covariate. H11 H12
For the fundamentals dataset, covariates are lagged by an
I2 O1
additional period. This is done to approximate the fact that H21 H22
it is extremely rare to obtain real-time macroeconomic data.
By lagging the features by one month we ensure we are not .. .. ..
peeking into the future by including unpublished data. . . .
In
IV. F RAMEWORKS AND M ODELS Hm1 H p2
A. Frameworks
First, we perform binary classification on the Signalt Fig. 1. NN architecture.
∗ Second
hidden layer only for the market variables model.
variable we constructed in the data processing step. This
essentially transforms what initially is a continuous variable
Gu et al. (2018) show that shallow learning outperforms
problem into a classification task.
deeper learning in asset pricing applications. We follow this
On a second exercise, we use ML algorithms to con- result and only consider shallow architectures. In particular,
struct point forecasts for our raw continuous target variable, we use a network with two hidden layers for the market
2
variables dataset and a neural net with one hidden layer for The results provide evidence that market variables have a
the fundamentals dataset. stronger forecasting power than fundamentals when it comes
to classifying long/short signals. The largest test accuracy
Our choice for loss depends on the framework. We se-
(56.0%) for the market variables was obtained by the SVM,
lect logistic loss for classification and mean squared error
while the maximum test accuracy (44.9%) is achieved by
for the continuous target variable problem. We choose the
logistic regression for the fundamentals data.
proper activations in the same fashion: sigmoid is used for
classification, while ReLU is used for the continuous target There is, however, an important caveat when interpreting
variable. Finally, we use dropout or activation regularization the results. Being a measurement of the fraction of pre-
to avoid overfitting. dictions that we can correctly forecast, accuracy does not
differentiate between true positives and true negatives. A
V. H YPERPARAMETER T UNING successful trading strategy should exploit true positives and
All model parameters are tuned using the validation set. true negatives, while minimizing false positives and false
We use accuracy as our performance evaluation in the negatives.
binary classification model and mean squared error in the
continuous target variable model. The resulting parameters To discern between these cases, Fig. 2 shows the confusion
are detailed in Table III. matrix for the SVM model in the market variables dataset.
The plot suggests a bad performance on the classification of
TABLE III short signals, as well as a prevalence of long predictions.
S ELECTED PARAMETERS
Market Fundamentals
Model
Train Validate Test Train Validate Test
Logistic 62.5 55.2 53.0 67.8 39.1 44.9
Lasso 59.1 58.8 53.6 58.5 53.6 34.8
Ridge 60.1 61.8 54.2 59.0 53.6 37.7
SVM 59.1 60.0 56.0 65.4 53.6 40.6
NN 69.7 56.4 54.2 65.5 55.1 40.6
GBC 81.9 52.1 48.2 Fig. 3. Conditional density of 3-month Mexican T-Bills
Note: Best performance on test set marked in red.
3
B. Continuous Experiments A profitable investment strategy requires algorithms that
Table V presents the statistical performance of every correctly predict the direction of very large movements in
model for the continuous target framework applied to the the price of the asset. In our case, if an algorithm correctly
market variables and the fundamentals datasets. predicts most small changes but misses large jumps in the
exchange rate, it is very likely that it will produce negative
TABLE V
economic performance upon execution. This issue has been
C ONTINUOUS TARGET: ACCURACY (%)
previously assessed in the literature by Kim, Liao, and
Tornell (2014).
Market Fundamentals
Model Therefore, to assess the economic performance of our
Train Validate Test Train Validate Test
Linear 65.3 65.9 58.8 54.5 55.9 50.0 models, we compute the cumulative profits generated by the
Lasso 63.2 67.1 57.0 50.5 63.2 52.9 execution of the ML-generated strategy in the test set. The
Ridge 63.6 67.1 60.0 52.0 52.9 50.0
SVR 67.3 56.7 58.2 55.9 45.6 54.5 implemented strategy is simple: we start with enough cash in
NN 79.2 54.9 60.0 65.2 45.6 54.4 MXN to buy a unit of USD. We then execute the following
GBR 73.9 50.6 56.4 for every time t:
Note: Best performance on test set marked in red.
(
\t = 1
Long USD 1 if Signal
Strategyt =
\t = 0
Short USD 1 if Signal
The outperformance of the continuous variable target with
respect to the binary classification models is significant. The
At the end of every period, the position is closed, profits are
improvement between the accuracy of the best performing
cashed-in and the strategy is repeated. Finally, we use a long-
models in the market variables test set is of around 7%,
only strategy as our benchmark for economic performance.
while of 21% for the fundamentals test set. All continuous
target models outperform the binary classification in terms
A. Binary Classification
of accuracy and all market-variables models outperform
fundamentals models. Fig. 5 plots the cumulative profits of executing the binary
classification algorithms on the market variables dataset as a
Given the bad results of the confusion matrix for the
trading strategy.
binary classification problem, we explore the results of the
continuous experiments. Fig. 4 shows the confusion matrix
of the best performing model in terms of accuracy on the
market variables data for the continuous variable framework,
Ridge regression.
Fig. 4. Confusion matrix of the Ridge model on the market variables data Fig. 5. USD cumulative profits of the market variables dataset
It is easy to observe that the change with respect to the The statistically best performing model corresponds to the
continuous model is dramatic. From a 4% true negative economically most profitable specification. However, it is
rate obtained in the best model for binary classification, important to notice that this positive result is mostly driven
this new continuous target framework yields a 59% rate. by a single correct bet made between weeks 725 and 750.
This is obtained at the expense of a lower true positive All other strategies produce profits that are equal to or worse
rate. However, the true positive rate still yields a reasonable than the long-only benchmark.
performance of 61%.
These results can be explained by the bad performance of
VII. E CONOMIC P ERFORMANCE the models in terms of the confusion matrix. Due to the very
A model with very successful statistical performance of low true negative rate of most models, all specifications are
long/short signals does not imply positive economic impli- close to the long-only benchmark and the departures are a
cations. This is an inherent problem in directional forecasts. consequence of few correct or incorrect short bets.
4
B. Continuous Variable Target It is no surprise that fixed income variables are the most
Fig. 6 plots the cumulative profits of executing the con- relevant features. The result is consistent with the idea that
tinuous variable target algorithms on the market variables the exchange rate is closely related to interest rates, as
dataset as a trading strategy. explained by the Uncovered Interest Rate Parity condition
widely studied in economics.
Finally, another interesting insight is that the USDMXN
reacts strongly to global and emerging-market (EM) fixed
income indicators. In theory, the bilateral exchange rate
should react strongly to the interest rate differential between
the two countries. We believe the observed result provides
evidence of investor behavior. As documented in recent
years by Bloomberg (2015), The Wall Street Journal (2017a)
and The Financial Times (2018), the high liquidity of the
Mexican Peso has allowed its role as a hedge for long EM
positions. Our results are consistent with these findings.
IX. C ONTRIBUTIONS
The team worked on the same problem but used different
datasets. The contribution to this work was as follows:
Christian González Rojas was in charge of data col-
lecting, data processing, algorithm selection and algorithm
implementation on the market variables dataset for both the
continuous and the binary framework. He decided to consider
GBC/GBR as an additional model to further test the value of
nonlinear relationships. He was also responsible for writing
Fig. 7. Variable importance for ridge regression on the market variables the CS229 poster and the CS229 final report. His data and
dataset under the continuous target framework code can be found at this link.
5
Molly Herman worked on data collection, data processing
and algorithms for the fundamentals dataset. She was respon-
sible for modifying the CS229 poster to create an alternative
version for the CS229A presentation and was in charge of
writing her own final report for CS229A.
The division of work for the poster and the final report
was done to provide deeper insight on the results to which
each author contributed the most.
R EFERENCES
Amat, C., Michalski, T., & Stoltz, G. (2018). Fundamentals
and exchange rate forecastability with simple machine
learning methods. Journal of International Money and
Finance, 88, 1 - 24.
Bloomberg. (2015). Why Traders Love to Short the Mexican
Peso.
Connor, G., & Korajczyk, R. A. (1988). Risk and return
in an equilibrium APT: Application of a new test
methodology. Journal of Financial Economics, 21(2),
255 - 289.
Fan, J., Liao, Y., & Wang, W. (2016, 02). Projected principal
component analysis in factor models. Ann. Statist.,
44(1), 219–254.
Gu, S., Kelly, B. T., & Xiu, D. (2018). Empirical Asset Pric-
ing via Machine Learning. Chicago Booth Research
Paper, No. 18-04.
Hryshko, A., & Downs, T. (2004). System for foreign
exchange trading using genetic algorithms and rein-
forcement learning. International Journal of Systems
Science, 35(13-14), 763-774.
Kelly, B., Pruitt, S., & Su, Y. (2018). Characteristics
are covariances: A unified model of risk and return.
Journal of Financial Economics, Forthcoming.
Kim, Y. J., Liao, Z., & Tornell, A. (2014). Speculators’ Po-
sitions and Exchange Rate Forecasts: Beating Random
Walk Models. Working Paper.
Lettau, M., & Pelger, M. (2018). Factors that fit the time
series and cross-section of stock returns. Working
Paper.
Ramakrishnan, S., Butt, S., Chohan, M. A., & Ahmad, H.
(2017). Forecasting Malaysian exchange rate using
machine learning techniques based on commodities
prices. In 2017 International Conference on Research
and Innovation in Information Systems (ICRIIS) (p. 1-
5).
The Financial Times. (2018). Mexico’s Peso remains the
bellwether for Emerging Markets.
The Wall Street Journal. (2017a). The Mexican Peso: A
Currency in Turmoil.
The Wall Street Journal. (2017b). The Quants Run Wall
Street Now.