A Simple Machine Learning Model to Trade SPY

1.006 Trading Signal

I have created a quantitative trading strategy that incorporates a simple machine learning model to trade the SPY as part of my ongoing research in quantitative trading. The focus here was not on creating a strategy with alpha but rather to develop a framework both in my mind and in code to develop more advanced models in the future.

1. Does SPY Exhibit Short-Term Mean Reversion or Momentum?

Examining whether SPY exhibits short-term mean reversion or momentum is the central idea in this strategy. If negative returns tend to precede positive returns (or positive returns precede negative returns), that suggests that mean reversion exists. If positive returns tend to precede positive returns (or negative returns precede negative returns), that suggests that momentum exists.

In other words, short-term mean reversion or momentum exist if there is some relationship between the daily return of SPY and the lagged daily return of SPY over various time periods.

Below I plot the daily returns of SPY versus the lagged daily return of SPY over periods of one day to nine days. The top left plot shows the relationship between the daily return of SPY versus the one-day lagged daily return of SPY. The blue line in the plots is a smoother to aid in detecting patterns in the data.

Scatterplots of SPY daily returns versus the lagged daily return over various time periods.

Scatterplots of SPY daily returns versus the lagged daily return over various time periods.

So is there any evidence of mean reversion or momentum? The interesting plots are the first two plots which represent the one-day and two-day lagged returns. The plots suggest that there is weak evidence of some short-term mean reversion, especially when there has been a large negative return in the past. There are relatively more extreme observations in the top left quadrant of the plots compared to the other quadrants. These observations represent instances where the SPY has gone down by a lot and then corrected upwards the following day. The smoother has a slight negative slope which confirms the weak evidence for short-term mean reversion.

Why might this happen? The narrative is that occasionally broad-based and panic-induced selling occurs which exhausts all the short-term selling pressure, so the market corrects the following day.

2. Training a Logistic Regression Model

We have established that there might be some relationship between SPY daily returns and the one-day and two-day lagged daily returns, but is this enough to build a predictive model? I decided to treat this problem as a classification problem to keep things simple — I am only interested in predicting the direction of future returns (positive or negative) and not the magnitude.

The machine learning method I decided on using is logistic regression which is a simple learning method that is often used before more flexible learning methods.

The data was split into a training set that consists of SPY observations from 2000 to 2013. The following 10 models were trained on this training set with each model identical to the model above it but with the addition of an additional lagged return as a predictor. Models 3 through 9 are not shown for brevity.

m01 <- glm(daily_return_sign ~ 
           data = SPY_training, 
           family = binomial)
m02 <- glm(daily_return_sign ~ 
             daily_return_lag1 + 
           data = SPY_training, 
           family = binomial)
m10 <- glm(daily_return_sign ~ 
             daily_return_lag1 + 
             daily_return_lag2 + 
             daily_return_lag3 + 
             daily_return_lag4 + 
             daily_return_lag5 + 
             daily_return_lag6 + 
             daily_return_lag7 + 
             daily_return_lag8 + 
             daily_return_lag9 + 
           data = SPY_training, 
           family = binomial)

3. Assessing Model Accuracy

Logistic regression models output the probability that an observation will belong to each class (in our case, whether the daily return will be positive or negative), but it is up to the practitioner to decide on a logical probability threshold to assign the observation to a class. Which probability threshold to decide on is a trade off between the number of true positives you want versus false positives. This post on StackExchange helped me understand this concept more clearly.

One logical probability cut off to choose is if the probability that the observation belongs to a class is greater than 50%, then assign that observation to that class. Unfortunately, I am unable to use this decision rule in this case because the model predicts for almost all the observations that the probability that the daily return is positive is greater than 50%. In fact, the predicted probabilities are clustered around 54% — which is also the probability that the daily return is positive in the data.

Simply put, the models were not able to predict with much confidence given the predictors available to them, so the predicted probabilities that the daily return is positive had a very narrow range centered around the observed mean probability for positive return in the data. As such, I set the probability threshold of around 54% to assign observations to a predicted positive return or predicted negative return.

The models were evaluated on a test set that consists of SPY observations from 2014 to present. I decided to use the accuracy rate as a measure of assessing model fit which is simply the percent of observations that the model predicted correctly. Here are how the models performed.

 model accuracy
1 m01 0.5204
2 m02 0.5302
3 m03 0.5432
4 m04 0.5383
5 m05 0.5106
6 m06 0.5155
7 m07 0.5334
8 m08 0.5318
9 m09 0.5188
10 m10 0.5253

4. Choosing the Model and Constructing Trading Signal

These might seem like good results since the models have accuracy over 50% until you remember that a classifier that would simply predict a positive daily return for every observation would have an accuracy of 54% (since 54% of the time, the daily return of SPY is positive).

Unsurprisingly, predicting future returns is hard and these models are bad. But let’s take a closer look at model m03 which has an accuracy of  54.3%. Here is a table of actual daily returns versus predicted daily returns.

                                   negative_predict      positive_predict
negative_actual 160                                  120
positive_actual  160                                   173

When model m03 predicted negative returns, it was right 160 times and wrong 160 times with an accuracy of 50%. When model m03 predicted positive returns, however, it was right 173 times and wrong 120 times with an accuracy of 59%. This is pretty promising, so let’s use model m03 and only trade when it predicts a positive return.

The logic for constructing the trading signal is to use the 54% probability threshold for assign observations to a predicted positive return or negative return, go fully long when the predicted class is a positive return, and be flat when the predicted class is a negative return.

5. Assessing Strategy Performance

Here is the equity curve. It turns out that this simple strategy isn’t complete garbage and has actually outperformed the buy-and-hold return of SPY, at least over this test set. The strategy performed extremely well from late 2015 to present. Adding in transaction costs, however, would cause this strategy to underperform the SPY because there are a large number of trades.

1.006 Equity Curve

Here is the SPY closing price with a mapped trading signal color gradient. Blue indicates times when the model was fully long and black indicates times when the model was flat. The plot shows that the strategy tends to go long as the market dips and hopes for a correction which exploits the short-term mean reversion explored earlier in this post.

1.006 Trading Signal

The code for this post is comprised of 218 lines of code in R and can be found on my Github.


  1. Pingback: Quantocracy's Daily Wrap for 06/09/2016 | Quantocracy

  2. Hey can you explain more about what is model m01,m02,..,m10. Are you trying to predict the direction for future days? like m01 for tomorrows direction, m02 for day after tomorrow direction and m10 after 10 days? etc

    • Sure, I was struggling a little bit with the language, so I’m not surprised that it’s not clear. Suppose that it’s the end of the trading day. You know the daily return for the current day, the day before, and the day before that. Using these three daily returns, the model attempts to predict the daily return for the next day. That’s what model m03 is.

      More formally, the models attempt to predict the daily return of day t = 0 using the daily returns of day t = -1, t = -2, …, t = -10.

  3. mark leeds

    Hi: I tried to something similar a long time ago ( I also had another class called don’t trade ) and I found something that maybe
    goes on with your strategy ? If you use the 50 percent threshold approach then, even if you get more right than wrong, it’s
    the values of the returns that drive it. Atleast that’s what I found. A very interesting and well done post.

    Your approach may have more possibilities because mine was trying to do classification intraday. My guess ( or should I say “infenrence” ) is that those returns are waaay too noisy to ever work with that approach. So, it never worked out and I moved
    on. You may have more success. I would check including “don’t trade” also. ( which then causes you need to nnet because
    it’s multinomial ) . I think I did 60 and don’t trade in between. It may not have been 40 and 60 but you get the idea. I never saw this blog before so I’m gonna subscribe. Very nicely done and it brought back some ( good and bad ) memories so thanks.

    • Thanks, that’s a good idea for the classification. It should be possible to optimize the probability thresholds using cross validation or out of sample testing.

  4. mark leeds

    One other suggestion: Since the returns clearly mattered in my case, I tried to still invent some kind of objective function in order to decide what to get into and what not to get into. ( I was dealing intraday so there could be many choices because a lot of stocks ) . It’s somewhat vague in my memory but I I think I used the probabilities as some kind of proxy for returns and then created a linear program that tried to maximize the sum of the p_i’s subject to the portfolio being market neutral. That didn’t help much either because the proxy isn’t all that great. But my point is that I think there needs to be a way to quantify the proabilities and use them to decide how either A) enter or B) position size when the probability is higher than just 39 or 61 ( using the 40-60 approach ). Good luck.

    • Right, the probability should be a measure of confidence, and the further away the probability is away from 50%, the more confidence the model has in its prediction. So that should mean a stronger signal (a bigger position). For the purposes of this blog post, I just decided to let the signal take on values of only fully long or fully short, but I like the idea of varying the position size / signal strength depending on the confidence.

  5. Pingback: Best Links of the Last Two Weeks | Quantocracy

Leave a Reply

Your email address will not be published. Required fields are marked *