Saturday, April 23, 2011

The many facets of linear regression

Many years ago, a portfolio manager asked me in a phone interview: "Do you believe that linear or nonlinear models are more powerful in building trading models?" Being a babe-in-the-woods, I did not hesitate in answering "Nonlinear!" Little did I know that this is the question that separate the men from the boys in the realm of quantitative trading. Subsequent experiences showed me that nonlinear models have mostly been unmitigated disasters in terms of trading profits. As Max Dama said in a recent excellent article on linear regression: "...when the signal to noise ratio is .05:1, ... there’s not much point in worrying about [higher order effects]". One is almost certain to overfit a nonlinear model to non-recurring noise.


Until recently, I have used linear regression mainly in finding hedge ratios between two instruments in pair trading, or more generally in finding the weightings (in number of shares) of individual stocks in a basket in some form of index arbitrage. Of course, others have found linear algebra useful in principal component analysis and more generally factor analysis as well. But thanks to a number of commenters on this blog as well as various private correspondents, I have begun to apply linear regression more directly in trading models.


One way to directly apply linear regression to trading is to use it in place of moving averages. Using moving average implicitly assumes that there is no trend in a price series, that the mean of the prices will remain the same. This of course may not be true. So using linear regression to project the current equilibrium price is sometimes more accurate than just setting it equal to a moving average. I have found that in some cases, this equilibrium price results in better mean-reverting models: e.g. short an instrument when its current price is way above the equilibrium price. Of course, one can also use linear regression in a similar way in momentum models: e.g. if the current price is way above the equilibrium price, consider this a "breakout" and buy the instrument. 


Max in his article referenced above also pointed out a more sophisticated version of linear regression, commonly called "weighted least squares regression" (WLS). WLS is to linear regression what exponential moving average (EMA) is to simple moving average (SMA): it gives more weights to recent data points. Indeed I have found that EMA often gives better results than SMA in trading. However, so far I have not found WLS to be better than simple least squares. Max also referenced an article which establishes the equivalence between weighted least squares and Kalman filter. Now Kalman filter is a linear model that is very popular among quantitative traders. The nice feature about Kalman filter is that there is very few free parameters: the model will adapt itself to the means and covariances of the input time series gradually. And furthermore, it can do so one-step at a time (or in technical jargon, using an "online" algorithm) : i.e., there is no need to separate the data into "training" and "test" sets, and no need to define a "lookback" period unlike moving averages. It makes use of "hidden states" much like Hidden Markov Models (HHM), but unlike HHM, Kalman filter is faithfully linear.


I haven't used Kalman filter much myself, but I would welcome any comments from our readers on its usage. Also, if you know of other ways to use linear regression in trading, do share with us here!



54 comments:

Damian said...

I've often heard people refer to the Kalman filter as a T3 moving average - but I've not seen one coded up that didn't include a lookback period.

Here's an implementation in Amibroker - curious what you think.

http://www.wisestocktrader.com/indicators/240-t3-function-include-afl

Ernie Chan said...

Hi Damian,
Thanks for the link. I think the implementation of the T3 Kalman filter is too complicated and ad-hoc. The "nonlinear" in "... T3 is a six-pole non-linear Kalman filter ..." is precisely what most of us want to avoid.

In the Kalman implementation referenced by Max (http://www2.econ.iastate.edu/tesfatsi/FLSTemporalDataMining.GMontana2009.pdf), there is a parameter that controls how fast the regression parameter is allowed to change. This can be viewed as some kind of lookback parameter, since the faster it is allowed to change, the shorter the effective lookback period is.

Ernie

Damian said...

Yes a very different thing....pretty interesting. Thanks for the link.

Matthew said...

The Kalman filtering approach is a really important concept. Two things to keep in mind is that the filter uses a model to predict the system's next state. So you still have to choose between linear and non-linear modeling even after deciding to apply such a Kalman filter.

Also be aware that the mathematical underpinnings of the Kalman filter assume continuous, normally distributed variables. Items which are strikingly hard to come by in trading.

Mr.LoL said...

Hi Ernest:

I am pretty new in this quant world, I have been reading your posts and I have a question if you don't mind...

What is the point of focusing on HFT when using pair trading and holding them for some days seems to be so much easier?

I mean that you could get a 10%? 20% maybe? using that approach and you just have to worry about the mathematical model itself, not the implementation,slippage, etc.

I also assume that the shorter the timeframe, the greater the randomness, is this right? Does it really pay off in terms of reward?Could you give us some approximation here comparing those two ways of trading??

Ernie Chan said...

Hi LoL,
The advantage of HFT is that the Sharpe ratio is typically much higher than overnight trading, so that allows you to use more leverage, and that in turns allow you to have much higher returns.

I do not believe that randomness increases with trading frequency. In fact, I find the opposite to be true, as short time scales prevents extraneous events from disrupting the model.

Ernie

Anonymous said...

Hi Ernie,

Dumb me, I just noticed I missed the seasonal trade on RBOB (+$3381 per RBOB contract this year)...

Did you remember to trade it?

Ernie Chan said...

Hi Anon,
Good to hear RBOB is still working!
No, I didn't trade either: I am focusing on higher frequency trading these days.
Ernie

Anonymous said...

Can you elaborate on how you use linear regression in place of moving averages? What's the dependent variable and what's the regressor? Thanks.

Shaun Overton said...

Hi Ernie,

This is a decidedly non-mathematical approach, but I've taken to resetting my moving averages whenever a bar > 2 standard deviations forms. The MA period grows linearly until the next volatility event.

Click on the "Resetting Moving Average" to get the indicator code for MT4 or NinjaTrader.

Ernie Chan said...

Anon,
In using LR instead of MA, the time variable t=1,2,3... is the independent variable, and the price is the dependent one.
Ernie

Ernie Chan said...

Hi Shaun,
That's an interesting approach and it does make sense. Thanks for sharing!
Ernie

Wei said...

Hi Ernie,

Great article and thanks for sharing your thoughts on linear regression and other technical methods. There are a couple of important aspects worth pointing out:

1. OLS and WLS require specifying hyperparameters like the length of lookback window, with expanding or rolling window being popular choices. The coefficients however are sensitive to the size of window: too slow to adapt if too long a window, high sample error if too short. I have encountered situations where the hedge ratio changes its sign as data samples rolling forward, completely nonsense and purely an artifact of LR properties and sampling errors. It led to a breakdown of the regression-based trading model, but I consider this a fortunate revelation as I had long worried about the arbitrariness of selecting a hyperparameter without clear economic justification. One could "optimize" the window size to get the best backtest results but the problem is that tomorrow is a different day. That leads to the second and more serious problem with linear regression.

2. OLS, and even WLS, disregard the intertemporal structure of the time series data. Max claims that WLS solves this problem, but my experience has been that it makes no significant difference, and you seem to agree. Again, one faces the problem of deciding what kind of weights and decay rate to be applied. Yet another hyperparameter, to be decided not on economic grounds.

3. Kalman filter solves these problems to a large extend, and it works well with discrete data (unlike one commentator claimed). It's also simple and efficient to implement, but it's not a free lunch. To use it, you need a model specification, and there is no off-the-shelf way of doing that. It's completely up to your creativity and understanding the trading problem. Of course, it has its own set of issues too, but at least you can frame it in economic terms, because hopeful you have created a model specification based on sensible economics.

4. Last but not least, on whether high frequency microstructure is more or less noisy. It depends on the market and assets you are looking at. For an asset with high intraday volatility, you might be better off using low frequency data. Almost by definition, high volatility is an indication of high degree of noise around the "true" fundamental value.

Ernie Chan said...

Hi Wei,
Thanks for your thoughtful comments on OLS, WLS, KF, and noise.

In my experience, hedge ratios do not vary too much based on the lookback period. Perhaps that's because I focus only on ETF pairs and they are pretty stable. I am surprised though to hear that even WLS has such a sensitive dependence, as the weights are supposed to smooth this out.

With regard to choosing the right model for KF, I stick to Occam's razor as usual! But yes, if you know a bit about the economics of the trade, it would be a big help, though usually I am clueless until after-the-fact.

With regard to noise -- for a mean-reversion trader, more noise means more profit opportunities! (We assume that the noise is mean-reverting.) So if intraday trading is noisy in that sense, intraday trading is therefore very profitable. The noise we don't like is the type that does not mean-revert: for e.g. those created by exogenous corporate/economic/political events.

Ernie

K said...

Hi Ernest,

I just started my own blog for my personal usage, can I have your permission to add your link to my blog?

Thank you.

Regards
Kenneth

Ernie Chan said...

Hi Kenneth,
Sure, please feel free to link to my blog.
Ernie

Anonymous said...

Ernie,

This may sound like a strange question to ask on a blog called, "Quantitative Trading", but have you ever evaluated the quant-oriented daytrading methods discussed on this blog in comparison to plain old value investing and security analysis? That method can be summarized as follows: gain a detailed understanding of a stock or bond security through research, buy when the price is much lower than what it's really worth (i.e., intrinsic value), and sell as it approaches intrinsic value.

The reason I ask such a fundamental question is that my background is very technical and mathematical, much like yours. I have a Ph.D. in Electrical Engineering and have a background in things like linear regression and kalman filters. But after evaluating all investment methods I was aware of, the simple non-quant idea of buying securities at a deep discount to fair value still makes the most sense to me.

I do enjoy your blog - don't get me wrong. I was just wondering if you ever thought about this more fundamental question.

Thanks,
aagold

Ernie Chan said...

Anon,
Both approaches are equally valid.

The value investing approach usually implies a long holding period. However, value investing is not antithetical to a quantitative approach. In fact, many people call Ben Graham the first "quant". For e.g. factor models utilize many fundamental and economic indicators in order to determine the fair value of a stock.

What people usually have in mind as algorithmic or quantitative trading typically occur at a higher frequency. At such frequency, fundamental information becomes less important.

Value investing typically has low Sharpe ratio and large drawdown, but it has very high capacity. High frequency algorithmic trading has the opposite characteristics.

An ideal hedge fund should encompass both approaches, but few managers have equally excellent skills in both, not even Jim Simons.

Ernie

Ronnie said...

Hi Ernie!

Sorry for the off topic..

Could you explain which way is the better for a new trader to start?

Imagine you have not a lot of capital, pair trading often involves buying-selling contracts that are quite big for the small money, so the way to do this would be over-leveraging which is dangerous.

My question is: Where should a trader with 20k for example look at? forex? commodities? stocks?

Would pair trading be ok for this? O should the trader go for other options like volatility trading or such?

Thanks in advance.

Ernie Chan said...

Hi Ronnie,
FX and futures are the best areas for a trader with small capital base to start, due to the small margin requirement. Of course, that assumes that you have good strategies in those areas!

Pair trading ETF's are pretty easy and safe, but as you said, requires a good bit of capital to make a living.

Ernie

Anonymous said...

Hi Erine

Roughly, how much you have to spend on setting up the infrastructure (co-location) of HFT business, those hardware seems pretty expensive. The setup cost seems too much for retail trader

Kat

Ernie Chan said...

Hi Kat,
Hardware is not expensive. Any server of about $5K will do. What's expensive is what your broker will charge for the ongoing colocation expense: at least $2K / month.

None of these matter if your HFT strategy actually works!
Ernie

Anonymous said...

Hi Erine

HFT seems a quite profitable strategy for small fund capital, I heard that some of the banks embbeded their trading strategy in a microchip to gain extra speed. Sounds likes everyone keeps on investing on hardware to front run the other taders. What software language you use to implement the HFT, matlab?

Dave

Ernie Chan said...

Hi Dave,
I hesitate to call my strategies HFT: I can certainly tolerate latency of a few seconds.

While the high turnover of HFT does allow a small fund to use its small capital base very efficiently, the infrastructure cost for a true HFT strategy is beyond most small funds.

Yes, I implement all my strategies in Matlab.

Ernie

Issy said...

hi Ernie, what's your thought on measuring divergence between the price and an oscillator such as RSI?

Given that the divergence is done on the swings; and not on the raw data points; is Linear Regression a good candidate?

Issy

Ernie Chan said...

Hi Issy,
I am not exactly sure what you mean by "divergence is done on the swing, and not the raw data points". Could you please elaborate?
Ernie

GTji said...

I suspect that Linearregression Avg may cause your system to be overly curve fitted, whats you opinion on that?

Suny said...

I have a question on OLS function by Spatial Econometrics. I used that function as suggested by Ernie in this book. Somehow the hedge ratio (or beta) of the regression comes out different from when I run it with glmfit function in Econometric toolbox. The result from OLS in Spatial econometrics and REGRESS in matlab comes out the same but different from glmfit. I have tested with simple Excel regression and SAS function. Those numbers agree with glmfit. I am just wondering what makes the difference here. Am I missing something here? Thanks

Ernie Chan said...

Suny,
Have you made sure that no offset was used in the regression fit in all cases?
Ernie

Anonymous said...

Hi Ernie - quick question if you don't mind the time. Appreciate your time as always. I'm wondering how do you set up the regression in place of the MA:

>In using LR instead of MA, the time >variable t=1,2,3... is the >independent variable, and the price >is the dependent one.

Price = a + b * t

1) Are you using intercept or is it better to leave it out?

2) The time variable t - are you using just an integer for t that increments by 1 as you move forward in time?

3) I imagine you are doing a rolling-window regression, similar to how a moving average rolls forward based on the window period selected.

Greatly appreciate the insights.

Shal

Ernie Chan said...

Anon,
1) Intercept is needed here because prices do not go to zero at an arbitrary t=0.

2) t can increment by 1 at every bar.

3) For ordinary OLS, a rolling window is needed. For WLS or Kalman Filter, we don't need rolling window.
Ernie

Anonymous said...

Thanks Ernie for the feedback on the regression setup. Here is a simple way to do this in R if anyone wants to fiddle around.

library('quantmod')
getSymbols(c('AAPL'),from='01-01-2003')

lm(Cl(AAPL) ~ index(AAPL)) -> results

summary(results)
plot(index(AAPL), Cl(AAPL),type='line')

abline(coef = coef(test))

Shal

Boris said...

Hi Ernie,
you mentioned you used LR also in basket arbitrage trading. I am using it in forex basket arbitrage trading with R where the regressand is EURUSD. Can you tell me how you are deriving the lot sizes from the calculated coefficients ?

Ernie Chan said...

Hi Boris,
I assume you are regressing A.USD, B.USD, ..., against EUR.USD, so that all independent variables are denominated in USD? If so, then the regression coefficients are the lot sizes.
Ernie

Boris said...

Hi Ernie, thanks for the feedback. I am using mixed pairs xxxUSD and USDxxx like USDJPY and USDCHF. Deposit currency is USD.

Ernie Chan said...

Hi Boris,
In that case, you have to first convert all the pairs to X.USD first, do the LR, obtain the lot size, and then convert the lot size back to USD.X.
Ernie

Boris said...

Thanks for the feedback. It sounds good. I am thinking about another approach: normalization of the currency pairs should be also achieved by deviding quotes through its related pip value per lot - calculated with ticksize and tickvalue correct ?

Ernie Chan said...

Boris,
Yes, as long as each point move represents the same dollar amount, you can run your LR on any price series.
Ernie

Unknown said...

Hi Ernie,

I have one question regarding LR. I am using them as Moving Average for example 21 day LR & 63 day LR. I will be looking out for cross over and also price cross them either from up or down. My question is what can be better option to filter for trend identification and also how to avoid whipsaw as we often see in MA crossover.

Ernie Chan said...

Unknown,
Different lookback is optimal for different time series. If you are looking for trend, you should check the correlation coefficient of various lookback periods with a holding period and see which one is optimal for your time series.
Ernie

sg said...

By the way, what does it mean by "independent variable" and "dependent variable" in this context?

Ernie Chan said...

Hi sg,
For pair trading, you can arbitrarily pick any one price series as independent variable, and the other as dependent. However, it is a good idea to try both permutations.
Ernie

RM said...

Hi Ernie,

(Stats grad student who just started following your blog here) -- I wanted to comment on the use of OLS and WLS. For those who may not know, WLS is a special case of Generalized Least Squares (GLS) when we have no autocorrelation in the model errors (in other words all the off-diagonal terms in the covariance matrix are zero). GLS outperforms OLS (among all other linear unbiased estimators) in terms of efficiency when there is heteroskedasticity (non-constant variance) and/or autocorrelation in the error terms by essentially weighting observations according to the magnitude of the model errors. If you use GLS/WLS and choose the weights according to time periods instead of giving relatively larger weights to observations with smaller errors and giving less weight to the ones with larger errors, then you will indeed get some funky results which I think explains why you weren't getting better results using WLS over OLS. If there is strong evidence of heteroskedasticity and/or autocorrelation (usually at least autocorrelation in financial time series) then WLS/GLS should give you better results than OLS.

BTW - Great blog. I'm just recently getting into computational finance and I'm enjoying your blog along with all the comments. It's been very helpful.

RM said...

Hi all,

One more comment on the use of linear regression. Anonymous asked if he should leave out the intercept term in his model
price = a + b*t which would give us a different model of
price = b*t
which would force the regression line to go through the origin. For purposes of interpretation, we would not want to have the intercept term since exclusion of the intercept would imply a price of 0 at t=0. However, for forecasting purposes it really does not matter. Given that it's hardly much extra work to run both models, I would suggest trying both and comparing the models using cross validated root mean squared error .


Also, my previous post which mentioned why we would want to use generalized or weighted least squres, after seeing the specific regression equation I felt compelled to add my two cents. To get the best results out of our linear regression of
price = a + b*t, I would suggest the following method.

1. Test our variables for non-stationarity

2. Use a stationary transformation (differencing) on any non-stationary variables

3. Try lagging the variables in the regression equation

4. Using the time series plots and autocorrelation function plots etc., we can estimate the number of lags and order of differencing we should use. We'll get a regression equation that looks something like

price(t) = a + price(t-1) + b1*((t-1)-(t-2)) + b2*(price(t-1)-price(t-2)) for a single order of differencing and a one period lag. After we're happy with our stationary transformations we can run ols then check if we still have heteroskedasticity and/or autocorrelation then go from there, using gls/wls as needed.

Ernie Chan said...

Hi RM,
Thank you for your detailed comment and insights!

I am not sure what you are referring to when you said "choose the weights according to time periods". What we did is to give more weight to more recent data, which is generally deemed to be more relevant to the current market condition. Is that bad?

Ernie

RM said...

Hi Ernie,

I apologize I should have asked you for details before commenting on anything. If you are interested in forecasting and wish to extrapolate the model several periods forward out of sample then indeed the use of wls with more heavily weighted recent data is perfectly fine. On the other hand, if the goal were prediction - whether your data, a test subset of your data, validation on another dataset, etc - then using the weights in that matter would be arbitrary and definitely not advisable.

Although I think we could conjure up some cases where if the weights were too heavily weighted towards recent data, then we would run into some problems. If I have time this weekend I might look at some time series price data and give a few examples on all these procedures and problems. I think another commenter here alluded to this fact that the weights we choose for wls are a nuisance parameter we might want to somehow average out or avoid altogether.

In fact, if we follow the correct procedure for testing non-stationarity, using the acf plots, etc., then we get a pretty good estimate at the order of differencing and number of lagged variables we should use in our regression equation. These estimates of the order of differencing and number of lags are just indicating how much "memory" our variables contain. If we know how much memory is in fact contained in our variables, then we know what we should include in our regression equation which will be the transformed version of the data
that gives us only white noise for errors and not the problematic error structure which breaks classical assumptions that justify the use of ols to begin with, thereby leaving us without a need for wls and its extra nuisance parameters - the weights.

Too long; didn't read version -- If we correctly transform the data by differencing and using lagged variables, we get a stationary series with only white noise as errors and should therefore use ols and ignore wls

hammy said...

when we find hedge ratio using ordinary least square method and then apply agumented dicky fuller test on the spread(including hedge ratio). is it possible that ADf test says its not cointegrated. if yes. why is so???

Ernie Chan said...

hammy,
Just because one uses OLS to find hedge ratio doesn't guarantee that the resulting time series is stationary. For e.g., the R^2 of the OLS can be close to zero and the fit very poor.
Ernie

Stefan Martinek said...

The backtest of the linear regression strategy applied on the large portfolio is here: http://www.oxfordstrat.com/trading-strategies/linear-regression/

Jacob Seltz said...

Hi guys, Read the book and am now programming a Kalman filter. The issue I'm having, and I'm not sure where it is in my code (I'm coding in Mathematica) is that my Intercept term in Beta is staying very low when it shouldn't be. I'm still getting a grip on the Kalman State updates.

Without going into more detail or providing code (for now) I'm wondering if anyone else experienced the same issue in their implementation.. perhaps they had a matrix operation incorrect.

Note: I'm using a simple Linear Regression model where Beta represents a slope and intercept.

Jacob Seltz said...
This comment has been removed by the author.
Jacob Seltz said...

I believe my State covariance update was not proper. In mathematica I had to make sure to use an Outer Product with K * x[[t]]... which looks like :
Outer[Times,x[[t]],K]

Making the Covariance update:

P = R - Outer[Times,x[[t]],K].R where t is the iterator.

Chicago said...

Instead of trying both products as the dependent variable, try an orthogonal regression (total least squares) approach. It adds value by not assigning regression errors to just one product, but distributes them on an orthogonal basis.

Chicago said...

Instead of trying both products as the dependent variable, try an orthogonal regression (total least squares) approach. It adds value by not assigning regression errors to just one product, but distributes them on an orthogonal basis.