Monday, May 11, 2015

Equity Ranking Backtest with Python/Pandas

I have been look at equities a bit of late, I am particularly interested in ranking a universe of equities for “low frequency” manual trading on a weekly or monthly basis.

Every period I would rank each name on a bunch of different factors, then invest in the highest ranked ones for that month.

I was initially working in R but the code grew unwieldy, and I wanted a second opinion on my approach so took the time to re implement it in python using Pandas.

Setup


For each symbol in our universe, we load the raw data and generate the information used for ranking. If we have 5 names, we end up with 5 dataframes.

Then we combine those dataframes into one big dataframe, and iterate through month by month, selecting the symbols that meet our ranking criteria. From those selected, we equally weight and sum the next period returns.

One thing that is really cool about the pandas dataframe is that it allows multiple rows with the same index.

This makes it easy to get the data for the month under consideration. We just pass the month to index function and get the subset of data for that month, e.g.

>>> df.ix['2015-02']
                 cpr       npr       avg   over  sym
Date                                               
2015-02-28  0.043302 -0.062449 -0.038914  False  DBC
2015-02-28 -0.025028  0.008524  0.006130   True  IEF
2015-02-28  0.056838 -0.014239  0.005434   True  VEU
2015-02-28 -0.037434  0.017171  0.015900   True  VNQ
2015-02-28  0.055832 -0.011697  0.009236   True  VTI

[5 rows x 5 columns]
>>> 

In this example there are 5 symbols, and we see the ranking information for February 2015.

Another option would be to use hierarchical indexing, with a sub-index for each month, but this way worked for my needs and I think is quite clean and simple.

If anyone knows an equivalent in R that is as clean and easy to work with for multiple time series I would love to hear about it. 

Code Notes


The demo code does a simple back test of the GTAA/Relative Strength trend following system using ETFs.  

I have stripped it down to the basics so hopefully it is easy to understand. Load the data, generate the dataframe with the info we want, make a combined data frame, then go through month by month.

The ranking is done by filtering out names under their 10 month moving average, then selecting the top n based on average 3 month return.

The “cpr” column is the current period return, and the “npr” column is the next period return, which is the return realized if we select a given security for that month.

The data is just ETF data from Yahoo, which I have put up here. Code is here.

I found Python For Data Analysis a very useful book is when working with pandas.

Tuesday, March 24, 2015

Simulation and relative performance

There’s been some nice posts on randomness the last week or so, in particular here and here

I would like to look at how we can use simulations to get a better understanding of how some aspect of a trading system holds up relative to a bunch of random trades.

In this example, I look at entries on weekly data for SPY. The entry signal is to buy if the previous week closed down.

Over the time frame (2005-2014, about 10 years), it was long about 44% of the time, and out the rest.

In the simulation function, we generate random entry signals that will see us long about the same amount of time.

We track some metrics of system performance, in this case total return, average trade return and accuracy (i.e. how often a buy signal was correct).

I then use ggplot to make some density plots of the simulation metrics, marking the mean of the simulation results in red and the corresponding system metric in blue.

It looks like this


I basically want to see the blue line far away from the red line. In this case it seems fairly decent. You can also generate some p-values based off the simulation data as well.

For comparison, here is a daily system that is long if the previous close was above the 200 day simple moving average.


We can see there’s not a lot of difference between the moving average results and just entering randomly. (Note the accuracy metric has a different x-axis scale than the previous plot).

I use a similar idea for putting risk or open trade management ideas through their paces, seeing how well they hold up when managing random entries.

Code is up here. Thanks for reading