Sunday, September 22, 2013

Setting up a historical database


One of the first steps in setting up an automated trading platform is testing your candidate strategies. This process called - backtesting. Since backtesting is a whole science by itself I will discuss this topic in my later posts. At the moment I will concentrate on creating a historical data set.

Data source

The parameters I had in mind when searching for historical data feed are:

  1. Cost (preferably free)
  2. Reliability
  3. Ease of access
  4. Daily resolution availability

After some research I have found that Yahoo! and Google both provide services which satisfy most of my conditions. Both of them provide free prices with daily resolution for sufficiently long periods. However eventually I have decided to use Yahoo! instead of Google for 2 reasons:

  1. Yahoo! prices already include Adjusted Close values. Of course it can be done manually as well but I prefer to minimize the amount of error I can introduce into this process.
  2. It seems that there are much resources available on the net on Yahoo! data rather than Google. This is a serious consideration since it results in  more personal experience examples.

Data Reliability

The fact that the data is absolutely free, naturally raises some healthy concerns regarding its reliability. While looking for any information on this topic, I have discovered that Yahoo! uses the data provided by CSIData (officially stated on their site here).
There are few discussions on that topic I have found helpful:

It looks that some people experienced issues with Yahoo! data but it doesn't seem too serious (or irrelevant in my case) and currently I'm not really concerned with that and I think it will be enough for my initial proof of concepts phase. If the need will arise in the future, I will review and switch for a different source.

Getting the data

Clearly, the amount of data to be used for backtesting must be substantial. However Yahoo only enables to download one symbol at a time. Fortunately there is a quite simple way to "hack" Yahoo! and obtain the data programmatically in a csv files form which can be then easily parsed.

There are plenty of sites with very detailed explanations of how it's done and it sums up into modifying this string (YHOO symbol in that example):

 http://ichart.finance.yahoo.com/table.csv?s=YHOO&d=0&e=28&f=2010&g=d&a=3&b=12&c=1996&ignore=.csv  

when:
s = Symbol
a = startMonth-1
b = startDay
c = startYear
d = endMonth-1
e = endDay
f = endYear
g = d/m/y (daily/monthly/yearly)

So you will trigger a file download just by pasting such link in your browser, however I have created a simple downloader application which can retrieve whole sets of historical data. The code is available here as a Google code open-source project. Detailed information regarding usage is available at the project site wiki.

The app is a simple command line interface, with no GUI, but this should be enough. It is written under Linux using Perl. At the moment I have not tested it on Windows but I see no reason why it won't work after installing all required modules.

The application consists of 2 main parts:

  1. Obtaining and updating symbols list per exchange.
  2. Obtaining/updating historical data per stock or per exchange.

No comments:

Post a Comment