Sunday, September 22, 2013

Setting up a historical database


One of the first steps in setting up an automated trading platform is testing your candidate strategies. This process called - backtesting. Since backtesting is a whole science by itself I will discuss this topic in my later posts. At the moment I will concentrate on creating a historical data set.

Data source

The parameters I had in mind when searching for historical data feed are:

  1. Cost (preferably free)
  2. Reliability
  3. Ease of access
  4. Daily resolution availability

After some research I have found that Yahoo! and Google both provide services which satisfy most of my conditions. Both of them provide free prices with daily resolution for sufficiently long periods. However eventually I have decided to use Yahoo! instead of Google for 2 reasons:

  1. Yahoo! prices already include Adjusted Close values. Of course it can be done manually as well but I prefer to minimize the amount of error I can introduce into this process.
  2. It seems that there are much resources available on the net on Yahoo! data rather than Google. This is a serious consideration since it results in  more personal experience examples.

Data Reliability

The fact that the data is absolutely free, naturally raises some healthy concerns regarding its reliability. While looking for any information on this topic, I have discovered that Yahoo! uses the data provided by CSIData (officially stated on their site here).
There are few discussions on that topic I have found helpful:

It looks that some people experienced issues with Yahoo! data but it doesn't seem too serious (or irrelevant in my case) and currently I'm not really concerned with that and I think it will be enough for my initial proof of concepts phase. If the need will arise in the future, I will review and switch for a different source.

Getting the data

Clearly, the amount of data to be used for backtesting must be substantial. However Yahoo only enables to download one symbol at a time. Fortunately there is a quite simple way to "hack" Yahoo! and obtain the data programmatically in a csv files form which can be then easily parsed.

There are plenty of sites with very detailed explanations of how it's done and it sums up into modifying this string (YHOO symbol in that example):

 http://ichart.finance.yahoo.com/table.csv?s=YHOO&d=0&e=28&f=2010&g=d&a=3&b=12&c=1996&ignore=.csv  

when:
s = Symbol
a = startMonth-1
b = startDay
c = startYear
d = endMonth-1
e = endDay
f = endYear
g = d/m/y (daily/monthly/yearly)

So you will trigger a file download just by pasting such link in your browser, however I have created a simple downloader application which can retrieve whole sets of historical data. The code is available here as a Google code open-source project. Detailed information regarding usage is available at the project site wiki.

The app is a simple command line interface, with no GUI, but this should be enough. It is written under Linux using Perl. At the moment I have not tested it on Windows but I see no reason why it won't work after installing all required modules.

The application consists of 2 main parts:

  1. Obtaining and updating symbols list per exchange.
  2. Obtaining/updating historical data per stock or per exchange.

Saturday, September 21, 2013

Automated Trading - Why this blog?

It has been a while since I have started to gain some interest in automated/algorithmic trading. Clearly, this topic is widely discussed on the nets and the amount of the related information is overwhelming (which is both, an advantage and disadvantage). Starting from scratch demands a huge effort, hence I have decided to examine the possible options before I invest a significant amount of time in learning and committing to a specific topic. For more than a year I was just trying to absorb and process some of the widely available information and decide where to start.

I've always assumed that no one (who actually knows what he's doing) will share his knowledge and that's why I am very skeptical about all "I will show you how to get rich by trading X" style books and that's why I'm trying not follow their "recommendations" directly. However I am also not 100% sure that absolutely nothing can be learned from these books if picking the information carefully, a bit here and a bit there. Saying that, I do believe that a fair amount useful knowledge can be extracted by cross-validating the information and by conducting own experiments.

As a result I have decided to give it a try and attempt to find the truth my way, making all the errors and learning all the lessons on my own skin.
However to make a good progress and to be productive, I must commit. That that's why I have started this blog. Hopefully people, who find themselves lost in a world of automated trading will be interested in my work here and will read my posts, comment, advise and propose their own view on things.

In this blog I will include absolutely everything I do. From setting up the historical database, finding and choosing backtesting platform to choosing and testing actual strategies. I will raise questions and try to provide an educated answers. I will write only about things I have personally done. Nothing will be taken for granted.


To summarize, these are the few decisions I have made regarding my work:
  • I will keep things simple. As I have already mentioned, the amount of information (and private opinions) available on the nets is enormous. Processing this information and making educated conclusions based on it might be the biggest challenge in this task. Saying that I mean:
    • I will not attempt to learn  any complex financial math to optimize my strategies. People devote many years of their lives to study this topic and it doesn't seem feasible to gain any practical knowledge quickly. 
    • I will not focus too much on prettying up the code I will write/reuse. As long as it works, that's fine with me. Of course if someone will want to take a part and make things better, it will always be welcome.
  • Everything will be tested and checked. As I have already mentioned it before, I will not take any assumptions. I will test and check every theory and practice. I will also use open source software to ensure the transparency of its functionality.
  • I will try to avoid any paid software and services as long as possible.
  • At the moment I am going to restrict my work to a daily strategies as the highest resolution. This means that I am going to use EOD (end-of-day) data only to make my decisions. This decision was made due to a few reasons, mainly because such strategies seem to be easier to follow and execute at the first stages. On contrary, HFT (high-frequency-trading) demand a very specific set of knowledge and tools which I do not possess at the moment. It also requires another and usually much more expensive set of data.
  • I will assume that people do have some basic computer skills. Well.. since the whole topic is very technology-centered anyway, it would be hard (if not impossible) to avoid that. If I'm going to write some code, I will publish it as open-source and link it to this blog.
  • I will make changes to already published posts. Since many of my posts will include steps and/or listings, I will update them instead of creating new post. So if you were referencing to any of my posts, you might want to check it once in a while to see if has some new info. Moreover, if I will find out that any of the information I have presented is incorrect, inaccurate or incomplete - I will update it in place.

And of course:
I AM A PRIVATE PERSON, NOT A FINANCIAL ADVISER. ALL INFORMATION IN THIS BLOG IS MY OWN UNDERSTANDING ONLY! IF YOU DECIDE TO MAKE USAGE OF THIS INFORMATION, IT IS YOUR OWN RISK.