How can I go about applying machine learning algorithms to stock markets?
I am not very sure, if this question fits in here.
I have recently begun, reading and learning about machine learning. Can someone throw some light onto how to go about it or rather can anyone share their experience and few basic pointers about how to go about it or atleast start applying it to see some results from data sets? How ambitious does this sound?
Also, do mention about standard algorithms that should be tried or looked at while doing this.
A good starting point is this blog: http://epchan.blogspot.com/ The author has also written a very good book on the subject: http://books.google.de/books?id=HPKCPQAACAAJ&dq=quantitative+trading&hl=en&ei=6lRITZ_lEc6eOpej5IAF&sa=X&oi=book_result&ct=result&resnum=1&ved=0CDgQ6AEwAA
I honestly don't think that this question fits here. See http://meta.quant.stackexchange.com/questions/11/are-help-me-develop-this-strategy-questions-in-scope.
@Shane: Actually it does (ok, it depends on the definition of machine learning) but e.g. in chapter 7 there are a lot of references to machine learning, data mining and pca etc...
...and it puts it in perspective, e.g. trading costs, risk management etc. - very good esp. for beginners because data analysis is of course only part of the story.
It makes reference, but there is nothing concrete (so far as I can recall). Chan's analysis is almost all basic time series modelling (e.g. cointegration). I agree with your latter point: it does put everything together so it's useful as a general beginner reference.
Before you start, check this out first: http://www.priceactionlab.com/Blog/2012/06/fooled-by-randomness-through-selection-bias/
Shane, great answer below but I also think this is a great question here since I'm sure every quant here at sometime pondered about this at some time. Unlike the 'develop strategy' link, this is more generic and widely helpful (judging from the votes too).
@Downvoter: Is there a reason for the downvote? How can I improve the answer? Thank you
There seems to be a basic fallacy that someone can come along and learn some machine learning or AI algorithms, set them up as a black box, hit go, and sit back while they retire.
My advice to you:
Learn statistics and machine learning first, then worry about how to apply them to a given problem. There is no free lunch here. Data analysis is hard work. Read "The Elements of Statistical Learning" (the pdf is available for free on the website), and don't start trying to build a model until you understand at least the first 8 chapters.
Once you understand the statistics and machine learning, then you need to learn how to backtest and build a trading model, accounting for transaction costs, etc. which is a whole other area.
After you have a handle on both the analysis and the finance, then it will be somewhat obvious how to apply it. The entire point of these algorithms is trying to find a way to fit a model to data and produce low bias and variance in prediction (i.e. that the training and test prediction error will be low and similar). Here is an example of a trading system using a support vector machine in R, but just keep in mind that you will be doing yourself a huge disservice if you don't spend the time to understand the basics before trying to apply something esoteric.
Just to add an entertaining update: I recently came across this master's thesis: "A Novel Algorithmic Trading Framework Applying Evolution and Machine Learning for Portfolio Optimization" (2012). It's an extensive review of different machine learning approaches compared against buy-and-hold. After almost 200 pages, they reach the basic conclusion: "No trading system was able to outperform the benchmark when using transaction costs." Needless to say, this does not mean that it can't be done (I haven't spent any time reviewing their methods to see the validity of the approach), but it certainly provides some more evidence in favor of the no-free lunch theorem.
As a shameless plug, I recently started a guided tour of the above book on my blog if you want to follow along (http://www.statalgo.com/2011/01/29/esl-the-elements-of-statistical-learning/). I will be reproducing the major analysis from the book using R.
thanks for the advice. To be frank, in some way, I was someone who was trying to do what you mentioned at the start!
@zubinmehta Thanks for admitting it. :) I guessed as much from your question. If that was possible, there would be a lot of rich people out there doing it. But it's much more of a black hole than you would hope. And once you understand how to do the analysis, applying it in a specific domain (e.g. finance) follows naturally.
I want to know why there's such a vast sea of machine learning people working at prop firms on LinkedIn if it doesn't work? Isn't this good evidence that it does work persistently in *some* markets at *some* frequencies?
@Jase As one of the authors of the mentioned master's thesis I can quote my own work and say: "If anyone actually achieves profitable results there is no incentive to share them, as it would negate their advantage." Although our results might lend support to the market hypothesis it doesn't preclude the existence of systems that work. It might be like probability theory: "It is speculated that breakthroughs in the field of probability theory has happened several times, but never shared. This [could be] due to its practical application in gambling." Then again, maybe this is all modern alchemy.
@Andre - I agree that a system with astounding results would probably not be made public. You also can't just start using ML. I am at 80% predictive accuracy using time-recurrent neural networks, and now incorporating lagged intermarket data that's been reduced non-linearly. I use a lot of stability analysis, swarm intelligence methods as well. Takes more than a decade to fully code this, fetch stock data automatically, etc., when using .NET (not an interpretive language like R, Matlab, etc.)
I appreciate very much this answer. But once you developed an algorithm, how can you apply it to real stock markets in "real time"? I mean is there a forex app which you can apply your algorithm to? Maybe even a sandbox or something?
@user5626 Neural nets are linear regressions. Multilayer neural nets are polynomial regressions.
As someone who works in the space, anyone claiming over 80% accuracy is just spewing garbage..
@Shane I'm a computer programmer. Could you recommend a source where I can learn more about stocks, investing strategies and to learn what features should be used in ML?