Debunking Finance data science myths

6 mins read

An actuary and a computer scientist debunk popular finance data science myths.

When I first started working with models, the phenomenon that fascinated me the most was changes in stock market prices.  This was in an era where great fortunes were made and lost trading financial securities without the benefit of internet boards, chat rooms, encrypted conversations, zero latency networks or instant news. A grapevine of professional and amateur investors would exchange notes about what was driving the trend for a given security, build their own models and trade.  I saw two generations of elders bet the market with their limited capital. One made enough to last three generations. We still talk about him with awe. Everyone else lost their collective wardrobe.

It was perfectly normal behavior for a young twenty year old computer scientist to look at Neural Networks in the late eighties to forecast stock market prices. I had already built models to detect lines, characters and patterns as a final year computer science student. Forecasting stock market price changes would be just another exercise in pattern recognition. Luckily for me, the project I took on in my final year failed because I ran out of time, attention and bandwidth. But for years I wondered if I had tried really hard would the code I wrote actually work. Did I actually have the magic sauce in my hands? Did I walk away from it all?

Twenty five years ago I was young and a zealot when it came to the power of models and computer science to solve problems. I believed. Today when I look back at those days the only sentiment I can find is amusement.

At the intersection of the financial world and the data science world, I see three themes repeat again and again.  Three finance data science myths that just refuse to die a natural death.

  1. Data science being quoted as the savior for Dr. Alexander Elder’s technical analysis models for making money.
  2. Data science and algorithms quoted as the reason behind the money being made with High Frequency Trading (HFT) models.
  3. Risk models and value at risk being quoted as the primary reason for the financial services crash of 2008-2009. The reality though is very different. See Why doesn’t bank regulation work? and Is Capital Adequacy the right measure?

So if you are interested in making money from trading financial securities should you invest in a move towards data science? There are two answers to this question. The short, quick and easy version is no. The long, detailed, analysis driven version is also no.

Disappointed? Please read on.

I have spent my entire professional life working with data. Financial securities and investments have been an interest ever since I was taught to read the daily market closing report in the yellow pages as a teenager.  My specialization as an actuary is investments and risk management. I have built and sold risk models for over a decade and a half. I have taught the subject to both undergraduate and graduate students side by side with trading equities and options. I made money as well as burnt my fingers running a substantial fixed income portfolio (made money), equities (made and lost money) and options (lost money, poof).

A word about data.

Trading desks and investors consume an enormous amount of data.  Yes, there are trends and patterns in the stream. Yes, you can identify and dig them out. Yes, it helps to have the tools and the infrastructure. Yes, I would rather have data and the analysis. But no, just data and data science is not going to make you money. And no, while the premise suggested in Fear Index by Robert Harris sounds plausible, I think we are a few years away from it all.

You make the most money by going against the market, especially when the market is wrong. You make it by a fanatical commitment and dedication to process and discipline. You make it by following frameworks and strategies that you understand in markets that you have followed for a lifetime. You make it by identifying trends before they become trends. You don’t make it by purchasing a platform that lets you analyze data.

Models don’t work. Process and discipline does.

Dr. Alexander Elder’s technical analysis guide is a framework. A charting toolset for traders who have learnt to work with charting signals. It is not news. It is old respected school, a framework that traces roots all the way back to the 17th century. Data science cuts down the mechanics, reduces the overhead but can’t take credit for the science or the voodoo magic behind technical analysis. It is Dr. Elder’s commitment to process and discipline that made it possible for him to do his magic and the reason why a book or software will not be able to do the same for you.

From Buffet to Taleb and Victor Niederhoffer, the message is the same. Figure out the process, focus on the process, don’t ignore the process. Process and discipline trump all other shades. 

But what about neural networks? My first love, you ask?

Financial markets are a complex organism that cannot be effectively modeled by a statistical regression model, which is what neural networks essentially are. We have tried to model the same securities with more advanced calculus and failed. Despite literature, papers, presentations and analysis to the contrary we understand how markets behave under normal operating conditions. We don’t understand what they do when they are stressed or in panic mode. We understand it enough to build models that are plausible and approximately right but when we put a little mileage or pressure on the same models they break. As long as the future follows the past our models stay on track. But the minute market behavior transitions into a new previously unseen trend, our models fall apart.

My credentials are not that impressive but Taleb (Nicholas Nassim) made the same argument much before I caught on to it. Markets and models underestimate the true distribution. Simulations work for operational processes and queuing theory but are bad proxies for financial price forecasting. Approximately right is good enough for production processes, not good enough for large leveraged financial bets. 

We don’t have a good enough model for modeling the future.  There you go that is the issue. We don’t know how to model future behavior for a dynamic organism such as a financial market. We can guess, identify signals, combine drivers, bring in an intelligent analysis, plot price ranges and predict direction. But we can’t guarantee traction.

Let me restate what I have just said above.

We don’t have good models for effectively predicting future stock market prices. We can guess, we can trend, we can suggest a range, but we cannot predict prices. Period. 

The only way to beat the future is to get there first.

But then how do you explain the money made by high frequency trading? I don’t have to. Micheal Lewis did a phenomenal job with Flash boys – the indictment of the HFT industry.   Essentially the same argument. Latency and lag times. Get to markets before investors, buyers and price information does. Beat the participants with speed. Legally front run ahead of buyers by anticipating their larger buy orders. Get to the future before the rest of the world arrives at it.

If you haven’t read Flash boys see the Michael Lewis and IEX Churchill Club meeting footage to catch up on HFT and market challenges.

How do you get to the future before the world arrives at it? You get there by taking a step back and identifying trends before they become trends. Or even better creating and feeding trends before the world discovers them.  Identifying relationships and following them through to the source so that you can understand how a given relationship will hold or behave under a range of possible future scenarios. You have to build the model from the ground up. Do the hard work using relationships. Push them, test and break them. Build your understanding first of the behavior you are trying to model.

Here finally is where data science steps in and is useful.

A tool that can help you identify trends and relationships is enormously useful. If the focus of your analysis is identifying drivers and factors and their future behavior, you are on the right track.  Focus on asking the right questions, identifying the right signals? Not one signal, but a collection based on our understanding of how prices supposedly change and transition over time. Without the right questions, you can’t identify the right signals. Without understanding, you can’t ask the right questions.

What holds true for performance and returns also hold true for risk. If you can’t accurately predict prices, you can’t accurately model price risk.  Once again process and discipline work better than just technology and black boxes.

Remember relationships, not prices. Understanding, not forecasting. Process and discipline, not models. That is the future of data science. Everything else is just snake oil.