When data is misleading

When data is misleading

This is part 1 of an examination of the various ways we can predict the future, their strengths and their weaknesses. Part 2 is available here, which focuses on incentive structures and limiting factors. Part 3 is available here, which focuses on my critique of Hegelian Dialectics, and The conclusion in part 4 is available here.

The turkey problem

Nassim Taleb talks about a very basic but utterly fundamental problem in his book The Black Swan, which he calls the turkey problem. A turkey on a farm is fed every day by a farmer, and taken care of any problem it might have. If this turkey were to apply the usual data-driven methods that we are told are "rational" and robust, it would think after each day that passes that its safety is more and more guaranteed, because it's collecting more and more data which confirms that hypothesis.
Of course we all know that the turkey is only fed because ultimately, the farmer wants to kill the turkey for its meat, which eventually happens, an event which, from the point of view of the data-collecting turkey, would be totally surprising, but from our point of view is expected.

Now, someone might argue that the correct way to do science is to try to disprove the hypotheses you come up with, which is true in an ideal world, and might work in the case of the turkey problem, but about the messy, confusing world of human beings that we live in? This philosophy works well when you have all the time and funding in the world to consider tons of hypotheses, but what about someone trying to make sense of the economy we live in, to figure out what they should spend their time and money in?
The reality is that we have limited time and as such can only explore so many theories, and more importantly, desire predictive power from our theories. While having a pile of discarded theories is useful for academia as a whole, because it knows a bunch of pathways which aren't worth pursuing, as an individual who wants to know what to do with your finite time, reaching the conclusion that a certain theory is wrong isn't all that useful.

The other problem being that of course, in our messy complex world, things aren't black and white. Nutrition for instance is a field riddled with competing theories, for the simple reason that human bodies are incredibly different from one another. At least in that case, you have the possibility to experiment with a bunch of different diets without too much risk, and see for yourself which ones work and which ones don't.
But what about investing advice? It often takes many years for those ideas to be verified, a feedback which can come from a global financial crisis, and because the economy and the factors that drive it are so complex, it's difficult to even tell whether an investment worked because the ideas are genuinely solid, or simply because the world just happened to favor your ideas in the period you tried them in, for some reason.

This is why I consider "data-driven" methods to form one category in this piece, with the general underlying idea that the future will roughly behave like the past, even if the parameters we use might change.
Obviously all methods to make predictions about the future are driven by things which happened in the past to some degree, which means that in some sense everything is "data-driven", but I specifically mean the methods which use some form of scientific or rationalist aesthetics, such as the models used in quantitative finance, or the basic idea of bayesian updating.

Data in an echo-chamber

There is something unsettling about data if you think about what it's supposed to achieve. Data is supposed to be our anchor to Reality, we are supposed to understand more and more of Reality and ourselves the more we collect. And yet what happens in practice? There are two main failure modes from what I can tell: echo chambers on one hand, because evidently collecting more data in one doesn't make you more in tune with Reality, but rather makes you more confident in the assumptions of those around you, and schizophrenia on the other hand, which I will discuss in Part 3.

Echo chambers exist because fundamentally, the self-informed self can only see its own projections of Reality, meaning it will surround itself with other people who also view Reality in a roughly similar way, and more importantly, will not challenge its blindness and selfishness.
Add to that the modern phenomenon by which the information we are presented with changes based on our own preferences, all in order to make us spend more and more time on the same platform, and it's not too difficult to see that echo chambers have become the norm, not the exception.

From what I can tell, it was always the case ever since we left the scale of hunter gatherer tribes that information was in some way distorted, because it was never neutral, but the times we live in have brought this dynamic to an absurd degree, because there are so many complex systems interacting with one another, in totally opaque ways from us, and which have to prioritize their own survival over anything else.
Which is to say that academia, despite its stated mission to be an institution focused on bringing us closer to Truth, is primarily concerned with maintaining itself, a selfish drive which is then echoed by the people working in it, who are then incentivized to do whatever they can to further their own career, instead of doing the work of clean epistemology for other people.

We can add whatever methods we want to examine Reality, but if the incentive structures point towards selfishness rather than wholeness, then the result will always be shoddy. Phenomenons such as p-hacking, cherry-picking, or HARKing—hypothesizing after the results are known—are all statistical mistakes, but they are rooted in a selfish drive which is beyond statistics and science.
What this means is that the problems of academia, and deeper than that, collective epistemology, are deeper than what any method can solve. It's not a problem of method or even intelligence, it's a problem of incentive structures in an atomized world.

This is why those who believe that a top-down information ecology led by scientists and other experts would somehow resolve our inability to understand anything which is going on in our times are sorely mistaken. Insanity can manifest as the refusal to engage with anything factual, as conspiracy theorists show us, but it can also manifest as someone who can only look at one fact after another, isolated from everything else. The emergence of conspiracy theories is what I will examine in part 2, and the schizoid self, the one whose worldview is informed by a kaleidoscope of shatterd bits, in part 3.


Links and tags

Go back to the list of blog posts

Prediction     Statistics     Echochamber     History

2025-12-22