The Problem of Big Data: Exploration Into Its Epistemology

The promise of big data and artificial intelligence to change society is an exciting concept to behold.  The Economist highlighted data as the most valuable commodity in the world on the cover of their magazine, replacing oil for the top spot.  Bigger and bigger sets of data matched with greater computing power is bound to bring new discoveries and change society.  Big data will bring society closer to the utopia we all want.  Or will it?

Before we start envisioning technological utopias created from artificial intelligence and big data an important question must be addressed.  Are we masking age old philosophical problems with a shiny new technological wrapper?  Should society be looking to technologists and scientists for what the future unfolds or the great epistemology philosophers of the enlightenment era? Have we distanced ourselves so far from philosophy that we are now forming society devoid of a knowledge foundation?

I was one of those individuals that fell into the technology trap and truly believed that big data and artificial intelligence would transform everything.  And in many ways it will but it wasn’t until I had the harsh realization that the data was wrong about the future or better stated the data was incomplete.  At that point, I had to take a step back and re-evaluate my worldview.

The windy path I took eventually lead to epistemology.  Once I was there it was obvious that I needed to start from the foundation of knowledge to truly understand what to do with the promises of all these new technologies. Stepping into the shoes of the great philosophers like Pascal, Kant, Descartes, Hume, Mill and Locke is when I started to realize that technology does not change the nature of knowledge nor is it all knowing.  Much of the answers to my questions were debated hundreds of years ago by these great minds.

As I dove deeper I realized that big data and artificial intelligence are nothing more than an extension of empiricism and inductive reasoning.  At its core the problem of big data is the problem of induction.  For centuries we have used empirical evidence and scientific theory to advance society and also fool ourselves into thinking that this is the full representation and path to truth.

The problem of induction has best been described over the centuries by the black swan and still remains true to this day.  This ancient parable has roots back to second century poets yet still expanded on in modern times by derivative traders.

“A rare bird in the lands and very much like a black swan” 2nd-century Roman poet Juvenal (black swan was presumed non-existent at the time)


“No amount of observations of white swans can allow the inference that all swans are white, but the observation of a single black swan is sufficient to refute that conclusion.” John Stuart Mill, British Philosopher


“If the past, by bringing surprises, did not resemble the previous past to it (what I call the past’s past), then why should our future resemble our current past? Nassim Taleb, Derivatives Trader

Many great minds over the years have approached this important problem in epistemology. The deep history of thought on this ancient problem illustrates the importance of epistemology in society regardless of technological advances.

We must shy away from what philosopher CS Lewis described as “chronological arrogance”, a term he used to describe the nature of latter generations falling into the trap of thinking they are more intelligent than former generations due to technological advancements in society.

When it comes to describing the limitations of technology there is no better example than our financial markets and economies.   For decades we have been promised controlled economies and consistent returns through econometrics and comprehensive modeling utilizing machine learning.  However, the results have delivered unpredictable and dramatic cycles and it seems that the greater the models built and the more PhD’s we throw at the models the more severe the outcomes.

A great example of the problem of big data is Long Term Capital Management.  Long Term Capital Management was a hedge fund composed of the brightest minds in economics and finance, from Nobel prize winning economists to world renowned traders. Arbitrage models built to find inefficiencies in securities prices by the brightest minds were still no match to the unpredictable nature of complex markets.  The most brilliant minds could not properly predict risk in a complex system and highly underestimated the downside of their over leveraged positions.  This led to a near global collapse of the financial markets.

The sub-prime mortgage crises ensued just ten years later and caught nearly the entire world off guard.  Uniquely structured derivatives were built to mitigate risk but nearly collapsed the global economy.  The largest banks and governments in the world were caught up in one of the biggest financial collapses in modern day history.  Modern day technologies and models were no match for the exponential risk that can be created in complex systems.

Unfortunately, the canaries in the coal mines are on the fringe of society, they are the rebellious philosophers, the heterodox economists, and free thinking entrepreneurs. They are telling society what they don’t want to hear and shunned until they seem prophetic after the next market crash.

Its at the point of disaster that we start to explain away the risk in hindsight and call for more regulation and new models in a reactionary fashion.  All of which will fall short of making any real difference since this is relying on historical empirical evidence to try to prevent unseen future events. And around the merry go round we go. Taleb labeled this the Ludic Fallacy which is the misuse of games for real-life situations.  Taleb and many philosophers know it is impossible to be in control of all the available data, therefore, rendering models and games useless in complex systems with unknown variables.

This is not to say that we should ignore or fully discount the power of technology and data, we should embrace technology with eyes wide open.  Without the right philosophical approach we are bound to trust the knowledge of new technology which has proven to lead to many unfortunate outcomes.  Our philosophical and technological minds must work together to fully understand the potential and limitations of our technological advances, it is then that we can start to truly embrace what technology and data can and can not do for our society.