Strange Big Data: Can We Really Predict Everything?

Big Data prediction

Anyone who knows anything about Big Data understands that, at the end of the day, it all boils down to guesswork. The reason we accumulate so much data, and store it, and try to glean insights from it, is so we can guess what happens next. So far, much of the emphasis has been on how Big Data can help big business to make more money, but experts are just beginning to explore the much wider impact that this knowledge could have on the world…

Predicting Outbreaks of Disease

One of the most tantalizing possibilities that has emerged recently is the ability to predict future events, such as outbreaks of disease, by studying Big Data on previous events that have occurred in the past.

In order to test this theory, researchers from Microsoft Research and the Technion-Israel Institute of Technology collaborated to develop software that crunches data from the New York Time’s archives, as well as Wikipedia, and other sources. The experts focused on predicting outbreaks of disease, riots and significant numbers of deaths, and found that they could successfully predict such events with an accuracy rate of between 70% and 90%.

See the entire Strange Big Data Series on Pinterest and Springpad!


In one example, the research reveals how experts were able to spot a relationship between the occurrence of droughts in Africa and Asia, and consequent outbreaks of cholera. For example, in 1973 the NYT reported news of a drought in Bangladesh, followed by a report in 1974 of a cholera epidemic that hit the region. This was replicated exactly ten years later, when the NYT reported another drought in 1983, followed by a second cholera outbreak in 1984.

Eric Horvitz, of Microsoft Research, and Kira Radinsky, a PhD student at Technion-Israel Institute of Technology, said that the research shows that “alerts about a downstream risk of cholera could have been issued nearly a year in advance”, which would have given authorities 12 months to prepare for the outbreak of disease.

Horvitz and Radinsky point out that research has been done in this field before, but in the past it has always been retrospective, studying the events leading to disease epidemics, rather than trying to predict when the next one may happen.

The two researcher says that their software can be used to verify the chances of future predictions where outbreaks of disease may be suspected:

“It can be valuable to identify situations where there is a significantly lower likelihood of an event than expected by experts based on the large set of observations and feeds being considered in an automated manner.”

Predicting Legal Cases

Lawyers are routinely asked “can I win this case?”, and “what’s it going to cost me?”, by their clients, and up until now all they’ve had to go on is their gut instinct when answering such questions. But it might not be long before legal professionals will be able to answer these questions with a much higher degree of accuracy.

A recent article in explores how legal scholars, computer science engineers, and commercial companies are increasingly turning to Big Data, building up databases of legal history and developing algorithms that help them to predict case outcomes.

TyMetrix, a subsidiary of Wolters Kluwer Corporate Legal Services, is just one of the companies attempting to capitalize on the emerging ‘legal Big Data’ niche, developing a mobile app that serves up the average legal costs of various law firms in the US, according to the type of case. The idea is to help clients manage their legal costs, but it goes further than that, allowing users to analyze different variables of their case and play ‘what-if’ scenarios and predict how the outcome of their case would be affected if they decided to spend less on legal costs.

Estimating costs isn’t quite the same as predicting outcomes of court cases, but that too could become possible in the not-too-distant future. Fantasy SCOTUS was originally developed just for fun – a web-based fantasy league in which participants try to predict the outcomes of Supreme Court decisions. But since its launch in 2009, the site has accumulated a vast databases of crowd-sourced analysis and opinions of Supreme Court cases, to the point where the site’s founder Josh Blackman says he could be onto something much bigger.

In an academic paper published last year, Blackman suggests combining this data with that of publicly available court records, then developing an algorithm and decision engine to make predictions of regular court cases:

“It would be quite conceivable for a bot to crawl through all of the filings in Pacer . . . and develop a comprehensive database of all aspects of how each court works,” writes Blackman.

Predicting the Entire World?

As if second guessing the justice system isn’t tricky enough, try this one on for size. In what has to be the single-most ambitious Big Data project ever conceived, a group of preeminent scientists is attempting to build what it calls a “knowledge collider” that will basically simulate the entire world in an effort to predict societal fluctuations such as economic bubbles, social unrest, disease epidemics and more.

The Living Earth Simulator is the brainchild of FuturICT, an organization comprising some of the world’s most famous computer science centers and high-power computing installations. The aim of the project is to try and correlate massive quantities of Big Data from what it calls a Planetary Nervous System (PNS), comprising of real-time news and social media sources, as well as other, extant information from various installations in the world – jumbling all of this into the biggest melting pot of Big Data ever seen. Using this data, scientists believe that they’ll be able to identify the hidden laws and tacit agreements that govern our society, allowing us to generate useful knowledge that can be applied to issues such as energy, communications, economics, crime, corruption, health, migration and crisis management, amongst other things.

Naturally, the Living Earth Simulator is going to be massive undertaking. Scientists envisage supercomputers from all over the world collaborating on the project, sharing CPU time and correlating the data into a new, open-source Global Participatory Platform which anyone can draw on for insights. While fundraising efforts are still ongoing, the project has backing from a number of major tech companies, including Microsoft Research, Yahoo Research, and IBM, not to mention the European Union which has donated $1 billion towards the endeavor.

As an epic as this all sounds, the Living Earth Simulator does highlight one of the most fundamental problems facing humanity today.  We might have developed a vast understanding of the physical universe around us, but still struggle to understand ourselves and consequently, creating a truly just and fair society has always been beyond our grasp. Maybe, just maybe, the Living Earth Simulator will help us to understand the sociology and psychology behind human nature, and with that knowledge we can build a much better world for our future generations.

Learn more about the project at FuturICT