UPDATED 12:32 EDT / MARCH 24 2014

Google Flu Trends: A case of Big Data gone bad?

medium_3471986083Well, it seemed like a good idea at the time, but sadly Google Flu Trends has turned out to be a prime example of what can go wrong when you read too much into your Big Data…

When Google Flu Trends first kicked off back in 2009, the search giant thought it was being rather crafty and had hit on a foolproof way to track outbreaks of influenza around the world. The idea was simple enough – if you get sick, you’ll likely search on Google to find out things, such as how to treat it. Google decided it wanted to track these searches, and use the data to try and predict flu epidemics even before medical authorities like the Centers for Disease Control (CDC) can do.

But Google Flu Trends has proven to be anything but accurate, and in fact is often cited for predictions of flu cases that are way off the mark. In 2010, a year after it was born, researchers at the University of Washington stated that the site was around 25 percent less accurate than the CDC. More recently, in 2013, an article in Nature said that Google Flu Trends was overestimating influenza cases by around 50 percent.

So what went wrong with Google Flu Trends? Apparently, Google relied too much on simple searches, and hit what the authors of a new report in Science call a “Big Data hubris”:

“Big data hubris” is the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.”

Google thought that with its unparalleled, swarming mass of data, it could outsmart everyone, but in fact it was guilty of making one of the most stupid assumptions of all time. One of the main problems is that a lot of people probably don’t even know what “the flu” is. The vast majority of doctors’ office visits for flu-like symptoms turn out to be other viruses. CDC tracks these visits under “influenza-like illness” because so many turn out to be something else. To illustrate, the CDC reports that in the most recent week for which data is available, only 8.8% of specimens tested positive for influenza.

Google Flu Trends ILI Estimates Compared to CDC Estimates

Google Flu Trends ILI Estimates Compared to CDC Estimates

Image courtesy of UMV.edu

 

So what Google Flu Trends has is just an enormous database of misinformation – and a huge pile of drivel which will only produce more drivel.

At the start of “The Parable of Google Flu: Traps in Big Data Analysis”, the study references Nature.com’s report last year, highlighting how Google Flu Trends predicted more than double the number of doctor visits for ILI. The study goes on to claim:

“Although not widely reported until 2013, the new Google Flu Trends has been persistently overestimating flu prevalence for a much longer time.”

This is far from what Google’s data scientists set out to do. When Flu Trends was first announced in a Nature article in 2009, they insisted that “…we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day.”

But within just a few months of announcing the project, it made its first glaring error, missing the 2009 Swine Flu pandemic that was caused by a new strain of H1N1 flu. Following this, the failures continued, with study authors David Lazer, Ryan Kennedy, Gary King and Alessandro Vespignani showing that Google Flu has inaccurately reported flu cases in 100 of 108 weeks since August 2001.

Hardly surprising really when between 80-90 percent of people who think they have flu don’t actually have it – it’s not exactly a statistic that should give us confidence in their internet searches.

Google Flu might continue to insist it’s doing a great job, but if the authors of today’s study are to be believed, you’ll be much better off sticking with the CDC’s judgment instead.

So here’s a lesson learned. Big Data can be good, but only if you remember to separate the ‘good’ data from the ‘bad’ data.

photo credit: Eneas via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

Join Our Community 

Click here to join the free and open Startup Showcase event.

“TheCUBE is part of re:Invent, you know, you guys really are a part of the event and we really appreciate your coming here and I know people appreciate the content you create as well” – Andy Jassy

We really want to hear from you, and we’re looking forward to seeing you at the event and in theCUBE Club.

Click here to join the free and open Startup Showcase event.