UPDATED 06:15 EST / APRIL 09 2014

Big Data under attack: Can we really trust it?

medium_8450190120Following years of hype about the potential of Big Data, several prominent academics and authors have cast a shadow of doubt upon the concept, questioning just how useful this kind of in-depth data analysis really is. While they don’t dismiss Big Data’s usefulness entirely, the suggestion is we need to rethink exactly what we can and can’t rely on it for.

The kerfuffle all started when Nature revealed how Google Flu Trends had totally missed the mark, massively over-reporting peak flu levels throughout 2013. The journal Science took the investigation further, stating that Google over-estimated the prevalence of flu in 100 out of 108 weeks from August 2011, with some of its estimates being almost double those of Centers for Disease Control.

Too much ‘junk’ data?

 

pile of used carsThe authors took Google to task for its almost total dependence on Big Data, blaming what it called a “Big Data hubris”, defined as “the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.”

The problem is the way Big Data is gathered, with the main challenge being that most Big Data is not the output of instruments designed to produce reliable and valid data for scientific analysis, say the authors. In the case of Google Flu Trends, Google gets its data from things like social media, as well as searches on the internet. Basically, anyone who searches for a “flu remedy” must have the flu, at least according to Google’s thinking, but of course it isn’t really always the case.

Whilst acknowledging that Big Data does have potential, the authors warn that just because we have data in vast quantities, it doesn’t mean “one can ignore foundational issues of measurement”. In addition they warn that popular sources like Facebook and Twitter are vulnerable to manipulation, hence these can’t be trusted entirely.

The authors were soon joined by another Big Data critic, Kaiser Fung, a prominent author and statistician, who labeled Google Flu Trend’s efforts an “epic fail” and added the point that in his belief, “data validity is being consistently overstated.”

Small data distortions = Big Data headaches

 

small__9651644381The Financial Times also had a dig when it published an article “Big data: are we making a big mistake?”, on March 28, stating that in the “absence of theory”, our reliance on correlations is inevitably fragile. Due to Big Data’s size and its messy nature, it often conceals misleading bias, and that can lead analytical tools to drawing wrong conclusions.

“There are a lot of small data problems that occur in big data,” notes David Spiegelhalter. “They don’t disappear because you’ve got lots of stuff. They get worse.”

Most recently, Big Data was the subject of a stinging op-ed that appeared in the New York Times last Sunday. In it, two New York University professors took a number of shots at Big Data, saying that “many tools based on Big Data can be easily gamed”, and that datasets have a tendency to produce correlations that are merely spurious. Another problem is that Big Data analysis tools can create an “echo-chamber effect”, distorting data in much the same was as Google Translate will distort words and sentences when translating from one language to a second, then a third, and then back into the first language.

Nobody’s suggesting that we do away with Big Data and go back to how things were before. Rather, these warnings serve as a reminder that while Big Data can certainly be useful, it’s never a good idea to put all your eggs in one basket – in some instances we may find that there’s no substitute for good old fashioned scientific sampling, research and analysis.

photo credits: infocux Technologies via photopin; cc Horrortaxi via photopin; ccSimon & His Camera via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU