UPDATED 09:08 EDT / DECEMBER 02 2014

Researchers cite major flaws in common social media measurement techniques

crowd-blur-stock-smSocial media offers a treasure trove of real-time data about consumer behavior and sentiment across every imaginable segment of society, but that information loses its value without proper context. And that is proving the downfall of all too many analytics projects today, according to a study from two computer scientists recently published in the journal Science.

Jürgen Pfeffer of Carnegie Mellon University and Derek Ruths from McGill University in Montreal make the claim that data from social networks is often analyzed such that the numerous differences distinguishing online communities from the general population are left unaccounted for. One of the most obvious factors that slips under the radar is demographics. The authors cite Pinterest as an example of where sentiment is influenced primarily by young female users whose views often don’t align with that of other audiences.

Similar biases exist on other networks, which leaves it up to the person analyzing the data to correct for the discrepancies among the different groups on a website. Then there’s also the matter of tracking opinion when only a partial view of user sentiment is available. Standing out in particular is Facebook, which doesn’t provide a “Dislike” button to balance Likes, thereby diminishing the weight of negative opinions. The burden of providing context thus once again falls on the shoulders of the analyst handling the data.

Another less obvious-factor is the large presence of spammers and bots on social networks, which is often so sizable that they can distort behavior measurements, undermining the accuracy of studies. The authors of the report attribute the challenges in addressing that and the other inconsistencies to lack of visibility into how publicly available data streams, which are relied on heavily for social media analytics in the academia and beyond, filter information. But some of the blame also rests with practitioners themselves, who Pfeffer and Ruths note often don’t take the necessary measures to ensure the quality of their data.

Specifically, the authors point out how social media research frequently surveys straightforward topics such as political opinion that mask the complexity of the sampling process, which often leads to a scenario where the accuracy of the results becomes widely exaggerated. They suggest incorporating more lessons on mitigating bias from fields such as statistics and machine learning into analytics workflows and set generally higher standards for handling information.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU