UPDATED 07:30 EDT / OCTOBER 02 2014

Open Data’s dirty secret: poor quality and potential for abuse

This week the government released data on more than 4.4 million medical payment records in line with its Open Payments initiative, and it’s already come in for some strong criticism from the media.

It’s a reminder that open data, while free, is usually far from perfect.

The controversial release, which is required under the Affordable Care Act, has been criticized by the likes of Forbes, NPR and The Wall Street Journal as vague and confusing. The records show more than $3.5 billion in payments made to doctors by device companies and pharmaceutical manufacturers, yet there are two main issues with the data.

Firstly, it offers no context. The records don’t show if the payments represent legitimate financial relationships or conflicts of interest, says Fierce Health IT. Second, one-third of the payment records submitted last year have been omitted due to problems with the data that could lead to mistaken identification, according to NPR. Moreover, ProPublica reports that 64 percent of the current data doesn’t specify which hospital or doctor received the money.

“The release could be viewed two ways: as a detailed view of the underbelly of U.S. medicine, or a flawed, sloppy release of partial information that will confuse rather than elevate understanding,” suggests Politico in an opinion piece.

In reality, it’s probably somewhere between the two. There might be quality problems with the data, but ProPublica’s article still reveals some noteworthy insights into public health spending trends.

This also isn’t the first time an open data release has been criticized. When Germany’s open data portal came online, it was quickly slammed by critics who realized they had to specifically request many of the data sets – and were sometimes charged for the privilege. Australia’s effort fared little better, being described as “patchy and transitional” by the country’s own Information Commissioner. Meanwhile, India’s open data is looking somewhat sparse, according to DNA India’s Shyamanuja Das, who moaned it “has just about 115 datasets; what’s worse, all those datasets are from only 11 government department/agencies.”

The incidents highlight some fundamental flaws with open data. Raw data is often incomplete, inaccessible and therefore unusable.

The second flaw is the potential for misuse, as illustrated by the recent release of historical trip and fare data from New York City taxis. A Freedom of Information request made by blogger Chris Whong yielded details of 173 million trips made by the yellow cabs, including the driver’s ID, GPS coordinates of both pick-up and drop-off locations, trip times and passenger numbers.

But it didn’t take long for someone to decipher the driver’s IDs. Then someone else revealed how it was possible to match celebrities’ journeys with their drivers. That led to the discovery that some of those people were picked up outside of the city’s strip joints.

Clearly the potential for misuse applies to almost every data set, no matter what efforts are made to anonymize it. As analyst Alistair Croll pointed out back in 2012, “Big Data is our generation’s civil rights issue, and we don’t know it.”

Maybe we will find out the hard way.

photo credit: European Parliament via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Open Data’s dirty secret: poor quality and potential for abuse

photo credit: European Parliament via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

Open Data’s dirty secret: poor quality and potential for abuse

photo credit: European Parliament via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026