UPDATED 07:00 EDT / JULY 25 2014

Big, bad data: What’s the big deal over data quality?

Big Data is rapidly expanding, and the nature of this data is more complex and varied than ever before. The concept brings with it a new world of insights and possibilities for organizations, but it also poses many challenges, most especially when it comes to the quality and management of data.

Every single byte of data has potential value if an organization can collect, analyze and use that data to gain insights, but care needs to be taken to manage and exploit this data. The term “bad data” might sound a little bit melodramatic, but the fact remains that if your data is corrupted or incorrect, your business is going to suffer because of it.

After all, even the most powerful Big Data analytics tools are only as useful as the data they crunch, and many times intelligence that’s built on bad data can be worse than not having made any analysis at all. Nevertheless, many companies willingly make important decisions, or formulate objectives based on data they know is flawed or incomplete.

The big cost of bad data

It’s hard to put a finger on the cost of bad data, but recent research from Experian Data Quality shows that inaccurate data affects the bottom line of some 88 percent of organzations, causing them to lose as much as 12 percent of their revenues. This losss of revenue is calculated according to wasted resources, wasted marketing spend, and wasted staff time.

But in actual fact, the true cost of bad data might be even higher than this. According to Experian, 28 percent of companies surveyed said their bad data resulted in problems delivering email, causing their customer service to suffer. Another 21 percent said they had experienced reputational damage as a result of bad data.

Image source: Experian Data Quality

“Bad data as a result of outdated or incomplete data, or duplicate, incorrect entries impacts the ability to make quality decisions for the business, as well as provide enhanced and personalized customer service and experience,” said Ben Parker, Principal Technologist at Guavus, in an interview. “The business cost of bad data may be as high as 10-25% of an organizations revenue,” he adds, citing this report.

Where does bad data come from?

According to Dennis Moore, the senior vice president and general manager of Informatica’s MDM business, the main problem lies in the fragmentation of business-critical data that occurs as organizations grow.

“There is no big picture because it’s scattered across applications, including on premise applications (such as SAP, Oracle and PeopleSoft) and cloud applications (such as Salesforce, Marketo, and Workday),” writes Moore on the Informatica blog. “But it gets worse. Business-critical data changes all the time…. Business-critical data becomes inconsistent, and no one knows which application has the most up-to-date information. This costs companies money. It saps productivity and forces people to do a lot of manual work outside their best-in-class processes and world-class applications.”

Fixing bad data

If your organization relies on Big Data, the bad data is certainly going to cost you. So the question is, what can businesses do to avoid making key decisions based on inaccurate facts, and therefore avoid exposing themselves to increased risks?

Dr. Mark Whitehorn, a business intelligence consulatant and senior lecturer at the School of Computing at the University of Dundee, writes that there are three approaches to solving the bad data conundrum:

“1. Fix the existing data in the transactional systems.
2. Improve the applications so no more bad data can be entered.
3. Tolerate bad data in the source systems and fix it when we perform the extract, transform and load (ETL) process to move it into the data warehouse.”

All well and good, but each one has its limitations. Fixing your existing data doesn’t do anything to stop more bad data arriving in the system later, and improving your applications to eliminate bad data is, in Dr. Whitehorn’s words, “often impossible”. As for the third option, this is probably the most effective way to remedy the situation, typically done using some kind of tool with “good built-in mechanisms for cleansing data”.

Another method is to use a relatively new technique called “streaming analytics” to ‘clean’ the bad data as it arrives in the system.

“To prevent too much big data from accumulating, organizations can leverage streaming analytics,” says Parker. “Streaming analytics collects and reduces the large array of real-time data in motion from multiple sources into a consistent set of usable data—the right and clean data on which to perform analysis.”

At the end of the day, data quality is the key enabler when it comes to getting actionable insights from Big Data. Bad data can lead to big problems, creating a mess of conflicting information that leads to poor decision making, disappointing results and other negative consequences.

For organizations that wish to leverage Big Data, the onus is on them to ensure that their data quality initiatives are up to speed.

photo credits: byrev via Pixabay.com; CJ Isherwood via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Big, bad data: What’s the big deal over data quality?

The big cost of bad data

Image source: Experian Data Quality

Where does bad data come from?

Fixing bad data

photo credits: byrev via Pixabay.com; CJ Isherwood via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Big, bad data: What’s the big deal over data quality?

The big cost of bad data

Image source: Experian Data Quality

Where does bad data come from?

Fixing bad data

photo credits: byrev via Pixabay.com; CJ Isherwood via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

KubeCon + CloudNativeCon EU 2026

RSAC 2026 Conference

Nvidia GTC 2026

Google Cloud AI Agents in Action Series 2025/2026

MWC Barcelona 2026

Cookies