UPDATED 07:00 EDT / JULY 25 2014

Big, bad data: What’s the big deal over data quality?

Big Data is rapidly expanding, and the nature of this data is more complex and varied than ever before. The concept brings with it a new world of insights and possibilities for organizations, but it also poses many challenges, most especially when it comes to the quality and management of data.

Every single byte of data has potential value if an organization can collect, analyze and use that data to gain insights, but care needs to be taken to manage and exploit this data. The term “bad data” might sound a little bit melodramatic, but the fact remains that if your data is corrupted or incorrect, your business is going to suffer because of it.

After all, even the most powerful Big Data analytics tools are only as useful as the data they crunch, and many times intelligence that’s built on bad data can be worse than not having made any analysis at all. Nevertheless, many companies willingly make important decisions, or formulate objectives based on data they know is flawed or incomplete.

The big cost of bad data

It’s hard to put a finger on the cost of bad data, but recent research from Experian Data Quality shows that inaccurate data affects the bottom line of some 88 percent of organzations, causing them to lose as much as 12 percent of their revenues. This losss of revenue is calculated according to wasted resources, wasted marketing spend, and wasted staff time.

But in actual fact, the true cost of bad data might be even higher than this. According to Experian, 28 percent of companies surveyed said their bad data resulted in problems delivering email, causing their customer service to suffer. Another 21 percent said they had experienced reputational damage as a result of bad data.

Image source: Experian Data Quality

“Bad data as a result of outdated or incomplete data, or duplicate, incorrect entries impacts the ability to make quality decisions for the business, as well as provide enhanced and personalized customer service and experience,” said Ben Parker, Principal Technologist at Guavus, in an interview. “The business cost of bad data may be as high as 10-25% of an organizations revenue,” he adds, citing this report.

Where does bad data come from?

According to Dennis Moore, the senior vice president and general manager of Informatica’s MDM business, the main problem lies in the fragmentation of business-critical data that occurs as organizations grow.

“There is no big picture because it’s scattered across applications, including on premise applications (such as SAP, Oracle and PeopleSoft) and cloud applications (such as Salesforce, Marketo, and Workday),” writes Moore on the Informatica blog. “But it gets worse. Business-critical data changes all the time…. Business-critical data becomes inconsistent, and no one knows which application has the most up-to-date information. This costs companies money. It saps productivity and forces people to do a lot of manual work outside their best-in-class processes and world-class applications.”

Fixing bad data

If your organization relies on Big Data, the bad data is certainly going to cost you. So the question is, what can businesses do to avoid making key decisions based on inaccurate facts, and therefore avoid exposing themselves to increased risks?

Dr. Mark Whitehorn, a business intelligence consulatant and senior lecturer at the School of Computing at the University of Dundee, writes that there are three approaches to solving the bad data conundrum:

“1. Fix the existing data in the transactional systems.
2. Improve the applications so no more bad data can be entered.
3. Tolerate bad data in the source systems and fix it when we perform the extract, transform and load (ETL) process to move it into the data warehouse.”

All well and good, but each one has its limitations. Fixing your existing data doesn’t do anything to stop more bad data arriving in the system later, and improving your applications to eliminate bad data is, in Dr. Whitehorn’s words, “often impossible”. As for the third option, this is probably the most effective way to remedy the situation, typically done using some kind of tool with “good built-in mechanisms for cleansing data”.

Another method is to use a relatively new technique called “streaming analytics” to ‘clean’ the bad data as it arrives in the system.

“To prevent too much big data from accumulating, organizations can leverage streaming analytics,” says Parker. “Streaming analytics collects and reduces the large array of real-time data in motion from multiple sources into a consistent set of usable data—the right and clean data on which to perform analysis.”

At the end of the day, data quality is the key enabler when it comes to getting actionable insights from Big Data. Bad data can lead to big problems, creating a mess of conflicting information that leads to poor decision making, disappointing results and other negative consequences.

For organizations that wish to leverage Big Data, the onus is on them to ensure that their data quality initiatives are up to speed.

photo credits: byrev via Pixabay.com; CJ Isherwood via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Support our open free content by sharing and engaging with our content and community.

Join theCUBE Alumni Trust Network

Where Technology Leaders Connect, Share Intelligence & Create Opportunities

11.4k+

CUBE Alumni Network

C-level and Technical

Domain Experts

15M+

theCUBE

Viewers

Connect with 11,413+ industry leaders from our network of tech and business leaders forming a unique trusted network effect.

SiliconANGLE Media is a recognized leader in digital media innovation serving innovative audiences and brands, bringing together cutting-edge technology, influential content, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — such as those established in Silicon Valley and the New York Stock Exchange (NYSE) — SiliconANGLE Media operates at the intersection of media, technology, and AI. .

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a powerful ecosystem of industry-leading digital media brands, with a reach of 15+ million elite tech professionals. The company’s new, proprietary theCUBE AI Video cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Big, bad data: What’s the big deal over data quality?

The big cost of bad data

Image source: Experian Data Quality

Where does bad data come from?

Fixing bad data

photo credits: byrev via Pixabay.com; CJ Isherwood via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

Understanding Today's Digital Business With Dynatrace

Black Hat USA 2025

World of Workato 2025

VMware Explore 2025

CrowdStrike Fal.Con 2025

RECENT CUBE EVENTS

theCUBE + NYSE Wired: Robotics & AI Infrastructure Leaders 2025

AppDev Done Right Summit 2025

Broadcom Delivers the Modern Private Cloud 2025

Databricks Data + AI Summit 2025

AWS Summit Washington, DC 2025

Big, bad data: What’s the big deal over data quality?

The big cost of bad data

Image source: Experian Data Quality

Where does bad data come from?

Fixing bad data

photo credits: byrev via Pixabay.com; CJ Isherwood via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Join theCUBE Alumni Trust Network

LATEST STORIES

LATEST STORIES

Understanding Today's Digital Business With Dynatrace

Black Hat USA 2025

World of Workato 2025

VMware Explore 2025

CrowdStrike Fal.Con 2025

theCUBE + NYSE Wired: Robotics & AI Infrastructure Leaders 2025

AppDev Done Right Summit 2025

Broadcom Delivers the Modern Private Cloud 2025

Databricks Data + AI Summit 2025

AWS Summit Washington, DC 2025

Cookies