UPDATED 08:30 EST / DECEMBER 18 2017

BIG DATA

Survey shows data quality is now a top challenge for analytics projects

Finding professionals with advanced data processing knowledge isn’t as pronounced a challenge as it once was for enterprises, according to Syncsort Inc.’s fourth annual survey of the analytics landscape.

The provider, which sells software that helps large companies tap the information spread out across their internal systems, polled about 200 technology professionals and executives for the study. The participants ranged from data scientists to senior leaders such as chief information officers. They all work at firms that have adopted or are considering adopting Hadoop and Spark, the two most well-known frameworks for enterprise-scale data analytics.

In Syncsort’s previous three surveys, the biggest challenge highlighted by respondents was a lack of needed technical talent. That changed in 2017, when maintaining data quality took over the top spot after 40 percent of the participants named the task as as a major struggle for their organizations.

There are several potential factors that could explain the shift. One is that enterprises are analyzing data from a growing variety of systems, which makes it trickier to harmonize the records and ensure everything is consistent.

According to Syncsort, 69 percent of the survey participants reported that their companies are pulling data from relational stores into the internal analytics environment. This finding wouldn’t stand out too much under normal circumstances. What makes it notable is that the figure represents a 6 percent increase over last year, a period in which NoSQL databases and cloud-based sources rose noticeably as well.

The trend is reflected in how Hadoop and Spark are being applied. Syncsort found that 70 percent of respondents perceive ETL (extract, transform, load) as among of the most attractive use cases, which is not surprising given how companies are increasingly combining data from disparate systems. For comparison, ETL racked up only 53 percent in the 2016 survey. 

Predictive analytics, meanwhile, emerged as the runner-up this year with 63 percent while stream processing followed in third place. Both use cases require a steady supply of fresh data. However, Syncsort found that 75 percent of respondents have difficulty keeping their analytics environment in sync with upstream information sources.

Yet companies are nonetheless realizing a return on investment. One particularly notable, but usually overlooked, benefit that Syncsort has identified as part of the study relates to capacity planning. It found that bringing data from isolated systems together in one place often enables organizations to gain a better understanding of infrastructure requirements and optimize accordingly.

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU