UPDATED 08:30 EDT / DECEMBER 18 2017

BIG DATA

Survey shows data quality is now a top challenge for analytics projects

Finding professionals with advanced data processing knowledge isn’t as pronounced a challenge as it once was for enterprises, according to Syncsort Inc.’s fourth annual survey of the analytics landscape.

The provider, which sells software that helps large companies tap the information spread out across their internal systems, polled about 200 technology professionals and executives for the study. The participants ranged from data scientists to senior leaders such as chief information officers. They all work at firms that have adopted or are considering adopting Hadoop and Spark, the two most well-known frameworks for enterprise-scale data analytics.

In Syncsort’s previous three surveys, the biggest challenge highlighted by respondents was a lack of needed technical talent. That changed in 2017, when maintaining data quality took over the top spot after 40 percent of the participants named the task as as a major struggle for their organizations.

There are several potential factors that could explain the shift. One is that enterprises are analyzing data from a growing variety of systems, which makes it trickier to harmonize the records and ensure everything is consistent.

According to Syncsort, 69 percent of the survey participants reported that their companies are pulling data from relational stores into the internal analytics environment. This finding wouldn’t stand out too much under normal circumstances. What makes it notable is that the figure represents a 6 percent increase over last year, a period in which NoSQL databases and cloud-based sources rose noticeably as well.

The trend is reflected in how Hadoop and Spark are being applied. Syncsort found that 70 percent of respondents perceive ETL (extract, transform, load) as among of the most attractive use cases, which is not surprising given how companies are increasingly combining data from disparate systems. For comparison, ETL racked up only 53 percent in the 2016 survey. 

Predictive analytics, meanwhile, emerged as the runner-up this year with 63 percent while stream processing followed in third place. Both use cases require a steady supply of fresh data. However, Syncsort found that 75 percent of respondents have difficulty keeping their analytics environment in sync with upstream information sources.

Yet companies are nonetheless realizing a return on investment. One particularly notable, but usually overlooked, benefit that Syncsort has identified as part of the study relates to capacity planning. It found that bringing data from isolated systems together in one place often enables organizations to gain a better understanding of infrastructure requirements and optimize accordingly.

Image: Unsplash

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.