UPDATED 07:43 EDT / AUGUST 07 2014

Gartner drains ‘data lakes’ concept in new report

small__5230066654A new report from Gartner, Inc. calls into question the concept of “Data Lakes,” or large repositories of unstructured data from a range of sources than can be used for analytics.

The study, called The Data Lake Fallacy: All Water and Little Substance, notes that while many vendors have signed on to the data lake concept, few companies agree on a definition of what data lakes are or the value they provide.

Data lakes are marketed as enterprisewide data management platforms for analyzing disparate sources of data in their native formats, wrote Gartner’s Nick Heudecker. “This eliminates the up-front costs of data ingestion, like transformation. Once data is placed into the lake, it’s available for analysis by everyone in the organization.”

But co-author Andrew White pointed out that while data lakes might benefit certain parts of an organization, no one has yet realized the value proposition of enterprisewide data management.

The analysts write that data lakes help to solve two key problems; They eliminate data silos and they address the problem of how to analyze data stored in different formats. But data lakes aren’t without risks, including the lack of an underlying mechanism to maintain them, and the absence of metadata. These problems can eventually lead to what Gartner terms a “data swamp” where it becomes impossible to carry out any kind of accurate analysis.

The authors said are other risks with data lakes too, including access control and security considerations. Data may also be restricted by regulatory or privacy requirements, and just dumping it into a lake could lead to legal exposure.

Gartner instead advises enterprises to focus on “semantic consistency and performance in upstream applications and data stores”, rather than trying to consolidate all of their data in a lake.

In an interview with Application Development Trends, Jack Noriss of MapR Technologies, Inc., said data lake adoption was being driven by the cost, efficiency and agility of Hadoop.

“Gartner is rightly pointing out that not all Big Data and Hadoop solutions provide the performance, security and data protection capabilities that customers need,” Norris said.

Nevertheless, Gartner’s analysts don’t write off data lakes altogether. “The question your organization has to address is this: Do we encourage one-off, independent analysis of information in silos or a data lake, bringing said data together, or do we formalize to a degree that effort, and try to sustain the value-generating skills we develop?” White said.

Data lakes are likely to appeal if an organization that prefers the first scenario, but those that want to consolidate information should move beyond data lakes to focus on building a more robust data warehouse.

photo credit: http://heatherbuckley.co.uk via photopin cc

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.