UPDATED 12:00 EST / JULY 01 2016

NEWS

Pentaho offers blueprint to keep data lakes from becoming swamps

Pentaho Corp. is throwing information-addled big data teams a lifeline with a blueprint that helps them create a stable and repeatable process for ingesting big data into Hadoop data lakes.

Filling the Data Lake” is a framework and process for untangling the web of incompatible data that many big data projects must contend with. Ventana Research Inc. has estimated that organizations deploying Hadoop projects spend 46 percent of their time preparing data for analysis or reviewing the quality and consistency of data, rather than actually using it.

“It’s very easy to get raw data into Hadoop,” said Chuck Yarbrough, senior director of solutions marketing at Pentaho “The problem is when you have lots of data sets where not all files are the same.” For example, financial institutions often load thousands of CSV files that contain similar data but are formatted with different columns and metadata.

Forrester Research Inc. Consulting report commissioned by Pentaho found that more than half of firms using Hadoop blend together 50 or more distinct data sources to enable analytics capabilities, and about one-third blend 100 or more data sources.

“When you dump data into Hadoop you don’t get a nice clean data lake; you get a data swamp,” Yarbrought said.

Pentaho says that by following its blueprint, organizations can reduce dependence on hard-coded data ingestion procedures, manage a changing array of data sources, establish repeatable processes at scale and maintain control and governance along the way.

Pentaho has created four other blueprints related to optimizing big data projects. Visitors must fill out a registration form in order to receive the information.

Photo by Ed Dunens via Flickr CC

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

  • 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
  • 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.