NEWS
NEWS
NEWS
Pentaho Corp. is throwing information-addled big data teams a lifeline with a blueprint that helps them create a stable and repeatable process for ingesting big data into Hadoop data lakes.
“Filling the Data Lake” is a framework and process for untangling the web of incompatible data that many big data projects must contend with. Ventana Research Inc. has estimated that organizations deploying Hadoop projects spend 46 percent of their time preparing data for analysis or reviewing the quality and consistency of data, rather than actually using it.
“It’s very easy to get raw data into Hadoop,” said Chuck Yarbrough, senior director of solutions marketing at Pentaho “The problem is when you have lots of data sets where not all files are the same.” For example, financial institutions often load thousands of CSV files that contain similar data but are formatted with different columns and metadata.
A Forrester Research Inc. Consulting report commissioned by Pentaho found that more than half of firms using Hadoop blend together 50 or more distinct data sources to enable analytics capabilities, and about one-third blend 100 or more data sources.
“When you dump data into Hadoop you don’t get a nice clean data lake; you get a data swamp,” Yarbrought said.
Pentaho says that by following its blueprint, organizations can reduce dependence on hard-coded data ingestion procedures, manage a changing array of data sources, establish repeatable processes at scale and maintain control and governance along the way.
Pentaho has created four other blueprints related to optimizing big data projects. Visitors must fill out a registration form in order to receive the information.
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.