Pentaho offers blueprint to keep data lakes from becoming swamps
Pentaho Corp. is throwing information-addled big data teams a lifeline with a blueprint that helps them create a stable and repeatable process for ingesting big data into Hadoop data lakes.
“Filling the Data Lake” is a framework and process for untangling the web of incompatible data that many big data projects must contend with. Ventana Research Inc. has estimated that organizations deploying Hadoop projects spend 46 percent of their time preparing data for analysis or reviewing the quality and consistency of data, rather than actually using it.
“It’s very easy to get raw data into Hadoop,” said Chuck Yarbrough, senior director of solutions marketing at Pentaho “The problem is when you have lots of data sets where not all files are the same.” For example, financial institutions often load thousands of CSV files that contain similar data but are formatted with different columns and metadata.
A Forrester Research Inc. Consulting report commissioned by Pentaho found that more than half of firms using Hadoop blend together 50 or more distinct data sources to enable analytics capabilities, and about one-third blend 100 or more data sources.
“When you dump data into Hadoop you don’t get a nice clean data lake; you get a data swamp,” Yarbrought said.
Pentaho says that by following its blueprint, organizations can reduce dependence on hard-coded data ingestion procedures, manage a changing array of data sources, establish repeatable processes at scale and maintain control and governance along the way.
Pentaho has created four other blueprints related to optimizing big data projects. Visitors must fill out a registration form in order to receive the information.
Photo by Ed Dunens via Flickr CC
A message from John Furrier, co-founder of SiliconANGLE:
Your vote of support is important to us and it helps us keep the content FREE.
One click below supports our mission to provide free, deep, and relevant content.
Join our community on YouTube
Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.
THANK YOU