UPDATED 14:35 EDT / SEPTEMBER 27 2016

NEWS

How not to drown in your data lake | #BigDataNYC

One of Hadoop’s most well-known concepts is the data lake, a storage repository that holds a large amount of raw data until it is needed. The challenge for companies becomes how to best access and analyze that data? If you’re not careful, your data lake can become a data swamp, where your data is underutilized or mismanaged.

Chuck Yarbrough, senior director of Solutions Marketing and Management at Pentaho, A Hitachi Group Company, joined Dave Vellante (@dvellante) and George Gilbert (@ggilbert41), cohosts of theCUBE, from the SiliconANGLE Media team, during BigDataNYC 2016, held at the Mercantile Annex in New York, NY. Yarbrough talked about Pentaho’s new tools for helping Hadoop users safely access and navigate the waters of their data lakes.

New release for Pentaho

Vellante kicked off the discussion by asking Yarbrough about what’s new at Pentaho.

“We just announced a big data enhancement release … [it] includes a whole bunch of things added into the platform to make Hadoop easier … additional SQL/Spark enhancement [to enable] that data pipline,” said Yarbrough. The unique metadata injection capability to onboard multiple data types faster allows data engineers to dynamically generate PDI transformation at runtime.

As Hadoop can be a challenge around security, Pentaho is expanding its Hadoop data security integration to promote better Big Data governance, protecting clusters from intruders. These include enhanced Kerberos (network authentication protocol) integration for secure multi-user authentication and Apache Sentry integration to enforce rules that control access to specific Hadoop data assets.

Hydrating and managing the data lake

Gilbert brought up the point where, for many clients, they look at their pipelines and see what data is being consumed and being brought into (or “hydrating”) the data lake. They then want to analyze and operationalized that data; how can they do that?

“Getting data into the data lake; that’s easy. … What people are asking for is, “Help me reduce the insanity, get a handle on what we do.’ [For example,] in the financial services space, [a client] had a problem where they needed to onboard data into the lake, quickly and efficiently. … You would have to create a data transformation for every file. … With metadata, it will apply to the transformation, and land the data in, in exactly the form you need in Hadoop,” explained Yarbrough.

Further regarding managing the data lake, he stated: “My team looks at how customers use our products, and how our products fit into the entire ecosystem. … Then we look for what’s repeatable, and what we can deliver as a solution, quicker, faster, cheaper than client building it themselves,” said Yarbrough.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE and theCUBE’s coverage of BigDataNYC 2016.

Photo by SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.