UPDATED 15:05 EST / JULY 20 2017

BIG DATA

Cloudera aims to change the way data is engineered

Developing an accurate data science model is a challenging process on its own. Scaling the model from a development environment to a production cluster presents another set of operational challenges that Cloudera Inc. aims to address with two new product offerings: Data Science Workbench and Altus.

Mark Grover (pictured, left), software engineer at Cloudera Inc., explained some of the operational challenges in data science. “There is this dichotomy, as a data scientist. I want to have the latest and greatest tools, the latest version of Python, the latest notebook kernel. … However, on the other side of this the dichotomy, the [information technology] world wants to make sure all tools are compliant and data is secure,” he said.

Grover and colleague Jennifer Wu (pictured, right), director of cloud management at Cloudera, spoke with David Goad (@davidgoad) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during Spark Summit in San Francisco, California. They discussed Cloudera’s new product offerings. (* Disclosure below.)

A seamless production experience

The disconnect between a typical data scientist’s working environment and a production cluster is exactly what Data Science Workbench aims to alleviate.

“Data Science Workbench runs on the same cluster that is being managed by Cloudera Manager … it allows you to move your development machine learning algorithms from your Data Science Workbench to production much easier because it’s all running on the same hardware and system,” Grover explained.

Altus, a new platform for consuming data science services, publicly launched just two weeks ago. Wu explained how Altus is changing the data science developer experience.

“It is a platform as a service offering designed to leverage the agility and scale of cloud, and make a very easy-to-use experience to expose Cloudera capacity for data engineering type of workloads,” she said. “They’ll be able to do things like [extract, transform and load], large-scale data processing, productionized machine learning workflows in the cloud.”

This focus of the product has been improving the end user experience for data scientists.

“We wanted to abstract away the cloud and cluster operations and make the end user experience very easy. So jobs and work loads are first-class objects; you can do things like submit jobs, clone jobs, troubleshoot jobs. We wanted to make this very easy for the data engineering end user,” Wu concluded.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of Spark Summit 2017(* Disclosure: DataBricks Inc. sponsored this Spark Summit 2017 segment on SiliconANGLE Media’s theCUBE. Neither DataBricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU