Cloudera aims to change the way data is engineered
Developing an accurate data science model is a challenging process on its own. Scaling the model from a development environment to a production cluster presents another set of operational challenges that Cloudera Inc. aims to address with two new product offerings: Data Science Workbench and Altus.
Mark Grover (pictured, left), software engineer at Cloudera Inc., explained some of the operational challenges in data science. “There is this dichotomy, as a data scientist. I want to have the latest and greatest tools, the latest version of Python, the latest notebook kernel. … However, on the other side of this the dichotomy, the [information technology] world wants to make sure all tools are compliant and data is secure,” he said.
Grover and colleague Jennifer Wu (pictured, right), director of cloud management at Cloudera, spoke with David Goad (@davidgoad) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE Media’s mobile livestreaming studio, during Spark Summit in San Francisco, California. They discussed Cloudera’s new product offerings. (* Disclosure below.)
A seamless production experience
The disconnect between a typical data scientist’s working environment and a production cluster is exactly what Data Science Workbench aims to alleviate.
“Data Science Workbench runs on the same cluster that is being managed by Cloudera Manager … it allows you to move your development machine learning algorithms from your Data Science Workbench to production much easier because it’s all running on the same hardware and system,” Grover explained.
Altus, a new platform for consuming data science services, publicly launched just two weeks ago. Wu explained how Altus is changing the data science developer experience.
“It is a platform as a service offering designed to leverage the agility and scale of cloud, and make a very easy-to-use experience to expose Cloudera capacity for data engineering type of workloads,” she said. “They’ll be able to do things like [extract, transform and load], large-scale data processing, productionized machine learning workflows in the cloud.”
This focus of the product has been improving the end user experience for data scientists.
“We wanted to abstract away the cloud and cluster operations and make the end user experience very easy. So jobs and work loads are first-class objects; you can do things like submit jobs, clone jobs, troubleshoot jobs. We wanted to make this very easy for the data engineering end user,” Wu concluded.
Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of Spark Summit 2017. (* Disclosure: DataBricks Inc. sponsored this Spark Summit 2017 segment on SiliconANGLE Media’s theCUBE. Neither DataBricks nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)
Since you’re here …
Show your support for our mission with our one-click subscription to our YouTube channel (below). The more subscribers we have, the more YouTube will suggest relevant enterprise and emerging technology content to you. Thanks!
Support our mission: >>>>>> SUBSCRIBE NOW >>>>>> to our YouTube channel.
… We’d also like to tell you about our mission and how you can help us fulfill it. SiliconANGLE Media Inc.’s business model is based on the intrinsic value of the content, not advertising. Unlike many online publications, we don’t have a paywall or run banner advertising, because we want to keep our journalism open, without influence or the need to chase traffic.The journalism, reporting and commentary on SiliconANGLE — along with live, unscripted video from our Silicon Valley studio and globe-trotting video teams at theCUBE — take a lot of hard work, time and money. Keeping the quality high requires the support of sponsors who are aligned with our vision of ad-free journalism content.