UPDATED 16:02 EST / JUNE 14 2017

BIG DATA

Hortonworks finally takes data stream processing to heart

After years of presentations that focused on how to analyze, enhance and even expand views of data as it landed in the cluster, Hortonworks Inc. finally admitted that it conveniently ignored how to actually build a process that streamed in the data itself.

With the company’s announcement this week of the Streaming Analytics Manager as part of HortonWorks Data Flow 3.0, it took a major step toward giving business analysts the ability to create streaming applications without having to write a single line of code.

The new streaming data tool was demonstrated during today’s keynote at DataWorks Summit in San Jose, California, in a presentation by Joseph Witt, senior director of engineering for Hortonworks, and George Job Vetticaden, vice president of Hortonworks product management and emerging products.

“Before today, we just hand-waved at how to do stream processing,” Witt said.

The company’s SAM has changed that dynamic. In response to concerns that the process for building streaming analytics needed to become easier, Hortonworks has introduced a tool that uses a simple drag-and-drop interface to build an application in real time.

“We’ve shielded a lot of hairy details away from the developer. It’s not just easier, but quite fun,” Vetticaden said.

SAM includes a schema registry that lets applications interact with each other across streaming engines like Apache NiFi, which automates the flow of data between systems, and Apache Storm, an open-source distributed real-time computation system. In the DataWorks Summit keynote this morning, the two Hortonworks executives built a sample application that visualized data streams for a fleet of trucks, while predicting which vehicles and drivers would exceed the speed limit on a particular route.

“These are predictive analytics that work without writing any code,” Vetticaden said.

Yahoo uses Hive at massive scale

The keynote session also offered a look at how the various Apache Hadoop-based tools are being used to address critical needs in the enterprise. (Apache Hadoop is an open-source-based software used for storing, processing and analyzing big data.)

Sumeet Singh, senior director for cloud and big data platforms at Yahoo Inc., described how the company is relying on Apache Hive — a data warehouse software project built on top of Hadoop — to process half a billion records for each database query.

“Apache Hive is one of the predominant technologies that we’ve been shaping,” Singh said.

Singh said that Yahoo has introduced GPU and high-memory servers to facilitate the integration of machine learning into its operation. The company has also been running Caffe, a deep learning framework, and TensorFlowonSpark, which brings TensorFlow programs onto Apache Spark clusters, over the past two years.

“Open source is big for us,” Singh added.

The presentations from the Yahoo and Hortonworks executives underscored the growing influence of data science in the enterprise, as companies look for simplicity and a return on their information technology investment. This is leading to more focus on how to frame the big data conversation and what tools, like Hortonworks’ SAM, make the most sense.

“You don’t monetize the data,” said Bill Schmarzo, chief technology officer for the big data practice at Dell EMC, Dell Technology Inc.’s infrastructure group. “You’re going to monetize the insights that come from the data.”

Schmarzo, who spoke at the DataWorks keynote session this morning, teaches a class in Silicon Valley on how to get business people to think like data scientists. “It’s not about technology; it’s about business models,” he said.

Schmarzo challenged the gathering to better understand the economic value of data and create business models with analytics to deliver real results to the bottom line. Business executives live by the “four M’s,” which are “make me more money,” he said.

Watch the complete keynote video below, and be sure to check out more of SiliconANGLE’s and theCUBE’s independent editorial coverage of DataWorks Summit US 2017.

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU