UPDATED 18:55 EDT / MARCH 15 2017

BIG DATA

Spark ML: getting closer to the edge to improve latency

Going mainstream in the data-driven enterprise is Apache Spark, the open-source analytics engine. As prominent industries move to the Internet of Things markets and machine learning technologies to capitalize on data, Spark ML (which provides a uniform set of high-level application program interfaces that help users create and tune practical machine learning pipelines) offers companies the ability to build real-time streaming solutions that provide fast, advanced analytics to gain insights that drive business.

“We are going to be focused on how to use structured streaming for machine learning. I think that is really interesting, because stream learning is something that people want to do but aren’t yet doing in production. So it’s always fun to talk to people before they’ve built their systems,” said Holden Karau (pictured), principal software engineer at IBM Corp.

Karau, who is a “Spark Committer” and noted authority on the platform, met with Jeff Frick (@JeffFrick) and George Gilbert (@ggilbert41), co-hosts of theCUBE, SiliconANGLE Media’s mobile live streaming studio, during the BigData SV event in San Jose CA. (*Disclosure below.)

Machine learning: What is happening at the edge?

IoT and machine learning are consuming the technology industry. Apache Spark-structured streaming is making an impact in this technology. Karau noted, however, that certain aspects of Spark are not meant to be pushed out to the edge.

“Structured streaming for today, latency wise, is probably not something I would use [for IoT and real-time streaming]. It’s in the sub-second range, which is nice, but it’s not what you want for live surveying of decisions — like for your car. It’s just not going to be feasible,” Karau said.

She maintained that there is the potential to become faster and spoke about a renewed interest in Apache MLlib local, a scalable machine learning library that has the capacity to take models trained in Spark and push them out to the edge and apply the models to edge devices.

“I think for these IoT devices, it makes a lot more sense to do the predictions on the device itself,” Karau said.

Explaining that the models are only megabytes in size and do not need a cluster to do predictions on the models, using the cluster to train the models and pushing the prediction out to the edge node is a reasonable use case for Karau. Instead of using Spark to push the model, she recommends trying other tools.

“Spark is not very well suited to large amounts of internet traffic, but it is well-suited to the training. With MLlib local, it will be able to provide both sides, and the copy part is left to whoever is doing the work,” Karau advised.

The reason for moving the models to the edge is to improve latency. The question that many people are asking is: Will there be a different programming model at the edge?

“I don’t think the answer is finished yet, but I think the work is being done to make it look the same. … Spark has done a really good job of making things look very similar on single node cases to multi-node cases, and I think we can bring the same things to machine learning,” she said.

At IBM, open-source work on Spark is underway to simplify and improve programming languages that interoperate with the platform. Karau pointed out that Java is easy to use with Spark, but the aim of the project is to provide more comfortable experiences to increase adoption.

Predicting that the tools of the future will resemble the tools we have today, but with more options, Karau estimated that the experience will become more simplified.

“The main thing that we are lacking right now is good documentation — and of good books and good resources for people to figure out how to use these tools,” she said.

Watch the complete video interview below, and be sure to check out more of SiliconANGLE’s and theCUBE’s coverage of BigData SV 2017. (*Disclosure: Some segments on SiliconANGLE Media’s theCUBE are sponsored. Sponsors have no editorial control over content on theCUBE or SiliconANGLE.)

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Spark ML: getting closer to the edge to improve latency

Machine learning: What is happening at the edge?

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

Snowflake Summit 2026

Freshworks Refresh 2026

IBM Think 2026

Dell Technologies World 2026

KB4-CON 2026

Spark ML: getting closer to the edge to improve latency

Machine learning: What is happening at the edge?

Photo: SiliconANGLE

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

Snowflake Summit 2026

Freshworks Refresh 2026

IBM Think 2026

Dell Technologies World 2026

KB4-CON 2026

Cookies