UPDATED 12:29 EDT / SEPTEMBER 11 2015

NEWS

Spark 1.5 puts the pedal to the metal on in-memory analytics

Apache Spark expanded its lead as the prefered candidate for the open-source community’s new flagship analytics engine this week with the release of a landmark update that drastically improves processing speeds for every supported workload type. Much of that increase is due to an overhaul of the underlying operating scheme that has been in the works for several quarters.

Like most of the other leading analytics technologies developed under the umbrella of the Apache Software Foundation, Spark is written mainly in Java, which comes with an abstraction layer that removes the need for the programmer to worry about the nuances of how their code is executed. The project’s backers have given up some of that convenience to squeeze out more performance out of the underlying hardware.

Spark now circumvents the native Java mechanism for managing data in memory to use its own specialized format that saves space and reduces the overhead that the abstraction layer expends on figuring out which bits can be deleted and when after they’re no longer needed. But that still doesn’t fully accommodate every workload, which is why the engine takes over code execution entirely for some of its more advanced components.

Standing out in particular are the data management functions that Spark borrows from the world of relational databases, which are implemented in a dedicated component that allows business analysts to carry out analytics using familiar structure queries. As an added bonus, the new release makes it possible to visualize the execution paths of those queries in order to identify ways to improve response times.

Spark 1.5 also targets a more mathematically-oriented audience with the addition of expanded support for the R statistical modelling language, which is likewise aimed at enabling users to employ syntax they already know. Except instead structured queries, the integration aims to enable the creation of machine learning algorithms like the kind used in recommendation systems and several other popular use cases for the engine.

Another fast-rising application for Spark that often goes hand in hand with machine learning is stream processing, which is also receiving a boost in the form of reliability improvements and a new throttling feature meant to prevent clusters from ingesting more data than they can handle. That’s useful for dealing with sudden input spikes that can potentially compromise the service levels of a deployment if left unchecked.

But as big of an improvement as the update represents, it’s still only the tip of the iceberg of what’s to come now that IBM Corp. has allocated a billion dollars and several thousand engineers to accelerating the development of Spark. One of the first additions in the pipe is a library called SystemML that is derived from Watson and automatically optimizes machine learning algorithms for fast execution.

Photo via AdjencaJA

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Spark 1.5 puts the pedal to the metal on in-memory analytics

Photo via AdjencaJA

A message from John Furrier, co-founder of SiliconANGLE:

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

VMware Explore 2025

Future of Data Platforms Summit 2025

WOW: World of Workato 2025

Supermicro Open Storage Summit 2025

Black Hat USA 2025

Spark 1.5 puts the pedal to the metal on in-memory analytics

Photo via AdjencaJA

A message from John Furrier, co-founder of SiliconANGLE:

LATEST STORIES

LATEST STORIES

VMware Explore 2025

Future of Data Platforms Summit 2025

WOW: World of Workato 2025

Supermicro Open Storage Summit 2025

Black Hat USA 2025

Cookies