UPDATED 08:00 EDT / JUNE 15 2015

NEWS

Databricks updates Spark with support for R and Python 3

Databricks has announced a major new update to the popular data analytics cluster framework Apache Spark, adding support for the R statistical programming language in an effort to make life easier for data scientists.

As well as support for Python 3, Apache Spark 1.4 allows R users to work directly on large datasets via the SparkR R API. With over two million users worldwide, R is one of the most popular programming languages that’s specifically designed for predictive analytics and statistical computing.

“Because SparkR uses Spark’s parallel engine underneath, operations take advantage of multiple cores or multiple machines, and can scale to data sizes much larger than standalone R programs,” Patrick Wendell, a software engineer at Databricks, wrote in a blog post.

SparkR is an R package that was first developed at the AMPLab at UC Berkeley. It was designed to provide a frontend for R to Apache Spark. By utilizing Spark’s distributed computation engine, users can now run large data analysis workloads straight from the R shell, added Wendell.

Besides R, Spark 1.4 also comes with improvements like the addition of new capabilities to the DataFrame API, including windows functionalities in Spark SQL and in the DataFrame library that enable users to compute statistics over window ranges.

“In addition, we have also implemented many new features for DataFrames, including enriched support for statistics and mathematical functions – random data generation, descriptive statistics and correlations, and contingency tables – as well as functionalities for working with missing data,” Wendell continued.

“To make DataFrame operations execute quickly, this release also ships the initial pieces of Project Tungsten, a broad performance initiative which will be a central theme in Spark’s upcoming 1.5 release. Spark 1.4 adds improvements to serializer memory use and options to enable fast binary aggregations.”

Wendell revealed that the machine-learning pipelines API that was first introduced in Spark 1.2 and allows users to run complex workflows involving multiple steps, is now stable and production ready. According to Wendell, the new release means the Python API has attained parity with the Java and Scala interfaces. Besides this, the pipelines add a range of new feature transformers like OneHotEncoder, RegexTokenizer, and VectorAssembler, plus new algorithms such as tree models and linear models.

Spark 1.4 also adds visual debugging and monitoring utilities that are designed to help users understand how apps are running in Spark. There’s a new application timeline viewer, for example, that shows the completion of stages and tasks inside a running app. There’s also a new tool that provides visual representation of the underlying computation graph tied directly to the metrics of physical execution. The same feature also allows users to track the latency and throughput of data streams.

Image credit: ClkerFreeVectorImages via Pixabay.com

A message from John Furrier, co-founder of SiliconANGLE:

Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.

15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

https://siliconangle.com/aws-marketplace/

About SiliconANGLE Media

SiliconANGLE Media is a recognized leader in digital media innovation, uniting breakthrough technology, strategic insights and real-time audience engagement. As the parent company of SiliconANGLE, theCUBE Network, theCUBE Research, CUBE365, theCUBE AI and theCUBE SuperStudios — with flagship locations in Silicon Valley and the New York Stock Exchange — SiliconANGLE Media operates at the intersection of media, technology and AI.

Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.

Databricks updates Spark with support for R and Python 3

Image credit: ClkerFreeVectorImages via Pixabay.com

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST FROM THECUBE

UPCOMING CUBE EVENTS

RECENT CUBE EVENTS

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026

Databricks updates Spark with support for R and Python 3

Image credit: ClkerFreeVectorImages via Pixabay.com

A message from John Furrier, co-founder of SiliconANGLE:

Are you AWS customer? Support SiliconANGLE Financially by buying your AWS services from our Marketplace portal page and links.

LATEST STORIES

LATEST STORIES

RAISE Summit 2026

Pure Accelerate 2026

FinOps X 2026

Snowflake Summit 2026

Freshworks Refresh 2026