UPDATED 00:57 EST / JUNE 07 2016

NEWS

Microsoft taps into Apache Spark to drive its Big Data & analytics services

Microsoft is making what it claims is an “extensive commitment” to the Apache Spark Big Data processing engine, launching several new offerings out of preview and into general release.

The move is the latest in Microsoft’s embrace of open-source technologies, a trend that only emerged in the last couple of years after CEO Satya Nadella took over the company. Microsoft made its announcements at the Spark Summit 2016 developer’s conference in San Francisco yesterday.

Microsoft launched Spark for Azure HDInsight in preview last year, promising it would be the “best environment to run Apache Spark” with a managed service in the cloud. Now, Spark for Azure HDInsight has been made generally available, together with a fully managed Spark service from Hortonworks Inc. that’s been “hardened for the enterprise and made simpler for you to use,” the company said.

In addition, Microsoft announced the general availability of R Server for HDInsight later this summer. Currently available in preview mode, R Server for HDInsight include Spark integration for both the on-premises and cloud versions. The “R” refers to the R programming language that’s used for statistical computing and predictive analytics. Microsoft is one of the leading proponents of R following its acquisition of Revolution Analytics in 2015.

Microsoft previously said it’s planning to integrate the commercial R distribution into SQL Server 2016 via SQL Server R Services.

Microsoft also announced that R Server for Hadoop on-premises installations will be made generally available in June. R Server for Hadoop on-premises will support both Microsoft R and native Spark execution frameworks, the company said.

“Combining R Server with Spark gives users the ability to run R functions over thousands of Spark nodes letting you train your models on data 1000x larger and 100x faster than was possible with open source R and nearly 2x faster than Spark’s own MLLib,” according to Microsoft’s blog post.

Last but not least, Microsoft said PowerBI now supports Spark Streaming. The previously announced Spark support in PowerBI has been expanded with support for Spark Streaming scenarios, the company said.

While many consider that Apache Spark is a competitor to Hadoop, Microsoft’s announcements indicate that the company believes the two platforms are more complimentary than competitive. However, despite its embrace of Spark, that hasn’t stopped Microsoft Research from pursuing a new, potentially competing project called Prajna, which is an open-source distributed analytics platform designed for building cloud services that make use of Big Data analytics.

Image credit: Patrick Foto ;) via Flickr.com

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU