UPDATED 12:01 EDT / JULY 28 2014

Big Data needs drive R as a powerful enterprise ready language

r-project-logoAs Big Data continues to reach larger enterprise adoption, the programming languages that support writing schema and producing Big Data analysis algorithms will rush to keep up. As a result, the open source statistical language R has become a go-to skill for Big Data scientists and developers, with its popularity soaring amid languages and skills.

Combined with Big Data tools, the R language provides a deep statistical handle for large data sets, conducting statistical analysis, and rendering data-driven visualization. R is particularly widely used in the industries of finance, pharmaceuticals, media and marketing, where it can be used to help guide data-driven business decisions.

The popularity of R has grown significantly in recent years. A 2013 survey of data mining professionals conducted by Rexer Analytics indicated that the R programming language is by far the most popular statistical analysis tool, with 70% of respondents saying they use it at least occasionally. Developers interested in learning more about R can look into training on the subject to get a better grasp of its use in the Big Data paradigm.

In the enterprise market numerous companies and projects have risen to harness R and bring it to Big Data scientists and business users alike. These projects and tools include the use of R in Microsoft’s cloud computing Azure Machine Learning platform, IBM’s Big R, Teradata Aster R, Oracle R Enterprise, PivotalR’s Big Data R distribution, and SAP’s R for HANA.

Azure Machine Learning is a game changer with R

Microsoft last month announced the launch of its new platform Azure Machine Learning (ML). It is a platform dedicated to cloud predictive analytics on large volumes of data. Azure ML’s cloud service allows scientists and developers to effectively integrate predictive analytics data into their applications.

What is interesting is that Microsoft is providing APIs and templates based on the R language. Azure ML supports more than 300 packages using the R programming language; and allows users to assemble a model suited to their needs built out of existing pieces rather than forcing developers to build something from scratch. The ease of implementation makes machine learning accessible to a larger number of investigators with various backgrounds–even non-data scientists.

Microsoft says the Azure ML platform can predict future trends in systems such as with search engines, online recommendation, ad targeting, virtual assistants, demand forecasting, fraud detection, spam filters and more.

IBM integration with Big R

IBM InfoSphere BigInsights Big R is a library of functions that provides end-to-end integration with the R language and InfoSphere BigInsights. Big R can be used for comprehensive data analysis on the InfoSphere BigInsights server, lowering some of the complexity of manually writing MapReduce jobs.

Big R provides an end-to-end integration of R within IBM InfoSphere BigInsights. This makes it easy to write and execute R programs that operate on big data. Using Big R, an R user can explore, transform, and analyze big data hosted in a BigInsights cluster using familiar R syntax and paradigm.

Teradata Aster R

The rapid adoption of R and its proven value means that organizations looking to drive new revenue-generating insights should make R a part of their predictive analytics strategy. Teradata, the analytic data platforms, recently introduced Teradata Aster R, which extends the power of open source R analytics by lifting the memory and processing limitations.

Teradata Aster R gives analysts a solution to business analytics enterprise-ready, scalable to the highest degree, reliable and easy to use, allowing you to develop high-speed massive amounts of data to meet the analytical needs of each company. The platform delivers the power of R analytics to the enterprise. To support R analysts, Teradata offers familiar R language and tools, massive processing power, and a rich set of analytics. In addition, analysts have access to an immense volume of integrated data from multiple sources.

Teradata Aster R benefits from a platform of high performance computing and has all the advantages in terms of security, data management, and a set of analytics including Teradata Aster R Library, Teradata Aster R Parallel Constructor, and Teradata Aster SNAP Framework Integration.

Oracle R Enterprise

Oracle R Distribution is Oracle’s free distribution of open source R. The database company offers Oracle R Enterprise integrated with R. Oracle R Enterprise primarily introduces a variant to many R data types by overloading them in order to integrate Oracle database with R.

The company also offers Oracle Big Data Connectors that facilitate interaction and data exchange between a Hadoop cluster and Oracle Database. Oracle R Connector for Hadoop is a set of R packages that supports the interface between a local R environment, Oracle Database, and Hadoop.

Oracle strategy with R Enterprise is to provide in-database analytics capabilities for its widely adopted enterprise RDBMS, and for its Exadata appliance.

R for Big Data with PivotalR

PivotalR is a package that enables users of R to interact with the Pivotal (Greenplum) Database as well as Pivotal HD and HAWQ for Big Data analytics. PivotalR is an R library with a familiar user interface that enables data scientists to perform in-database and in-Hadoop computations.

HAWQ is the key differentiating technology in making Pivotal HD the world’s most powerful Hadoop distribution. With support of R language, it offers Dynamic Pipelining, a world-class query optimizer, horizontal scaling, SQL compliant, interactive query, deep analytics, and support for common Hadoop formats.

SAP integrates R with HANA

SAP has integrated R with their in-memory database HANA as the modern platform for mobile, analytics, data services and cloud integration services. SAP HANA works with R by using Rserve, a package that allows communication to an R Server.

The data exchange between SAP HANA and R is very efficient, because they all use the column storage style. SAP’s strategy for integrating HANA with R is to provide modern platform for all applications, enabling customers to truly innovate and transform their businesses in the cloud. The solutions include a comprehensive set of prepackaged rapid-deployment solutions that aim to automate deployment and simplify journey to the cloud.

Contributors: Kyt Dotson and Saroj Kar.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU