UPDATED 16:22 EDT / DECEMBER 24 2013

NEWS

Hadoop rises and becomes stronger with native R programming support

Illustration by Michael Capozzola

Hadoop just gained a new tool. The company Revolution Analytics (RA) modified its package for statistical analysis of Revolution R Enterprise 7 (RRE 7), so that the R language can also be used for Hadoop platform users.

With the integration of R with Hadoop, big data analysis will get a huge boost to productivity. The primary benefit, however, is the possibility of calculating the statistical functions and the additional benefit of better visualization of results.

The aim is that R through support of Hadoop will find greater adoption amidst business-minded people.

According to Revolution Analytics, by analyzing the data within the node in which it resides, rather than moving it somewhere else to be analyzed, R-based data analysis can done more quickly. With Revolution R Enterprise, Hadoop users can reduce model development lifecycle, build hundreds of models for diverse data sets, create ensemble models to achieve better fidelity and improve lift of predictive models, run large scale compute-intensive simulations at speed, build recommendation engines using machine learning algorithms and integrate analytics directly into enterprise applications.

In addition, RRE 7 offers a number of new algorithms and processes, including a collection of models for the establishment of so-called Decision Forests, a machine learning technique for predicting future results. There are also new ways to visualize decision trees, through which the representation of complex relationships and correlations within a set is said to have improved data.

The new release of Revolution R Enterprise integrates this with the Hadoop distributions of Cloudera, CDH3 and CDH4, as well as with the Hortonworks Data Platform 1.3.

Machine learning to facilitate prediction

Revolution Analytics hope that the integration of R in Hadoop and the Teradata databases will allow to extend the use of language to service managers. The company has designed a new workflow interface that requires no knowledge about the implementation of specific algorithms. This avoids problems with R coding Java or another language in order to turn R on Hadoop platform.

The new R library includes statistical analysis and predictive algorithms commonly used for data processing, data sampling, descriptive statistics, statistical tests, data visualization, simulation, and machine learning models. R also allows to analyze a comprehensive data set, not just a subset or sample of data, which brings the language with the mode of operation of enterprise data warehouses (EDWS).

Analytics perspective on Big Data

R is common among statisticians and data analysts, since the language is easy to learn, even if no programming skills are present. Revolution Analytics Data shows that R program is being used by about 2 million people. You will find R mechanisms for organizing data, creating spreadsheets and optical processing of records.

Hadoop as being a data storage mechanism also have dozens, sometimes hundreds, of CPUs computational processors in them. If data scientists can apply R to these predictive models, they can get computational power house. The language enables data scientists to take full advantage of the computational capacity of Hadoop without having to worry about the plumbing.

Big data analysis developers can use many R language intensive data analysis tools to do the work of refining and extraction of information, such as capture the signal from the noise, the quantization of non- structure of text, extracted from the social network graph showing the measured data, … and so on.

R can also scale for Hadoop execution including in-database execution, parallelized user code, parallelized algorithms, multi-core processing, multi-threaded execution, memory management and fast math libraries.

R is a statistical tool that can handle linear and non-linear modeling, time series analysis, classification and clustering models. The result can be presented in various graphic formats. Previously, R has gained increased popularity due to various tools offered, not always found in traditional software systems. RRE 7 has a library of algorithms that can be run in parallel across a number of network points, the same as Hadoop handles large amounts of data.


A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One click below supports our mission to provide free, deep, and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger, and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU