R Language Tops the Charts As Most Prefered Language for Data Science and Big Data Analytics


Demonstrating the potential of Big Data technologies requires expertise from different areas. Data Science, data mining, and big data analytics are some of the expert roles that bring together the diverse skills needed to deal with big data technologies, products, and services to optimize the operations of a company. Amid those skills are the languages an analyst knows, so when KDNuggets released its survey of languages and skills, we reviewed the results.

Data visualization is an essential skill for every Web Analyst and data scientist. Data Science demands a number of additional skills, most of which are not learned in a short time. A very strong general knowledge of statistics such as Bayes, linear regression, and logarithmic regression is required, as well as knowledge of algebra and linear algebra; natural language processing; predictive analytics (based on machine learning) and most importantly, knowledge of tools such as R, Python, SQL, and other programming languages.

KDNuggets has published its annual poll of top languages for analytics, data mining and data science, and just as in the two years prior, R language is ranked as the most popular. Based on a high response of over 700 voters, R’s usage grew 16% this year compared to the 2012 poll, followed by Python, and SQL.

“The most popular languages continue to be R (used by 61% of KDnuggets readers), Python (39%), and SQL (37%). SAS is stable at around 20%. The highest growth was for Pig/Hive/Hadoop-based languages, R, and SQL, while Perl, C/C++, and Unix tools declined,” says the report.

Among the most common languages, the largest relative increases in share of usage were found among Pig Latin/Hive/other Hadoop-based languages with 19% growth, from 6.7% in 2012 to 8.0% in 2013; R with 16% growth, and SQL with 14% growth. Similarly, the languages with the largest decline in share of usage were Lisp/Clojure (77% down), Perl (50% down), Ruby (41% down), C/C++ (35% down), UNIX shell/awk/sed (25% down) and Java (22% down).

Ben Podgursky, a Software Engineer at Liveramp, shared a statistic recently, saying that ActionScript yields the highest average household income of $108,119.47, followed by XSLT ($106,199.19), Java ($103,179.39), Groovy ($102,650.86), Objective-C ($101,801.60) and ColdFusion ($101,536.70). Puppet ($87,589.29) and Haskell ($89,973.82) were at the bottom of the list in the GitHub community.

Much like Linux, R has had a rather slow but steady evolution. R was created when a couple of university professors wanted an open source system that could work on big data that was being parallel processed, and it really took off in the academic community, beginning with research projects. Today, R is being used in pre-dated parallel processing, server clusters, and Hadoop and other cloud technologies.

The mix of skills in database query languages, statistics, predictive and advanced analytics, programming, business intelligence, and cognitive science make R such a popular language among developers. Today R can scale for Hadoop execution, in-database execution, parallelized user code, parallelized algorithms, multi-core processing, multi-threaded execution, memory management and fast math libraries.

At the same time, Python has been used for building massive web applications, scientific computing, data structuring, manipulation, query, analysis, and visualization in highly quantitative domains such as finance, oil and gas, physics, and signal processing. It has powered much of Google’s internal infrastructure. According to the TIOBE Software Index, Python is the 8th most popular programing language and the third most commonly used language on the Internet’s largest code repository (GitHub), ahead of Perl, Ruby, and JavaScript.