Imagine Romeo Montague calling up toward Juliet Capulet on a balcony, reciting poetry such as “What R through yonder Windows breaks?” or wooing the daughter of that rival family with sweet nothings of statistical data analysis talk of “Big Data, Big Analytics.” Microsoft SQL Server and Oracle Database products vie for market share. Some people consider Microsoft and the Linux community to be rivals. Therefore, when Microsoft joined forces with Google, Oracle and others in June 2015 for a collaborative project under the Linux Foundation, some people expressed confusion. Let’s clear up the realities of this seemingly star-crossed love affair.
The nonprofit Linux Foundation created the R Consortium to facilitate collaboration between the preexisting R Foundation and several companies, including Microsoft. To better understand the significance of this development, it is useful to first understand what R is, then to relate Microsoft’s interest in it.
What is R?
R is a statistical data analysis programming language. R was first created in 1993, at a time when Excel was still quite young, Visio was just being born and neither PowerPivot nor Excel Services for SharePoint were even a glimmer in anyone’s eyes. As a programming language, R is particularly adept at working with arrays and matrices — long lists of numbers. Statistical analysis and graphical rendering have been fortes of R’s for over two decades. Many data scientists, as well as scientists and engineers of many disciplines, use statistical languages such as MATLAB, Octave, Freemat, Scilab, IDL and R, as well as general-purpose programming languages like Python and FORTRAN for number crunching. MATLAB is a proprietary language and platform, and some open-source languages like Octave are largely language-compatible with MATLAB. In contrast, R is a mature open-source language with implementations available for Linux, Mac OS X and Windows. R is maintained by the nonprofit R Foundation.
Statistical Processing in Modern Times
Can a 22-year-old language be used for modern Big Data, Online Analytical Processing (OLAP) and hybrid on-premises, private cloud and public cloud systems? Yes. The R community of developers and users has been growing steadily over those years, and it is estimated that over 2 million people are using R. The R Foundation has been maintaining extensions, fixes and features.
One of the key players in the R community since 2007 has been Revolution Analytics, a company that was acquired by Microsoft in April 2015. Revolution Analytics offers two key products centered around R:
- Revolution R Open (RRO) – free R distribution that includes several enhancements
- Revolution R Enterprise (RRE) – paid platform with workstation, server (Windows, Red Hat Linux, SUSE Linux), Hadoop (Cloudera, Hortonworks, MapR) and Enterprise Data Warehouse (Teradata) distributions
Both Revolution R platforms are 100 percent R compatible, allowing cross-platform reuse of R code. These R distributions can be scaled to support data statistics processing, predictive modeling and machine learning projects.
The R Consortium was founded with the R Foundation as a core member, and Microsoft and RStudio as Platinum members. TIBCO is a Gold member, and Silver members include Alteryx, Hewlett-Packard, Mango Solutions, Google, Ketchum Trading and Oracle. These members are depicted in the pie chart that was generated using the following R code:
pie.RConsortium <- c(4, rep(3,2), 2, rep(1,6))
names(pie.RConsortium) <- c("R Foundation", "Microsoft", "RStudio",
"Alteryx", "HP", "Mango", "Google", "Ketchum", "Oracle")
pie(pie.RConsortium, col = c(rainbow(length(pie.RConsortium))))
title(main = "R-Consortium Members", cex.main = 1.8, font.main = 1)
title(xlab = "(Collaborative Project of Linux Foundation)",
cex.lab = 0.8, font.lab = 3)
This brief example includes static data. Certainly, the dynamic results of an SQL query or web service could be processed using R.
R and Microsoft
Upon acquiring Revolution Analytics, Microsoft invested in a strong commitment to the R community. Joseph Sirosh, Corporate Vice President, Information Management & Machine Learning at Microsoft noted on the Machine Learning blog, “investing in enabling more customers to use advanced analytics within Microsoft data platforms on-premises, in hybrid cloud environments and on Microsoft Azure. It’s been just over two months since Microsoft acquired Revolution Analytics and together, we are realizing the vision of empowering enterprises, R developers and data scientists to more easily and cost effectively build applications and analytics solutions at scale. Since the acquisition, we announced that R will be shipped in SQL Server 2016.” As a platinum member of the R Consortium, Microsoft has expressed an interest in integrating R into other Microsoft products in the future. David Smith at Revolution Analytics maintains a blog with recent activities.
R and You
If you work as a data scientist, use Big Data platforms,or want to be ready for SQL Server 2016 and Revolution R offerings, there are a number of training resources available. Some of the training options include:
Have you been using R? If so, what are some of the features you consider the most powerful? If you have not yet used R, are you ready to take the leap? Let me know if you have any questions.