Data Science 101: SparkR – Interactive R Programs at Scale

As interest in Apache Spark continues to accelerate, the SparkR project has been seeing increasing attention as well.  The industry website Inside Big Data recently published an article on SparkR .  The article includes a video by AMPLab’s own Shivaram Venketaraman.

SparkR was initiated as a class project by several students in our graduate-level computing systems course.    An early version was released on the AMPLab Github repo and very quickly was picked up by a number of users and developers.  Clearly there is a demand out there for a scalable implementation of the popular R statistical language.   The article cited above also points out the fact that combining “R” with Spark’s “RDD”s leads to “R2D2”, a naming opportunity that we in the lab somehow missed!