As interest in Apache Spark continues to accelerate, the SparkR project has been seeing increasing attention as well. The industry website Inside Big Data recently published an article on SparkR . The article includes a video by AMPLab’s own Shivaram Venketaraman.
SparkR was initiated as a class project by several students in our graduate-level computing systems course. An early version was released on the AMPLab Github repo and very quickly was picked up by a number of users and developers. Clearly there is a demand out there for a scalable implementation of the popular R statistical language. The article cited above also points out the fact that combining “R” with Spark’s “RDD”s leads to “R2D2”, a naming opportunity that we in the lab somehow missed!