CACM November 2016: “Apache Spark: A Unified Engine for Big Data Processing”

The November 2016 issue of CACM features an overview paper on Apache Spark written by Spark contributors from AMPLab and Databricks.   The paper, entitled “Apache Spark: A Unified Engine for Big Data Processing” by Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica, describes the history of Spark and the main ideas behind it, as well as giving an overview of the basic RDD model, higher-level programming interfaces, and some Spark applications.

While Spark is not the cover story of the issue (see below), it is featured with a bonus video by Dr. Matei Zaharia.

For this issue, CACM opted to go with a “COSMO-style” cover, either as a (reasonable) attempt at humor or an (unrealistic) attempt to increase sales at super-market checkout lines, so Spark is not the cover story, but the cover article co-authored by Berkeley colleague Christos Papadimitriou is also a good read.