Apache Spark and Java 8: The Big Data Team for 2015

AMPLab was an early adopter of the Scala language for serious systems work.   This was a risky bet at the time, for several reasons.   First, compared to other popular languages such as Java and C++, Scala at that time had a fairly small following and its functional style of programming was not really in the mainstream for systems programmers.   Secondly, since Scala runs on top of the JVM, there were concerns about performance.

Roll the clock forward to 2015, many of the performance concerns about JVM-based languages have been mitigated – for example, a 100 TB sort world record was recently set by a Spark-based system running on the public cloud.   Furthermore, a number of the key functional programming concepts from Scala have been added to the Java language in the Java 8 release.

These developments have proven to be quite beneficial to AMPLab’s Berkeley Data Analytics Stack (BDAS) and its community of users, and should lead the way to even wider adoption in the future.   This recent article on the Datanami Big Data tech blog describes this evolution and predicts that the combination of Apache Spark (the core of the BDAS stack) and Java 8 will become even more prevalent in the coming year.