Shark and Spark SQL Developer and Sort World Record holder Reynold Xin was recently interviewed by Datanami. In the interview Xin discusses the plans for some fundamental changes to the core execution engine of Spark that he and fellow AMPLab Alum Josh Rosen blogged about on the Databricks blog. A recent performance study from the AMPLab and practical experience from Spark users point to the fact that in many Spark deployments, CPU (not network or storage) is increasingly being seen as the limiting factor in performance, particularly in some important Machine Learning workloads.
Project Tungsten consists of a set of low-level enhancements to Spark, including off-heap memory management, cache-aware data structures, code generation and exploiting processor architecture enhancements and GPUs. The Datanami article can be found here.