Back in June, Patrick Wendell posted a first set of results in a “Big Data Benchmark” for large-scale query engines. Obviously a lot has happened in the space since then and so we have updated those results, re-running the tests on the latest versions of the previously tested systems (Redshift, Impala, Spark, and Hive) and including numbers for the Tez (Stinger) system.
While all the systems examined in the previous iteration of the benchmark show some performance improvement, the improvements are in general fairly modest for the simple queries used in this benchmark. The results for Tez/Stinger (which is currently in a “preview” release) show a roughly 2x performance improvement over the Hadoop MapReduce-based version of Hive for the benchmark queries.
The results, along with some details of how the experiments were run, pointers and code for running the benchmarks yourself, and the various disclaimers and provisos that should accompany all performance benchmarking efforts can be found at the AMPLab Big Data Benchmark Page. Also at that page is a link to the previous results for comparison purposes.