We Don’t Know Enough to make a Big Data Benchmark Suite – An Academia-Industry View

This is a position paper that comes from an unprecedented empirical analysis of seven production workloads of MapReduce, an important class of big data systems. The main lesson we learned is that we do not know much about real life use cases of big data systems at all. Without real life empirical insights, both vendors and customers often have incorrect assumptions about their own workloads. Scientifically speaking, we are not quite ready to declare anything to be worthy of the label “big data benchmark.” Nonetheless, we should encourage further measurement, exploration, and development of stopgap tools.

Author: Yanpei Chen
Publication Date: May 2012
Conference: Workshop on Big Data Benchmarking
Download PDF: mainWBDB2012