This is a position paper that comes from an unprecedented empirical analysis of seven production workloads of MapReduce, an important class of big data systems. The main lesson we learned is that we do not know much about real life use cases of big data systems at all. Without real life empirical insights, both vendors and customers often have incorrect assumptions about their own workloads. Scientifically speaking, we are not quite ready to declare anything to be worthy of the label “big data benchmark.” Nonetheless, we should encourage further measurement, exploration, and development of stopgap tools.
National Science Foundation
Expeditions in Computing