Cluster computing applications like MapReduce and Dryad transfer massive amounts of data between their computation stages. These transfers can have a significant impact on job performance, ac- counting for more than 50% of job completion times. Despite this impact, there has been relatively little work on optimizing the per- formance of these data transfers, with networking researchers tra- ditionally focusing on per-flow traffic management. We address this limitation by proposing a global management architecture and a set of algorithms that (1) improve the transfer times of common communication patterns, such as broadcast and shuffle, and (2) al- low scheduling policies at the transfer level, such as prioritizing a transfer over other transfers. Using a prototype implementation, we show that our solution improves broadcast completion times by up to 4.5× compared to the status quo in Hadoop. We also show that transfer-level scheduling can reduce the completion time of high- priority transfers by 1.7×.
National Science Foundation
Expeditions in Computing