BlowFish: Dynamic Storage-Performance Tradeoff in Data Stores

This paper presents BlowFish, a distributed data store that admits a smooth tradeoff between storage and performance for point queries. What makes BlowFish unique is its ability to navigate along this tradeoff curve efficiently at fine-grained time scales with low computational overhead.

Achieving a smooth and dynamic storage-performance tradeoff enables a wide range of applications. We apply BlowFish to several such applications from real-world production clusters: (i) as a data recovery mechanism during failures: in practice, BlowFish requires 5.4x lower bandwidth and 2.5x lower repair time compared to state-of-the-art erasure codes, while reducing the storage cost of replication from 3x to 1.9x; and (ii) data stores with spatially-skewed and time-varying workloads (e.g., due to object popularity and/or transient failures): we show that navigating the storage-performance tradeoff achieves higher system-wide utility (e.g., throughput) than selectively caching hot objects.