Time-Evolving Graph Processing at Scale

Time-evolving graph-structured big data arises naturally in many application domains such as social networks and communication networks. However, existing graph processing systems lack support for efficient computations on dynamic graphs.

In this paper, we represent most computations on time evolving graphs into (1) a stream of consistent and resilient graph snapshots, and (2) a small set of operators that manipulate such streams of snapshots. We then introduce GraphTau, a time-evolving graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphTau quickly builds fault-tolerant graph snapshots as each small batch of new data arrives. GraphTau achieves high performance and fault tolerant graph stream processing via a number of optimizations. GraphTau also unifies data streaming and graph streaming processing. Our preliminary evaluations on two representative datasets show promising results. Besides performance benefit, GraphTau API relieves programmers from handling graph snapshot generation, windowing operators and sophisticated differential computation mechanisms.