[AMPLab Cloud Seminar] GraphLab: A Framework for Asynchronous Parallel Machine Learning

Guest speaker is Joseph Gonzalez from Carnegie Mellon University (CMU)

While high-level data-parallel frameworks like MapReduce (Hadoop) dramatically simplify the design of large-scale data processing systems, they do not naturally express the asynchronous iterative graph computation found in many advanced machine learning techniques. Recent frameworks like Pregel, Giraph, and Pegasus have begun to simplify the design and implementation of large-scale graph algorithms by implementing the classic Bulk Synchronous computational model. However, none of these frameworks target the more expressive asynchronous computational model needed for a wide variety of popular graph structured machine learning algorithms. To fill this critical void, we developed the GraphLab framework which naturally expresses asynchronous graph computation while ensuring data consistency and achieving a high degree of parallel performance.

In this talk I will demonstrate how the MapReduce and Bulk Synchronous models of computation can lead to highly inefficient parallel learning systems. I will then introduce the GraphLab framework and explain how
it addresses these critical limitations while retaining the advantages of a high-level abstraction. I will show how the GraphLab abstraction can be used to build efficient provably correct versions of several popular sequential machine learning algorithms. Finally, I will present scaling results in both the multi-core and cloud settings.