Concurrency Control for Machine Learning

Many machine learning (ML) algorithms iteratively transform some global state (e.g., model parameters or variable assignment) giving the illusion of serial dependencies between each operation. However, due to sparsity, exchangeability, and other symmetries, it is often the case that many, but not all, of the state-transforming operations can be computed concurrently while still preserving serializability: the equivalence to some serial execution where individual operations have been reordered.

This opportunity for serializable concurrency forms the foundation of distributed database systems. In this project, we implement updates in ML algorithms as concurrent transactions in a distributed database. As a result, we achieve high scalability while maintaining the semantics and theoretical properties of original serial algorithm.

ML algorithms that have been implemented using concurrency control include non-parametric clustering, correlation clustering, submodular maximization, and sparse convex optimization.

AMP Lab – UC Berkeley

Concurrency Control for Machine Learning

Projects

Tags