KeystoneML is a research project exploring techniques to simplify the construction of large scale, end-to-end, machine learning pipelines.

KeystoneML is designed around the principles of composability and modularity, and presents a rich set of operators including featurizers for images, text, and speech, as well as general purpose statistical and signal processing tools including large scale linear solvers. The software also provides several example pipelines that reproduce state-of-the-art academic results on public data sets.

KeystoneML is open source software built on Apache Spark. You can find more information and examples on the project webpage, and contribute to the code on Github.


“Building Large Scale Machine Learning Applications with Pipelines” Spark Summit, 2015

“KeystoneML: Building Large Scale Machine Learning Pipelines on Apache Spark” Strata NYC, 2015