At the AMPLab, we are constantly looking for ways to improve the performance and user experience of large scale advanced … Continue reading
Tag Archives:
SparkNet
Training deep networks is a time-consuming process, with networks for object recognition often requiring multiple days to train. For this … Continue reading
CoCoA: A Framework for Distributed Optimization
A major challenge in many large-scale machine learning tasks is to solve an optimization objective involving data that is distributed … Continue reading
KeystoneML
KeystoneML is a research project exploring techniques to simplify the construction of large scale, end-to-end, machine learning pipelines. KeystoneML is designed around … Continue reading
Splash: Efficient Stochastic Learning on Clusters
Splash is a general framework for parallelizing stochastic learning algorithms (SGD, Gibbs sampling, etc.) on multi-node clusters. It consists of a … Continue reading
GraphX: Large-Scale Graph Analytics
Increasingly, data-science applications require the creation, manipulation, and analysis of large graphs ranging from social networks to language … Continue reading
Concurrency Control for Machine Learning
Many machine learning (ML) algorithms iteratively transform some global state (e.g., model parameters or variable assignment) giving the illusion of … Continue reading
MLbase: Distributed Machine Learning Made Easy
Implementing and consuming Machine Learning techniques at scale are difficulttasks for ML Developers and End Users. MLbase is a platform … Continue reading
BLB: Bootstrapping Big Data
The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving very large … Continue reading
DFC — Divide-and-Conquer Matrix Factorization
Divide-Factor-Combine (DFC) is a parallel divide-and-conquer framework for noisy matrix factorization problems, e.g., matrix completion and robust matrix factorization. DFC … Continue reading