Trip Report from the NIPS Big Learning Workshop

Error: Unable to create directory uploads/2024/05. Is its parent directory writable by the server?

A few weeks ago, I went to the Big Learning workshop at NIPS, held in Spain. The workshop brought together researchers in large-scale machine learning, an area near and dear to the AMP Lab’s goal of integrating Algorithms, Machines, and People to tame big data, and contained a lot of interesting work. There were about ten invited talks and ten paper presentations. I myself gave an invited talk on Spark, our framework for large-scale parallel computing, which won a runner-up best presentation award.

The topics presented ranged from FPGAs to accelerate vision algorithms in embedded devices, to GPU programming, to cloud computing on commodity clusters. For me, some highlights included the discussion on training the Kinect pose recognition algorithm using DryadLINQ, which ran on several thousand cores and had to overcome substantial fault mitigation and I/O challenges; and the GraphLab presentation from CMU, which discussed many interesting applications implemented using their asynchronous programming model. Daniel Whiteson from UC Irvine also gave an extremely entertaining talk on the role of machine learning in the search for new subatomic particles.

One of the groups I was happy to see represented was the Scala programming language team from EPFL. Scala features prominently as a high-level language for parallel computing. We use it in the Spark programming framework in our lab, as well as the SCADS scalable key-value store. It’s also used heavily in the Pervasive Parallelism Lab at a certain school across the bay. It was good to hear that the Scala team is working on new features that will make the language easier to use as a DSL for parallel computing, making it simpler to build highly expressive programming tools in Scala such as Spark.

The AMP Lab was also represented by John Duchi, who presented a new algorithm for stochastic gradient descent in non-smooth problems that is the first parallelizable approach for these problems, and Ariel Kleiner and Ameet Talwalkar, who presented the Bag of Little Bootstraps, a scalable bootstrap algorithm based on subsampling. It’s certainly neat to see two successes in parallelizing very disparate statistical algorithms one year into the AMP Lab.

In summary, the workshop showcased very diverse ideas and showed that big learning is a hot field. It was the biggest workshop at NIPS this year. In the future, as users gain experience with the various programming models and the best algorithms for each problem type are found, we expect to see some consolidation of these ideas into unified
stacks of composable tools. Designing and building such a stack is one of the main goals of our lab.