MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

The classical mixture of Gaussians model is related to K-means via small-variance asymptotics: as the covariances of the Gaussians tend to zero, the negative log-likelihood of the mixture of Gaussians model approaches the K-means objective, and the EM algorithm approaches the K-means algo- rithm. Kulis & Jordan (2012) used this ob- servation to obtain a novel K-means-like al- gorithm from a Gibbs sampler for the Dirichlet process (DP) mixture. We instead con- sider applying small-variance asymptotics di- rectly to the posterior in Bayesian nonparametric models. This framework is independent of any specific Bayesian inference algorithm, and it has the major advantage that it generalizes immediately to a range of models beyond the DP mixture. To illustrate, we apply our framework to the feature learning set- ting, where the beta process and Indian buffet process provide an appropriate Bayesian nonparametric prior. We obtain a novel ob- jective function that goes beyond clustering to learn (and penalize new) groupings for which we relax the mutual exclusivity and exhaustivity assumptions of clustering. We demonstrate several other algorithms, all of which are scalable and simple to implement. Empirical results demonstrate the benefits of the new framework.