Bayesian Bias Mitigation for Crowdsourcing

Biased labelers are a systemic problem in crowdsourcing and a
comprehensive toolbox for handling their responses is still being
developed. A typical crowdsourcing application can be divided into
three steps: data collection, data curation and learning. At present
these steps are often treated separately. We present Bayesian Bias
Mitigation for Crowdsourcing (BBMC), a Bayesian model to unify all
three. Our model describes each labeler as being influenced by a
number of shared factors. This captures bias in a population of
labelers more flexibly than present methods and allows us to combine
data curation and learning into a single computation. Active learning
integrates data collection into learning but is generally considered
infeasible for Gibbs sampling inference. We propose a general
approximation strategy for Markov chains to efficiently quantify the
effect a perturbation has on the stationary distribution and
specialize it to active learning. Experiments show BBMC to outperform
a number of common heuristics.