Large Scale Estimation in Cyberphysical Systems using Streaming Data: a Case Study with Smartphone Traces

Controlling and analyzing cyberphysical and robotics systems is increasingly becoming a Big Data challenge. We study the case of predicting drivers’ travel times in a large urban area from sparse GPS traces. We present a framework that can accommodate a wide variety of traffic distributions and spread all the computations on a cluster to achieve small latencies. Our framework is built on Discretized Streams, a recently proposed approach to stream processing at scale. We demonstrate the usefulness of Discretized Streams with a novel algorithm to estimate vehicular traffic in urban networks. Our online EM algorithm can estimate traffic on a very large city network (the San Francisco Bay Area) by processing tens of thousands of observations per second, with a latency of a few seconds.