Energy Debugging with Carat Enters Beta

Error: Unable to create directory uploads/2024/04. Is its parent directory writable by the server?

Carat is a new research project in the AMP Lab that aims to detect energy bugs—app behavior that is consuming energy unnecessarily—using data collected from a community of mobile devices. Carat provides users with actions they can take to improve battery life (and the expected improvements).

Carat collects usage data on devices (we care about privacy), aggregates these data in the cloud, performs a statistical analysis using Spark, and reports the results back to users. In addition to the Action List shown in the figure, the app empowers users to dive into the data, answering questions like How does my energy use compare to similar devices? and What specific information is being sent to the server?

The key insight of our approach is that we can acquire implicit statistical specifications of what constitutes “normal” energy use under different circumstances. This idea of statistical debugging has been applied to correctness and performance bugs, but this is the first application to energy bugs. The project faces a number of interesting (and sometimes distinguishing) technical challenges, such as accounting for sampling bias, reasoning with noisy and incomplete information, and providing users with an experience that rewards them for participating.

We need your help testing our iOS implementation and gathering some initial data! If you have an iPhone or iPad with iOS 5.0 or later and are willing to give us a few minutes of your time, please click here.

Getting It All from the Crowd

Error: Unable to create directory uploads/2024/04. Is its parent directory writable by the server?

What does a query result mean when the data comes from the crowd? This is one of the fundamental questions raised by CrowdDB, a hybrid human/machine database system developed here in the AMP lab. For example, consider what could be thought of as the simplest query: SELECT * FROM table. If tuples are being provided by the crowd, how do you know when the query is complete? Can you really get them all?

In traditional database systems, query processing is based on the closed-world assumption: all data relevant to a query is assumed to reside in the database. When the data is crowdsourced, we find that this assumption no longer applies; existing information could be extended by further by asking the crowd. However, in our current work we show that it is possible to understand query results in the “open-world” by reasoning about query completeness and the cost-benefit tradeoff of acquiring more data.

Consider asking workers on a crowdsourcing platform (e.g., Amazon Mechanical Turk) to provide items in a set (one at a time). As you can imagine, answers arriving from the crowd follow a pattern of diminishing returns: initially there is a high rate of arrival for previously unseen answers, but as the query progresses the arrival rate of new answers begins to taper off. The figure below shows an example of this curve when we asked workers to provide names of the US States.

average SAC from States experiment

Number of unique answers seen vs. total number of answers in the US States experiment (average)

This behavior is well-known in fields such as biology and statistics, where this type of figure is known as the Species Accumulation Curve (SAC). This analysis is part of the species estimation problem; the goal is to estimate the number of distinct species using observations of species in the locale of interest. We apply these techniques in the new context of crowdsourced queries by drawing an analogy between observed species and worker answers from the crowd. It turned out that the estimation algorithms sometimes fail due to crowd-specific behaviors like some workers providing many more answers than others (“streakers vs. samplers”). We address this by designing a heuristic that reduces the impact of overzealous workers. We also demonstrate a heuristic to detect when workers are consulting the same list on the web, helpful if the system wants to switch to another data gathering regime like webpage scraping.

Species estimation techniques provide a way to reason about query results, despite being in the open world. For queries with a bounded result size, we can form a progress estimate as answers arrive by predicting the cardinality of the result set. Of course, some sets are very large or contain items that few workers would think of or find (the long tail), so it does not make sense to try to predict set size. For these cases, we propose a pay-as-you-go-approach to directly consider the benefit of asking the crowd for more answers.

For more details, please check out the paper.