National Science Foundation
Expeditions in Computing
AMPLab Publications
- KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics
- ActiveClean: Interactive Data Cleaning For Statistical Modeling
- Kira: Processing Astronomy Imagery Using Big Data Technology
- SparkR: Scaling R Programs with Spark
- MLlib: Machine Learning in Apache Spark
- ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning (Demonstration Paper)
- PrivateClean: Data Cleaning and Differential Privacy
- Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics
- Embarrassingly Parallel Time Series Analysis for Large Scale Weak Memory Systems
- Clamshell: Scaling Up Crowds for Low Latency Data Labeling
- SampleClean: Fast and Reliable Analytics on Dirty Data
- Scientific Computing Meets Big Data Technology: An Astronomy Use Case
- Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views
- Wisteria: Nurturing Scalable Data Cleaning Infrastructure
- Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views
- Automating Model Search for Large Scale Machine Learning
- Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity
- Spark SQL: Relational Data Processing in Spark
- Rethinking Data-Intensive Science Using Scalable Analytics Systems
- The missing piece in complex analytics: Low latency, scalable model management and serving with Velox
- Coordination Avoidance in Database Systems
- A Partitioning Framework for Aggressive Data Skipping
- Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning
- The Power of Choice in Data-Aware Cluster Scheduling
- GraphX: Graph Processing in a Distributed Dataflow Framework
- A Methodology for Learning, Analyzing, and Mitigating Social Influence Bias in Recommender Systems
- A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data
- Fine-grained Partitioning for Aggressive Data Skipping
- GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
- PLANET: Making Progress with Commit Processing in Unpredictable Environments
- Scaling the Mobile Millennium System in the Cloud
- MLI: An API for Distributed Machine Learning
- GraphX: A Resilient Distributed Graph System on Spark
- PBS at Work: Advancing Data Management with Consistency Metrics (Demo)
- Leveraging Transitive Relations for Crowdsourced Joins
- Generalized Scale Independence Through Incremental Precomputation
- RTP: Robust Tenant Placement for Elastic In-Memory Database Clusters
- Fast and Interactive Analytics over Hadoop Data with Spark
- MDCC: Multi-Data Center Consistency
- Shark: SQL and Rich Analytics at Scale
- CrowdQ: Crowdsourced Query Understanding
- MLbase: A Distributed Machine-learning System
- Crowdsourced Enumeration Queries (Best Paper Award)
- CrowdER: Crowdsourcing Entity Resolution
- Probabilistically Bounded Staleness for Practical Partial Quorums
- Shark: Fast Data Analysis Using Coarse-grained Distributed Memory (Best Demo Award)
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing (Best Paper Award)
- PIQL: Success-Tolerant Query Processing in the Cloud
- Crowdsourcing Applications and Platforms: A Data Management Perspective (Tutorial)
- Scaling the Mobile Millennium System in the Cloud
- CrowdDB: Query Processing with the VLDB Crowd (Best Demo Award)
- The SCADS Director: Scaling a Distributed Storage System Under Stringent Performance Requirements
- CrowdDB: Answering Queries with Crowdsourcing
- Hybrid In-Database Inference for Declarative Information Extraction
- Spark: Cluster Computing with Working Sets