Michael Franklin

Director

http://www.cs.berkeley.edu/~franklin/
franklin@cs.berkeley.edu

AMPLab Publications

KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics
ActiveClean: Interactive Data Cleaning For Statistical Modeling
Kira: Processing Astronomy Imagery Using Big Data Technology
SparkR: Scaling R Programs with Spark
MLlib: Machine Learning in Apache Spark
ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning (Demonstration Paper)
PrivateClean: Data Cleaning and Differential Privacy
Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics
Embarrassingly Parallel Time Series Analysis for Large Scale Weak Memory Systems
Clamshell: Scaling Up Crowds for Low Latency Data Labeling
SampleClean: Fast and Reliable Analytics on Dirty Data
Scientific Computing Meets Big Data Technology: An Astronomy Use Case
Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views
Wisteria: Nurturing Scalable Data Cleaning Infrastructure
Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views
Automating Model Search for Large Scale Machine Learning
Feral Concurrency Control: An Empirical Investigation of Modern Application Integrity
Spark SQL: Relational Data Processing in Spark
Rethinking Data-Intensive Science Using Scalable Analytics Systems
The missing piece in complex analytics: Low latency, scalable model management and serving with Velox
Coordination Avoidance in Database Systems
A Partitioning Framework for Aggressive Data Skipping
Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning
The Power of Choice in Data-Aware Cluster Scheduling
GraphX: Graph Processing in a Distributed Dataflow Framework
A Methodology for Learning, Analyzing, and Mitigating Social Influence Bias in Recommender Systems
A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data
Fine-grained Partitioning for Aggressive Data Skipping
GraphX: Unifying Data-Parallel and Graph-Parallel Analytics
PLANET: Making Progress with Commit Processing in Unpredictable Environments
Scaling the Mobile Millennium System in the Cloud
MLI: An API for Distributed Machine Learning
GraphX: A Resilient Distributed Graph System on Spark
PBS at Work: Advancing Data Management with Consistency Metrics (Demo)
Leveraging Transitive Relations for Crowdsourced Joins
Generalized Scale Independence Through Incremental Precomputation
RTP: Robust Tenant Placement for Elastic In-Memory Database Clusters
Fast and Interactive Analytics over Hadoop Data with Spark
MDCC: Multi-Data Center Consistency
Shark: SQL and Rich Analytics at Scale
CrowdQ: Crowdsourced Query Understanding
MLbase: A Distributed Machine-learning System
Crowdsourced Enumeration Queries (Best Paper Award)
CrowdER: Crowdsourcing Entity Resolution
Probabilistically Bounded Staleness for Practical Partial Quorums
Shark: Fast Data Analysis Using Coarse-grained Distributed Memory (Best Demo Award)
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing (Best Paper Award)
PIQL: Success-Tolerant Query Processing in the Cloud
Crowdsourcing Applications and Platforms: A Data Management Perspective (Tutorial)
Scaling the Mobile Millennium System in the Cloud
CrowdDB: Query Processing with the VLDB Crowd (Best Demo Award)
The SCADS Director: Scaling a Distributed Storage System Under Stringent Performance Requirements
CrowdDB: Answering Queries with Crowdsourcing
Hybrid In-Database Inference for Declarative Information Extraction
Spark: Cluster Computing with Working Sets