AMP Lab – UC Berkeley

National Science Foundation
Expeditions in Computing

Main menu

Skip to content
  • About
  • People
  • Papers
  • Projects
  • Software
  • Blog
  • Sponsors
  • Photos
  • Login

Tag Archives:

Time-Evolving Graph Processing at Scale

Anand Padmanabha Iyer, Li Erran Li, Tathagata Das, Ion Stoica
Graph Data-management Experiences & Systems (GRADES), Jun. 2016.
Tags: Big Data, graph, graph analytics, graph processing

CoCoA: A Framework for Distributed Optimization

A major challenge in many large-scale machine learning tasks is to solve an optimization objective involving data that is distributed … Continue reading →

Tags: Big Data, cocoa, distributed machine learning, Optimization

Succinct on Apache Spark: Queries on Compressed RDDs

Posted on November 5, 2015 by Rachit Agarwal
Rachit Agarwal

tl;dr Succinct is a distributed data store that supports a wide range of point queries (e.g., search, count, range, random … Continue reading →

Tags: amp, Big Data, compression, query processing, range queries, scalability, search, spark, Succinct

Succinct: Enabling Queries on Compressed Data

Web applications and services today collect, store and analyze an immense amount of data. As data sizes continue to grow, the … Continue reading →

Tags: Big Data, compression, ElasticSearch, key-value store, NoSQL, search, spark, Succinct

KeystoneML

KeystoneML is a research project exploring techniques to simplify the construction of large scale, end-to-end, machine learning pipelines. KeystoneML is designed around … Continue reading →

Tags: Big Data, Declarative ML, distributed machine learning, Machine Learning

Splash: Efficient Stochastic Learning on Clusters

Splash is a general framework for parallelizing stochastic learning algorithms (SGD, Gibbs sampling, etc.) on multi-node clusters. It consists of a … Continue reading →

Tags: Big Data, distributed machine learning, spark, stochastic algorithm

Velox: Models in Action

To support complex data-intensive applications such as personalized recommendations, targeted advertising, and intelligent services, the data management community has focused … Continue reading →

Tags: Big Data, Machine Learning, serving

The Power of Choice in Data-Aware Cluster Scheduling

Shivaram Venkataraman, Aurojit Panda, Ganesh Anantharayanan, Michael Franklin, Ion Stoica
OSDI'14, Oct. 2014.
Tags: Approximate Query Processing, Big Data, dataflow, scheduling

GraphX: Graph Processing in a Distributed Dataflow Framework

Joseph Gonzalez, Reynold Xin, Ankur Dave, Dan Crankshaw, Michael Franklin, Ion Stoica
OSDI, Oct. 2014.
Tags: Big Data, dataflow, Graphs, Graphx, query processing, spark

A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data

Jiannan Wang, Sanjay Krishnan, Michael Franklin, Ken Goldberg, Tim Kraska, Tova Milo
SIGMOD, Jun. 2014.
Tags: Big Data, crowdsourcing, Data Cleaning, query processing, Sampling

Fine-grained Partitioning for Aggressive Data Skipping

Liwen Sun, Michael Franklin, Sanjay Krishnan, Reynold Xin
ACM SIGMOD, Jun. 2014.
Tags: algorithms, Big Data, data warehouse, databases, partitioning, physical database design, spark

Alluxio (formerly Tachyon), a Memory Speed Virtual Distributed Storage System

As datasets continue to grow, storage and networking pose the most challenging bottlenecks for many workloads. To address the bottleneck, … Continue reading →

Tags: Alluxio, BDAS, Big Data, hadoop, storage

GraphX: Large-Scale Graph Analytics

    Increasingly, data-science applications require the creation, manipulation, and analysis of large graphs ranging from social networks to language … Continue reading →

Tags: Big Data, distributed machine learning, graph analytics, Graphs, social networks, spark

Concurrency Control for Machine Learning

Many machine learning (ML) algorithms iteratively transform some global state (e.g., model parameters or variable assignment) giving the illusion of … Continue reading →

Tags: Big Data, concurrency control, distributed machine learning

A General Bootstrap Performance Diagnostic

Ariel Kleiner, Ameet Talwalkar, Sameer Agarwal, Ion Stoica, Michael Jordan
ACM KDD 2013, Aug. 2013.
Tags: Big Data, BlinkDB, Bootstrap, Error Bars

GraphX: A Resilient Distributed Graph System on Spark

Reynold Xin, Joseph Gonzalez, Michael Franklin, Ion Stoica
GRADES (SIGMOD workshop), Jun. 2013.
Tags: Big Data, graph

Generalized Scale Independence Through Incremental Precomputation

Michael Armbrust, Eric Liang, Tim Kraska, Armando Fox, Michael Franklin, David Patterson
ACM SIGMOD Conference, Jun. 2013.
Tags: Big Data, Materialized Views, PIQL, SCADS, scale independence

Fast and Interactive Analytics over Hadoop Data with Spark

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica
Usenix ;login:, Aug. 2012.
Tags: BDAS, Big Data, hadoop, spark

Shark: SQL and Rich Analytics at Scale

Reynold Xin, Joshua Rosen, Matei Zaharia, Michael Franklin, Scott Shenker, Ion Stoica
ACM SIGMOD Conference, Jun. 2013.
Tags: Big Data, spark, SQL, Warehouse

MLbase: Distributed Machine Learning Made Easy

Implementing and consuming Machine Learning techniques at scale are difficulttasks for ML Developers and End Users. MLbase is a platform … Continue reading →

Tags: Big Data, distributed machine learning

AMP Camp Follow-up and Preview of What’s Next

Posted on September 21, 2012 by Andy Konwinski
Andy Konwinski

In August, we hosted the first AMP Camp “Big Data bootcamp” and it was a huge success, with a sold-out … Continue reading →

Tags: Big Data

DNA Processing Pipeline

Another effort related to genomics underway at the AMP Lab involves developing a variant calling pipeline.  Variant calling is the … Continue reading →

Tags: amp, application, Big Data, genomics

DNA Sequence Alignment with SNAP

As the cost of DNA sequencing continues to drop faster than Moore’s Law, there is a growing need for tools … Continue reading →

Tags: amp, application, Big Data, genomics, snap

Cancer Tumor Genomics: Fighting the Big C with the Big D

It may have been true once that expertise in computer science was needed only by computer scientists. But Big Data … Continue reading →

Tags: amp, application, Big Data, genomics

A Scalable Bootstrap for Massive Data

Ariel Kleiner, Ameet Talwalkar, Purna Sarkar, Michael Jordan
Journal of the Royal Statistical Society, Series B, Dec. 2011.
Tags: Big Data

Shark: Fast Data Analysis Using Coarse-grained Distributed Memory (Best Demo Award)

Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Scott Shenker, Ion Stoica
Demonstration Paper, ACM SIGMOD Conference , May. 2012.
Tags: BDAS, Best Paper Award, Big Data, Shark, spark, SQL

Real Life Datacenter Workloads

How do we ensure that AMP Lab works on important and immediate problems? One of many ways is to look … Continue reading →

Tags: Big Data, Datacenters, dryad, hadoop, storage, workload

Traffic jams, cell phones and big data

Posted on January 18, 2012 by Timothy Hunter
Timothy Hunter

(With contributions from Michael Armbrust, Leah Anderson and Jack Reilly) It is well known that big data processing is becoming … Continue reading →

Tags: Big Data, mobilemillennium, spark

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing (Best Paper Award)

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica
NSDI 2012, Apr. 2012.
Tags: BDAS, Best Paper Award, Big Data, spark, storage

BLB: Bootstrapping Big Data

The bootstrap provides a simple and powerful means of assessing the quality of estimators. However, in settings involving very large … Continue reading →

Tags: Big Data, distributed machine learning

Divide-and-Conquer Matrix Factorization

Lester Mackey, Ameet Talwalkar, Michael Jordan
Neural Information Processing Systems (NIPS), Jan. 2012.
Tags: Big Data, matrix factorization


Tags

Akaros amp application Approximate Query Processing BDAS Best Paper Award Big Data BlinkDB Bootstrap cluster coflow consistency crowdsourcing databases Datacenters data centers Data Cleaning data quality Declarative ML distributed machine learning genomics Graphs hadoop Machine Learning Materialized Views matrix factorization mesos MLbase Optimization OS pbs PIQL query processing Sampling SCADS scalability scale independence scheduling Shark spark SQL storage Succinct transactions vldb

  • Come Visit
  • Contact
  • Open Positions


  • About
  • People
  • Publications
  • Projects
  • Seminars
  • Blog: AMP BLAB
  • Sponsors
  • Photos
  • Wiki
  • Jenkins
Copyright © 2021 AMPLab