AMP Lab – UC Berkeley

National Science Foundation
Expeditions in Computing

Main menu

Skip to content
  • About
  • People
  • Papers
  • Projects
  • Software
  • Blog
  • Sponsors
  • Photos
  • Login

Tag Archives:

MLlib: Machine Learning in Apache Spark

Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, Doris Xin, Reynold Xin, Michael Franklin, Reza Zadeh, Matei Zaharia, Ameet Talwalkar
Journal of Machine Learning Research, 17 (34), Apr. 2016.
Tags: Machine Learning, MLlib, spark

SparkNet

Training deep networks is a time-consuming process, with networks for object recognition often requiring multiple days to train. For this … Continue reading →

Tags: deep learning, distributed machine learning, Machine Learning, spark

SparkNet: Training Deep Networks on Spark

Philipp Moritz, Robert Nishihara, Ion Stoica, Michael Jordan
International Conference on Learning Representations (ICLR), May. 2016.
Tags: deep learning, distributed machine learning, Machine Learning, spark

Succinct on Apache Spark: Queries on Compressed RDDs

Posted on November 5, 2015 by Rachit Agarwal
Rachit Agarwal

tl;dr Succinct is a distributed data store that supports a wide range of point queries (e.g., search, count, range, random … Continue reading →

Tags: amp, Big Data, compression, query processing, range queries, scalability, search, spark, Succinct

Succinct: Enabling Queries on Compressed Data

Web applications and services today collect, store and analyze an immense amount of data. As data sizes continue to grow, the … Continue reading →

Tags: Big Data, compression, ElasticSearch, key-value store, NoSQL, search, spark, Succinct

Splash: Efficient Stochastic Learning on Clusters

Splash is a general framework for parallelizing stochastic learning algorithms (SGD, Gibbs sampling, etc.) on multi-node clusters. It consists of a … Continue reading →

Tags: Big Data, distributed machine learning, spark, stochastic algorithm

Spark SQL: Relational Data Processing in Spark

Michael Armbrust, Reynold Xin, Yin Huai, Davies Liu, Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael Franklin, Ali Ghodsi, Matei Zaharia
ACM SIGMOD Conference 2015, May. 2015.
Tags: Catalyst, Dataframes, JSON, Optimization, query processing, Shark, spark, SQL

Rethinking Data-Intensive Science Using Scalable Analytics Systems

Frank Austin Nothaft, Matt Massie, Timothy Danford, Zhao Zhang, Uri Laserson, Carl Yeksigian, Jey Kottalam, Michael Franklin, Anthony Joseph, David Patterson
ACM SIGMOD Conference, May. 2015.
Tags: genomics, spark

Discretized Streams: Fault-Tolerant Streaming Computation at Scale

Matei Zaharia, Tathagata Das, Haoyuan Li, Timothy Hunter, Ion Stoica, Scott Shenker
SOSP, Nov. 2013.
Tags: D-streams, spark, spark streaming

GraphX: Graph Processing in a Distributed Dataflow Framework

Joseph Gonzalez, Reynold Xin, Ankur Dave, Dan Crankshaw, Michael Franklin, Ion Stoica
OSDI, Oct. 2014.
Tags: Big Data, dataflow, Graphs, Graphx, query processing, spark

Fine-grained Partitioning for Aggressive Data Skipping

Liwen Sun, Michael Franklin, Sanjay Krishnan, Reynold Xin
ACM SIGMOD, Jun. 2014.
Tags: algorithms, Big Data, data warehouse, databases, partitioning, physical database design, spark

GraphX: Large-Scale Graph Analytics

    Increasingly, data-science applications require the creation, manipulation, and analysis of large graphs ranging from social networks to language … Continue reading →

Tags: Big Data, distributed machine learning, graph analytics, Graphs, social networks, spark

ADAM: Genomics Formats and Processing Patterns for Cloud Scale Computing

Matt Massie, Frank Austin Nothaft, Chris Hartl, Christos Kozanitis, Andre Schumacher, Anthony Joseph, David Patterson
Dec. 2013.
Tags: genomics, spark

Fast and Interactive Analytics over Hadoop Data with Spark

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica
Usenix ;login:, Aug. 2012.
Tags: BDAS, Big Data, hadoop, spark

Shark: SQL and Rich Analytics at Scale

Reynold Xin, Joshua Rosen, Matei Zaharia, Michael Franklin, Scott Shenker, Ion Stoica
ACM SIGMOD Conference, Jun. 2013.
Tags: Big Data, spark, SQL, Warehouse

Shark: Fast Data Analysis Using Coarse-grained Distributed Memory (Best Demo Award)

Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Scott Shenker, Ion Stoica
Demonstration Paper, ACM SIGMOD Conference , May. 2012.
Tags: BDAS, Best Paper Award, Big Data, Shark, spark, SQL

Traffic jams, cell phones and big data

Posted on January 18, 2012 by Timothy Hunter
Timothy Hunter

(With contributions from Michael Armbrust, Leah Anderson and Jack Reilly) It is well known that big data processing is becoming … Continue reading →

Tags: Big Data, mobilemillennium, spark

Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing (Best Paper Award)

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Murphy McCauley, Michael Franklin, Scott Shenker, Ion Stoica
NSDI 2012, Apr. 2012.
Tags: BDAS, Best Paper Award, Big Data, spark, storage

Spark – Lightning-Fast Cluster Computing

Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and … Continue reading →

Tags: cluster, hadoop, mesos, spark


Tags

Akaros amp application Approximate Query Processing BDAS Best Paper Award Big Data BlinkDB Bootstrap cluster coflow consistency crowdsourcing databases Datacenters data centers Data Cleaning data quality Declarative ML distributed machine learning genomics Graphs hadoop Machine Learning Materialized Views matrix factorization mesos MLbase Optimization OS pbs PIQL query processing Sampling SCADS scalability scale independence scheduling Shark spark SQL storage Succinct transactions vldb

  • Come Visit
  • Contact
  • Open Positions


  • About
  • People
  • Publications
  • Projects
  • Seminars
  • Blog: AMP BLAB
  • Sponsors
  • Photos
  • Wiki
  • Jenkins
Copyright © 2021 AMPLab