BDAS, the Berkeley Data Analytics Stack, is an open source software stack that integrates software components being built by the AMPLab to make sense of Big Data.
BDAS consists of the components shown below. Components shown in Blue or Green are available for download now. Click on a title to go that project’s homepage.
In-house Apps |
Access and Interfaces |
Processing Engine |
Storage |
Resource Virtualization |
Spark Streaming | Sample Clean | G-OLA | SparkR | GraphX | Splash | MLBase | Velox |
BlinkDB | MLPipelines | ||||||
SparkSQL | MLlib | ||||||
Apache Spark (Core) |
Succinct | |||
Alluxio (formerly Tachyon) | |||
HDFS, S3, Ceph |
Apache Mesos | Hadoop Yarn |
AMPLab Initiated | Spark Community | 3rd Party | In Development |
In addition to BDAS, the AMPLab has released additional software components useful for processing data:
- AMPCrowd: A RESTful web service for sending tasks to human workers on crowd platforms like Amazon’s Mechanical Turk. Used by the SampleClean project for context-heavy data cleaning tasks.
Roadmap
BDAS will continue to evolve over the life of the AMPLab project, as existing components evolve and mature and new ones are added.
Community
- Software project Meetups – Help organize monthly developer meetups around BDAS components. Check out the Spark/Shark meetup group, the Mesos meetup group, and the Alluxio meetup group
- AMP Camp “Big Data Bootcamp” – Two days packed full of software system intros, demos and hands-on exercises. Aims to bring practitioners with no prior experience up to speed and writing real code with real advanced algorithms.
- Support – Unlike many research software prototypes that never see production use, we support BDAS software components by actively monitoring and responding on developer and user mailing lists.