Software

BDAS, the Berkeley Data Analytics Stack, is an open source software stack that integrates software components being built by the AMPLab to make sense of Big Data.

BDAS consists of the components shown below. Components shown in Blue or Green are available for download now. Click on a title to go that project’s homepage.

In-house Apps

Access and
Interfaces

Processing Engine

Storage

Resource Virtualization

Cancer Genomics Energy Debugging Smart Buildings
Spark Streaming	Sample Clean	G-OLA	SparkR	GraphX	Splash	MLBase	Velox
	BlinkDB					MLPipelines
	SparkSQL					MLlib
Apache Spark (Core)

	Succinct
		Alluxio (formerly Tachyon)
		HDFS, S3, Ceph

Apache Mesos		Hadoop Yarn

AMPLab Initiated

Spark Community

3rd Party

In Development

In addition to BDAS, the AMPLab has released additional software components useful for processing data:

AMPCrowd: A RESTful web service for sending tasks to human workers on crowd platforms like Amazon’s Mechanical Turk. Used by the SampleClean project for context-heavy data cleaning tasks.

Roadmap

BDAS will continue to evolve over the life of the AMPLab project, as existing components evolve and mature and new ones are added.

Community

Software project Meetups – Help organize monthly developer meetups around BDAS components. Check out the Spark/Shark meetup group, the Mesos meetup group, and the Alluxio meetup group
AMP Camp “Big Data Bootcamp” – Two days packed full of software system intros, demos and hands-on exercises. Aims to bring practitioners with no prior experience up to speed and writing real code with real advanced algorithms.
Support – Unlike many research software prototypes that never see production use, we support BDAS software components by actively monitoring and responding on developer and user mailing lists.