Spark – Lightning-Fast Cluster Computing

Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much quicker than with disk-based systems like Hadoop MapReduce. To make programming faster, Spark integrates into the Scala programming language, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala interpreter.

More details and downloads can be found on the Spark homepage.

AMP Lab – UC Berkeley

Spark – Lightning-Fast Cluster Computing

Projects

Tags