An article posted recently on the IBM developerWorks website gives a nice over view of the Spark programming model. The article entitled “Spark, an alternative for fast data analytics”, by M. Tim Jones is available at http://www.ibm.com/developerworks/library/os-spark/. The article summary is below:
Summary: Although Hadoop captures the most attention for distributed data analytics, there are alternatives that provide some interesting advantages to the typical Hadoop platform. Spark is a scalable data analytics platform that incorporates primitives for in-memory computing and therefore exercises some performance advantages over Hadoop’s cluster storage approach. Spark is implemented in and exploits the Scala language, which provides a unique environment for data processing. Get to know the Spark approach for cluster computing and its differences from Hadoop.