The demonstration of AMPLab’s Shark query processing system was given the “Best Demo” Award at the SIGMOD 2012 Conference held in Phoenix this week.
The demo proposal was entitled “Shark: Fast Data Analysis Using Coarse-grained Distributed Memory” by Cliff Engle, Antonio Lupher, Reynold Xin, Matei Zaharia, Michael Franklin, Scott Shenker and Ion Stoica.
At the conference, Messrs Engle, Lupher, Xin, and Zaharia showed Shark and Spark running a K means clustering algorithm on 100 EC2 nodes over 16 months of Wikipedia logs. The demo included some nice visualizations showing the activity on thecluster nodes and also took the bold approach of showing the top search topic clusters over Wikipedia without censoring them (you can imagine).
The live demonstration showed the scalability and sophistication of Shark as well as its ability to address some of the real limitations of the popular Hive query system.
More info on the demo is at: http://amplab.cs.berkeley.edu/2012/05/20/shark-jumps-big-data-at-sigmod/
It should be noted that this makes for back-to-back Best Demo awards for AMPLab projects at the major Database Systems research conferences (including the award for CrowdDB at VLDB 2011) – an unprecedented accomplishment.
A special congratulations to Reynold Xin, who was a ringleader for both of these successful efforts.