@bigdata: O’Reilly Strata Blog on Shark Analytics

New post by Ben Lorica on the Strata Big Data Blog: Real-time queries & analytics for Big Data: Shark is 100X faster than Hive for SQL, & 100X faster than Hadoop for machine-learning.

http://strata.oreilly.com/2012/11/shark-real-time-queries-and-analytics-for-big-data.html

Summary

Impala and Shark are interactive SQL systems for Hadoop. A new paper shows Shark offers speedups that arecomparable to those observed in MPP databases. In addition to being 100X faster than Hive for SQL, Shark is a framework that is 100X faster than Hadoop for (iterative) machine-learning algorithms.