The ReadWrite Blog recently featured a story on Baidu’s use of AMPLab-born software for running fast SQL queries over petabyte-scale, globally-distributed databases. The article is a short interview with Baidu Senior Architect Shaoshan Liu describing the deployment. One interesting quote from the article:
With the system deployed, we measured its performance using a typical Baidu query. Using the original Hive system, it took more than 1,000 seconds to finish a typical query. With the Spark SQL-only system, it took 300 seconds. But using our new Alluxio and Spark SQL system, it took about 10 seconds. We achieved a 100-fold increase in speed […]