An AMP Blab about some recent system conferences – Part 3: Hadoop World 2011

Error: Unable to create directory uploads/2024/05. Is its parent directory writable by the server?

I recently had the pleasure of visiting Portugal for SOSP/SOCC, and New York for Hadoop World. Below are some bits that I found interesting. This is the personal opinion of an AMP Lab grad student – in no way does it represent any official or unanimous AMP Lab position.

Part 3: Hadoop World 2011

Not exactly a research conference, Hadoop World is a multi-track industry convention hosted by Cloudera, an enterprise Hadoop vendor, and draws various companies with some stake in the Hadoop community. This year’s Hadoop World saw some 1500 attendees, including Hadoop vendors, Hadoop users, executives from various companies, vendors building on top of Hadoop, people looking to learn more about Hadoop, and of course, a small contingent of researchers. I believe Hadoop World is a good place for researchers to get a state-of-the-industry view of the big data, big systems space.

One theme is that Hadoop has really become “mainstream”, and moved much beyond its initial use cases in supporting e-commerce type services. The convention agenda included talks from household names beyond typical high-tech industries. The talks also had audiences in ripped jeans and flip flops sitting next to others in pressed three piece suits, indicating the present diversity of the community, and perhaps pointing to opportunities for multi-disciplinary collaboration in the near future.

Accel Partners announced a $100M “Big data fund” to accelerate innovation in all layers of the “big data stack”. This should be of interest to entrepreneurial-minded students within the Lab.

Another theme is that Hadoop is still waiting for a “killer app”. One keynote speaker dubbed 2012 to be “the year of apps”. In other words, the Hadoop infrastructure is sufficient to be “enterprise ready”; therefore innovation should now focus on using Hadoop to derive business value.

Also, the “data scientist” role is gaining prominence. Jeff Hammerbacher pioneered this role at Facebook. Companies across many industries are looking for similarly skilled people to make sense of the data deluge that’s happening everywhere. This role requires some combination of expertise in computer science, statistics, social science, natural science, business, and other skills. AMP Lab is rooted in computer science and statistics, and depending on individual students interests, also reasonably literate in social science/natural science/business areas. I certainly found it motivational to see the countless ways that the Lab’s expertise can be applied to create business value, help improve the quality of life, and even discover new knowledge.

NetApp and Cloudera announced a partnership in providing the NetApp Open Solution for Hadoop running on Cloudera Distribution including Apache Hadoop. It’s great to see increased collaboration between our industry partners beyond knowledge sharing through the Lab.

I gave a joint talk on “Hadoop and Performance” with Todd Lipcon, our colleague from Cloudera. The talk was well received, and folks are looking forward to our imminent release of the “Cloudera Hadoop workload suite”. One could say that the focus of typical enterprises should be either profit (monetary and societal), or arguing that “my-performance-is-better”. Thus, it remains the academic community’s responsibility and opportunity to develop scientific design and performance evaluation methodologies.

No travel notes this time.