An AMP Blab about some recent system conferences – Part 3: Hadoop World 2011

Error: Unable to create directory uploads/2024/04. Is its parent directory writable by the server?

I recently had the pleasure of visiting Portugal for SOSP/SOCC, and New York for Hadoop World. Below are some bits that I found interesting. This is the personal opinion of an AMP Lab grad student – in no way does it represent any official or unanimous AMP Lab position.

Part 3: Hadoop World 2011

Not exactly a research conference, Hadoop World is a multi-track industry convention hosted by Cloudera, an enterprise Hadoop vendor, and draws various companies with some stake in the Hadoop community. This year’s Hadoop World saw some 1500 attendees, including Hadoop vendors, Hadoop users, executives from various companies, vendors building on top of Hadoop, people looking to learn more about Hadoop, and of course, a small contingent of researchers. I believe Hadoop World is a good place for researchers to get a state-of-the-industry view of the big data, big systems space.

One theme is that Hadoop has really become “mainstream”, and moved much beyond its initial use cases in supporting e-commerce type services. The convention agenda included talks from household names beyond typical high-tech industries. The talks also had audiences in ripped jeans and flip flops sitting next to others in pressed three piece suits, indicating the present diversity of the community, and perhaps pointing to opportunities for multi-disciplinary collaboration in the near future.

Accel Partners announced a $100M “Big data fund” to accelerate innovation in all layers of the “big data stack”. This should be of interest to entrepreneurial-minded students within the Lab.

Another theme is that Hadoop is still waiting for a “killer app”. One keynote speaker dubbed 2012 to be “the year of apps”. In other words, the Hadoop infrastructure is sufficient to be “enterprise ready”; therefore innovation should now focus on using Hadoop to derive business value.

Also, the “data scientist” role is gaining prominence. Jeff Hammerbacher pioneered this role at Facebook. Companies across many industries are looking for similarly skilled people to make sense of the data deluge that’s happening everywhere. This role requires some combination of expertise in computer science, statistics, social science, natural science, business, and other skills. AMP Lab is rooted in computer science and statistics, and depending on individual students interests, also reasonably literate in social science/natural science/business areas. I certainly found it motivational to see the countless ways that the Lab’s expertise can be applied to create business value, help improve the quality of life, and even discover new knowledge.

NetApp and Cloudera announced a partnership in providing the NetApp Open Solution for Hadoop running on Cloudera Distribution including Apache Hadoop. It’s great to see increased collaboration between our industry partners beyond knowledge sharing through the Lab.

I gave a joint talk on “Hadoop and Performance” with Todd Lipcon, our colleague from Cloudera. The talk was well received, and folks are looking forward to our imminent release of the “Cloudera Hadoop workload suite”. One could say that the focus of typical enterprises should be either profit (monetary and societal), or arguing that “my-performance-is-better”. Thus, it remains the academic community’s responsibility and opportunity to develop scientific design and performance evaluation methodologies.

No travel notes this time.

An AMP Blab about some recent system conferences – Part 2: SOCC 2011

Error: Unable to create directory uploads/2024/04. Is its parent directory writable by the server?

I recently had the pleasure of visiting Portugal for SOSP/SOCC, and New York for Hadoop World. Below are some bits that I found interesting. This is the personal opinion of an AMP Lab grad student – in no way does it represent any official or unanimous AMP Lab position.

Part 2: Symposium on Cloud Computing (SOCC) 2011

This year represents the second iteration of the conference. SOCC has certainly established itself as a noteworthy conference that brings together diverse computer system specialties. The proceedings are available through ACM. Perhaps SOCC would become a stand-alone venue next year, instead of being co-located with SIGMOD (last year) or SOSP (this year).

AMP Lab is fortunate to have inherited many members from its predecessor RAD Lab, which made some contributions in highlighting cloud computing as an important technology trend and emerging research area. The numerous SOCC papers on MapReduce optimizations and key-values stores continues the research paths that RAD Lab helped identify regarding MapReduce schedulers and scale-independent storage.

The program committee awarded three “papers of distinction”: 1. Pesto: Online Storage Performance Management in Virtualized Datacenters, 2. Opportunistic Flooding to Improve TCP Transmit Performance in Virtualized Clouds, 3. PrIter: A Distributed Framework for Prioritized Iterative Computations. I especially liked the TCP paper – the authors actually modified the TCP kernel, a painful task per my own past experience.

Our AMP Lab colleagues presented two talks – Improving Per-Node Efficiency in the Datacenter with New OS Abstractions (Barret Rhoden, Kevin Klues, David Zhu, and Eric Brewer), and Scaling the Mobile Millennium System in the Cloud (Timothy Hunter, Teodor Moldovan, Matei Zaharia, Justin Ma, Samy Merzgui, Michael Franklin, Pieter Abbeel, and Alexandre Bayen). Both went very well.

One train of thought that appeared several times – how do the system improvements demonstrated over artificial benchmarks translate to real life situations. Folks from different organizations raised this point during Q&A for several papers, with the response being the familiar lament regarding the shortage of large scale system traces available to academia. This prompted our friend John Wilkes from Google to give an 1-slide impromptu presentation highlighting the imminent public release of some large scale Google cluster traces, and inviting researchers to work with Google. I felt it helpful to do an 1-slide impromptu follow-up presentation highlighting that AMP Lab has access to large scale system traces from several different organizations, inviting researchers to work with AMP Lab and our industrial partners, and of course thanking our Google colleagues John Wilkes, Joseph L. Hellerstein, and others for their guidance on our early efforts to understand large scale system workloads.

Portugal travel note 2: Consider taking in the stunning sunset at Castelo de Sao Jorge, set against the 25 de Abril Bridge across the River Tejo, with the Cristo Rei Statue lit by bright light on the opposite side of the River. Walking about the medieval Castle in semi-darkness is a unique and almost haunting experience, provided you can muster the courage and the night-vision. Or just head to the Bairro Alto historical neighborhood and stuff yourself on fantastic local food.

An AMP Blab about some recent system conferences – Part 1: SOSP 2011

Error: Unable to create directory uploads/2024/04. Is its parent directory writable by the server?

I recently had the pleasure of visiting Portugal for SOSP/SOCC, and New York for Hadoop World. Below are some bits that I found interesting. This is the personal opinion of an AMP Lab grad student – in no way does it represent any official or unanimous AMP Lab position.

Part 1: Symposium on Operating System Principles (SOSP) 2011

A very diverse and high quality technical program, as expected. You can find the proceedings and talk slides/videos at http://sigops.org/sosp/sosp11/current/index.html.

One high-level reaction I have from the conference is that AMP Lab’s observe-analyze-act design loop position us well to identify emerging technology trends, and design systems with high impact under real life scenarios. Our industry partnerships would also allow us to address engineering concerns beyond the laboratory, thus expedite bilateral knowledge transfer between academia and industry.

One best-paper award went to “A File is Not a File: Understanding the I/O Behavior of Apple Desktop Applications”, authored by our friends from Univ. of Wisconsin, Professors Andrea and Remzi Arpaci-Dusseau, as well as their students. The paper did an elaborate study of Apple laptop file traces, and found many pathological behavior. For example, a file “write” actually writes a file multiple times, a file “open” touches a great number of seemingly unrelated files.

Another best-paper award went to “Cells: A Virtual Mobile Smartphone Architecture” from Columbia. This study proposes and implements “virtual phones”, the same idea as virtual machines, for example running a “work phone” and a “home phone” on the same physical device. The talk highlight was a demo of two versions of the Angry Birds game running simultaneously on the same phone.

The audiences-choice best presentation award went to “Atlantis: Robust, Extensible Execution Environments for Web Applications”, a joint work between MSR and Rutgers. The talk very humorously surveyed the defects of current Internet browsers, and proposes an “exokernel browser” architecture in which web applications have the flexibility to define their own execution stack, e.g. markup languages, scripting environments, etc. As expected, the talk catalyzed very entertaining questioning from companies with business interests in the future of web browsers.

Also worthy of highlighting – the session on Security contained three papers, all three have Professor Nickolai Zeldovich on the author list, and all three of high quality. I have not done a thorough historical search, but I’m sure it’s rare that a single author manages to fill a complete session at SOSP.

There was also a very lively discussion on ACM copyright policies during the SIGOPS working dinner. I personally believe it’s vital that we find policies that balances concern about upholding the quality of research, preserving the strength of the research community, and facilitating the sharing of cutting edge knowledge and insights.

My own talk on “Design Implications for Enterprise Storage Systems via Multi-Dimensional Trace Analysis” went very well. This is a study that performs an empirical analysis on large scale enterprise storage traces, identify different workloads, and discuss design insights specifically targeted at each workload. The rigorous trace analysis allow us to identify simple, threshold-based storage system optimizations, with high confidence that the optimizations bring concrete benefit under realistic settings. Big thank you to everyone at AMP Lab and our co-authors at NetApp for helping me prepare the talk!

Lisbon travel note 1: If history/food is dear to your heart, you will find it worthwhile to visit the Jerónimos Monastery, and try the Pasteis de Nata sold nearby. This is THE authentic egg tart, originated at the Monastery, and very good for a mid-day sugar-high. I had too many – I felt too happy after eating the first 10, lost count of how many more I ate, and skipped lunch and dinner for that day.