Tuesday, September 15, 2009

Things learned from P2P'09

Besides many excellent technical papers presented at the P2P'09 conference in Seattle, I am particularly impressed with the 3 keynote speeches. Videos of these talks are now available on the Website (I was sitting at the back, therefore can't be found). Just like to blog a few lines here to remind me to not forget what were communicated:

1. Ian Clark, creator of the Freenet Project, kicks off with his recent work on SWARM, a mid-layer infrastructure to write scalable distributed application. The one point I took away from his presentation, besides the colorful and animated slides, is the argument that if Facebook, Twitter and other large Web applications used better tools during developments, they can save development time and rest assured that their applications will scale. Such infrastructure is particular useful if adopted by Cloud providers. As details are abstracted away from developers, they can now concentrate in making applications with more practical values.

Also, this is the talk in which I first heard of MapReduce. Few hours of reading and following links in Wikipedia reveal that it is a very popular programming paradigm used by Google for distributing its computation loads. Even though Database experts aren't very fond of it, MapReduce looks easy to use and achieve great parallelism by Mapping the problems, distributing the sub-problems to servers, results of which are eventually aggreated by calls to Reduce functions. A distributed programming course using MapReduce is also availbe online.


2. Roger Barga from Microsoft makes his contribution to the Cloud computing theme in an engaging talk. He starts describing how High Performance Computing (HPC) and Cloud Computing are essentially twins separated at birth. Cloud Computing, on the other hands, enjoys explosive growth due to interest from the industry. He paints a quite ironic picture that a medium-size data center offers more computation power than the world's three largest academic clusters put together. One of Microsoft's focus at the moment (beside churning out worse and worse Windows - this is not his words) is on developing a new programming paradigm and laguge to take full advantage of the Cloud. The argument for this endeavour is that the current implementation is Virtual-Machine based where customers have full controls over VMs. This does not scale well. A abstraction with which developers only see data and need not concern with how many and which VMs they have is more scalable and preferable. He ends the talk with the observation that Cloud computing is transforming science. More specifically, science history started with Experimental Science (observe, then hypothesize), moved to Theoretical Sience (the like of quantum physics, Einstein et al), then Computational Science (weather modeling, etc). Now, science is in the Data-Intensive age, with more and more data coming from the ProteinFolding, LHC, etc. And the enabler is the Cloud.

3. Ken Birman from Cornell is the last keynote speaker. He argues for the current change in practice adopted by the industry: embrace consistency. This means: forget about consistency algorithms, learns how to deal with it. Ken continues by introducing his Gossip protocol that could provide an answer to the consistency problem in the cloud. It is important to preserve consistency, because things like security can not be achieved without consistency. Maybe this is our last hope for a stable and secure Cloud ?