Tuesday, April 29, 2008

Acrobat compatibility problem with PDFExpress


PDFExpress is the online PDF validator tool that check if your PDF file is correctly formatted so that it can be published by IEEE CS and viewed online by IEEExplorer.  The tale is that the error that I got back says "Acrobat version is less than 5.0". The innocent guess was something to do with the compatibility nonsense when I was generating the PDF file. I usually do these:
  • latex file.tex
  • dvipdf file.dvi
and rely on these magic commands to generate anything that could be open by Acrobat. 

PDFExpress offers an alternative with which you can submit your source files and rely on the system to generate the PDF file it likes. Sounds ideal. I tried that and failed miserably: somehow all the figures' captions had gone missing. I must stress that all pictures were included in a properly structured, zipped source file. And don't bother to get the Technical Help from the site, they responded with some incomprehensible instructions that only make you more nervous.  

My solution that works is drawn from this link on creating PDF with ps2pdf. The steps are:
  • latex file.tex
  • dvips file.tex -o file.ps
  • ps2pdf -dCompatibilityLevel=1.4 -dOptimize=true -dEmbedAllFonts=true file.ps
The compatibility levels can be 1.2, 1.3 and 1.4, which correspond to PDF version 3.0, 4.0 and 5.0 respectively. Now submit the the PDF file to PDFExpress, and you will get a nice notification saying that your much-loved paper has passed the check. Hurray! 

Thursday, April 24, 2008

When good guys doing bad things

The recent USENIX conference NDSI has been slashdotted a lot. One very interesting paper describe how people pollute Stormnet, one of the biggest botnet, and bring it down to knees. 

One thing I learn from this paper is that the current Storm botnet is organized in a structured P2P overlay. It implements one of the very simple yet effective overlay called Kademlia.  So once infected, the bot machine will join the Kademlia network and participate in a publish/subscribe protocol. The general operations of this gang are as follows:
  1. An attacker initially publishes commands into the network, using keys that he and all the bots in the network know how to generate. As the matter of fact, the key can be based on the current date and time, plus some pre-define randomness. Credits to the researchers to reverse-engineering the malware and discover this process. 
  2. Bots periodically search for keys that it knows. Obviously they will find the published commands, download and execute them blindly. 
  3. Not much details are given about the commands, but I presume it can contain another set of commands,  so that the bots themselves can later publish commands into the network. That way, the attacker is well covered.
If I were the botnet creator, I would use Tapestry overlay rather than Kademlia. It is more suitable and probably performs better for this type of subscription/publishing protocol. On the other hand, Kademlia is probably the most easy one to implement, and its source code is available. 

The most interesting point is the proposal (already deployed) of launching Sybil attacks to the network. I spend quite some time reading papers on Sybil attacks. All of them, including me, label Sybil attacks as a threat to P2P networks and works are focused on finding away to mitigate them. This work shed a new light to the field by looking at the attacks at a different angle. If the network is full of bad participants, taking it down becomes a moral thing to do. And Sybil attack is a great tool.  It has implemented all sort of attacks mentioned in the literature, desperately trying to render the botnet useless. The attacker will have the same difficulty finding us as we have in finding him.  The attack attempts to partition the network so that bots can not discover commands that were published. 

Having said that, many (including me) are working towards how to fight Sybil attacks. Apparently, the attacker can use these techniques to fend off the current attempts. At the moment, the paper describes that it was able to introduce thousands of Sybils into the networks, using a single machine. However, due to Kademlia's routing redundancy, there are still rooms for improvement. It also surprises me a bit that you can launch that many Sybils on a single machine. This indicates how the current implementation of the botnet malware is still not robust enough, for it allows thousands of connections from a few IP addresses. 

Anyways, Sybil attack is just like guns. It does bad things when under bad guys' control, but can offers helps to good guys in certain circumstances.  For the time being, though, the possible bad things it can cause seem to outnumber the good things it can bring.  

Java to open source and the story of software licenses

Sun Microsystem is currently removing the last hurdles in the run towards freeing its famous Java platform.

The story reminds me of the days when I first used Linux. Discovering and experiencing with free, open source software is great. However, one thing that bugged me was that none of the Linux distributions has bundled Sun's Java Runtime Environment. Instead, there was a weird pre-installed Java that took a while to learn how to get rid of.  Only later did I find out it is because of the license incompatible between Linux and Java. Early 2006, Sun announced that they have picked GLP license for Java once the source is open.  This move makes it possible for the next releases of some Linux distributions to automatically bundle Java Runtime Environment. For developers, it saves a few clicks, few commands and a bit of bandwidth for downloading JRE. For normal users (addressed to Ubuntu users :)), it means no need to hack your Firefox to run Java applets (even as "yesterday" as it sounds).

The news gets me to find out what exactly the license issue was at the first place. Here, I write down my notes on my "finding" of open-source software, GPL license, BSD license (all thanks go to Wikipedia, whose license is also GPL-compatible):

1. Open source software vs free software: the former means you (the user) have access to the the source codes of the software. The later firstly include the source code to be open. It further defines licenses regarding distribution of the software.  It is at this point that the free-software community divides: copycenter and copyleft. 

2. Copyleft or GPL vs Copycenter or BSD:
The General Public License (GPL) is labelled copyleft, as its terms and conditions go opposite way of the proprietary, copyright license.  The most important requirement of GPL license (besides being free of charge) is that: if A is GPL licensed, and derivatives of  A must be licensed by licenses that are not more restrictive than GPL. For example, you download a GPL licensed software A written by a person P (the copyleft holder), together with its source code. You modify A to have a new software B. You can keep B for yourself. But if you decide to distribute B, it must be GPL licensed, i.e. people who download B must be able to get the source code and makes changes,  freely.  If you do not comply, P can sue you for copyleft violation

GPL seems to take all the rights off the person who wrote the software. The Berkeley Software Distribution (BSD) licenses sits in the middle of the two extremes: copyright and copyleft. It is therefore labeled copycenter. The main difference to GPL, regarding distribution is: any derivatives of your software can be distributed by any other licenses. An example: I wrote a software A and distributed it under BSD license. You get the software, modify and turn it into B. Now you can distribute B in any way you like, especially you can turn it into proprietary software and make people pay for it, and I won't get a dime. This difference make BSD even freer than GPL, for GPL enforces the derivative to be distributed under GPL as well. 

3. Now go back to the Java licensing. Linux distributions are GPL'ed, and it requires that all software bundled in it are GPL'ed. Sun's JRE license it not GPL compatible, it has several restrictions about distribution of derivatives. Consequently, to have Java, we used to have to download it from the Sun webpage and agree with all the terms and conditions that we never bother to read. 

After all, one thing is still on my mind though. mPlayer is a great media player in Linux, and it's GPL licensed. Why none of the Linux distributions I know of did not have it bundled into their distribution? Why make users go through the painful process of downloading (then installing) it off the Web ? Until then, have a beautiful and predictable (weather-wise) day.

Wednesday, April 23, 2008

Google app engine - the next best things ?

Haven't got time to catch up with recent news, but so glad that I did eventually. 

In "recent" news, Google announced its new Google app engine.  It immediately sparked commotions not only in the Geek-world, but also in business world.  A great review of the new product can be found here

Long story short:
1. Google gets so proud of its extremely scalable infrastructure that it now offers us to have a piece of it. Basically, it provide you with a SDK, with which you can develop your next best, breakthrough, world-changing Web application.  Such SDK can be downloaded and installed at your local machine. You then develop and test your application locally, with the SDK simulating the Web server.

2. After testing, you decide (or have to) host your next best thing with Google. Good news though, the giant offers upto 5ooMb of web space for free. Your application will now behave exactly as it does in your local machine. Your mom, dad  and grandparents now can access it.

3. The beauty of this Google App Engine is that once you upload your application onto Google, your application has the same scalability as Google possesses at the moment. It means that millions Internet users (as if I've ever have more than 10 friends?) can access your application at once without wrongly cursing the Internet for being so slow. In addition, you can rest assure of all the Distributed Denial of Service to your application, had you ever made enemy with a hacker. For start-up or even already established companies that struggle with bringing their business only, it seems to come in just the right time. 

Sounds to good to be true ? The review I mentioned early pointed out brilliantly the hidden catches of this new product.
1. It's all about lock-in. Once you get addicted and have written all your applications with this, it may prove difficult to switch to another platform. Or look at it another way, you definitely have to re-write the "legacy" code (which is everything) of your current application if moving to Google's platform sounds ideal for you.
2. If this infrastructure does provide a fully featured back-ends (databases, etc...), which I suppose it does, the prominent question is  what will happen with your data. Google does not have the best reputation on it practice of gathering and handling private data. 
3. The long term effect of the lock-in strategy appears more worrying. Let see your application REALLY is the next best thing. You already relied on Google to make it extremely scalable. Moving away from it may not be the best move, as performance of your application won't get any better. And when Google comes to offer you a take-over deal, it almost definitely is the offer you can not refuse. Should you determine to sell your application for another party, such third parties may think twice with all the cost and effects when moving away from Google. 
4. This is probably not a catch, but I personally think that this product may encourage bad practices in developing scalable applications. Think of how many years of research effort has put on studying distributed system and how to make it scalable. Now with this, developers will not feel the need of designing a better, sustainable application.

My conclusion is that it is a great move by Google, probably Amazon's Cloud killer in my opinion. It illustrates vividly the current move towards a Software as a Service paradigm. Next best things really are just around the corner.