Tuesday, August 31, 2010

The most expensive dictionary yet!

It's been announced that the Oxford English Dictionary (OED) is moving online, at the cost of the hundred-pound print versions being terminated, forever.
Most interestingly, the price tag is set at around £250 a year per subscription. This means £2.5K per 10 years, which can be compared with the £1K print version and £150 PC/Mac CD version. How can that not leave you thinking?

First, I want to say something about print versus online or CD-based dictionaries. I always have a hypothesis, derived from personal experience, that people remember words better when using the printed dictionary than when doing online or software-based look-ups. One of the reason, I cannot say for certain as it would require a degree in cognitive science, might be that the word being looked up in the printed dictionary is stored in your memory together with the spatial information of the page that the word is in, with the way the word and its explanations are printed, and with the words near it. The main idea here is that: there is more contextual information stored in your memory, in comparison to the online/software-based look-ups. The more contextual, high-level information there is, the more efficiently it is stored and retrieved by your memory.

Second, as a non-native English speaker (didn't even start learning English properly until I was 18), I've had a high level of exposure to numerous kinds of English dictionaries, both English-Vietnamese and English-English ones. The tiny paper-back ones; the large, hard-back one, the e-Dictionary ones, and the PC/Mac-friendly ones. Here is a priority list of things that a typical non-native English speaker uses the dictionary for:
  • To look up the (most common) meanings of the words. The simpler the explanation and the more examples there are, the better.
  • To look up phrases, idioms whose meanings are complete different to the ones constructed by threading the meanings of the individual words together.
  • To look up the synonyms, antonyms
  • For nouns, to find out if they are countable or uncountable
  • To check the pronunciations
My understanding of the extra feature that OED offers, besides the inclusiveness, completeness of all the English words ever used, is:
  • To find out the origin of the words, how it is used in different (literature) contexts.
In other words, (1)-(5) answer the what questions about the word, while (6) addresses the why question. How many of us, who aren't linguists or writers, are concerned with the latter? Not many. In science, we use English as a way to communicate, i.e. we only need each others to agree upon the meanings, not to know why they are these ways. Having said that, we all need people who can master and manipulate the meanings of words. We need poetry. We need arts. And those people need things like the OED.

So, is the price tag justified? The publisher can argue that the subscription buys the users the benefit of the fast, convenient, access-it-everywhere look-ups. Furthermore, the users always have the most up-to-date versions. And thousands of trees, both for making paper and book selves, will be saved. My opinion on this is: yes, go and buy it, but only if you can (1) afford it or (2) are a writer/poet/journalist/linguist. For others, I would recommend two options that adequately satisfy the requirement (1)-(5):
  1. Cambridge advanced learner's dictionary, printed version or the CD version for PC and Mac.
  2. McMillan free online dictionary: http://www.macmillandictionary.com/
Have a great day!

Saturday, August 21, 2010

Reading and the Internet

An interesting article discusses a new book called “The Shallows”, whose thesis is that the Internet has altered our brain, in a bad way.


The author, Nicholas Carr has previously written an article on Atlantics accusing Google (and the Internet bandwagon) of making it stupid. The main argument that occurs again in the book is that reading in the Internet induces multitasking - which is not necessarily a good thing - and content skimming.

With regards to the latter, it can be seen that Carr is a strong advocate for the idea of “slow reading”, i.e. reading books, the old ways. Slow reading, I agree, brings out the contemplative, reflective thinking of the readers. It roughly says that by immersing oneself, spending days of reading one book, the chance is high that one would stop and a moment to think, to relate and find connections with real life, or just simply appreciate the writing, the plot, the art being delivered by the text. A premise of slow reading is total concentration, total immersion with the book. One could imagine the act of slow reading is a continuous line of thought over a period of time. This is a stark contrast to the typical way people do Internet reading over the same period of time, which could be describe as ad-hoc, discontinuous, segmented lines of thought.

How do we read in the Internet? We mostly do our reading on the web. Beside it being unnatural for the eyes, we would never *just* read. We multitask, we listen to music, we check out pop up links, we check emails, facebook, twitter, ... and the list keeps going longer. Sadly, reading has become merely one of those task, and I feel chills down on my spine when thinking about the possibility that reading becoming a secondary task, i.e. we just happen to read while doing something else.

I am not jumping to the conclusion that the age of Internet reading is bad for the future generations, or even for us. First, it is currently impossible for science to quantify what are good and what are bad for our children. Second, the debates on the effect of the information age on our culture are still going strongly, with people from both camps refuse to come to a consensus. In the art camps, authors like Carr start using scientific evidence demonstrating the change in our brain structure as we get exposed to the Internet. Such the result is interesting, but merely for scientific interest, and scientists cannot say if the changes are bad or good. The wise men used to say that science only provides evidence and facts, not opinions. In the science camps, cognitive or neural scientists
usually claim the long-term benefit of online, role-playing games to children. In the contrary to the common thinking that video games are bad for children, those role-playing games enforce children to learn and practice good social behavior, improve their communication skills, and allow them to explore the possibilities that their parents’ generation did not have. As scientists are still struggling to quantify the effects of technologies on the youth, there certainly are strong evidence supporting the thesis that growing up with technology is a good thing for children.

Finally, my personal take on this is not to blame the technology, which by its nature has bad sides and good sides. It’s up to us to decide how to make the best of them. As a result, parents, teachers and politicians play important roles in shaping our future. As for reading, I consider myself lucky for being born before the Internet, to be able to observe and get the best of both worlds. I use the Internet to skim around ideas and topics I don’t know about. I use the Internet to find scientific articles at the speed unimaginable just more than one decade ago. However, text books, article print-outs and non-fictions are still the main resource for studying. I’m still building up my collection of novels, still indulging myself in the minds of the novelist, in the imaginary worlds full of fantastic charaters, in the emotions brought by the books - those that are unreachable by technology.

Sunday, May 02, 2010

Cannot vs can not

Found a great article discussing how cannot differ from can not or can't. I only recently had this mistake pointed out to me.

The full article is here (also read the Further Discussion part):

In summary:

cannot: not able to do something. E.g. I cannot swim
can not (or can't): able to to it, but choose not to. For example: I can not study

Tuesday, April 27, 2010

List of useful Linux commands

For myself:

1. tar zcf output Folder
Archive the folder to .tar.gz

2. ls -lt
Listing, sort by modified time

3. tar ztf file
See tar file content

4. find directory -name '*key'
Finding all files ending with 'key'

5. Ctrl+Z, fg, bg
Suspend the process, bring the process to foreground, background respectively

6. a2ps -o output ascii_file
Convert the Ascii file into ps, for better printing

Thursday, April 15, 2010

Notes on Java Threads and Concurrency

I was taught Thread & Concurrency five years ago and it was one of the most interesting and useful modules I have ever taken. But consider how much time has elapsed, one should be forgiven for forgetting some of the stuff he learned. In fact, it is always a good thing to forget something and then re-learn it again years later, so you could look at it from a different angles and probably gain more insight. In this blog, I aim to jot down some fundamentals about Thread & Concurrency (in Java) to remind my future self, so that he won't make stupid mistakes when writing highly concurrent programs.

1. Concurrency problems (which lead to synchronization methods) arise from the scenarios where an object O is shared by multiple working threads. As later discussed, Thread.sleep() method is only used to put the current thread to sleep, it has no synchronization semantic whatsoever, because it doesn't have anything to do with shared objects.

The analogy I use here (taken from a Java book) is the phone booth, being the common resource shared by multiple threads (users eager to use the phone booth).

There are two properties we want regarding the shared objects:
  1. At most one thread can access the object at a time, to avoid concurrent reading, writing for example. In the phone booth analogy, this property means that no more than one person is using the booth at a time.
  2. One thread can communicate with each other about the state of the object. In particular, if the booth is broken, we want all users to wait until another thread (the maintenance officer) fixes it. Once finish, the maintenance officer announce to all waiting people that the booth is now fix and people could start making call again. We prefer this to the scenarios where everyone attempts to use the booth, one after another even though it is known that the booth is still broken. This scenario is less efficient.
2. Each object, namely O, has an implicit monitor lock called L. In the phone booth analogy, the lock is the phone booth's door. Once one user enter and lock it, the phone booth becomes occupied and others cannot enter until it is open again and free.

3. We use the synchronized keyword for methods implemented in O in order to serialize the access to O. For example, the following code

synchronized void methodA()

will has the following effects:
  1. Before entering this method, the lock L must be obtain.
  2. If L is not available, it will block until it is.
  3. Once L is obtain, the method is executed. At the end, L is release
So in the phone booth analogy, methodA() could represent a user wanting to make a call. First, he checks if the booth is available (by checking that the door is not lock). He waits until it is available, then enters, locks it, makes a calls, gets out and leaves the door open. Notice that if more than one person is waiting to use the booth, they may have to compete (using a social protocol, for example) with each other to decide who gets the booth when it becomes available.

We can see that the access to the phone booth is serialized and no more than one person can use it at a time.

4. Java support the Lock class, whose most commonly used sub-class is ReentranceLock. One could attempt to get the lock and subsequently release it using tryLock() and unlock() method respectively. The differences between using this and the synchronized keyword is as follows:
  1. The synchronized keyword access the implicit lock associated with the object. A ReentranceLock object is explicit.
  2. The tryLock() method is non-blocking, in the sense that it returns true or false immediately depending on whether the lock is available. This is of contrast to the implicit lock, where obtaining the lock (via calling a synchronized method) will block until the lock becomes available to it.
5. The wait/notify/notifyAll mechanisms are used to for the 2nd property of concurrency described at the beginning. In particular:


  1. It releases the current lock, the remaining code after the call is not executed. This means that the calling method must have obtained the lock, i.e. the wait() method must be called from inside a synchronized method. Other threads trying to get the lock can now get it.
  2. The current thread is put to a waiting queue Q (different from the queue of threads competing for the object's lock).
  3. This thread will be awaken by a notification and removed from the Q. Once removed, it enter the queue competing for the object lock. If it then gets the lock, the remaining code from after the wait() call is executed.
notify(): awakens a randomly chosen thread from Q of the current object. Notice that if more than one threads are waiting, only one random one is notified. In the phone booth analogy, the maintenance officer only tells one waiting caller that the booth is fixed, others are left waiting in vain.

notifyAll(): awakens all threads in Q. As a consequence, they all wake up and compete for the lock before executing the remaining of their codes (after the wait() calls). Under normal circumstances, always use notifyAll() instead of notify(), as with the latter, a wrong thread could be awakened. Because different threads may wait for different conditions, it is always advisable to surround the wait() method by a while loop checking for the right condition. More specifically:

while (condition)

//do other things

Using notifyAll(), all waiting threads wake up and compete for the lock. If the wrong one gets the lock, it will knows that its condition is still not satisfied and go back to waiting.

6. A common pitfall is using Thread.sleep() to deal with concurrency problems. The Java specification gives no synchronization semantic to this method. This method simply puts the current thread to sleep, but nothing else. A very good example demonstrating the pitfall is as follows:

while (!this.done)

and assume that there's another thread will change this.done at some point. The problem is that the above code could loop forever, because Java isn't required to load fresh value of this.done from memory. It means that Java could reuse old value from its cache when checking the condition this.done, therefore changes from other thread won't be noticed.

Thursday, November 12, 2009

Interesting article on types

What a coincidence! Just when the students are pulling their hairs trying to grasp the concept of interfaces in Java, I came across this well written, comprehensive article explaining type, data abstraction and polymorphism. These concepts are presented rather theoretically, but quite easy to understand as the authors relate them to features in programming languages.

Highly recommended!

The link is:


Tuesday, September 15, 2009

Things learned from P2P'09

Besides many excellent technical papers presented at the P2P'09 conference in Seattle, I am particularly impressed with the 3 keynote speeches. Videos of these talks are now available on the Website (I was sitting at the back, therefore can't be found). Just like to blog a few lines here to remind me to not forget what were communicated:

1. Ian Clark, creator of the Freenet Project, kicks off with his recent work on SWARM, a mid-layer infrastructure to write scalable distributed application. The one point I took away from his presentation, besides the colorful and animated slides, is the argument that if Facebook, Twitter and other large Web applications used better tools during developments, they can save development time and rest assured that their applications will scale. Such infrastructure is particular useful if adopted by Cloud providers. As details are abstracted away from developers, they can now concentrate in making applications with more practical values.

Also, this is the talk in which I first heard of MapReduce. Few hours of reading and following links in Wikipedia reveal that it is a very popular programming paradigm used by Google for distributing its computation loads. Even though Database experts aren't very fond of it, MapReduce looks easy to use and achieve great parallelism by Mapping the problems, distributing the sub-problems to servers, results of which are eventually aggreated by calls to Reduce functions. A distributed programming course using MapReduce is also availbe online.

2. Roger Barga from Microsoft makes his contribution to the Cloud computing theme in an engaging talk. He starts describing how High Performance Computing (HPC) and Cloud Computing are essentially twins separated at birth. Cloud Computing, on the other hands, enjoys explosive growth due to interest from the industry. He paints a quite ironic picture that a medium-size data center offers more computation power than the world's three largest academic clusters put together. One of Microsoft's focus at the moment (beside churning out worse and worse Windows - this is not his words) is on developing a new programming paradigm and laguge to take full advantage of the Cloud. The argument for this endeavour is that the current implementation is Virtual-Machine based where customers have full controls over VMs. This does not scale well. A abstraction with which developers only see data and need not concern with how many and which VMs they have is more scalable and preferable. He ends the talk with the observation that Cloud computing is transforming science. More specifically, science history started with Experimental Science (observe, then hypothesize), moved to Theoretical Sience (the like of quantum physics, Einstein et al), then Computational Science (weather modeling, etc). Now, science is in the Data-Intensive age, with more and more data coming from the ProteinFolding, LHC, etc. And the enabler is the Cloud.

3. Ken Birman from Cornell is the last keynote speaker. He argues for the current change in practice adopted by the industry: embrace consistency. This means: forget about consistency algorithms, learns how to deal with it. Ken continues by introducing his Gossip protocol that could provide an answer to the consistency problem in the cloud. It is important to preserve consistency, because things like security can not be achieved without consistency. Maybe this is our last hope for a stable and secure Cloud ?