Random thoughts by Ceki Gülcü

Friday, June 05, 2009

Biased Locking in Java SE 6.0

As a result of the ensuing discussion in response to this blog entry, it is now pretty clear that the issue described herein is not specific to Java SE 6. Please do not reach any conclusions without reading the entire post, including the comments, and especially the comments.

Joern Huxhorn recently informed logback developers that locking in Java SE 6.0 was unfair. Indeed, Java documentation regularly mentions that one should not make assumptions about Java's thread scheduler. For example, the last paragraph of Thread Priority on the Solaris™ Platform, states:

Likewise, no assumptions should be made about the order in which threads are granted ownership of a monitor or the order in which threads wake in response to the notify or notifyAll method. An excellent reference for these topics is Chapter 9, "Threads," in Joshua Bloch's book "Effective Java Programming Language Guide."

If you actually bother to read the book, in Item 51, Joshua Bloch writes:

When multiple threads are runnable, the thread scheduler determines which threads get run and for how long. Any reasonable JVM implementation will attempt some sort of fairness when making this determination.

While Joshua Bloch's statement above is specifically about the JVM scheduler, I think that the notion of "some sort of fairness" applies to locks as well. The general principle is the same.

Here is a small java application called LockingInJava which illustrates the locking behavior in question.

public class LockingInJava implements Runnable {

 static int THREAD_COUNT = 5;
 static Object LOCK = new Object();
 static Runnable[] RUNNABLE_ARRAY = new Runnable[THREAD_COUNT];
 static Thread[] THREAD_ARRAY = new Thread[THREAD_COUNT];

 private int counter = 0;

 public static void main(String args[]) throws InterruptedException {
   printEnvironmentInfo();
   execute();
   printResults();
 }

 public static void printEnvironmentInfo() {
   System.out.println("java.runtime.version = "
       + System.getProperty("java.runtime.version"));
   System.out.println("java.vendor          = "
       + System.getProperty("java.vendor"));
   System.out.println("java.version         = "
       + System.getProperty("java.version"));
   System.out.println("os.name              = "
       + System.getProperty("os.name"));
   System.out.println("os.version           = "
       + System.getProperty("os.version"));
 }

 public static void execute() throws InterruptedException {
    for (int i = 0; i < THREAD_COUNT; i++) {
      RUNNABLE_ARRAY[i] = new LockingInJava();
      THREAD_ARRAY[i] = new Thread(RUNNABLE_ARRAY[i]);
    }
    for (Thread t : THREAD_ARRAY) {
      t.start();
    }
    // let the threads run for a while
    Thread.sleep(10000);
    
    for (int i = THREAD_COUNT - 1; i >= 0; i--) {
      THREAD_ARRAY[i].interrupt();
    }
    Thread.sleep(100); // wait a moment for termination, too lazy for join ;)
  }

  public static void printResults() {
    for (int i = 0; i < RUNNABLE_ARRAY.length; i++) {
      System.out.println("runnable[" + i + "]: " + RUNNABLE_ARRAY[i]);
    }
  }

  public void run() {
    for (;;) {
      synchronized (LOCK) {
        counter++;
        try {
          Thread.sleep(10);
        } catch (InterruptedException ex) {
          break;
        }
      }
    }
  }

  public String toString() {
    return "counter=" + counter;
  }
}

When run under Sun's JDK version 1.6.0_11 on a 64bit Dual Core AMD Opteron running Linux, here is what gets printed on the console.

java.runtime.version = 1.6.0_11-b03
java.vendor          = Sun Microsystems Inc.
java.version         = 1.6.0_11
os.name              = Linux
os.version           = 2.6.25-gentoo-r6
runnable[0]: counter=1002
runnable[1]: counter=0
runnable[2]: counter=0
runnable[3]: counter=0
runnable[4]: counter=0

Notice how only the first thread gets any work done. All the other threads are completely starved while waiting for the lock. The same application run with JDK 1.5 or older will have much more uniform results. Some threads will get more often access to the lock than others, but all threads will get some access. With JDK 1.6, access to locks is dispensed in a biased fashion. In other words, the thread currently holding the lock will always be favored compared to other competing threads. Sun calls it biased locking. It purportedly makes better use of CPU capabilities.

As for the argument about no guarantees offered about fairness, while true in letter, it is invalid in spirit. The argument is too punctilious and formalistic. Any developer reading the documentation will naturally assume that while there are no guarantees, the synchronization mechanism will not actively impede fairness.

Biased locking, as introduced in JDK 1.6, will affect thousands of unsuspecting applications. The fact that the bias is intentional does not make it less damaging. It's still a bug.

Saturday, May 23, 2009

The Earth having an indigestion

We humans share a very flawed sense of proportion. We tend to consistently underestimate the relevance of certain events and overestimate others. Social psychologists refer to the general phenomena as cognitive bias.

Environmentalists stay awake at night worrying about global warming. One of the worst-case scenarios of global warming involves the release of methane, a greenhouse gas 21 times more potent than CO2, currently trapped under the sea. Recent studies reveal that thawing lakes of the northern hemisphere are releasing methane at an alarming rate. The more Earth temperatures increase, the more frozen lakes and permafrost thaw, the more methane is released into the atmosphere, the bigger the greenhouse effect, the more Earth temperatures rise. This vicious cycle, if unchecked, could wipe out most rainforests, destroy the fertility of many soils and leave the Arctic ice-free even in midwinter. Entire regions will become uninhabitable. Depending on where you live, your country might enter into a war with a neighboring country over fresh water resources.

Study of ice cores from Greenland and Antarctica show that increases of methane concentrations have occurred the past. According to the anthropic principle, given that life exists on Earth, especially in the form of Homo sapiens, conditions on Earth must be amenable to carbon-based life forms, including us. However, as anyone involved in the stock market knows, past performance is no guarantee for the future. Indeed, the Earth has sustained life for millions of years, however, that is no proof that it will necessarily continue do so in the future.

How is it that we don't seem to worry more about the environment? Is it because the problem is too big to lay on our frail shoulders? Is it because we lack proof about the impending catastrophe? Is it because of insufficient media coverage? Is it because we subconsciously know that, like our own death, Earth's demise is inevitable, so we don't wish too waste time worrying about it? What are the cognitive biases that we need to overcome in order to react more effectively?

Monday, May 11, 2009

Burrowing Animal as defined in R&A Golf Rules

The R&A takes its name from The Royal and Ancient Golf Club of St Andrews, which has continuous records dating back to its foundation in 1754. The first thing that struck me about the R&A rules booklet was its length, over 200 pages (French edition). The rules are presented in a dense and legalistic language. For example, here is the definition of "Burrowing Animal"

A "burrowing animal" is an animal (other than a worm, insect or the like) that makes a hole for habitation or shelter, such as a rabbit, mole, groundhog, gopher or salamander.

Note: A hole made by a non-burrowing animal, such as a dog, is not an abnormal ground condition unless marked or declared as ground under repair.

According to the Rules (25-I), if your ball falls into a hole dug by a rabbit, you are entitled to relief (lift your ball and drop it outside the hole), but not if the hole is dug by a dog. A hole is hole is a hole, but apparently not in Golf.

What about the groundhog? Clearly, since the groundhog lives in the burrow, it must be considered as a burrowing animal. So, if your ball falls into a hole dug by a groundhog you are entitled to relief according to rule 25-I.

Now, let us consider the case of the biological family of the Suidae to which pigs, hogs, and in particular wild boars belong. Although common in many regions of the world, including France, the wild boar became extinct in Great Britain and Ireland by the 17th century. Surprisingly enough, wild boar are known to dig holes for shelter and thus are burrowing animals in the sense of the Rules, even if no member of the Suidae family is mentioned in the definition of burrowing animal above, presumably because there are no wild boars in Scotland where R&A is located.

Having established that the wild boar is a burrowing animal, consider the case of a golf course nuzzled by wild boar in search of food, a common occurrence in France as attested by a google image search for "boar golf terrain" (in French). The damage caused by wild boars in search for food can be rather extensive. I have seen areas over 50 square meters damaged by wild boar as if ploughed through by a tractor.

Given the sheer size of the ground nuzzled by wild boar, it is almost certain that your ball will eventually fall into nuzzled ground. The question whether you are entitled to relief according to rule 25-I. You might argue that wild boar are burrowing animals and consequently rule 25-I applies. However, it might also be counter argued that wild boar are not widely known to be burrowing animals. Moreover, wild boar plough the terrain in search of food and not shelter. Let us just say that the applicability of 25-I is questionable in case of damage caused by wild boar.

Some Golf rules can be combined together to make a deliciously confusing cocktail. Consider the case of Rule 3-3 It states that "in stroke play, if a competitor is doubtful of his rights or the correct procedure during the play of a hole, he may, without penalty, complete the hole with two balls."

The rules do not mention the case of multiple invocations of rule 3-3. Can a player complete a hole with 3, 4 or even more balls and still comply with the rules?

Given that there is no upper bound to the number of provisional balls that a player can play (Rule 27-2), given a recurring ambiguity associated with provisional balls, e.g. your provisional balls falls into ground dug up by wild boar, in theory (ignoring time restrictions) there is no limit to the number of balls with which a player may complete a hole. There are of course physical limits.

According to Appendix III, that weight of a ball must not be greater than 46g. Assuming that all matter on earth is transformed into golf balls, and a giant player, e.g. Atlas, capable of carrying the equivalent of the Earth in golf balls, a Titan could complete a whole with 10^26 balls, that is roughly the equivalent of the debt (in Indian Rupees) we are leaving to the next generation.

Thursday, April 23, 2009

Selling a car in Switzerland

Like most people, I have an uncanny tendency to be generous in criticism and frugal in praise. However, I think that the vehicle registry system in Switzerland, compared to neighboring countries, is so simple and well designed that it merits a few paragraphs of kudos.

Every car in Switzerland needs to be registered with the Vehicle Registration Service (VRS). As proof of such registration, the car owner gets a card colored in gray, called the "gray card". If and when the police stop a car, they always ask for the driver's driving license and the vehicle's gray card. The VRS will deliver a gray card for a given vehicle if and only if the vehicle is insured for damages caused to third parties. Anyone can register any car; all you need is an insurance certificate. In particular, you don't need proof of ownership.

License plates are the personal property of the owner with a distinct lifecycle than that of the vehicle. For example, when you unregister a vehicle by depositing the gray card at the VRS, you can keep the license plates! Moreover, your registration is still valid until midnight of the same day. Why does this matter?

Well, it matters because it allows for very a smooth procedure for selling or purchasing cars. Given a signed sales contract, as the owner or a representative of the owner, you can unregister a vehicle at the VRS, deliver the car to its new owner located anywhere within a day of driving range, pocket the agreed sum of money, unscrew the license plates from the car and drive home (in another vehicle). At no time is there overlap in the responsibility of the car owners. The fact that anyone in possession of the gray card can unregister a vehicle simplifies things to a large degree. A friend or your friendly neighborhood garage can perform all registration operations at the VRS on your behalf.

Considering the complexity of the issue, and the broken practices of other countries, Switzerland is a shining and rare example of an efficient bureaucracy. I mean efficient in the sense that it efficiently renders a service to the public. Other bureaucracies seem to be efficient at perpetuating their own existence.

Tuesday, January 06, 2009

Indoctrination and anger

The front page of every single European newspaper carries news about Israeli military operations in Gaza. At best the titles talk about a military "offensive" and at worst about "crimes against humanity".

While most reasonable people will grant Israel the right of self-defense, at least in principle, there seems to be an emerging consensus about the disproportionality of Israel's reaction. Undeniably, the victims on the Palestinian side vastly outnumber the number of Israeli victims. Without trying to minimize the loss of human life, innocent or not, illegitimate as well legitimate use of force is usually disproportionate. Let us not forget that each of the rockets fired at Israeli cities, is nothing less than an unsuccessful murder attempt. When nations are under serious threat, they react with whatever force they have got. Had the Russians been subjected to the same threats as Israel, they would be no Gaza to talk about. It would have been leveled to the ground as was the case in Chechnya and more recently but to a lesser extent in Georgia.

There are those who assert that the recent violence plants the seeds of hatred for future generations. But as Bret Stephens points out, how is a two-state solution is supposed to come about by allowing Hamas to rule half of a presumptive Palestinian state?

Still in "The Wall Street Journal", Zeev Maghen writes about the difference between hate and indoctrination. Of the thousands of articles on the Middle East I have read in the past 20 years, it is probably one of the best, chillingly so.

Monday, December 22, 2008

Is Apache a meritocracy?

Yes, it is, but in my humble opinion, not consistently.

Apache defines itself as a meritocracy. At the same time, the foundation's mission statement includes references to "collaborative software development". If I had to reduce the ASF into a single word, it would be collaboration.

The ASF is exemplary in the way it welcomes newcomers. Users are encouraged to post questions, however trivial, and developers are encouraged to make contributions, however minimal. The ASF culture is about openness and collaboration. Participation, even critical, is welcomed.

In his description of the ASF, Lars Eilebrecht talks about the chain of merit. A person is fist a user of software developed by a given Apache project, then becomes a committer, and later Project Management Committee (PMC) member, and usually after several years of devoted service, an elected member of the ASF. Thus, the "ideal" career path at Apache is user, committer, PMC member and ASF member.

In majority of cases, a project committer is usually also a PMC member. So, committer status is accompanied with voting rights. The said voting rights include veto power. If for any given decision, a committer votes -1 (a veto), then a 3/4 majority is required to override that veto. Given that a -1 vote effectively shuts down any proposal, most committers are very reluctant to use their veto rights. However, some committers not realizing the destructive power of their veto, make cavalier use of it.

Certain projects have an imbalanced committer structure, with one highly active developer, usually the project founder, accompanied by several less active committers. In such a project, given the 3/4 majority rule, it is entirely possible for a new and highly self-confident committer to bring a project to its knees by simply vetoing (or hinting at veto) new proposals. This deadlock can easily become permanent if the dissenting new committer can gain the support of just one additional other committer.

Had the Apache model been a completely fair meritocracy, a very active developer would have more voting power than a much less active developer. However, if the active developer dares to mention that he (or she) is more active, then, according to the currently prevalent ASF culture, he will loose a great deal of credibility. At present time, hinting at "leadership" or extra-merit is a socially unacceptable attitude at Apache. In other words, within the bounds of a single project and its associated group of committers, not only Apache is not a meritocracy, it involuntarily promotes a situation which can be described as "representation without taxation", a tongue-in-cheek reference to the slogan emblematic of the American Revolution.

In defense of the Apache model, I should observe that it is very simple. Treating everyone equal is the simplest representation model imaginable, and in case you had any doubts, simplicity is a highly desirable characteristic to have. The perfect equality in addition to veto powers gives everyone ample representation power, thus encouraging participation. As a committer, you cannot reasonably say "Why should I vote? My vote does not count." Nevertheless, the fact that a committer's representation powers does not increase proportionally with his contributions is cause for concern.

Do you think you can come up with a better model? If you do, I welcome your proposals describing social mechanisms where one's representation powers are adjusted according to one's contributions.

Tuesday, November 18, 2008

Interest in logback

One of the niceties of open-source development is that once in a while people will make valuable contributions. By valuable contribution, I mean code that shows intimate knowledge and understanding of the software. In my estimation, in the log4j project, we got one valuable contribution every semester, roughly speaking. In my astonishment, in the logback project, log4j's successor, we get a valuable contribution every two weeks, again roughly speaking.

I really can't explain it. Log4j has more users than logback. So it is definitely not the size of the user-base. You have two similar projects. They have similar code bases and similar architecture. Log4j is much better known, yet it is logback that gets the more frequent contributions.

Logback is newer, and has not reached 1.0 status, so perhaps developers feel more confident that their input will be acted upon. Also, logback is functionally richer so it may be more attractive to contributors. It may also be just luck, although given the nature of independent random processes I doubt it.

Tuesday, August 19, 2008

HTH or HTT?

Assuming we've got a long random string composed of the letters H and T, and only those two letters, which of the patterns "HTH" or "HTT" (overlap not allowed) do you think will occur more frequently?

For example, if overlap is not allowed, in the string "HTHTHTTH", both "HTH" and "HTT" occur once. However, if we allow for overlap, then "HTH" occurs twice and "HTT" once.

So, the question is, which of the patterns "HTH" or "HTT" (without overlap) do you think will occur more frequently? Will "HTH" occur at a higher, the same or lower frequency than "HTT".

You must give an answer.

To find out the answer run the Simulation application written in Java.

This question is an adaptation of the question asked by Peter Donnelly in his TED presentation on "How juries are fooled by statistics."

Friday, August 01, 2008

Subversion+Eclipse on Linux

I usually develop on Windows on a fairly fast PC. However, our continuous integration server runs on Linux. For some reason, some tests were failing intermittently on the integration server. Now, since the integration server was heavily used by other processes and the tests in question were time-sensitive, the next step was to fix whatever bug there was by developing on the integration server. This just shortens the patch+test cycle.

Installing Eclipse Ganymede on this Linux machine was a breeze. Everything just worked out of the box. Well, not everything. SVN integration was missing. Installing Subclipse did not work nor did Subversive, at least not until I stumbled upon Basilikk's blog entry on Eclipse 3.4 Ganymede and Subclipse 1.4.

The critical paragraph was:

The second alternative is to also install the SVNKit adapter. This immediately causes Subclipse to work, there is no need to uninstall the JavaHL adapter. Downside of this of course is that you are using a beta version of the adapter. Hopefully there will be a final version of this adapter and an update to Subclipse 1.4.x to include a stable SVNKit soon.

In short, selecting the "SVNKit Adapter" during the usual installation of Subclipse worked as a charm.

Tuesday, July 29, 2008

Android, the next big thing

While clients all over the world are lining up to buy iPhones, the next big hit make come from Google, in the form of Android. The iPhone is a tightly controlled platform whereas the Android platform is open. This difference may be deemed too geeky to make a difference. I beg to differ.

When people realize that they can make VoIP calls using nearby WiFi networks for free, or VoIP-to-PSTN calls for 20th of the current price, they'll go berserk.

There are two ingredients missing for this scenario to become reality. First, there must be an Android phone that people can buy off the shelf. Second, people need to start sharing their WiFi networks. This is not happening, or nearly not fast enough. Our minds are imprisoned by the notion that bits are scarce. Bob Frankston argues why this is the case but should not be.

Thursday, July 17, 2008

Putting a face on evil

Samir Kuntar is the perpetrator of a truly gruesome act of violence committed almost 30 years ago. He was released yesterday, on July 16, 2008 as part of a prisoner swap between Israel and Hezbollah. While the negotiations leading to his release are the subject of controversy, I am also quite concerned and shocked by the hero's welcome he received in Lebanon. The guy murdered a little girl with the butt of his rifle, not exactly the kind of person you'd want to root for, especially not in a mass rally.

For me, he represents the face of evil. His nazi salute of the crowd (see photo) just clinches it.

Taking a step back, Sun Tzu said All warfare is based on deception." This general principle also applies to the Arab-Israeli conflict. Since 1974, the political and psychological aspects of the conflict have tremendous importance, perhaps even more than purely military considerations. Appearing as murderous fanatics on world TV is probably the last thing you'd want to do in order to win the hearts of minds of the wider public. What may seem like a Hezbollah victory today, may be viewed very differently tomorrow. This reminds me of a Zen Master story mentioned in the movie "Charlie's War".

Do you know the story of Zen Master and the little boy?

There was a little boy. On his 14th birthday, he gets a horse. Everybody in the village says "how wonderful, the boy got a horse" and the Zen Master says "we'll see." Two years later the boy falls off the horse and breaks his leg. Everybody in the village says "how terrible" and the Zen master says "we'll see." Then, a war breaks out, and all the young men have to go and fight. The boy can't because his leg is all messed up. Everybody in the village says "how wonderful" and the Zen Master says...

Thursday, July 10, 2008

The iPhone this, the iPhone that

The new iPhone will be available for sale in Switzerland starting tomorrow. The local press is abuzz with the news. They claim that the iPhone is easy to use, the icons highly legible, the screen wider than the competition, its 3G technology allowing high-speed access to the internet. The iPhone this, the iPhone that.

But, no mention of Skype+iPhone integration. As a reminder, I carry around a telephone to transmit and receive sound, more precisely the sound of voice, mine and the person talking to me. Call and get called, without budgeting a second rent, is what I want from my phone. The ubiquitous Wi-Fi, plus Skype/iPhone combination would give me just that.

Unfortunately, it is not possible to run Skype over the iPhone. Sigh.

If you are looking for a start up idea, I've got one for you: build a phone based on Linux (or Windows), specifically designed to run Skype. I'd buy one, so would the rest of the planet.

Tuesday, July 08, 2008

Fighting spam with open-source tools

Being on many publicly accessible mailings lists exposes one's email address to spammers. So it happens that I receive a hefty amount of spam, i.e. 400 to 500 messages per day. Believe me when I say that pressing the "del" key for 15 minutes is not a good way to start a day.

Over the years I have tried several methods of varying complexity in order to cope with the scourge of spam.

We host our own email server. It uses Postfix. Postfix supports three strategies for filtering content. Initially, the "before queue, external" strategy seemed the most attractive to me. However, most of the examples on the web are based on the "after queue, external" strategy. So I decided to follow recipes found on the web for integrating SpamAssassin and postfix. Actually, the basic integration example found on the spamassassin wiki is an excellent start. Once you've got the basic version working, a simple variation thereof gives you the ability to quarantine messages. If you are not satisfied by just following recepies but would like to understand how the various pieces fit together, then you should read "Fighting Spam and SpamAssasin and Postfix".

Being essentially a rule engine, SpamAssassin can be integrated with other spam-fighting tools such as DCC and Razor. Given that we use Gentoo as our Linux distribution, for us this was easy as issuing the following two commands as root:

emerge mail-filter/razor
emerge mail-filter/dcc

I then had to change the following two lines in the /etc/spamassassin/local.cf file, from:

use_razor2  0
use_dcc  0

to:

use_razor2  1
use_dcc  1

SpamAssassin assigns a score for each message it filters. SpamAssassin was configured to consider messages with scores 6.0 or above as spam. Such messages have their subject lines modified to contain "**** SPAM ****" as a prefix. SpamAssassin, or any spam-figting tool, may incorrectly identify a legitimate message as spam. Such occurrences are called false positives. They must be avoided, unless you don't mind loosing legitimate correspondence.

To reduce the probability of false positives, we adopted a two pronged strategy. Messages high spam scores, 9.0 and above, are quarantined in a special directory accessible only to the system administrator. They are not delivered to the mailbox of users. However, messages scoring "low", between 6.0 and 9.0, are still delivered to the user, with "**** SPAM ****" added to the subject line. With most email clients, e.g. Thunbderbird, it is rather easy to create a filtering rule which automatically moves messages with a subject line containing "**** SPAM ****" to a special SPAM folder. We thus give the user an opportunity to double check low-scoring spam messages before deleting them.

With this configuration and in the last 24 hours, SpamAssassin has quarantined 700 messages as spam (scores of 9.0 or above) for our whole site. My own various mailboxes received 50 low-scoring spam messages. Since such messages are filtered automatically, I am not distracted by them. Only 30 messages reached my mailbox in pristine form, of which 20 were legitimate and 10 were spam. This is such as huge improvement over the 20 spam to one legitimate message ratio we had previously.

For the next couple of weeks I will be looking at the high-scoring quarantined messages to check whether legitimate messages were mistakenly identified as spam (false positives). I am happy to report that there were no false positives in the 700 high-scoring messages nor in the 50 low-scoring messages.

Friday, June 27, 2008

Does Gentoo make sense?

When I mention to colleagues in the IT industry that compiling packages before installing them on a computer is a good thing, they either give me a blank look or an ever slight smirk. What is the point wasting many hours waiting for some package to compile instead of fetching the binaries from a repository and have it installed in seconds?

Come to think of it, I am actually writing this while waiting for Gentoo to upgrade GCC from 3.4.4 to 4.3.1. It may not sound much but it's actually a big deal. GCC is probably the package that takes the longest slowest to build, in order order of two hours, even on recent dual-core 64bit machines.

Portage, gentoo's package management system, when installing a package, say X, will fetch X's sources from some repository, and then build X from the sources. For example, if package X was written in C, it will compile the sources and then link the resulting binary files into an executable program. As mentioned previously, this process of building from sources can take from minutes to several hours depending on the package and its dependencies. Note that if package X requires package A, B and C, and B requires D and E, and D requires F, Portage will build A,B,C,D,E and F in the correct order.

Clearly, building from sources is much slower than fetching the binary package. But, building from sources will implicitly check that the required dependencies for the package under construction are available. If X requires A,B,C,E and F if any of those five packages is missing, then X won't compile and hence will not install. Thus, if Portage is able to install X, then you can be fairly confident that it is installed correctly on your system. Of course, you would still need to configure X according to your needs, but as far as the binaries of X and its dependencies are concerned, you are reasonably safe.

Contrast it with installing binary packages. You can never be sure that you are not missing a library or if they have a conflicting version. Conceptually, Gentoo vs. Ubuntu is analogous to compiled and statically typed languages, e.g. C++ or Java, versus interpreted and dynamically typed languages, e.g. Python or Ruby.

Interpreted and dynamically typed languages enjoy a shorter development cycle but are somewhat more brittle whereas compiled and statically typed languages have a slower development cycle but are often deemed more reliable.

Another analogy would be an RDBMS enforcing data integrity constraints e.g. MySQL+InnoDB versus an RDBMS ignoring data integrity constraints, e.g. MySQL+MyISAM.

As it stands, Portage is still building GCC.

Monday, April 07, 2008

The pomodoro technique

The pomodoro technique has been recently presented by Matteo Vaccari and Federico Gobbo at XPDays Benelux.

Essentially, it consists of dividing your workday into uninterrupted chunks of 25 minutes, plus 5 minute pauses. You set a by setting a kitchen timer to go off in 25 minutes and do whatever work you need to get done without letting yourself being interrupted by external or internal (yourself) distractions.

It's very simple, helpful and enjoyable technique.

Testable by design

In the first few months after beginning to systematically unit testing components, most developers I have encountered will have qualms about modifying the component under test (CUT) so that it is easier to test. In a typical scenario, the component under test will need to expose class members for testing purposes. However, exposing members goes against the information hiding principle. Thus, an otherwise well-intentioned developer may go to great lengths to perform the tests without exposing new members, or alternatively may expose said members but feeling guilty about compromising the information hiding principle.

It takes some experience to realize that exposing such members is a low price to pay for increased testability. At some later stage, you may even begin considering tests as an important force driving the design of your components.

Testable components expose more of their internal state to the outside world. They also tend to be more modular. For example, if a CUT requires a database connection, then the developer might modify it so that it admits DAO and then inject a mock of the DAO using a mock database, allowing the CUT to be tested independently of the database.

Persistence, concurrency, distribution and transaction support are accepted to be wide-reaching and fundamental aspects of a component. For instance, a component designed for persistence may differ substantially from the same component without persistence support. In a similar vein, a component designed with testability in mind will show different characteristics than its less testable counterpart. In particular and as mentioned above, testable components tend to be much more modular. It follows that modularity is a corollary of testability.

It should be noted that, it takes significant effort to develop and maintain test code. In the logback project, we spend about 30% of our time on documentation, at least 50% maintaining tests, with only the remaining 20% of time spent on the actual code. Writing good test code requires real skill.

Monday, March 31, 2008

Obsessed by Gentoo

After my previous article about Gentoo, I had the audacity to update the "mailman" package from version 2.1.5 to 2.1.9. While the mailman code itself had not changed much, Gentoo's way of structuring of the installed package had changed. It took me about 4 hours of uninterrupted work to get our mailing lists working again.

At this stage, you may think that I had learned my lesson, and would not embark on any new adventures. I had not yet realized my luck so far and would soon pay the price of my temerity.

On Torino, another production machine running Gentoo, my subsequent attempts at upgrading X11 and Apache resulted in complete failures. Notwithstanding my relentless attempts, Torino can no longer run X11, due to a suspected gcc/glibc versioning problem.

Updating of Apache from 2.0.x to 2.2.x had an interesting twist on its own. As in mailman, Gentoo's packaging structure for Apache changed between e 2.0.x and 2.2.x. More specifically, the directives for specifying which modules were included in the server changed. Fortunately, there were instructions on migrating from the old to the new structure. It took me about 2 hours to understand that the migration instructions had a little bug. Dashes within package names had to be written with underscore characters. After that tweak, the new build of the Apache server included all the required modules.

For our web-applications, we rely heavily on reverse proxying, that is on Apache's mod_proxy module. This module witnessed substantial enhancements between Apache 2.0.x and 2.2.x. Given that Torino is a production server, I had only a limited number of hours to perform the migration. At about 1 AM Monday morning, manually reverting to Apache 2.0.x was the only remaining option.

As I understand it, Gentoo supports the installation of only a single version for any given application package. It does not support the simultaneous installation of the same package. In the Apache case, it would have been nice to simultaneously support the installation of Apache version 2.0.x and 2.2.x. Alternatively, it would have been acceptable if Gentoo allowed me to revert to an older version of Apache. However, it seems that Gentoo supports only one path for updates, i.e. upgrades.

In conclusion, while Gentoo's package menagement mechanism is pretty fantastic, it still does not allow for seamless upgrades. Others have made similar observations.

Wednesday, March 26, 2008

Fascinated by Gentoo

Gentoo is a linux meta-distribution where new packages are compiled on your machine before they are installed. We chose Gentoo about three years ago because it was well documented and also the only one that supported our AMD 64bit machines.

In the last three years, we never felt the need to perform regular updates. However, yesterday I noticed that on one particular machine, the log files were getting very large. Switching to syslog-ng instead of the good-ol syslogd package seemed the right thing to do. However, since we never upgraded the platform, the view of the available packages, i.e. the portage tree on the host, was too old. Thus, the installation of syslog-ng failed. The portage tree needed to be upgraded. By the way, portage is Gentoo's installation framework.

Thus, I updated the portage tree by issuing a "emerge –sync" command. However, in the meantime the package description format had changed so that the version of portage on the host could not read the updated portage tree. It appeared as if the whole portage tree was corrupt. Thus, a chicken-and-egg situation emerged. I could not install the latest version of portage because my portage tree was unreadable by the current portage software.

Anyway, once I realized what was going on, I copied over an older version of the portage tree from a backup, installed a new version of portage and then updated to the latest portage tree.

Even after this relatively troublesome event, I still love Gentoo and the stability it provides. Our Linux systems just work without a reboot for years on end. The latest experience notwithstanding, it's usually very easy to install or update new packages.

More generally, dependency management is one of the key features of any serious software platform. For instance, Maven, a java build-tool, is becoming increasingly popular, imho mainly because it helps with dependency management.

Sunday, January 06, 2008

How efficient is a Prius?

I received my brand new Toyota Prius three weeks ago. Technologically, it seems like quite a remarkable car. I say seems because I don't really know much about cars. I can't really tell whether the electrical engine is anything more than a marketing gimmick, especially considering its ridiculously low autonomy (about 1km).

Obviously, the Prius is noteworthy not because of its powerful engine -- its accelerations won't ever rivet you to your seat -- but because it is supposed to use little gas. Anyway, after three weeks of driving, today it alerted me through a blinking square on the dashboard that it needed a refill. When I brought it to the gas station, her odometer was showing 670 km. Her tank was full after drinking 41.25
liters of unleaded gasoline, which brings me to my main point.

Assuming the gas tank was completely full when I got it (which is somewhat an iffy proposal), my Prius yielded an average of 6.1 liters for 100 kilometers. By the way, 6.1 l/100km is equivalent to 38.56 MPG (US gallons). This result is considerably worse than the mileage advertised by Toyota, i.e. 4.4 liters per 100 kilometers, i.e. 53.46 MPG. Nevertheless, the Prius still emerges as being more fuel efficient than most other cars.

Of course, a single measurement is not necessarily representative, especially considering that those 670 km included a drive to a nearby mountain. After googling for a few minutes, I stumbled upon a US-governmental page showing the MPG obtained by other drivers . My MPG happens to be worse than the average shown on that page.

Update

The second time around, I measured 810 km for 42.02 liters of unleaded gasoline, or 5.2 liters per 100 km (45.3 MPG). This result is very much aligned with the average reported by other drivers. It could probably be improved, as the 810 km included a trip to a nearby mountain.

The difference with the first result can be explained by the fact that on the highway I now drive a little lower than the authorized limit, at 110 km/h instead of 120km/h.

Tuesday, December 04, 2007

XP Days conference

Here a few lines describing on my impressions of the XPDays Benelux conference that took place in Belgium a few weeks ago.

Organization of the conference
With 115 participants, and 4 parallel sessions, the conference had a friendly and personal atmosphere. It was also very well organized. At the beginning of each day, the presenters had 60 seconds to stand up and "sell" their session. This made it easier to choose among the 4 parallel sessions.

Product owner
In one of the hand-on sessions, we learned how important it was to have a product owner (PO) closely involved in the project. XP and Scrum talk about "customer on site". This point was also mentioned by other participants in informal chats. It became clear that having a readily accessible PO, someone capable of deciding on and prioritizing the product feature set, made a big difference.

Retrospectives
In my humble opinion, one of the most powerful ideas from the XP/agile world. Basically, it means that the team members take the time to reflect on the various processes and improve upon them. Retrospectives happen frequently, differentiating them from project post-mortems. At the end of the first day of the conference, the organizers had a retrospective on the conference itself, improving it on the fly.

TDD (test driven development)
Excellent development practice but which can end up warping your mind. I thought I was practicing TDD for some time but apparently not well enough according to the opinion of the purists. Supposedly, you have to make a consistent effort so as to come up with the tiniest possible change on the implementation barely sufficient to make the tests pass. It made feel like my mind was in shackles. Apparently, you get used to it. I hope I never do.

In other sessions, I have learned that tests can be considered as a specification. As such, the test phase is more akin to design.

To write maintainable tests, you can start by asking yourself whether by (only) reading the test code one can come up with the solution, i.e. implementation. Once you do that, you can start viewing the test code as the origin of the implementation.

Teams
Teams need tome to gel. The arrival or the departure of a member will disturb the team dynamic. Some people talk about a new team after any change to the team. It may sound extreme but I think there is some truth to it.

Agility and co.
I was surprised to discover that agile methods require a lot of discipline. XP and Scrum define detailed procedures that some people follow religiously. The no-compromise/no-prisoners-taken/all-or-nothing approach of certain participants seemed disturbingly martial, on the verge of the intolerant.

Having said that, there are many excellent ideas brewing in the Agile world. Next time you stumble upon an XPDays conference in your neighborhood, I'd recommend that you attend.