Monday, December 22, 2008

Is Apache a meritocracy?

Yes, it is, but in my humble opinion, not consistently.

Apache defines itself as a meritocracy. At the same time, the foundation's mission statement includes references to "collaborative software development". If I had to reduce the ASF into a single word, it would be collaboration.

The ASF is exemplary in the way it welcomes newcomers. Users are encouraged to post questions, however trivial, and developers are encouraged to make contributions, however minimal. The ASF culture is about openness and collaboration. Participation, even critical, is welcomed.

In his description of the ASF, Lars Eilebrecht talks about the chain of merit. A person is fist a user of software developed by a given Apache project, then becomes a committer, and later Project Management Committee (PMC) member, and usually after several years of devoted service, an elected member of the ASF. Thus, the "ideal" career path at Apache is user, committer, PMC member and ASF member.

In majority of cases, a project committer is usually also a PMC member. So, committer status is accompanied with voting rights. The said voting rights include veto power. If for any given decision, a committer votes -1 (a veto), then a 3/4 majority is required to override that veto. Given that a -1 vote effectively shuts down any proposal, most committers are very reluctant to use their veto rights. However, some committers not realizing the destructive power of their veto, make cavalier use of it.

Certain projects have an imbalanced committer structure, with one highly active developer, usually the project founder, accompanied by several less active committers. In such a project, given the 3/4 majority rule, it is entirely possible for a new and highly self-confident committer to bring a project to its knees by simply vetoing (or hinting at veto) new proposals. This deadlock can easily become permanent if the dissenting new committer can gain the support of just one additional other committer.

Had the Apache model been a completely fair meritocracy, a very active developer would have more voting power than a much less active developer. However, if the active developer dares to mention that he (or she) is more active, then, according to the currently prevalent ASF culture, he will loose a great deal of credibility. At present time, hinting at "leadership" or extra-merit is a socially unacceptable attitude at Apache. In other words, within the bounds of a single project and its associated group of committers, not only Apache is not a meritocracy, it involuntarily promotes a situation which can be described as "representation without taxation", a tongue-in-cheek reference to the slogan emblematic of the American Revolution.

In defense of the Apache model, I should observe that it is very simple. Treating everyone equal is the simplest representation model imaginable, and in case you had any doubts, simplicity is a highly desirable characteristic to have. The perfect equality in addition to veto powers gives everyone ample representation power, thus encouraging participation. As a committer, you cannot reasonably say "Why should I vote? My vote does not count." Nevertheless, the fact that a committer's representation powers does not increase proportionally with his contributions is cause for concern.

Do you think you can come up with a better model? If you do, I welcome your proposals describing social mechanisms where one's representation powers are adjusted according to one's contributions.

Tuesday, November 18, 2008

Interest in logback

One of the niceties of open-source development is that once in a while people will make valuable contributions. By valuable contribution, I mean code that shows intimate knowledge and understanding of the software. In my estimation, in the log4j project, we got one valuable contribution every semester, roughly speaking. In my astonishment, in the logback project, log4j's successor, we get a valuable contribution every two weeks, again roughly speaking.

I really can't explain it. Log4j has more users than logback. So it is definitely not the size of the user-base. You have two similar projects. They have similar code bases and similar architecture. Log4j is much better known, yet it is logback that gets the more frequent contributions.

Logback is newer, and has not reached 1.0 status, so perhaps developers feel more confident that their input will be acted upon. Also, logback is functionally richer so it may be more attractive to contributors. It may also be just luck, although given the nature of independent random processes I doubt it.

Tuesday, August 19, 2008

HTH or HTT?

Assuming we've got a long random string composed of the letters H and T, and only those two letters, which of the patterns "HTH" or "HTT" (overlap not allowed) do you think will occur more frequently?

For example, if overlap is not allowed, in the string "HTHTHTTH", both "HTH" and "HTT" occur once. However, if we allow for overlap, then "HTH" occurs twice and "HTT" once.

So, the question is, which of the patterns "HTH" or "HTT" (without overlap) do you think will occur more frequently? Will "HTH" occur at a higher, the same or lower frequency than "HTT".

You must give an answer.

To find out the answer run the Simulation application written in Java.

This question is an adaptation of the question asked by Peter Donnelly in his TED presentation on "How juries are fooled by statistics."

Friday, August 01, 2008

Subversion+Eclipse on Linux

I usually develop on Windows on a fairly fast PC. However, our continuous integration server runs on Linux. For some reason, some tests were failing intermittently on the integration server. Now, since the integration server was heavily used by other processes and the tests in question were time-sensitive, the next step was to fix whatever bug there was by developing on the integration server. This just shortens the patch+test cycle.

Installing Eclipse Ganymede on this Linux machine was a breeze. Everything just worked out of the box. Well, not everything. SVN integration was missing. Installing Subclipse did not work nor did Subversive, at least not until I stumbled upon Basilikk's blog entry on Eclipse 3.4 Ganymede and Subclipse 1.4.

The critical paragraph was:
The second alternative is to also install the SVNKit adapter. This immediately causes Subclipse to work, there is no need to uninstall the JavaHL adapter. Downside of this of course is that you are using a beta version of the adapter. Hopefully there will be a final version of this adapter and an update to Subclipse 1.4.x to include a stable SVNKit soon.

In short, selecting the "SVNKit Adapter" during the usual installation of Subclipse worked as a charm.

Tuesday, July 29, 2008

Android, the next big thing

While clients all over the world are lining up to buy iPhones, the next big hit make come from Google, in the form of Android. The iPhone is a tightly controlled platform whereas the Android platform is open. This difference may be deemed too geeky to make a difference. I beg to differ.

When people realize that they can make VoIP calls using nearby WiFi networks for free, or VoIP-to-PSTN calls for 20th of the current price, they'll go berserk.

There are two ingredients missing for this scenario to become reality. First, there must be an Android phone that people can buy off the shelf. Second, people need to start sharing their WiFi networks. This is not happening, or nearly not fast enough. Our minds are imprisoned by the notion that bits are scarce. Bob Frankston argues why this is the case but should not be.

Thursday, July 17, 2008

Putting a face on evil

Samir Kuntar is the perpetrator of a truly gruesome act of violence committed almost 30 years ago. He was released yesterday, on July 16, 2008 as part of a prisoner swap between Israel and Hezbollah. While the negotiations leading to his release are the subject of controversy, I am also quite concerned and shocked by the hero's welcome he received in Lebanon. The guy murdered a little girl with the butt of his rifle, not exactly the kind of person you'd want to root for, especially not in a mass rally.

For me, he represents the face of evil. His nazi salute of the crowd (see photo) just clinches it.

Taking a step back, Sun Tzu said All warfare is based on deception." This general principle also applies to the Arab-Israeli conflict. Since 1974, the political and psychological aspects of the conflict have tremendous importance, perhaps even more than purely military considerations. Appearing as murderous fanatics on world TV is probably the last thing you'd want to do in order to win the hearts of minds of the wider public. What may seem like a Hezbollah victory today, may be viewed very differently tomorrow. This reminds me of a Zen Master story mentioned in the movie "Charlie's War".

Do you know the story of Zen Master and the little boy?

There was a little boy. On his 14th birthday, he gets a horse. Everybody in the village says "how wonderful, the boy got a horse" and the Zen Master says "we'll see." Two years later the boy falls off the horse and breaks his leg. Everybody in the village says "how terrible" and the Zen master says "we'll see." Then, a war breaks out, and all the young men have to go and fight. The boy can't because his leg is all messed up. Everybody in the village says "how wonderful" and the Zen Master says...

Thursday, July 10, 2008

The iPhone this, the iPhone that

The new iPhone will be available for sale in Switzerland starting tomorrow. The local press is abuzz with the news. They claim that the iPhone is easy to use, the icons highly legible, the screen wider than the competition, its 3G technology allowing high-speed access to the internet. The iPhone this, the iPhone that.

But, no mention of Skype+iPhone integration. As a reminder, I carry around a telephone to transmit and receive sound, more precisely the sound of voice, mine and the person talking to me. Call and get called, without budgeting a second rent, is what I want from my phone. The ubiquitous Wi-Fi, plus Skype/iPhone combination would give me just that.

Unfortunately, it is not possible to run Skype over the iPhone. Sigh.

If you are looking for a start up idea, I've got one for you: build a phone based on Linux (or Windows), specifically designed to run Skype. I'd buy one, so would the rest of the planet.

Tuesday, July 08, 2008

Fighting spam with open-source tools

Being on many publicly accessible mailings lists exposes one's email address to spammers. So it happens that I receive a hefty amount of spam, i.e. 400 to 500 messages per day. Believe me when I say that pressing the "del" key for 15 minutes is not a good way to start a day.

Over the years I have tried several methods of varying complexity in order to cope with the scourge of spam.

We host our own email server. It uses Postfix. Postfix supports three strategies for filtering content. Initially, the "before queue, external" strategy seemed the most attractive to me. However, most of the examples on the web are based on the "after queue, external" strategy. So I decided to follow recipes found on the web for integrating SpamAssassin and postfix. Actually, the basic integration example found on the spamassassin wiki is an excellent start. Once you've got the basic version working, a simple variation thereof gives you the ability to quarantine messages. If you are not satisfied by just following recepies but would like to understand how the various pieces fit together, then you should read "Fighting Spam and SpamAssasin and Postfix".

Being essentially a rule engine, SpamAssassin can be integrated with other spam-fighting tools such as DCC and Razor. Given that we use Gentoo as our Linux distribution, for us this was easy as issuing the following two commands as root:
emerge mail-filter/razor
emerge mail-filter/dcc

I then had to change the following two lines in the /etc/spamassassin/local.cf file, from:
use_razor2  0
use_dcc 0
to:
use_razor2  1
use_dcc 1
SpamAssassin assigns a score for each message it filters. SpamAssassin was configured to consider messages with scores 6.0 or above as spam. Such messages have their subject lines modified to contain "**** SPAM ****" as a prefix. SpamAssassin, or any spam-figting tool, may incorrectly identify a legitimate message as spam. Such occurrences are called false positives. They must be avoided, unless you don't mind loosing legitimate correspondence.

To reduce the probability of false positives, we adopted a two pronged strategy. Messages high spam scores, 9.0 and above, are quarantined in a special directory accessible only to the system administrator. They are not delivered to the mailbox of users. However, messages scoring "low", between 6.0 and 9.0, are still delivered to the user, with "**** SPAM ****" added to the subject line. With most email clients, e.g. Thunbderbird, it is rather easy to create a filtering rule which automatically moves messages with a subject line containing "**** SPAM ****" to a special SPAM folder. We thus give the user an opportunity to double check low-scoring spam messages before deleting them.

With this configuration and in the last 24 hours, SpamAssassin has quarantined 700 messages as spam (scores of 9.0 or above) for our whole site. My own various mailboxes received 50 low-scoring spam messages. Since such messages are filtered automatically, I am not distracted by them. Only 30 messages reached my mailbox in pristine form, of which 20 were legitimate and 10 were spam. This is such as huge improvement over the 20 spam to one legitimate message ratio we had previously.

For the next couple of weeks I will be looking at the high-scoring quarantined messages to check whether legitimate messages were mistakenly identified as spam (false positives). I am happy to report that there were no false positives in the 700 high-scoring messages nor in the 50 low-scoring messages.

Friday, June 27, 2008

Does Gentoo make sense?

When I mention to colleagues in the IT industry that compiling packages before installing them on a computer is a good thing, they either give me a blank look or an ever slight smirk. What is the point wasting many hours waiting for some package to compile instead of fetching the binaries from a repository and have it installed in seconds?

Come to think of it, I am actually writing this while waiting for Gentoo to upgrade GCC from 3.4.4 to 4.3.1. It may not sound much but it's actually a big deal. GCC is probably the package that takes the longest slowest to build, in order order of two hours, even on recent dual-core 64bit machines.

Portage, gentoo's package management system, when installing a package, say X, will fetch X's sources from some repository, and then build X from the sources. For example, if package X was written in C, it will compile the sources and then link the resulting binary files into an executable program. As mentioned previously, this process of building from sources can take from minutes to several hours depending on the package and its dependencies. Note that if package X requires package A, B and C, and B requires D and E, and D requires F, Portage will build A,B,C,D,E and F in the correct order.

Clearly, building from sources is much slower than fetching the binary package. But, building from sources will implicitly check that the required dependencies for the package under construction are available. If X requires A,B,C,E and F if any of those five packages is missing, then X won't compile and hence will not install. Thus, if Portage is able to install X, then you can be fairly confident that it is installed correctly on your system. Of course, you would still need to configure X according to your needs, but as far as the binaries of X and its dependencies are concerned, you are reasonably safe.

Contrast it with installing binary packages. You can never be sure that you are not missing a library or if they have a conflicting version. Conceptually, Gentoo vs. Ubuntu is analogous to compiled and statically typed languages, e.g. C++ or Java, versus interpreted and dynamically typed languages, e.g. Python or Ruby.

Interpreted and dynamically typed languages enjoy a shorter development cycle but are somewhat more brittle whereas compiled and statically typed languages have a slower development cycle but are often deemed more reliable.

Another analogy would be an RDBMS enforcing data integrity constraints e.g. MySQL+InnoDB versus an RDBMS ignoring data integrity constraints, e.g. MySQL+MyISAM.

As it stands, Portage is still building GCC.

Monday, April 07, 2008

The pomodoro technique

The pomodoro technique has been recently presented by Matteo Vaccari and Federico Gobbo at XPDays Benelux.

Essentially, it consists of dividing your workday into uninterrupted chunks of 25 minutes, plus 5 minute pauses. You set a by setting a kitchen timer to go off in 25 minutes and do whatever work you need to get done without letting yourself being interrupted by external or internal (yourself) distractions.

It's very simple, helpful and enjoyable technique.

Testable by design

In the first few months after beginning to systematically unit testing components, most developers I have encountered will have qualms about modifying the component under test (CUT) so that it is easier to test. In a typical scenario, the component under test will need to expose class members for testing purposes. However, exposing members goes against the information hiding principle. Thus, an otherwise well-intentioned developer may go to great lengths to perform the tests without exposing new members, or alternatively may expose said members but feeling guilty about compromising the information hiding principle.

It takes some experience to realize that exposing such members is a low price to pay for increased testability. At some later stage, you may even begin considering tests as an important force driving the design of your components.

Testable components expose more of their internal state to the outside world. They also tend to be more modular. For example, if a CUT requires a database connection, then the developer might modify it so that it admits DAO and then inject a mock of the DAO using a mock database, allowing the CUT to be tested independently of the database.

Persistence, concurrency, distribution and transaction support are accepted to be wide-reaching and fundamental aspects of a component. For instance, a component designed for persistence may differ substantially from the same component without persistence support. In a similar vein, a component designed with testability in mind will show different characteristics than its less testable counterpart. In particular and as mentioned above, testable components tend to be much more modular. It follows that modularity is a corollary of testability.

It should be noted that, it takes significant effort to develop and maintain test code. In the logback project, we spend about 30% of our time on documentation, at least 50% maintaining tests, with only the remaining 20% of time spent on the actual code. Writing good test code requires real skill.

Monday, March 31, 2008

Obsessed by Gentoo

After my previous article about Gentoo, I had the audacity to update the "mailman" package from version 2.1.5 to 2.1.9. While the mailman code itself had not changed much, Gentoo's way of structuring of the installed package had changed. It took me about 4 hours of uninterrupted work to get our mailing lists working again.

At this stage, you may think that I had learned my lesson, and would not embark on any new adventures. I had not yet realized my luck so far and would soon pay the price of my temerity.

On Torino, another production machine running Gentoo, my subsequent attempts at upgrading X11 and Apache resulted in complete failures. Notwithstanding my relentless attempts, Torino can no longer run X11, due to a suspected gcc/glibc versioning problem.

Updating of Apache from 2.0.x to 2.2.x had an interesting twist on its own. As in mailman, Gentoo's packaging structure for Apache changed between e 2.0.x and 2.2.x. More specifically, the directives for specifying which modules were included in the server changed. Fortunately, there were instructions on migrating from the old to the new structure. It took me about 2 hours to understand that the migration instructions had a little bug. Dashes within package names had to be written with underscore characters. After that tweak, the new build of the Apache server included all the required modules.

For our web-applications, we rely heavily on reverse proxying, that is on Apache's mod_proxy module. This module witnessed substantial enhancements between Apache 2.0.x and 2.2.x. Given that Torino is a production server, I had only a limited number of hours to perform the migration. At about 1 AM Monday morning, manually reverting to Apache 2.0.x was the only remaining option.

As I understand it, Gentoo supports the installation of only a single version for any given application package. It does not support the simultaneous installation of the same package. In the Apache case, it would have been nice to simultaneously support the installation of Apache version 2.0.x and 2.2.x. Alternatively, it would have been acceptable if Gentoo allowed me to revert to an older version of Apache. However, it seems that Gentoo supports only one path for updates, i.e. upgrades.

In conclusion, while Gentoo's package menagement mechanism is pretty fantastic, it still does not allow for seamless upgrades. Others have made similar observations.

Wednesday, March 26, 2008

Fascinated by Gentoo

Gentoo is a linux meta-distribution where new packages are compiled on your machine before they are installed. We chose Gentoo about three years ago because it was well documented and also the only one that supported our AMD 64bit machines.

In the last three years, we never felt the need to perform regular updates. However, yesterday I noticed that on one particular machine, the log files were getting very large. Switching to syslog-ng instead of the good-ol syslogd package seemed the right thing to do. However, since we never upgraded the platform, the view of the available packages, i.e. the portage tree on the host, was too old. Thus, the installation of syslog-ng failed. The portage tree needed to be upgraded. By the way, portage is Gentoo's installation framework.

Thus, I updated the portage tree by issuing a "emerge –sync" command. However, in the meantime the package description format had changed so that the version of portage on the host could not read the updated portage tree. It appeared as if the whole portage tree was corrupt. Thus, a chicken-and-egg situation emerged. I could not install the latest version of portage because my portage tree was unreadable by the current portage software.

Anyway, once I realized what was going on, I copied over an older version of the portage tree from a backup, installed a new version of portage and then updated to the latest portage tree.

Even after this relatively troublesome event, I still love Gentoo and the stability it provides. Our Linux systems just work without a reboot for years on end. The latest experience notwithstanding, it's usually very easy to install or update new packages.

More generally, dependency management is one of the key features of any serious software platform. For instance, Maven, a java build-tool, is becoming increasingly popular, imho mainly because it helps with dependency management.

Sunday, January 06, 2008

How efficient is a Prius?

I received my brand new Toyota Prius three weeks ago. Technologically, it seems like quite a remarkable car. I say seems because I don't really know much about cars. I can't really tell whether the electrical engine is anything more than a marketing gimmick, especially considering its ridiculously low autonomy (about 1km).

Obviously, the Prius is noteworthy not because of its powerful engine -- its accelerations won't ever rivet you to your seat -- but because it is supposed to use little gas. Anyway, after three weeks of driving, today it alerted me through a blinking square on the dashboard that it needed a refill. When I brought it to the gas station, her odometer was showing 670 km. Her tank was full after drinking 41.25
liters of unleaded gasoline, which brings me to my main point.

Assuming the gas tank was completely full when I got it (which is somewhat an iffy proposal), my Prius yielded an average of 6.1 liters for 100 kilometers. By the way, 6.1 l/100km is equivalent to 38.56 MPG (US gallons). This result is considerably worse than the mileage advertised by Toyota, i.e. 4.4 liters per 100 kilometers, i.e. 53.46 MPG. Nevertheless, the Prius still emerges as being more fuel efficient than most other cars.

Of course, a single measurement is not necessarily representative, especially considering that those 670 km included a drive to a nearby mountain. After googling for a few minutes, I stumbled upon a US-governmental page showing the MPG obtained by other drivers . My MPG happens to be worse than the average shown on that page.

Update


The second time around, I measured 810 km for 42.02 liters of unleaded gasoline, or 5.2 liters per 100 km (45.3 MPG). This result is very much aligned with the average reported by other drivers. It could probably be improved, as the 810 km included a trip to a nearby mountain.

The difference with the first result can be explained by the fact that on the highway I now drive a little lower than the authorized limit, at 110 km/h instead of 120km/h.