Random thoughts by Ceki Gülcü: 2006

Thursday, December 21, 2006

Release procedures

While lurking on the Maven dev mailing I came across a proposal for a release procedure. Given that the proposal is 3 pages long, in the ensuing discussion, the developers most heavily involved in the project, what one might call as the "doers", seemed to reserve a lukewarm welcome to the proposal.

There is a growing need to increase productivity of development teams, an industrialization of sorts. However, this need has to be balanced with the imperatives of creativity. Procedures affecting the day to day lives of developers need to be pragmatic and low cost. The time at the disposal of a productive developer is a scarce and expensive resource. As such, I am surprised to see Apache, an open-source icon, indulge in heavy-handed procedures. Again, it's only a proposal, and hopefully it won't be accepted in its current form.

"Release early, release often" has been one of the core mantras of open-source development. The ability to release frequently brings a level of reactivity highly appreciated by OS users. A multitude of procedures which inexorably hamper reactivity, need to be weighed against the purported benefits.

Of course, not every procedure is bad. Development teams need to be coordinated, with each developer assigned clearly defined tasks. Scrum attempts to deal with this problem. Scrum from the trenches gives an easy to read description of the approach advocated by Scrum aficionados.

Friday, December 15, 2006

Migrate from log4j to logback in seconds

It may sound insignificant but we migrated our most important
applications in production from log4j 1.3 over to logback version 0.7. We have done this without changing a single line of code in our applications but by merely replacing the file log4j.jar with log4j-bridge.jar in the relevant WEB-INF/lib directories.

Log4j-bridge intercepts calls to log4j's Logger class and transparently redirects them to logback.

I am thrilled to see that for our real-world applications previously using log4j, the migration process to logback was quick and painless.

Friday, November 10, 2006

Continuum running and configured in 20 minutes

Having been nagged by gump for ages, I've been reluctant to use a continuous integration system, at least until a few days ago. Notwithstanding my conservative attitude, colleagues have patiently and convincingly explained that having an automated system building and testing my projects, was a good thing. Taking their word, I've installed Continuum in about 5 minutes and had it configured for SLF4J and logback projects in about 15, of which most were spent entering the correct "scm" incantations in the relevant Maven2 project (pom.xml) files.

I am still not completely sold to the idea of continuous integration (CI). As I understand it, in practice, continuum will check out the latest sources from source repository, build and run the test on the CI machine, and notify our team if anything goes wrong. Already at this early stage, Continuum feels like a new member of our team. The question is whether this new member is worth the maintenance. However, from the little experience gained in the last few days, Continuum seems to do what it is supposed to do without getting in the way. A new build is done only if the contents of the source repository change, and notifications are sent only when the latest build results differ from the previous one.

In short, once you've sold your soul to M2, continuous integration via Continuum is a piece of cake.

Saturday, November 04, 2006

Solution to the Maven2 version number problem

In the past few days, I've ranted profusely about the difficulty of changing version numbers of modules in Maven2. As things stand currently, when a module references its parent, it must explicitly state the parent's version in hard-coded form. Once that import is done, the natural thing to do is to define the current module's version by the version of the parent. The down side is that, when the parent version changes, this must be reflected on all child modules. If your project makes simultaneous releases of all its modules in one sweep, as many projects seem to do, then you must manually change the version number of each module by changing the version number of the parent reference in each module. This is a time consuming and error prone process at best.

I recently experimented with a solution to the above problem. It's now part of the SLF4J project (which has 10 or so modules).

The idea is to declare the version number for the whole project as a property, namely "aversion" (pun intended), in the parent pom. The parent pom's own version number can be anything as long as it ends with "SNAPSHOT".

Here is an excerpt from SLF4J parent pom:

<project>
  ...
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-parent</artifactId>
  <version>1.0-SNAPSHOT</version>
  <packaging>pom</packaging>
  <name>SLF4J</name>
  <properties>
    <aversion>1.1.0-RC0</aversion>
  </properties>
  ....
</project>

Child modules' version is specified via the ${aversion} property. Children's reference to their parent's version is hard coded. However, since the parent pom's version is a SNAPSHOT, child modules will see the changes in the parent pom. In particular, if the parent pom changes the value of ${aversion}, the children will see the change.

Here is the pom.xml file for the slf4j-api module.

<project>
  <parent>
    <groupId>org.slf4j</groupId>
    <artifactId>slf4j-parent</artifactId>
    <version>1.0-SNAPSHOT</version>
  </parent>
 
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.slf4j</groupId>
  <artifactId>slf4j-api</artifactId>
  <version>${aversion}</version>
  <packaging>jar</packaging>
  <name>SLF4J API Module</name>
  ...
</project>

Unless I've missed something, this hack seems to work just fine. I would be interested to know whether there is a downside it.

Friday, October 27, 2006

Version numbers in Maven

I am probably the 50th person ranting about lack of inheritance of version numbers in Maven. Those who use Maven 2.0.4 will know what I am talking about. Maven developers plan to bring a solution with the next release, i.e. 2.1. Whatever the solution, it can't be worse than using the release plug-in, the officially recommended approach at present time. The release-plugin approach is both total crap and an insult to intelligence.

Nevertheless, I am fairly happy with Maven 2. Our 3 month investment is finally starting to pay off. If Maven were a car, it would safely take you from place to place as long as you did not switch on the radio. You see, the radio feature in Maven is not meant to be actually used. It is there for show only. As soon as you attempt to tune in to some music, the exhaust will sound off a loud bang and your car will need to be towed to the nearest garage for maintenance.

If you know of a good solution to the version number problem, please share your wisdom with us, mere mortals.

Friday, October 20, 2006

logback: a worthy successor to log4j?

As you may have already heard, I have been working on a new project
called logback, intended as a worthy successor of log4j.

On the 5th of December, I'll will be presenting (in French) the top 10
reasons for migrating your projects to logback. Issues such as
migration strategy, new APIs, SLF4J and Joran will be
discussed. Emphasis will be given to practical aspects and a live demo
rather than relatively theoretical considerations.This free-entry
event is organized by Hortis.

For those who may not be able to attend my presentation, here is abrief summary:

Logback is an improved version of log4j
Given that logback is built on top of SLF4J, you can switch to a another logging system at will.
The new Joran configration API sits at the core of logback.
The Status API for accrued resilliance. Logback's status API enables internal error reporting in a simple yet powerful way without adding complexity.
Documentation: already good and getting better by the day.
Filtering API. If you can imagine it, logback can filter it.
Marker objets to color log statements for highly-specialized processing.
Access module: easy integration with access logs generated by Tomcat or Jetty
JMX: You can configure logback at runtime using Mbeans.
TDD: logback has been developped test first. Moreover, logback is available for use, today.

The full presentation is also available. Anyway, I hope you will be able to attend.

Friday, October 13, 2006

Repated configuration with Joran

Developers have frequently express the need to output log files based on arbitrary runtime criteria such as by client, by task, etc.

Given all the flexibility offered by logback, writing such an appender should be easy. Let us call this new appender, MultiAppender. In principle, all MultiAppender needs to do is to create a new file as necessary according to the evaluation of incoming logging events. A configuration snippet might look like:

 <appender class="ch.qos.logback.core.MultiAppender">
  <fileNameCalculator class="ch.qos.logback.core.Calculator">
    <fileNamePattern>/some/path/%exp{userid}.log</fileNamePatttern</> 
    <expression name="userid">mdc.get("userid")</expression>
   </fileNameCalculator>
 </appender>

Thanks to Joran, logback's powerful configuration API, we can deal with unknown configuration elements such as fileNameCalculator, expression and so forth. It's a slam dunk for logback, or is it?

Although Joran can deal with arbitrary configuration instructions, it can do so only once. Assume we changed the requirements, so that MultiAppender acted like a multiplexer of appenders. Thus, instead of writing to different files, it delegated to a fully-fledged appender, according to various criteria, then MultiFileAppender would need to configure a complete appender repeatedly.

We are in the process of refactoring Joran so that it can be invoked repeatedly on parts of a configuration file. To my knowledge Joran is the only configuration system offering this capability (but I might be wrong.)

In a completely unrelated project, the same need of repeatedly configuring components came up. In this other project, we need to configure a tester, an object performing one or more tests. We create a tester, configure it, invoke its test methods, collect the results, and when done, throw the tester away to start all over again a few minutes later. We leveraged the unique capabilities of Joran to provide this particular lifecycle. Joran, part of logback-core, is a generic configuration system that you can use in your own projects to great effect.

Do ping me if you need further info,

Friday, September 08, 2006

Naked without eclipse

I recently had to make a few changes to a java project. The computer at my disposal, Gentoo Linux AMD64 machine, did not have eclipse installed. So I turned to my reliable editor/IDE, i.e. Emacs. Considering that Emacs was my IDE for several years in the past, I still felt astonishingly naked without Eclipse. Only under circumstances where a tool is withdrawn that one realizes how much comfort it brings and that you can't live without it.

Tuesday, September 05, 2006

Migrating to Maven 2

In the last few weeks I had to dabble in Maven, either migrating existing projects or creating new ones. The experience has been conclusive. As long as I adhere to Maven's philosophy, I am able to get things accomplished.

For instance, after the experience gained in mavenizing logback, migrating SLF4J to Maven2 from Ant has taken only few hours. The resulting project structure is imho a little easier to understand and to maintain.

As many users have observed in the past, Maven has a number of rough edges. For instance, the archetype plug-in is not well documented and can behave strangely at times.

One peeve I have is with the site plugin which has a nasty habit of stripping attributes from XML elements in xdoc files. I don't know why it decides that style classes that you painstakingly added to your document is unworthy of its consideration.

I find dependency scopes, a core feature of Maven, quite confusing. For instance, the article "Introduction to the Dependency Mechanism" mentions that the provided scope is only available on the compilation class path, while I know for fact that it is also available on the test class path. Moreover, the same article gingerly refers to several class paths without defining their meaning. Interestingly enough, the table defining the transitivity of indirectly referred projects is not properly labelled because, as mentioned previously, the xdoc plug-in has a dubious habit of stripping element attributes or even whole elements it does not like.

All the criticism aside, I am beginning to get the hang of Maven, albeit screaming and shouting. If someone tells you that Maven is a breeze, don't believe them. Build management is not an easy problem and does not suddenly become easy with Maven.

Friday, July 28, 2006

Logback version 0.1 just out the door

Sebastien Pennec recently announced the release of logback version 0.1. As the version number suggests, the logback framework still needs a lot of work. Nevertheless, we had to start somewhere...

For the next release, Sebastien and I will be concentrating on documentation and other essential features. The goal is to reach a maturity level such that developers can actually start using logback in their code.

Those skittish about committing to an unknown product can mitigate the risks by using SLF4J API, a fairly robust logging abstraction, directly supported by logback.

Monday, July 10, 2006

Maven -- whether you like it or not

As a relatively experienced Ant user, I usually try to avoid Maven, a build system intended to automate mundane and time consuming tasks such as project packaging and site generation.

Apparently, site generation is one of the most appreciated features of Maven. Given that the default site generated by Maven looks nice but otherwise is rather useless, you need to spend considerable amount of time to tame the results.

In the foreword of the book "Maven, A developer's notebook", Jason van Zyl states that "Maven is an incredibly boring technology". He goes on to say that if you use Maven, your development infrastructure will be so coherent, predictable and reproducible that you won't even think about it anymore.

Although his promise is alluring, I don't think it is fulfilled. For one, Maven is highly intrusive. It will heavily impact the structure of your project. For example, converting existing projects, even small ones, to Maven can be a painful experience. You need to adopt the Maven worldview to benefit from it. As far as Maven is concerned, it's ma-ven or the highway.

Don't get me wrong, Maven tackles the task of building projects, a very hard task to say the least. Your development infrastructure may indeed benefit from Maven. However, don't expect results without a heavy investment. Maven will cost you dearly, but so do Ant scripts. As discussed on the jakarta commons mailing lists, the question is which approach costs less. On BSF, Alan Gutierrez claims that Maven is a drag on productivity. Howard Lewis Ship explains in his blog the reasons why he has moved away from Maven. He also states (see comments section in his blog) that Maven 2 is a big imrovement over Maven 1.

Henri Yandel, who can be trusted to know a thing or two on multi-project builds, replies that Maven equals standardisation equals easy-to-understand. I think that that Henri summarizes the issue pretty well when he writes:

Slightly off-topic, if applying Maven to an Ant project, I would recommend trying to simplify your build system/code structure, don't hack Maven to fit the system you had in place for the Ant project. If you can't be standard-ish, stay in Ant - it's the right place for lots of customery.

As an alternative, you could mitigate the cost of dependency management (and only dependency management) with little known tool called Ivy which integrates well with Ant.

Tuesday, May 30, 2006

C#, first impressions

After several years of developing exclusively in Java, I recently ventured into the .Net/C# world. At a first glance, C# is eerily similar to Java. C# is not exactly Java, but almost.

As for the development environment, I was expecting Visual Studio to be spiffier than Eclipse or IDEA. I don’t think that is the case. Surprisingly, the automated doc support, the equivalent of Java’s javadoc, appears to be a second class citizen. Given that javadocs are such an essential component of the Java platform, it was a little unsettling to see the relative “lack” of support for automated doc generation in C#.

As far as the language is concerned, here what I like in C#
- switch statements accept strings as their input variable,
- checked arithmetic operations
- delegates
- verbatim strings
- partial types
- and did I mention delegates?

On the other hand, there are many things in C# that bother me:
- automated doc generation
- the PascalCase convention for methods and fields instead of camelCase
- the continued madness of operator overloading,
- structs
- the optional nature of exceptions
- "using" directive
- three different equality primitives, surpassing even JavaScript in incoherence.

Given the similarity between the languages, none of the above really matters if the development tools are not up to par. The Java world is blessed with excellent tools like Eclipse or IDEA. However, Visual Studio could hardly be qualified as shabby.

Given that two environments are roughly equivalent, I believe that we are witnessing a game of endurance between two camps, Microsoft, a company with deep pockets on one side, and a large number heterogeneous actors, from the single open source developer up to behemoths like IBM, on the other.

Thursday, April 27, 2006

VMWare and Gentoo

I've been using Gentoo on my Linux servers for the past several years, and it's a very nifty piece of software/distribution. Although compiling all the required packages can be a bit slow, the installation process is very well documented. I can testify that that installing Gentoo takes less than 24 hours, the working days for a basic system (including X windows) and the night to compile KDE. The other components, such as JDK, Apache, Tomcat (or Resin) take another hour or two.

Normally, it should be even quicker than that. The aforementioned figures are based on my latest experience installing Gentoo on a virtual PC on my Windows XP laptop using VMWare. Running Linux on my laptop allows testing software that runs comfortably only on Linux

As you might have guessed, I am extremely happy with the results. If you occasionally need to perform testing on a Linux server, I recommend running one with VMWare+Gentoo combination.

Tuesday, April 25, 2006

1st chapter of Tapestry in Action

I just finished reading the first chapter of "Tapestry in Action" by Howard M. Lewis Ship. I really like the style of the author. The introductory chapter clearly explains the conceptual differences between Struts, a servlet controlled framework, and Tapestry, a component controlled framework. The author also does a very good job of pointing out the weaknesses in Struts.

After 4 years of Struts, I am a little scared of the different programming model. Nevertheless, I can't wait to read the rest of the book to see if Tapestry lives up to its promise.

Monday, April 24, 2006

XML ain't a programming language

I happen to consider Spring a pretty nifty framework. Actually, many of the developers I know consider it nifty and I don't have any reasons to object. It is well-documented and rich in functionality.

After spending a day reading the documentation, I came to the conclusion that I had not understood it well enough and resolved to reread it at a later time.

The ability to configure a large application at deployment time through configuration files is hardly a new concept. Every Servlet container does it. Log4j does it. Logback does it, as well as any serious framework with an emphasis on flexibility and extensibility. Logback uses Joran package for its configuration parsing. Joran is derived from Jakarta Digester. Although amazingly powerful in certain aspects, as Jakarta Digester, it is not a general bean definition package.

IOC frameworks offer functionality beyond Digester derived configurators, in particular the ability resolve circular bean dependencies. You can configure many if not hundreds of beans using Spring. Unfortunately, this will cause your beans.xml to balloon to gargantuan proportions.

It seems that Spring allows a beans.xml file to be broken to smaller chunks by importing bean definitions from another file. I find it far more difficult to read bean definition files in XML than Java code. The difficulty increases exponentially with the size of the bean definition file. However, according to the documentation, the list of imported files must precede beans definitions in the importing file – a rather severe and unexpected restriction to say the least.

My impression is that IOC frameworks such as Spring need to offer support for a very large set of tools to be useful. Sucking in ever larger parts of the Java universe, it reinvents the Java language in XML. However, XML does not make a fun programming language.

I wonder if it is possible to deliver the promise of IOC non-intrusively and without resorting to large XML files.

Interestingly enough, unit testing presents a similar set of problems. However, as larger and larger portion of the Java universe gets mocked, we get farther and farhter away from the sprit of unit testing. Altough configuring unit tests is sometimes a non-trivial task, as far as I can tell, unit testing does not seem plagued by large config files in XML.

Wednesday, April 05, 2006

Race condition at the post office

Although only the fourth or fifth largest in Switzerland, the city of Lausanne currently possesses the biggest post office in the country. This post office is located at Place St-Francois at the center of the city. When you enter the post office, you are greeted by two ticketing machines, one at each entry of the office. After pressing a button, the machine spits out a ticket with a number printed on it. You wait until your number is displayed on one of the large and clearly visible digital panels hung at opposite ends of the office. When it's your turn, the panel chimes and then displays your number next to a letter, 'A' through 'R', symbolizing one of the 18 counters, each manned by a clerk.

My ticket read 811, with an estimated waiting time of 5 minutes. During those five minutes I got distracted and nearly missed the call for my number by counter 'N'. Actually, after several unanswered calls to my number, the clerk at counter 'N' gave up on 811 and called the next customer, in this case customer number 815. During the 20 or 30 seconds in which I failed to respond other counters summoned numbers 812, 813 and 814. Fortunately for me, I arrived a second before customer 815, and the clerk had already started serving my requests. Thus, ticket number 815 was consumed, with its owner stuck behind counter 'N' until I was done. There was no way for the clerk, who became aware of the problem, to resuscitate ticket 815 so that its owner could be more quickly served by one of the 17 other counters.

How would you design a ticketing service which would be simple to use for the clerks and still avoid race conditions?

After thinking more about this problem, I am beginning to think that contrary to the consequences of race conditions occurring in entirely automated systems, humans cope extremely well with inferequent race conditions -- or at least in this case we did. It was spontaneously obvious for all participants that customer 815 needed to wait patiently until the disrupting client (yours truly) was served. Thus, I tend to think that in presence of unintentional mistakes the current system is just fine as it is.

However, a malevolent customer could reproduce the aforementioned race condition at will. In an easier and totally devastating attack, the customer, or should I say foe, could ask the ticketing machine for n tickets instead of a single ticket.

It is infinitely harder to design systems capable of thwarting malevolent participants than systems dealing with distracted but otherwise benign users.

Wednesday, March 15, 2006

Fiction disguised as hands-on experience

I am not a big fan of Hani Suleiman. The feeling is apparently mutual. In response to my recent post about SLF4J's first release on TSS, Hani replied with what he claims are facts based on his hands-on experience.

Don't get me wrong, criticism is OK and even welcome. Accepting criticism is part of the open development process. It's fiction disguised as fact that I object to.

First claim:

Most implementations are forks of log4j and are written by Ceki, none of which are usable.

This one is easy to debunk. JDK 1.4 logging, x4juli and Simple-Log are developed by many developers, other than myself. He also claims that none of the SLF4J implementations are usable, a claim which is in contradiction with his own writings that follow a few lines below.

Except that many apps that depend on log4j blow up with nlog4j, due to slightly different method signatures than what they expect. So you can't use nlog4j if you use any of the apps that use log4j.

This claim, as stated above, is false, except for the 6 weeks between June 28th and August 16th in 2005.

NLOG4J is a drop in replacement for log4j. Code previously compiled with log4j will run fine with NLOG4J. Moreover, the same code will compile fine against NLOG4J (without any changes). Unfortunately, due to signature changes in some important logging methods, compilation against NLOG4J is sticky. Software compiled against NLOG4J will require NLOG4J to run. Hani's claim about incompatibility with log4j-compiled code is false. However, it was valid for 6 weeks, starting with the release of NLOG4J version 1.2.14 on June 28th 2005, ending with the release of NLOG4J 1.2.16 on August 28th, when the problem was fixed.

Perhaps Hani did really try out NLOG4J during those 6 weeks. Nevertheless, as a person voicing such strident criticism, he would have been well advised to check his claims against newer versions of the software, which were available for several months at the time he wrote his comments.

Hani's comments about JDK14 indicate that he has still not assimilated the need to switch logging systems.

Hani is not sure why "Simple" differs from System.err.println(). There are few reasons. "Simple" only prints messages for levels INFO or higher. Its output also contains more information than what System.err.println() provides. Most importantly, "Simple" is just one implementation of SLF4J API. You could switch to a different implementation in a matter of seconds.

According to Hani, "SimpleLog" looks sane probably because it's not produced by yours truly. I'll let you be the judge of that assertion.

LogBack: Seems to promise much, complex and big, but development seems to have stalled. Written by....yep, you guessed it.

Big, complex, development stalled? Hani's ability to slap qualifiers on software is mystifying, especially on software he has never seen. It's like pretending to have test-driven next year's car before it is out. Trust Hani to tell you all about the 2007 models a year before everyone else.

x4juli: A port of log4j to jdk14 API. Why anyone would use slf4j api backed by x4juli which is backed by jdk14 logging instead of jdk14 logging is a mystery that I suspect no one will ever solve.

Borris Unkel offers a reasonable rebuttal on this claim. Boris is assuming that Hani actually had the courtesy to learn about what x4juli does, which I suspect was not the case.

It's also often unclear with what needs to be deployed. In some cases you need two jars (the slf4j api and the impl), in some cases the impl contains the api.

Admittedly, this is a valid observation. For the sake of consistency, it might be better to have the API and the implementation in separate jar files, at least for the non-trivial implementations. Point well-taken.

In short, do yourself a favor and avoid this stuff. Stick to log4j and you will avert much sadness.

That's not an unreasonable advice in stand-alone applications but embedded components cannot afford to impose log4j on host applications.

The emerging pattern from Hani's comments is that he fabricates stories based on his limited understanding of the subject matter under the guise of hand-on experience. Hands on experience with a product comes with using a product for more than 10 seconds, not just casually browsing through its documentation.

Thursday, March 09, 2006

SLF4J 1.0 (final) is finally out

After 11 months of gestation, SLF4J version 1.0 (final) is finally out the door. For those who have not heard of it, SLF4J (Simple Logging Facade for Java) acts as a facade for various logging APIs allowing to the end-user to plug in the desired implementation at deployment time. A gradual migration path away from Jakarta Commons Logging (JCL) is also supported.

SLF4J does not rely on any special class loader machinery. In fact, the binding between SLF4J and a given logging API implementation is performed statically at compile time. Each binding is hardwired to use one and only specific logging API implementation. Thus, SLF4J suffers from none of the class loader problems or memory leaks observed with other approaches.

SLF4J also includes support for Marker objects, a feature which hopefully will be widely used as newer logging systems become available.

Shall we go JDK 1.5?

Alex Karasulu of Apache Directory project recently floated this question in their development mailing lists. The ensuing discussion was open and interesting with many of the critical questions raised quickly and clearly. Certain users were concerned about JDK 1.5 support in Websphere, or lack thereof.
Notwithstanding my personal reservations about generics, JDK 1.5 introduces truly useful language features. The enhanced loop feature makes it much easier to write algorithmic code riddles with various loops. Covariant return types allow for more meaningful factory methods, opening the door for powerful class structures. Type safe enums, varargs and static imports can lead to more elegant code. Even the more controversial generics are not hard to use.

Probably any developer who cares about the design of his (or her) API will want to take advantage of the new JDK 1.5 language features. However, recent reports claim that only 20% of users have switched to JDK 1.5, with 20% still using JDK 1.3, and the remaining 60% JDK 1.4.

During the discussion someone mentioned retroweaver as a way to bridge the language gap. Retroweaver allows classes files compiled with JDK 1.5 to be retrofitted to run under JDK 1.4. The techniques used in bringing about this miracle are nicely described in the tool documentation. As Trustin Lee observed, retrofitting JDK 1.5 language features has no bearing to methods or classes new in JDK 1.5.

Adopting JDK 1.5 language features but not new classes or methods will force the developer to manually check for the use of disallowed methods/classes. We tried a similar approach in log4j with mixed results. Although some developers paid attention to JDK compatibility rules, others did not.

In a nutshell, aiming for JDK 1.4 compatibility while still using JDK 1.5 language features is likely to be messy, especially if the development team is composed of heterogeneous people. Besides, who could blame Apache Directory developers to want to migrate to JDK 1.5 when JDK 1.6 looms just behind the horizon?

Tuesday, February 28, 2006

Problems on Algorithms

While 99.9% of Sudoku puzzles can be solved using relatively simple techniques, certain very difficult Sudoku puzzles require more advanced methods. One of these techniques consists of searching the puzzle for hidden sets, i.e. couples, triples or quadruples.
Again, puzzles which can only be resolved by searching for hidden sets occur very infrequently. Nevertheless, any self-respecting Sudoku solver must implement these more advanced techniques. Sudoku-Grok is no exception to this rule.

After half a day work, I managed to devise an algorithm to detect hidden couples. I thought that in principle the same algorithm could be used to detect hidden triples and quadruples. After all, it worked perfectly for couples, and the set size was just a parameter supplied to the algorithm. It was not hard to change the set size to three or four instead of two. Unfortunately, my approach assumed that each set was complete.

When the set size is two, the problems consist of detecting two sets of size two. My approach could detect three sets of size three, or four sets of size four. It so happens that when the set size is larger, say 3, hidden sets occur in three sets of size three but also two. Similarly hidden quadruples occur in four sets of size four but also three or two.

The problem was evidently more complicated than I thought initially. After several hours of unfruitful toil trying to find the proper recursive algorithm, it occurred to me that the problem could be approached as a choice of r elements among 9 elements, where r is 2, 3 or 4, the size of the desired hidden set.

Once I had realized that the problem could be expressed in terms of simple combinatorics, I turned to my copy of the book "Problems on Algorithms" by Ian Parberry. Lo and behold, page 120 of the book had an outline of a recursive algorithm generating all the combinations of r elements chosen among n. It was expressed on four (4) lines of pseudo-code which was just what I needed to get my hidden set problem solved. As it turns out, many sophisticated Sudoku techniques (but not all) can be implemented in terms of a choice of r elements in a set of n, where n is usually 9.

I can hear you reaching your keyboard to order a copy of "Problems on Algorithms". The bad news is that this book is out of print. Now for the good news: you can get it from Ian Parberry's web-site for free. Enjoy.

The triumph of obscenity

It appears that Hani Suleiman of bile blog fame has been elected to the Executive Committee of the Java Community Process. I find ithard to believe that the author of such extreme vulgarity and obscenity should be proposed, and even worse retained, to a position of influence in the java community.

Given that the election process looks fair and square, the blame does not lie with Sun Inc. but the electors, in this case the JCP membership.

As the number of ignored standards proposed by the JCP increases year over year, the inverse can be said about its influence. Thus, I don't think Hani can do much harm in his new position. However, his mere election to a position of leadership constitutes social endorsement of obscenity. There is something not quite right about socially promoting individuals who make a habit of egregiously insulting their peers.

I can only hope that the electors will have the wisdom to vote differently the next time his seat is up for election, sometime in 2008.

Saturday, February 25, 2006

Enhanced for Loop

The enhanced for loop introduced in JDK 1.5 makes it easier to loop through arrays and collections. However, as some of the other features added later in the evolution of the Java language, the enhanced for loop can behave unexpectedly.

For one, initializing array members within the enhanced for loop does not really initialize them. Thus, the following code


  String[] sa = new String[2];
  for(String s : sa) {
    s = "hello";
  }
  for(String s : sa) {
    System.out.println(s);
  }

prints "null" twice instead of "hello". Despite of this pitfall, the enhanced for loop is a welcome addition.

Notwithstanding the purportedly increased safety they bring to the language, I can't say the same about generics. Java generics seem so much weaker and confusing than generics in the Ada language, where they are a core language feature, not an awkward afterthought.

Generics seem as one of those half-baked features capable of making the life of programmers miserable. With more experience, I hope my initial concerns about generics will reveal themselves to be unjustified.

Premature optimisation

Donald Knuth once wrote that premature optimization was the root of all evil. As an immediate corollary, Knuth's principle implies that a programmer should not attempt to optimize code that does not need it. I think that is one of most valuable advice that can be given to a software developer.

Of course, giving advice is cheap, following it is the harder part. For instance, I have this bad habit of trying to evaluate the computational cost of my code and almost unconsciously avoid constructs that seem too expensive even when they are elegant. In various developer circles the current consensus calls for optimizing code that is invoked very frequently and not worry about optimising rarely exercised code.

As the founder and developer of log4j, slf4j and LOGBack projects, as far as these projects are concerned, I usually feel compelled to write "optimised" code. Logging epitomizes the type of code that gets exercised frequently. However, a recent incident showed me that most of what one would call "justified optimisation" is often a big waste of developer resources.

Here I am, trying to develop software to generate Sudoku puzzles. Before generating puzzles, one has to be able to generate random sudoku boards. Obviously, the problem is intensely computational. I wake up at 7:00 AM bursting with energy and enthusiasm to implement (in Java) the random Sudoku board generation algorithm imagined the previous day. The algorithm is relatively complicated but I am confident to have it running by 10:00 AM. The prediction part should avoid wasting time exploring dead-end branches and significantly increase the speed of board generation.

At around 9:30, an initial version is ready for testing. I give it a whirl. What do you know? It completes in a blink of an eye but unfortunately the result is not quite right. Oh, it's an indexing problem! I tinker with the code and give it another try. This time the algorithm stops half-way through the process. While trying to understand the results, I start riddling the code with System.out.println statements with the intention of removing them when the code is fixed. (You usually do not want to leave logging statements in number-crunching code. If you are not going to leave them, you should not add them in the first place. Hence, the use of System.out.println instead of logging.)

Anyway, lunch time comes and passes by as quickly as it arrives. At about 6:00 PM I am completely exhausted and my stupid algorithm is still not working. At that time, I decide to implement the algorithm in simpler a way, without any qualms about speed. Dan Rice has a nice description of it in his weblog. It consists of about 50 lines of recursive code. After 20 minutes I get it running with apparently flawless results. The simple algorithm is fast too. On my PC, it takes about 10 microseconds to generate a random board. Ten microseconds is quick, certainly quick enough for my Sudoku needs.

Lesson learned. And just on time for supper.