Tuesday, February 28, 2006

Problems on Algorithms

While 99.9% of Sudoku puzzles can be solved using relatively simple techniques, certain very difficult Sudoku puzzles require more advanced methods. One of these techniques consists of searching the puzzle for hidden sets, i.e. couples, triples or quadruples.
Again, puzzles which can only be resolved by searching for hidden sets occur very infrequently. Nevertheless, any self-respecting Sudoku solver must implement these more advanced techniques. Sudoku-Grok is no exception to this rule.

After half a day work, I managed to devise an algorithm to detect hidden couples. I thought that in principle the same algorithm could be used to detect hidden triples and quadruples. After all, it worked perfectly for couples, and the set size was just a parameter supplied to the algorithm. It was not hard to change the set size to three or four instead of two. Unfortunately, my approach assumed that each set was complete.

When the set size is two, the problems consist of detecting two sets of size two. My approach could detect three sets of size three, or four sets of size four. It so happens that when the set size is larger, say 3, hidden sets occur in three sets of size three but also two. Similarly hidden quadruples occur in four sets of size four but also three or two.

The problem was evidently more complicated than I thought initially. After several hours of unfruitful toil trying to find the proper recursive algorithm, it occurred to me that the problem could be approached as a choice of r elements among 9 elements, where r is 2, 3 or 4, the size of the desired hidden set.

Once I had realized that the problem could be expressed in terms of simple combinatorics, I turned to my copy of the book "Problems on Algorithms" by Ian Parberry. Lo and behold, page 120 of the book had an outline of a recursive algorithm generating all the combinations of r elements chosen among n. It was expressed on four (4) lines of pseudo-code which was just what I needed to get my hidden set problem solved. As it turns out, many sophisticated Sudoku techniques (but not all) can be implemented in terms of a choice of r elements in a set of n, where n is usually 9.

I can hear you reaching your keyboard to order a copy of "Problems on Algorithms". The bad news is that this book is out of print. Now for the good news: you can get it from Ian Parberry's web-site for free. Enjoy.

The triumph of obscenity

It appears that Hani Suleiman of bile blog fame has been elected to the Executive Committee of the Java Community Process. I find ithard to believe that the author of such extreme vulgarity and obscenity should be proposed, and even worse retained, to a position of influence in the java community.

Given that the election process looks fair and square, the blame does not lie with Sun Inc. but the electors, in this case the JCP membership.

As the number of ignored standards proposed by the JCP increases year over year, the inverse can be said about its influence. Thus, I don't think Hani can do much harm in his new position. However, his mere election to a position of leadership constitutes social endorsement of obscenity. There is something not quite right about socially promoting individuals who make a habit of egregiously insulting their peers.

I can only hope that the electors will have the wisdom to vote differently the next time his seat is up for election, sometime in 2008.

Saturday, February 25, 2006

Enhanced for Loop

The enhanced for loop introduced in JDK 1.5 makes it easier to loop through arrays and collections. However, as some of the other features added later in the evolution of the Java language, the enhanced for loop can behave unexpectedly.

For one, initializing array members within the enhanced for loop does not really initialize them. Thus, the following code

String[] sa = new String[2];
for(String s : sa) {
s = "hello";
}
for(String s : sa) {
System.out.println(s);
}

prints "null" twice instead of "hello". Despite of this pitfall, the enhanced for loop is a welcome addition.

Notwithstanding the purportedly increased safety they bring to the language, I can't say the same about generics. Java generics seem so much weaker and confusing than generics in the Ada language, where they are a core language feature, not an awkward afterthought.

Generics seem as one of those half-baked features capable of making the life of programmers miserable. With more experience, I hope my initial concerns about generics will reveal themselves to be unjustified.

Premature optimisation

Donald Knuth once wrote that premature optimization was the root of all evil. As an immediate corollary, Knuth's principle implies that a programmer should not attempt to optimize code that does not need it. I think that is one of most valuable advice that can be given to a software developer.

Of course, giving advice is cheap, following it is the harder part. For instance, I have this bad habit of trying to evaluate the computational cost of my code and almost unconsciously avoid constructs that seem too expensive even when they are elegant. In various developer circles the current consensus calls for optimizing code that is invoked very frequently and not worry about optimising rarely exercised code.

As the founder and developer of log4j, slf4j and LOGBack projects, as far as these projects are concerned, I usually feel compelled to write "optimised" code. Logging epitomizes the type of code that gets exercised frequently. However, a recent incident showed me that most of what one would call "justified optimisation" is often a big waste of developer resources.

Here I am, trying to develop software to generate Sudoku puzzles. Before generating puzzles, one has to be able to generate random sudoku boards. Obviously, the problem is intensely computational. I wake up at 7:00 AM bursting with energy and enthusiasm to implement (in Java) the random Sudoku board generation algorithm imagined the previous day. The algorithm is relatively complicated but I am confident to have it running by 10:00 AM. The prediction part should avoid wasting time exploring dead-end branches and significantly increase the speed of board generation.

At around 9:30, an initial version is ready for testing. I give it a whirl. What do you know? It completes in a blink of an eye but unfortunately the result is not quite right. Oh, it's an indexing problem! I tinker with the code and give it another try. This time the algorithm stops half-way through the process. While trying to understand the results, I start riddling the code with System.out.println statements with the intention of removing them when the code is fixed. (You usually do not want to leave logging statements in number-crunching code. If you are not going to leave them, you should not add them in the first place. Hence, the use of System.out.println instead of logging.)

Anyway, lunch time comes and passes by as quickly as it arrives. At about 6:00 PM I am completely exhausted and my stupid algorithm is still not working. At that time, I decide to implement the algorithm in simpler a way, without any qualms about speed. Dan Rice has a nice description of it in his weblog. It consists of about 50 lines of recursive code. After 20 minutes I get it running with apparently flawless results. The simple algorithm is fast too. On my PC, it takes about 10 microseconds to generate a random board. Ten microseconds is quick, certainly quick enough for my Sudoku needs.

Lesson learned. And just on time for supper.