Friday, June 27, 2008
Does Gentoo make sense?
Come to think of it, I am actually writing this while waiting for Gentoo to upgrade GCC from 3.4.4 to 4.3.1. It may not sound much but it's actually a big deal. GCC is probably the package that takes the longest slowest to build, in order order of two hours, even on recent dual-core 64bit machines.
Portage, gentoo's package management system, when installing a package, say X, will fetch X's sources from some repository, and then build X from the sources. For example, if package X was written in C, it will compile the sources and then link the resulting binary files into an executable program. As mentioned previously, this process of building from sources can take from minutes to several hours depending on the package and its dependencies. Note that if package X requires package A, B and C, and B requires D and E, and D requires F, Portage will build A,B,C,D,E and F in the correct order.
Clearly, building from sources is much slower than fetching the binary package. But, building from sources will implicitly check that the required dependencies for the package under construction are available. If X requires A,B,C,E and F if any of those five packages is missing, then X won't compile and hence will not install. Thus, if Portage is able to install X, then you can be fairly confident that it is installed correctly on your system. Of course, you would still need to configure X according to your needs, but as far as the binaries of X and its dependencies are concerned, you are reasonably safe.
Contrast it with installing binary packages. You can never be sure that you are not missing a library or if they have a conflicting version. Conceptually, Gentoo vs. Ubuntu is analogous to compiled and statically typed languages, e.g. C++ or Java, versus interpreted and dynamically typed languages, e.g. Python or Ruby.
Interpreted and dynamically typed languages enjoy a shorter development cycle but are somewhat more brittle whereas compiled and statically typed languages have a slower development cycle but are often deemed more reliable.
Another analogy would be an RDBMS enforcing data integrity constraints e.g. MySQL+InnoDB versus an RDBMS ignoring data integrity constraints, e.g. MySQL+MyISAM.
As it stands, Portage is still building GCC.
Monday, April 07, 2008
The pomodoro technique
Essentially, it consists of dividing your workday into uninterrupted chunks of 25 minutes, plus 5 minute pauses. You set a by setting a kitchen timer to go off in 25 minutes and do whatever work you need to get done without letting yourself being interrupted by external or internal (yourself) distractions.
It's very simple, helpful and enjoyable technique.
Testable by design
It takes some experience to realize that exposing such members is a low price to pay for increased testability. At some later stage, you may even begin considering tests as an important force driving the design of your components.
Testable components expose more of their internal state to the outside world. They also tend to be more modular. For example, if a CUT requires a database connection, then the developer might modify it so that it admits DAO and then inject a mock of the DAO using a mock database, allowing the CUT to be tested independently of the database.
Persistence, concurrency, distribution and transaction support are accepted to be wide-reaching and fundamental aspects of a component. For instance, a component designed for persistence may differ substantially from the same component without persistence support. In a similar vein, a component designed with testability in mind will show different characteristics than its less testable counterpart. In particular and as mentioned above, testable components tend to be much more modular. It follows that modularity is a corollary of testability.
It should be noted that, it takes significant effort to develop and maintain test code. In the logback project, we spend about 30% of our time on documentation, at least 50% maintaining tests, with only the remaining 20% of time spent on the actual code. Writing good test code requires real skill.
Monday, March 31, 2008
Obsessed by Gentoo
At this stage, you may think that I had learned my lesson, and would not embark on any new adventures. I had not yet realized my luck so far and would soon pay the price of my temerity.
On Torino, another production machine running Gentoo, my subsequent attempts at upgrading X11 and Apache resulted in complete failures. Notwithstanding my relentless attempts, Torino can no longer run X11, due to a suspected gcc/glibc versioning problem.
Updating of Apache from 2.0.x to 2.2.x had an interesting twist on its own. As in mailman, Gentoo's packaging structure for Apache changed between e 2.0.x and 2.2.x. More specifically, the directives for specifying which modules were included in the server changed. Fortunately, there were instructions on migrating from the old to the new structure. It took me about 2 hours to understand that the migration instructions had a little bug. Dashes within package names had to be written with underscore characters. After that tweak, the new build of the Apache server included all the required modules.
For our web-applications, we rely heavily on reverse proxying, that is on Apache's mod_proxy module. This module witnessed substantial enhancements between Apache 2.0.x and 2.2.x. Given that Torino is a production server, I had only a limited number of hours to perform the migration. At about 1 AM Monday morning, manually reverting to Apache 2.0.x was the only remaining option.
As I understand it, Gentoo supports the installation of only a single version for any given application package. It does not support the simultaneous installation of the same package. In the Apache case, it would have been nice to simultaneously support the installation of Apache version 2.0.x and 2.2.x. Alternatively, it would have been acceptable if Gentoo allowed me to revert to an older version of Apache. However, it seems that Gentoo supports only one path for updates, i.e. upgrades.
In conclusion, while Gentoo's package menagement mechanism is pretty fantastic, it still does not allow for seamless upgrades. Others have made similar observations.
Wednesday, March 26, 2008
Fascinated by Gentoo
In the last three years, we never felt the need to perform regular updates. However, yesterday I noticed that on one particular machine, the log files were getting very large. Switching to syslog-ng instead of the good-ol syslogd package seemed the right thing to do. However, since we never upgraded the platform, the view of the available packages, i.e. the portage tree on the host, was too old. Thus, the installation of syslog-ng failed. The portage tree needed to be upgraded. By the way, portage is Gentoo's installation framework.
Thus, I updated the portage tree by issuing a "emerge –sync" command. However, in the meantime the package description format had changed so that the version of portage on the host could not read the updated portage tree. It appeared as if the whole portage tree was corrupt. Thus, a chicken-and-egg situation emerged. I could not install the latest version of portage because my portage tree was unreadable by the current portage software.
Anyway, once I realized what was going on, I copied over an older version of the portage tree from a backup, installed a new version of portage and then updated to the latest portage tree.
Even after this relatively troublesome event, I still love Gentoo and the stability it provides. Our Linux systems just work without a reboot for years on end. The latest experience notwithstanding, it's usually very easy to install or update new packages.
More generally, dependency management is one of the key features of any serious software platform. For instance, Maven, a java build-tool, is becoming increasingly popular, imho mainly because it helps with dependency management.
Sunday, January 06, 2008
How efficient is a Prius?
Obviously, the Prius is noteworthy not because of its powerful engine -- its accelerations won't ever rivet you to your seat -- but because it is supposed to use little gas. Anyway, after three weeks of driving, today it alerted me through a blinking square on the dashboard that it needed a refill. When I brought it to the gas station, her odometer was showing 670 km. Her tank was full after drinking 41.25
liters of unleaded gasoline, which brings me to my main point.
Assuming the gas tank was completely full when I got it (which is somewhat an iffy proposal), my Prius yielded an average of 6.1 liters for 100 kilometers. By the way, 6.1 l/100km is equivalent to 38.56 MPG (US gallons). This result is considerably worse than the mileage advertised by Toyota, i.e. 4.4 liters per 100 kilometers, i.e. 53.46 MPG. Nevertheless, the Prius still emerges as being more fuel efficient than most other cars.
Of course, a single measurement is not necessarily representative, especially considering that those 670 km included a drive to a nearby mountain. After googling for a few minutes, I stumbled upon a US-governmental page showing the MPG obtained by other drivers . My MPG happens to be worse than the average shown on that page.
Update
The second time around, I measured 810 km for 42.02 liters of unleaded gasoline, or 5.2 liters per 100 km (45.3 MPG). This result is very much aligned with the average reported by other drivers. It could probably be improved, as the 810 km included a trip to a nearby mountain.
The difference with the first result can be explained by the fact that on the highway I now drive a little lower than the authorized limit, at 110 km/h instead of 120km/h.
Tuesday, December 04, 2007
XP Days conference
Organization of the conference
With 115 participants, and 4 parallel sessions, the conference had a friendly and personal atmosphere. It was also very well organized. At the beginning of each day, the presenters had 60 seconds to stand up and "sell" their session. This made it easier to choose among the 4 parallel sessions.
Product owner
In one of the hand-on sessions, we learned how important it was to have a product owner (PO) closely involved in the project. XP and Scrum talk about "customer on site". This point was also mentioned by other participants in informal chats. It became clear that having a readily accessible PO, someone capable of deciding on and prioritizing the product feature set, made a big difference.
Retrospectives
In my humble opinion, one of the most powerful ideas from the XP/agile world. Basically, it means that the team members take the time to reflect on the various processes and improve upon them. Retrospectives happen frequently, differentiating them from project post-mortems. At the end of the first day of the conference, the organizers had a retrospective on the conference itself, improving it on the fly.
TDD (test driven development)
Excellent development practice but which can end up warping your mind. I thought I was practicing TDD for some time but apparently not well enough according to the opinion of the purists. Supposedly, you have to make a consistent effort so as to come up with the tiniest possible change on the implementation barely sufficient to make the tests pass. It made feel like my mind was in shackles. Apparently, you get used to it. I hope I never do.
In other sessions, I have learned that tests can be considered as a specification. As such, the test phase is more akin to design.
To write maintainable tests, you can start by asking yourself whether by (only) reading the test code one can come up with the solution, i.e. implementation. Once you do that, you can start viewing the test code as the origin of the implementation.
Teams
Teams need tome to gel. The arrival or the departure of a member will disturb the team dynamic. Some people talk about a new team after any change to the team. It may sound extreme but I think there is some truth to it.
Agility and co.
I was surprised to discover that agile methods require a lot of discipline. XP and Scrum define detailed procedures that some people follow religiously. The no-compromise/no-prisoners-taken/all-or-nothing approach of certain participants seemed disturbingly martial, on the verge of the intolerant.
Having said that, there are many excellent ideas brewing in the Agile world. Next time you stumble upon an XPDays conference in your neighborhood, I'd recommend that you attend.
Sunday, September 02, 2007
Yet another choice
I am inclined to agree with Dion but for different reasons. Writing a good abstraction layer for two or more distinct systems takes serious effort. I'd go as far as declaring that the task is impossible unless the systems in question are very similar or owners of these systems unconditionally submit to the authority of the abstraction layer.
In the case of log4j and java.util.logging (JUL), Jakarta commons-logging (JCL) was only able to partially abstract the underlying APIs because their core APIs are similar both conceptually and structurally. However, JCL was not able to abstract parts below the core API. For example, the JCL does not offer any help with respect to configuration of the underlying logging system. SLF4J fares only a little better, in that it offers abstractions for both MDC and Marker, in addition to the core logging API.
JDBC can be cited counter example of a successful abstraction layer. However, it is successful insofar as the RDMS providers submit to the authority of JDBC specification. They all go out of their way to implement a driver compatible with the latest version of the JDBC specification. Moreover, RDMS applications already share a similar structure by way of SQL.
When the systems differ substantially, it is nearly impossible to bridge the gap. Is there an abstraction layer bridging relational and OO databases? I think not. The relational/OO impedance mismatch gave birth to major development efforts. Take Hibernate for instance. Would you dream of writing Hibernate as a weekend project?
So why did JCL, with all its warts, catch on like wildfire? Because JCL provides a convenient answer to the log4j vs. JUL dilemma faced by authors of most Java libraries. The dilemma does not exist in other languages because there usually is one predominant logging system for the language. In Java we have log4j getting most of the mindshare, with JUL looming in the background, not much used but not ignorable either -- hence the dilemma.
Anyway, Dion has a point. We, in the J2EE community, do indeed waste too much time dabbling in secondary matters such as logging, but we only do so because we have the luxury of choice. We can chose between log4j, logback or JUL as our logging system. We can choose between Ant, Ivy or Maven for our builds. We can choose between Eclipse, IDEA and Netbeans for our IDE. We can choose between JSF, Tapestry, Spring, Struts or Wicket as our web-application framework.
Making choices takes time and effort but it also exerts a powerful attraction on our psyche. When presented with the choice, programmers (to the extent that we programmers can be assimilated to humans) will prefer the situation where we can choose between multiple options than the situation when we are presented with only one option.
Java presents us with more choices than any other language, probably because it is also the most successful language in history. Of course, you already know that successful does not necessarily mean best.
Anyway, I am quite happy see SLF4J being adopted so massively.
Friday, June 29, 2007
GIT vs Subversion
In this particular presentation, I found Linus to be opinionated and rather unconvincing. He is extremely critical of CVS and Subversion. While GIT may be well-adapted to Linux's development model, I believe Subversion get the job done in other environments.
Martin Tomes, in his comments about GIT, nails the point. GIT and Subversion aim at different development models. While not perfect, the classical (centralized) model works well in both large and small projects, open-source or not.
The GIT project publishes a detailed albeit biased comparison between GIT and Subversion. The comparison makes a convincing case on why GIT offers better support for merges. The same page also mentions that the user interface for Subversion is better.
Monday, June 11, 2007
Selling YAGNI
YAGNI tends to sell well with developers. It prunes needless work. However, with customers who ask for features, the YAGNI principle does not sit quite as well. People in general do not appreciate their decisions to be questioned and YAGNI can be resumed to one question. "Do you really need this feature?" The answer is often yes, forcing the skeptic in me to repeat the question in perhaps a modified form. Most people, customers included, do not like to be challenged, especially if done with some insistence.
Pruning requirements to mere essentials takes both work and courage. In the eyes of the customer, the alternative, i.e. asking for potentially useless features, may often look both easier and less risky.
I try to use the arguments as advocated in the c2 wiki, the feature that is implemented by anticipation now may be radically different than the feature actually needed in the future.
So how do you apply the YAGNI principle in a real-world environment? What are the arguments that may sway your customers or fellow developers?
Tuesday, May 29, 2007
Evolving a popular API
Take Tapestry for example. It has evolved over seven years and five iterations to become what it is today. Unfortunately, some of these iterations were not backward compatible, thus purportedly negatively impacting Tapestry's adoption rate.
Offering a painless migration path to users may be necessary element to keep your existing user base, but as any developer who has attempted to preserve 100% backward-compatibility will tell you, such an ambitiuos goal will quickly begin to consume eons of your time.
Unless you are Microsoft or some other entity with serious resources, you will need to make a choice between 100% compatibility and innovation. In my experience, you can't both improve your design and keep 100% (absolute) compatibility.
However, if you aim a little lower than 100%, you can keep evolving your API without severely impacting your existing users. Most APIs have parts intended for internal use and other more public parts intended for use by the wider public. Changes to the internal parts may effect a handful of users, say one out of every thousand users. In contrast, changes to the public API will affect every user.
If such a distinction makes sense for your API, confine your incompatible changes to the internal part of your API. As mentioned earlier, these changes may affect a small proportion of your users, which may still number in the hundreds. Nevertheless, causing discomfort to a tiny minority of your users is still much better than a dead, i.e. non-evolving, API.
Friday, May 18, 2007
Dell delivers (not!)
We signed the order on the 30th of April 2007 and paid for it on the 3rd of May. Tracking the order on Dell's web-site, we noticed that the order was not being processed. I contacted the sales person to inquire about the order. She said that as fat as she could tell no payment was received and that she needed proof of payment to look into the matter. After sending her proof of payment, it took another day for the accounting team to match our payment with our order. Nevertheless, with the payment glitch fixed, the laptop went into preproduction on the 9th, was finished the next day and shipped by UPS on the 11th with expected delivery on Wednesday the 16th of May.
Lo and behold, we received it on the announced date, at around 11 AM. I was quite excited to receive this new laptop as a replacement for my older Inspiron 5100 (also from Dell). After 4 years of good and loyal service, while my old companion still works nicely, it weighs a hefty 3.5kg (7.7lbs). Since I have to schlep it on foot for about an hour each work day, 1.5kg (3.3lbs) less weight on my back is something I was looking forward to.
Opening the package, all the components were there. Unfortunately, instead of weighing 2.0kg (4.4lbs) my Latitude D620 weighs 2.5kg (5.5lbs), that is a 25% difference compared to my order and Dell's own specifications. Contacting the sales person, she proposed to sell me 4 cell battery, purportedly lighter than the 6 cell battery I currently had. Unconvinced, I asked to speak to her manager and somehow got disconnected. Sigh.
The second time I called I was put in contact with a customer service representative, who, recognizing the problem, promised to replace my laptop with a model of my choice. Needless to say, I was quite impressed by Dell's generous offer. To good to be true, she called an hour later reneging on her previous offer, under a completely bogus pretext. Let me cut a long story short by saying that there is a limit to the amount bull this particular customer (yours truly) was willing to put up with.
How can Dell hope to retain customers when what they deliver only approximates what they advertise ? One of the customer support people at Dell went as far as acknowledging that Dell was minimizing their laptop's weight to increase sales and that other vendors were also playing the same dubious game. One thing is for sure, we won't be buying another Dell product anytime soon.
Friday, March 02, 2007
Reading XFire code
The various SOAP and WS-* related specifications have the reputation of being tricky and difficult to understand. The latest project I am involved in requires a relatively deep understanding of WS-*. One way to gain understanding of a specification is by closely studying an implementation of it. Spurred by my previous pleasant experience with it, XFire happens to be that implementation of choice.
From what I can tell, the code is a pleasure to read and it feels like it is the result of fairly good design.
Monday, February 26, 2007
Founders at work
After reading the first 3 chapters of "Founders at work" by Jessica Livingston, I can't help but recommend this book. Compared to many other books where the fluff in the narrative ends up diluting the content, the direct language of the various founders is both refreshing and inspirational. Each story if filled with unsophisticated yet brilliant ideas, each resembling a small gem.
Friday, February 09, 2007
SLF4J and logback gaining traction
We are not at the same levels of popularity as commons-logging or log4j. Nevertheless, it is very encouraging to see users responding favorably to our work. It feels like the early days of log4j, and that's pretty damn exciting.
Thursday, February 08, 2007
Advantage of open source
More importantly, the API of the closed-source product, while very similar and accomplished the *identical* task, felt awkard. I guess that bouncing ideas off users and listening to what they have to say makes a real difference at the end.
Although clearly at a commercial disadvantage, an open-source project has a structural advantage at creating a better product. Of course for really large products where the combined efforts of dozens of programmers are needed for prolonged periods, closed-source remains a valid alternative.
Thursday, December 21, 2006
Release procedures
There is a growing need to increase productivity of development teams, an industrialization of sorts. However, this need has to be balanced with the imperatives of creativity. Procedures affecting the day to day lives of developers need to be pragmatic and low cost. The time at the disposal of a productive developer is a scarce and expensive resource. As such, I am surprised to see Apache, an open-source icon, indulge in heavy-handed procedures. Again, it's only a proposal, and hopefully it won't be accepted in its current form.
"Release early, release often" has been one of the core mantras of open-source development. The ability to release frequently brings a level of reactivity highly appreciated by OS users. A multitude of procedures which inexorably hamper reactivity, need to be weighed against the purported benefits.
Of course, not every procedure is bad. Development teams need to be coordinated, with each developer assigned clearly defined tasks. Scrum attempts to deal with this problem. Scrum from the trenches gives an easy to read description of the approach advocated by Scrum aficionados.
Friday, December 15, 2006
Migrate from log4j to logback in seconds
applications in production from log4j 1.3 over to logback version 0.7. We have done this without changing a single line of code in our applications but by merely replacing the file log4j.jar with log4j-bridge.jar in the relevant WEB-INF/lib directories.
Log4j-bridge intercepts calls to log4j's Logger class and transparently redirects them to logback.
I am thrilled to see that for our real-world applications previously using log4j, the migration process to logback was quick and painless.
Friday, November 10, 2006
Continuum running and configured in 20 minutes
I am still not completely sold to the idea of continuous integration (CI). As I understand it, in practice, continuum will check out the latest sources from source repository, build and run the test on the CI machine, and notify our team if anything goes wrong. Already at this early stage, Continuum feels like a new member of our team. The question is whether this new member is worth the maintenance. However, from the little experience gained in the last few days, Continuum seems to do what it is supposed to do without getting in the way. A new build is done only if the contents of the source repository change, and notifications are sent only when the latest build results differ from the previous one.
In short, once you've sold your soul to M2, continuous integration via Continuum is a piece of cake.
Saturday, November 04, 2006
Solution to the Maven2 version number problem
I recently experimented with a solution to the above problem. It's now part of the SLF4J project (which has 10 or so modules).
The idea is to declare the version number for the whole project as a property, namely "aversion" (pun intended), in the parent pom. The parent pom's own version number can be anything as long as it ends with "SNAPSHOT".
Here is an excerpt from SLF4J parent pom:
<project>
...
<groupId>org.slf4j</groupId>
<artifactId>slf4j-parent</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>pom</packaging>
<name>SLF4J</name>
<properties>
<aversion>1.1.0-RC0</aversion>
</properties>
....
</project>
Child modules' version is specified via the ${aversion} property. Children's reference to their parent's version is hard coded. However, since the parent pom's version is a SNAPSHOT, child modules will see the changes in the parent pom. In particular, if the parent pom changes the value of ${aversion}, the children will see the change.
Here is the pom.xml file for the slf4j-api module.
<project>
<parent>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-parent</artifactId>
<version>1.0-SNAPSHOT</version>
</parent>
<modelVersion>4.0.0</modelVersion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>${aversion}</version>
<packaging>jar</packaging>
<name>SLF4J API Module</name>
...
</project>
Unless I've missed something, this hack seems to work just fine. I would be interested to know whether there is a downside it.