Thursday, May 06, 2010

Encouters in Groovy I

You can take any .java file and rename it .groovy and the result will be valid Groovy. Compared to other contenders in the new-Java space such as Scala or Clojure, the syntactical-backward compatibility offered by Groovy is undoubtedly an important and possibly a decisive advantage.

Let me demonstrate. Here is a very simple unit test measuring the performance of a trivial arithmetic operation.
// file MyTest.java
package ch.qos;
import org.junit.Test;
public class MyTest {
static int LEN = 100*1000;

@Test
public void smoke() {
// let the JVM warm up
loop();
loop();
double result = loop();
System.out.println("Average duration per operation: "+result+ " nanoseconds");
}

double loop() {
long start = System.nanoTime();
double sum = 0;
for (int i = 0; i < LEN; i++) {
sum += i*1.0;
}
long end = System.nanoTime();
return (end - start) / LEN;
}
}

Running the above test yields:
Average duration per operation: 3.0 nanoseconds
As mentioned earlier, a valid Java class is also a valid Groovy class. So renaming "MyTest.java" as "MyTest.groovy" results in a valid Groovy class. With JetBrains IDEA which provides pretty nice Groovy support, I can run "MyTest.groovy" as any other junit test. Here is the result:
Average duration per operation: 843.09 nanoseconds
Lo and behold, the same code runs 280 times slower when compiled as a Groovy class than its Java counterpart. If I were blogging for a sensation-driven news organization with an anti-Groovy agenda, I would now prematurely claim the death of Groovy and stop writing.
As I don't work for a sensation-driven news organization nor have an anti-Groovy agenda, I will try to mitigate the preceding results.
The code generated by Groovy works on objects instead of primitive types. For example, the 'i < LEN' check is done by invoking the compareLessThan() method in the
the ScriptBytecodeAdapter class part of the groovy runtime. This method operates on objects instead of the primitives types. I suspect that the dynamic-nature of Groovy forces it to invoke methods flexible enough to deal with untyped objects, instead of invoking more trivial byte code which the hot-spot compiler is pretty masterful at optimizing -- but that's just my hunch.

We can actually improve the performance of the loop by using more idiomatic groovy. Modifying the iteration from
for (int i = 0; i < LEN; i++) {
sum += i*1.0;
}
to
for (i in 1..LEN;) {
sum += i*1.0;
}
brings down the average duration per operation from 843 to 675 nanoseconds. By avoiding the integer to double conversions we can drastically improve performance. Here is the modified iteration:
for (double i in 1..LEN;) {
sum += i;
}
Surprisingly enough, this last optimization brings the average duration to 50 nanoseconds, a 17 fold improvement from the initial non-idiomatic version of the code running at 843 nanoseconds per operation.

We are still far from the 3 nanoseconds obtained from initial .java version of the code. Perhaps the code can be further optimized and close the gap with the original .java version.

Groovy is indeed slower than Java in tight loops. However, it so happens that the performance of most applications is I/O bound, so the practical performance impact of Groovy may be largely offset by the (developer) productivity gains it offers.

In conclusion, while blindly converting .java files to .groovy may result in a catastrophic degradation in performance, a more selective migration can result in significantly better code without serious degradation in performance.

No comments: