Within a few days of my editorial suggesting that Java 7 be adopted quickly, news began to leak out that there were showstopper bugs in the Java 7 HotSpot compiler. I'll get into the defects shortly, but what really turned up the heat was Oracle's decision to ship the compiler aware that the known defects would cause one of two types of errors: hang the program or silently generate incorrect results. Given that Java 7 took five years to see light, it seems to me and many others that Oracle could have waited a bit longer to fix the bug before releasing the software. To a large extent, there is a feeling in the Java community that Oracle does not understand Java (despite the company's earlier acquisition of BEA). That may or may not be, but I would have expected it to understand enterprise software enough not to ship a compiler with defects that hang a valid program.
The problem, from what is known so far, derives from a command-line optimization switch on the Java compiler. This switch incorrectly optimized loops, resulting in the various reported errors. In Java 7, this switch is on by default, while it was off by default in previous releases. Regardless of the state of the switch, the resulting optimizations were not tested sufficiently.
This is a curious problem, because compilers are one of the most demonstrably easy products to test. Text file, easily parsed binary file out. Or earlier in the compilation process: text file in, AST out. The easy generation of input and the simple validation of output make it possible to create literally tens of thousands of regression tests that can explore every detail of the generated code in an automated fashion. These tests are known to be especially important in the case of optimizations because defects in optimized code are far more difficult for developers to locate and identify. The implicit contract by the compiler is that going from debug code during development to optimized code for release does not change functionality. Consequently, optimizations must be tested extra carefully.
But even if Oracle's in-house testing was not complete, I have to wonder why they were not testing the code on some of the large open-source codebases currently available. One program that reported the fatal bug was Apache Solr, which most developers would agree is a high profile, open source project. Projects such as Solr provide almost ideal test beds: a large code base that is widely used. Certainly, Oracle might not cotton to writing UATs and other tests to validate what the compiler did with the Solr code. But, in fact, it didn’t have to write a test at all. It simply needed to run the package and the SIGSEGV segmentation fault would occur.
I have to hope that this event will be a sharp lesson to Oracle to begin using the large codebases at its disposal as a fruitful proving ground for its tools. While the sloppiness I've discussed is disturbing, it's made worse by the fact that the same defects can be found in Java 6. The reason they suddenly show up now is that the optimization switch is off by default on Java 6, while on in Java 7. This suggests that Sun's testing was no better than Oracle's. (And given that much of the JDK team at Oracle is the same team that was at Sun, this is no surprise.) The crucial difference is that Oracle knew about the bugs prior to release and went ahead with the release anyway, while there is no evidence Sun was aware of the problems.
Oracle's decision was political, not technical. And here Oracle needs to really reassess its commitment to its users. Is Java a sufficiently important enterprise technology that shipping showstopper bugs will no longer be permitted? The long-term future of Java, the language, hangs in the balance.
Update: The defects will be fixed in Java 7 Update 2. Clearly, we suggest waiting until that release ships to pilot any migration to this release.