Friday, May 22, 2009

Auto-formatting source code

Code formatting is a sensitive subject. There are countless articles, blog posts, and other texts available on the subject. Everyone who's worked with other programmers knows that most programmers have a pretty good idea on how they want their code. Tabs vs spaces, 2/4/8 steps indenting, where to put the braces, etc.

There are several studies (see Death to the Space Infidels! for example) indicating that consistent code formatting improves team productivity. The hypothesis is that it is easier to read and understand code which follow certain rules. Atwood quotes a study on chess-players which compared the ability of remembering the layout of a chess-game. When the pieces were laid out as they might be in a game, the expert's memories were far superior than the novices, but when arranged randomly, there was little difference. The same type of idea seems to hold true for programmers as well.

I've written quite a lot of Java in Eclipse lately, and have become very fond if the "auto-format on save" feature introduced in Eclipse 3.4 (I think). The idea is that given a fairly detailed set of formatting rules, your editor (Eclipse in this case) formats your code automatically before saving. When I first tried it, I thought that it would be annoying and intrusive, but the feature turned out to be very nice. So nice that now I get annoyed when stuck in an editor which doesn't do this for me.

I started thinking about it, and realized that code formatting in this respect is a lot like code generation. Source code is just another representation of your program. It is different from the compiled code (or any other representation) because it is intended for human consumption and processing while the compiled code is designed to be executed in hardware. Compiler writers spend lots and lots of time making the compiler generate good code (which usually means "fast", but could also mean "small", or a combination of the two). In the same way, source code needs to be properly formatted for it to be efficient, i.e. easy to read and edit by programmers. This is a task which in very many cases can be automatically handled by the editor (or an add-on program).

The Java editor in Eclipse does this very well. Compilation and source code formatting is performed incrementally so there is no noticable delay while typing. However, there are a few drawbacks. One problem is with code which is commented out using block comments. Since the editor doesn't recognize it as code, it is formatted as ordinary text, making it unreadable. I usually work around this using "if (false) { … }". The other problem is that the formatter has a few blind spots. Complex boolean expressions are not indented in any structured way, and certain patterns of chained method calls are indented at the line width, instead of aligning. If you use the Builder design pattern, you'll know what I mean.

No comments: