Friday, October 31, 2008

Ruby performance bug

I got a bugreport the other day about a piece of Ruby code which was executing much slower than it really should. It's basically a piece of code which unpacks a zip-archive using rubyzip. Since I like TDD, I started by writing a test-case which would exhibit the behavior: create a zip-archive with 500 smallish files (around 100 bytes). Unpacking this took in the order of 10 seconds (as a baseline, unpacking the same zipfile using "unzip" takes .03 seconds). There are a lot of other things going on (YAML-parsing, for example), so we don't need to be on par with unzip, but 10 seconds is way too much.

I managed to narrow down the problem to a function which is called to print the name of each file as it is being unpacked. Removing the call to this function cuts the time from 10 seconds to 3 seconds. But why? The code looks essentially like this:

if @@current_obj != @obj
# do stuff
@@current_obj = @obj
end

If I comment out the @@current_obj = @obj line, things go fast (~3 seconds), and if I keep the assignment, it takes ~10 seconds. Surely a simple assigment can't take (10 - 3)/500 = 14 msecs?

If turned out that the real problem was the comparison. My guess is that Ruby attempts to make a deep comparison (@obj isn't very large, but making a deep comparison might take some time). Also, when I removed the assignment at the end of the if-statement, the comparison was always made with nil, which should be very fast (and constant).

So, I changed the comparison to
    
if @@current_obj.object_id != @obj.object_id
# do stuff
end

which prevents Ruby from making a deep comparison.

Friday, October 24, 2008

Merging in Subversion

Now I've done my second Subversion merge using the new 1.5 mergeinfo, and both times I've hit the same problem when trying to "reintegrate" the branch. Subversion complains:
   Some revisions have been merged under it that have not been merged
into the reintegration target; merge them first, then retry.
This post on CollabNet's Subversion blog helped me out. The problem boils down to the fact that there is "subtree" mergeinfo in the branch and/or trunk which prevents the merge from completing. Usually you can safely delete the subtree merge info:
   svn propdel svn:mergeinfo <subtree> -R
I honestly don't understand why this issue arises or why Subversion is not able to resolve it. Despite having mergeinfo is much better than not having it, I can't help feeling that this is a kludge to handle a feature which Subversion wasn't at all designed for.

Tuesday, October 7, 2008

New shiny colors

Time for some new colors on this blog...

The 500-mile email

Very funny anecdote on mysteriously failing emails: The case of the 500-mile email. (It's from 2002, but it was the first time I heard about it.)

Microsoft bashing day (again)

For some strange reason, I still get surprised at how bad Microsoft is at designing user-interfaces. And now I'm not even talking about the graphical ones. The command-line has long been viewed at Microsoft as something inherently evil: everything must have a graphical interface to be deemed usable; command-line tools are by definition user-hostile. Which, of course, is not true. Badly designed command-line tools are user-hostile, just as badly designed graphical user-interfaces are user-hostile. And Microsoft is pretty good at doing both.

Microsoft does not lack good interface designers. Where they really put in the effort, the result is sometimes really good. For example, I like the new Office-toolbar and the interactive display of keyboard shortcuts (even if it's a little annoying to have to relearn the entire user-interface. Every single command seems to have been moved), and the new dialogs with big descriptive buttons instead of just yes/no buttons. Also, I like the new Google Chrome. No wait, wrong company.

But the command-line tools seldom get any attention (with the possible exception of PowerShell, which I haven't tried yet). When will we get a new Windows console, for example? With a real font selection dialog, proper resizing, etc.?

The tool which actually caused my bloodpressure to exceed the "must-blog-about-it" point, is CACLS. CACLS is a tool for displaying or modifying the ACLs for files and folders. I wanted to use it to be able to remove a write-protected file (a file checked out by svn with the svn:needs-lock attribute), and I wanted to do it in an automated test.

Guess what: CACLS will ask the user for confirmation when changing permissions, and there is no option to turn it off. What is the BLOODY POINT of having a command-line tool which cannot be used from a script or bat file? The workaround is apparently to do "echo Y | cacls ...", but still. How hard can it be to add a single flag to switch on/off the question?

Now I'll go back to work and try to figure out how to prevent interactive dialogs blocking my automated build.