Friday, October 31, 2008

Ruby performance bug

I got a bugreport the other day about a piece of Ruby code which was executing much slower than it really should. It's basically a piece of code which unpacks a zip-archive using rubyzip. Since I like TDD, I started by writing a test-case which would exhibit the behavior: create a zip-archive with 500 smallish files (around 100 bytes). Unpacking this took in the order of 10 seconds (as a baseline, unpacking the same zipfile using "unzip" takes .03 seconds). There are a lot of other things going on (YAML-parsing, for example), so we don't need to be on par with unzip, but 10 seconds is way too much.

I managed to narrow down the problem to a function which is called to print the name of each file as it is being unpacked. Removing the call to this function cuts the time from 10 seconds to 3 seconds. But why? The code looks essentially like this:

if @@current_obj != @obj
# do stuff
@@current_obj = @obj
end

If I comment out the @@current_obj = @obj line, things go fast (~3 seconds), and if I keep the assignment, it takes ~10 seconds. Surely a simple assigment can't take (10 - 3)/500 = 14 msecs?

If turned out that the real problem was the comparison. My guess is that Ruby attempts to make a deep comparison (@obj isn't very large, but making a deep comparison might take some time). Also, when I removed the assignment at the end of the if-statement, the comparison was always made with nil, which should be very fast (and constant).

So, I changed the comparison to
    
if @@current_obj.object_id != @obj.object_id
# do stuff
end

which prevents Ruby from making a deep comparison.

No comments: