All shiny and new

12 February 2009

UPDATE 2.0: You really did see the update below, right? You’re getting Charlie all worried with your enthusiasm for Rubinius.

UPDATE: Ahem, you should probably also read: This is NOT cold fusion. No, it’s not April 1st. Sorry about that. Are you still excited? Read on!

It’s a pattern I’m fairly familiar with now. Evan will be pondering an issue with Rubinius. I’ll catch wind of it when he starts asking some questions of smart people, reading academic CS papers, other implementation’s code, and tossing out some “what if…” questions. Next thing you know, he’s frenetically churning out code. Suddenly, Rubinius is much better, and in this case, faster.

Well, it’s happened again and the preliminary results are outstanding_. A couple weeks ago, Evan began coding some changes to the way the Rubinius bytecode interperter works. He changed the stackless execution architecture that implemented an optimized kind of spaghetti stackstack to use the C stack more directly and naturally. This better enables the CPU optimizations of the past dozen years to work. It also significantly simplifies the code for our FFI, C-API for C extensions, JIT, and for potentially leveraging LLVM much more effectively. This change also brings native threads, and a much better GC for the mature generation is also in the works.

Now, for some details. Again, these results are preliminary. There is still a lot of breakage on the stackfull branch but MSpec is already running and many of the CI specs run. I’ll be getting a new CI set in place today and we’ll get the remaining breakage fixed quickly (don’t ya just love those specs).

Here’s some numbers for compiling and running the String specs.

First, on the Rubinius master branch:

    Finished in 25.829773 seconds

    69 files, 763 examples, 5632 expectations, 0 failures, 0 errors

Now, on the Rubinius stackfull branch:

    Finished in 5.834874 seconds

    69 files, 754 examples, 5563 expectations, 6 failures, 19 errors

Here’s the numbers for running after the specs have been compiled.

Again, on the master branch:

    Finished in 5.101799 seconds

    69 files, 763 examples, 5632 expectations, 0 failures, 0 errors

And now the stackfull branch:

    Finished in 1.564942 seconds

    69 files, 754 examples, 5563 expectations, 6 failures, 19 errors

I’ll let that sink in a bit…

The numbers for Hash with compilation are similar.

Master:

    Finished in 5.379050 seconds

    48 files, 195 examples, 425 expectations, 0 failures, 0 errors

Stackfull:

    Finished in 1.295544 seconds

    48 files, 193 examples, 421 expectations, 0 failures, 0 errors

That’s right, between 4.1 and 4.4 approaching 2 times faster (see the UPDATE above). And we are just getting started. The significant GC changes are not in yet. We are not yet doing any significant optimizations in the compiler, no profile-directed optimizations at runtime, and our nascent JIT is not hooked up by default. As I said at the outset, these optimizations are made easier by this architecture change.

While I’m breaking the news, Evan deserves the credit for the architecture decisions and generally being courageous enough to try and learn (some would say fail) and try again. Some have doubted that the lofty goals Rubinius has set are realistic. Doubters have a seat.

If you want to try this at home, clone the Rubinius Github repository and do the following:

    $ git branch --track stackfull origin/stackfull
    $ rake build
    $ bin/mspec ci core/string

Thanks to Engine Yard for trusting in Evan’s excellent judgment and system architecture talents and in all our hard work even if it doesn’t look immediately relevant. The path is clear. The goods are in the truck and they will be delivered.