Caveat Lector

20 February 2009

I really did double-check this time and I won’t be making any wild claims here. Sorry to disappoint.

We’re going to be running Antonio’s Ruby Benchmark Suite daily to track our progress on performance in Rubinius. The current RBS is a bit of a beast so I imported the files into the Rubinius repository and did some refactoring. You can read the details and up-vote that if you’d like to see this merged back.

Now, for some baseline RBS results. If you want to follow along at home, here’s what I did. I generated these by running the rake bench task using the VM option (see the benchmark/utils/README in the Rubinius repository) for Rubinius on the stackfull and master branch and for MRI using the version installed on Debian lenny, 1.8.7p22. The system is a dual Intel® Xeon™ CPU 2.40GHz. Then I ran the rake bench:to_csv task, imported the CSV file into Google Docs, added the comparison columns and colors, and exported to PDF.

Here’s what I got. The green is faster, the red is slower. The reported time is the minimum time recorded in five “iterations” of each benchmark per input. The maximum time allowed to run five iterations is 300 seconds, or an average of 60 seconds per iteration.

A few notes about these numbers:

  • We’re still fixing the breakage on the stackfull branch, so it is not surprising, for instance, that all the thread benchmarks errored out. The new native thread support is not 100% done.
  • There are a couple speed regressions on the stackfull branch, most minor. We’ll fix those.
  • Most of the benches do run on the stackfull branch.
  • On most of the benches that run slower in stackfull than MRI, we’re 2x or less slower than MRI.
  • We are a lot faster than MRI on quite a few benchmarks.
  • Rubinius on either branch does quite well relative to MRI on benches that MRI times-out on for certain inputs.

Perhaps the biggest point about the stackfull branch is that we haven’t done much optimization at all. Evan’s been coding in the basic new interpreter architecture, fixing the GC interaction, adding the native threading. We’re fixing breakage now so we can get this merged into the master branch. The JIT is not hooked up. The new GC work is not done. There is no inlining. In other words, there is lots of head room. And that is the key point. You can’t just “make it faster”. Architecture is crucial. Since RailsConf 2008, we’ve been working hard to lay the architectural foundations. With those (and the switch away from stackless), we can start focusing on the real dynamic language optimizations.

While the benchmarks tell part of the story, there’s another part that is even more interesting IMO. And this is the part that got me so excited I, um, well I just got excited

The two biggest pieces of Ruby software that we most often run are the Rubinius compiler and the RubySpecs. The RubySpecs are much more “real-world” than these benchmarks. Here are the results of two complete CI runs on master and stackfull. Note that we are not quite running all the basic CI specs on stackfull, but we’ll figure in that difference in our calculations below.

First, on master:

  $ bin/mspec ci --gc-stats
  rubinius 0.10.0 (ruby 1.8.6) (f4c5576c4 12/31/2009) [i686-apple-darwin9.6.0]

  Finished in 131.248169 seconds

  1430 files, 6927 examples, 23006 expectations, 0 failures, 0 errors

  Time spent in GC: 51.6s (39.3%)

And then on stackfull:

  $ bin/mspec ci --gc-stats
  rubinius 0.11.0-dev (ruby 1.8.6) (e7b6a2d56 12/31/2009) [i686-apple-darwin9.6.0]

  Finished in 66.357996 seconds

  1349 files, 6298 examples, 21344 expectations, 0 failures, 0 errors

  Time spent in GC: 12.7s (19.1%)

Let’s calculate how we do in expectations per second:

  $ irb
  >> master = 23006 / 131.248169
  => 175.286254850534
  >> stackfull = 21344 / 66.357996
  => 321.649255351232
  >> stackfull / master
  => 1.83499416782851

So, compiling and running the specs is about 1.8 times faster on stackfull. This is upside down from the normal results. Normally, we do better on the micro benchmarks and see that invert on “macro” benchmarks. On the RBS benches, stackfull is not 1.8 times faster than master. If I average the “x Master” column, I get 1.39.

There was something else in those spec run numbers I wanted to talk about… oh yeah, GC stats. We have a very simple GC timer stat right now. I’m going to be adding a few more stats. But what we see here is that the overall percentage of time spent in GC drops by half in stackfull. Even so, 19% is too much time to spend in GC. We expect to drop that by half again. Basically, leaning more on structures alloca'd on the C stack reduces a lot of pressure on the GC.

Some would toss out that it’s not hard to be faster than MRI. Perhaps. But it is an accomplishment to write a reasonably good VM, garbage collector, compiler, and Ruby standard library without importing anyone else’s code. And, lest we forget, that is two VM’s in about 27 months of a public project.

Some would also question the sanity of writing a VM and garbage collector when crazy smart people do things like that. Well, crazy smart people write papers that reasonably smart people can read and understand. From the benchmark result above, that is working pretty well.

Here’s the point: Don’t ever let anyone tell you that something is a bad idea. Make your own decisions. We probably wouldn’t have Ruby itself if Matz fretted over whether Larry Wall or Adele Goldberg were smarter than he. My most recent favorites in this space: Factor, Clojure, and yes, tinyrb.

We’re working frantically to get the stackfull branch breakages fixed and the branch merged back into master. Feel free to poke around and ask questions.