Dear glob

17 September 2007

At some point, you likely needed to grab some files and do something with them. You probably had a vague recollection about Dir[] from reading the Pickaxe book. To refresh your memory, you crack it open and fire up irb. Ah, easy…


  irb(main):001:0> Dir['spec/**/*_spec.rb'].each do |file|
  irb(main):002:1* puts file
  irb(main):003:1> end

There’s no denying that Ruby makes this task dead simple. So simple, in fact, that you probably don’t think twice about how that nifty Dir[] method does its work. That is, unless you’re trying to implement it.

During the Rubinius sprint, I realized that having this in Rubinius would enable me to make our continuous integration spec runner much better. Now, I know that there’s a glob function in C that provides behavior similar to what you get in the shell using * and ? to match file names. There’s also a function fnmatch that wraps up some of that magic. No problem. We’ve got this nifty foreign-function interface (FFI) that Evan has graciously provided. Evan recommended I take that route first. Yep, took all of 10 minutes to hook everything up.

Of course, it wouldn’t be that interesting were this the end of the story. It’s not. Our fnmatch specs were mostly passing, but when I looked into the failing ones, I discovered something that I’d probably tried to shield my psyche from. Ruby implements its own fnmatch and glob functions. And when I say ‘implements’, it doesn’t really give you any idea of the pain and suffering involved. Do take a peek:

It doesn’t take but a minute to see that Ola Bini’s java code is extraordinarily more readable than the MRI source. But both are daunting to say the least. So, I’ve decided to take a different route.

 1   def self.fnmatch(pattern, path, flags=0)
 2     pattern = StringValue(pattern).dup
 3     path = StringValue(path).dup
 4     escape = (flags & FNM_NOESCAPE) == 0
 5     pathname = (flags & FNM_PATHNAME) != 0
 6     nocase = (flags & FNM_CASEFOLD) != 0
 7     period = (flags & FNM_DOTMATCH) == 0
 8     subs = { /\*{1,2}/ => '(.*)', /\?/ => '(.)', /\{/ => '\{', /\}/ => '\}' }
 9     
10     return false if path[0] == ?. and pattern[0] != ?. and period
11     pattern.gsub!('.', '\.')
12     pattern = pattern.split(/(?<pg>\[(?:\\[\[\]]|[^\[\]]|\g<pg>)*\])/).collect do |part|
13       if part[0] == ?[
14         part.gsub!(/\\([*?])/, '\1')
15         part.gsub(/\[!/, '[^')
16       else
17         subs.each { |p,s| part.gsub!(p, s) }
18         if escape
19           part.gsub(/\\(.)/, '\1')
20         else
21           part.gsub(/(\\)([^*?\[\]])/, '\1\1\2')
22         end
23       end
24     end.join
25     
26     re = Regexp.new("^#{pattern}$", nocase ? Regexp::IGNORECASE : 0)
27     m = re.match path
28     if m
29       return false unless m[0].size == path.size
30       if pathname
31         return false if m.captures.any? { |c| c.include?('/') }
32         
33         a = StringValue(pattern).dup.split '/'
34         b = path.split '/'
35         return false unless a.size == b.size
36         return false unless a.zip(b).all? { |ary| ary[0][0] == ary[1][0] }
37       end
38       return true
39     else
40       return false
41     end
42   end

This code is only passing 80% of our existing specs for File.fnmatch?, so the jury is still out. And I’m sure someone can make this much better. The lesson for me is that 1) Ruby’s implementation is typically not accessible (I already knew that), and 2) writing Ruby code is a good way to handle tough problems.

But then, you already knew that. ;)

UPDATE: I’ve changed the code here to reflect our current version. It’s now passing 100% of the existing specs.