Dear glob
At some point, you likely needed to grab some files and do something with them. You probably had a vague recollection about Dir[]
from reading the Pickaxe book. To refresh your memory, you crack it open and fire up irb. Ah, easy…
irb(main):001:0> Dir['spec/**/*_spec.rb'].each do |file|
irb(main):002:1* puts file
irb(main):003:1> end
There’s no denying that Ruby makes this task dead simple. So simple, in fact, that you probably don’t think twice about how that nifty Dir[]
method does its work. That is, unless you’re trying to implement it.
During the Rubinius sprint, I realized that having this in Rubinius would enable me to make our continuous integration spec runner much better. Now, I know that there’s a glob
function in C that provides behavior similar to what you get in the shell using * and ? to match file names. There’s also a function fnmatch
that wraps up some of that magic. No problem. We’ve got this nifty foreign-function interface (FFI) that Evan has graciously provided. Evan recommended I take that route first. Yep, took all of 10 minutes to hook everything up.
Of course, it wouldn’t be that interesting were this the end of the story. It’s not. Our fnmatch specs were mostly passing, but when I looked into the failing ones, I discovered something that I’d probably tried to shield my psyche from. Ruby implements its own fnmatch and glob functions. And when I say ‘implements’, it doesn’t really give you any idea of the pain and suffering involved. Do take a peek:
- The Ruby source (stable 1.8.6)
- The JRuby source
It doesn’t take but a minute to see that Ola Bini’s java code is extraordinarily more readable than the MRI source. But both are daunting to say the least. So, I’ve decided to take a different route.
1 def self.fnmatch(pattern, path, flags=0)
2 pattern = StringValue(pattern).dup
3 path = StringValue(path).dup
4 escape = (flags & FNM_NOESCAPE) == 0
5 pathname = (flags & FNM_PATHNAME) != 0
6 nocase = (flags & FNM_CASEFOLD) != 0
7 period = (flags & FNM_DOTMATCH) == 0
8 subs = { /\*{1,2}/ => '(.*)', /\?/ => '(.)', /\{/ => '\{', /\}/ => '\}' }
9
10 return false if path[0] == ?. and pattern[0] != ?. and period
11 pattern.gsub!('.', '\.')
12 pattern = pattern.split(/(?<pg>\[(?:\\[\[\]]|[^\[\]]|\g<pg>)*\])/).collect do |part|
13 if part[0] == ?[
14 part.gsub!(/\\([*?])/, '\1')
15 part.gsub(/\[!/, '[^')
16 else
17 subs.each { |p,s| part.gsub!(p, s) }
18 if escape
19 part.gsub(/\\(.)/, '\1')
20 else
21 part.gsub(/(\\)([^*?\[\]])/, '\1\1\2')
22 end
23 end
24 end.join
25
26 re = Regexp.new("^#{pattern}$", nocase ? Regexp::IGNORECASE : 0)
27 m = re.match path
28 if m
29 return false unless m[0].size == path.size
30 if pathname
31 return false if m.captures.any? { |c| c.include?('/') }
32
33 a = StringValue(pattern).dup.split '/'
34 b = path.split '/'
35 return false unless a.size == b.size
36 return false unless a.zip(b).all? { |ary| ary[0][0] == ary[1][0] }
37 end
38 return true
39 else
40 return false
41 end
42 end
This code is only passing 80% of our existing specs for File.fnmatch?
, so the jury is still out. And I’m sure someone can make this much better. The lesson for me is that 1) Ruby’s implementation is typically not accessible (I already knew that), and 2) writing Ruby code is a good way to handle tough problems.
But then, you already knew that. ;)
UPDATE: I’ve changed the code here to reflect our current version. It’s now passing 100% of the existing specs.