Ian Dickinson Ian Dickinson - 6 months ago 22
Ruby Question

Ruby gsub regex unexpected behaviour

I thought I knew regexes pretty well, but this has me puzzled:

irb(main):016:0> source = "/foo/bar"
=> "/foo/bar"
irb(main):017:0> source.gsub( /[^\/]*\Z/, "fubar" )
=> "/foo/fubarfubar"


As far as I can tell,
/[^\/]*\Z/
has a unique expansion to match
bar
and therefore should result in
/foo/fubar
. I can't see at all why I get
fubarfubar
as the replacement.

The replacement works if I call
sub
rather than
gsub
, so it's not a question of working around the problem but rather uncovering my misunderstanding of
gsub
.

Answer

You need to use sub as you only need to replace once at the end of the string:

source.sub( /[^\/]*\Z/, "fubar" )
       ^^^

See the IDEONE demo

The problem is most probably with the way the matches are collected, and since you pattern matches an empty string, although at the end, the last null can also be treated as a 2nd match. It is not only a Ruby issue, a similar bug is present in many other languages.

So, actually, this is what is happening:

  • [^\/]*\Z pattern matches bar and replaces it with foobar
  • Regex index is at the end of the string - yes, there is a NULL, but Ruby still sees it as a valid "string" to process and
  • [^\/]*\Z matches the NULL, and adds another foobar.

If you need to use gsub, replace * quantifier that allows matching 0 characters with + that requires at least 1 occurrence of the quantified subpattern, avoid matching 0-length strings:

source.gsub( /[^\/]+\Z/, "fubar" )
                   ^

The rule of thumb: Avoid regexps that match empty strings inside Regex replace methods!

Comments