David Chan David Chan - 2 months ago 5
Ruby Question

ruby regex inconsistancy

I'm trying to convert to roman numerals using gsub with back references, and I've discovered a strange inconsistency.

$ ruby -v
ruby 2.2.5p319 (2016-04-26 revision 54774) [x86_64-linux]

$ irb
irb(main):001:0> BASES = {
irb(main):002:1* 1000 => 'M',
irb(main):003:1* 500 => 'D',
irb(main):004:1* 100 => 'C',
irb(main):005:1* 50 => 'L',
irb(main):006:1* 10 => 'X',
irb(main):007:1* 5 => 'V',
irb(main):008:1* 1 => 'I'
irb(main):009:1> }
=> {1000=>"M", 500=>"D", 100=>"C", 50=>"L", 10=>"X", 5=>"V", 1=>"I"}
irb(main):010:0> BASE_KEYS = BASES.keys
=> [1000, 500, 100, 50, 10, 5, 1]
irb(main):011:0> rom = 'IIII'
=> "IIII"


above is the setup

below I am trying to identify any character that repeats 4 times,
and replace it with 1 of that character and one the next BASE character.

eg: IIII => IV

irb(main):012:0> rom.gsub(/((.)\2{3})/,
irb(main):013:1* "#{
irb(main):014:0> BASES[BASE_KEYS.select.with_index{ |bk, i|
irb(main):015:2> BASES[BASE_KEYS[i]] == $2
irb(main):016:2> }.first]
irb(main):017:0> }
irb(main):018:1" #{BASE_KEYS.select.with_index{ |bk, i|
irb(main):019:1> BASES[BASE_KEYS[i]] == $2
irb(main):020:1> }.first}
irb(main):021:1" #{
irb(main):022:0> BASES[BASE_KEYS.select.with_index{|bk, i|
irb(main):023:2> BASES[BASE_KEYS[i+1]] == $2
irb(main):024:2> }.first]
irb(main):025:0> }
irb(main):026:1" #{BASE_KEYS.select.with_index{ |bk, i|
irb(main):027:1> BASES[BASE_KEYS[i+1]] == $2
irb(main):028:1> }.first}
irb(main):029:1" "
irb(main):030:1> )
=> "\n \n I\n 1\n "


so i get the wrong answer.. (with debug info for more insights)

irb(main):031:0> rom.gsub(/((.)\2{3})/,
irb(main):032:1* "#{
irb(main):033:0> BASES[BASE_KEYS.select.with_index{ |bk, i|
irb(main):034:2> BASES[BASE_KEYS[i]] == $2
irb(main):035:2> }.first]
irb(main):036:0> }
irb(main):037:1" #{BASE_KEYS.select.with_index{ |bk, i|
irb(main):038:1> BASES[BASE_KEYS[i]] == $2
irb(main):039:1> }.first}
irb(main):040:1" #{
irb(main):041:0> BASES[BASE_KEYS.select.with_index{|bk, i|
irb(main):042:2> BASES[BASE_KEYS[i+1]] == $2
irb(main):043:2> }.first]
irb(main):044:0> }
irb(main):045:1" #{BASE_KEYS.select.with_index{ |bk, i|
irb(main):046:1> BASES[BASE_KEYS[i+1]] == $2
irb(main):047:1> }.first}
irb(main):048:1" "
irb(main):049:1> )
=> "I\n 1\n V\n 5\n "
irb(main):050:0>


admittedly my regex code is barely comprehensible, but why do i get a different result on the second invoke of the same code?

irb(main):050:0> rom
=> "IIII"


notice rom has not changed...

Answer

Your code uses $2 before evaluating the regex. After running it the first time $2 is set and the code works as intended. Consider using a block instead of a string, because your string is interpolated before the match occurs.

In the block form, the current match string is passed in as a parameter, and variables such as $1, $2, $`, $&, and $' will be set appropriately. The value returned by the block will be substituted for the match on each call.

This is consistent:

rom.gsub(/((.)\2{3})/) { |s|
  "#{
     BASES[BASE_KEYS.select.with_index{ |bk, i|
       BASES[BASE_KEYS[i]] == $2
     }.first]
    }
    #{BASE_KEYS.select.with_index{ |bk, i|
       BASES[BASE_KEYS[i]] == $2
     }.first}
    #{
     BASES[BASE_KEYS.select.with_index{|bk, i|
       BASES[BASE_KEYS[i+1]] == $2
     }.first]
    }
    #{BASE_KEYS.select.with_index{ |bk, i|
       BASES[BASE_KEYS[i+1]] == $2
     }.first}
   "
}
# => "I\n    1\n    V\n    5\n   "
Comments