Jack Jack - 5 months ago 22
Ruby Question

Regexp Scan results

does anybody knows why I am getting different results depending on the order of the patterns?

list1 = ["AA1", "AA2","AA", "AA+"]
list2 = ["AA1", "AA2","AA+", "AA"]
results1 = "somethin with AA+ in it".scan(Regexp.union(list1))
results2 = "somethin with AA+ in it".scan(Regexp.union(list2))


Results1 outputs "AA"
Results2 outputs "AA+"

I may be misunderstandig how scan works, but I was expecting it to return every occurrence, hence both "AA" and "AA+". Also I don't get why the ouptut changes depending on the order of the strings used.

Answer

In an alternation group in NFA regex, the first left-most branch "wins". See Alternation with The Vertical Bar or Pipe Symbol for a more detailed explanation.

The regexes you have are

Regex 1: (?-mix:AA1|AA2|AA|AA\+)
Regex 2: (?-mix:AA1|AA2|AA\+|AA)

If you use the first regex, you get AA because |AA| branch matches first, and the others are not tested against the input, the match is returned and the regex index advances.

The second regex yields AA+ because the |AA\+| matches first, and the match is returned, |AA is not even tested.