In the Index file, we have primary, secondary and tertiary lines. In these lines we have page numbers with the ranges like:
nutrients in, 223-234
reproductive phase of, 115-116,
nutrients in, 223-34
reproductive phase of, 115-16,
We start off finding a
digits-digits string where the length of both sets of digits is the same, but without consuming any of it. This involves a lookahead looking for a balanced sets of digits (see http://perldoc.perl.org/perlfaq6.html#Can-I-use-Perl-regular-expressions-to-match-balanced-text? for a good explanation) and a negative lookahead to make sure no more digits follow (so we don't simplify
120-34) and also that it isn't something like 11-12-3 which we don't want to try to handle. Note that it is ok for there to be extra digits before the balanced digits; this allows us to further simplify partially simplified ranges like
Once we've done that, we try to find as many digits from the first group as possible where there are at least some digits remaining and the digits in the second group start off the same (using the backreference
\K is used to adjust where the substitution starts so that the replacement can remain empty.
/a is used to make
\d just mean 0-9, not any other kinds of digits.