ssr1012 - 1 year ago 52

Perl Question

In the Index file, we have primary, secondary and tertiary lines. In these lines we have page numbers with the ranges like:

`nutrients in, 223-234`

reproductive phase of, 115-116,

It should be

`nutrients in, 223-34`

reproductive phase of, 115-16,

It may be three digits or above... Could you please any one help me on this one.

Answer Source

```
s/(?=(\d(?:-|(?1))\d)(?![\d-]))(\d+)\d+-\K\2//ga
```

We start off finding a `digits-digits`

string where the length of both sets of digits is the same, but without consuming any of it. This involves a lookahead looking for a balanced sets of digits (see http://perldoc.perl.org/perlfaq6.html#Can-I-use-Perl-regular-expressions-to-match-balanced-text? for a good explanation) and a negative lookahead to make sure no more digits follow (so we don't simplify `120-1234`

into `120-34`

) and also that it isn't something like 11-12-3 which we don't want to try to handle. Note that it is ok for there to be extra digits before the balanced digits; this allows us to further simplify partially simplified ranges like `123-24`

.

Once we've done that, we try to find as many digits from the first group as possible where there are at least some digits remaining and the digits in the second group start off the same (using the backreference `\2`

). `\K`

is used to adjust where the substitution starts so that the replacement can remain empty. `/a`

is used to make `\d`

just mean 0-9, not any other kinds of digits.