Ryan Liebert Ryan Liebert - 3 months ago 20
Perl Question

Perl regex matching: Match string with single word only, how to use negative lookbehind lookahead as well

I'm genuinely exhausted in trying to get this regex to work. I'm using perl regex in sas. If anyone could help me I'd greatly appreciate it. I have three questions, consider the following lines of text (the line numbers are only there for reference):

1 weight
2 weightchange
3 weight change
4 weight percentile
5 change weight
6 percentile weight
7 **** weight pre op
8 water weight
9 weight percentile
10 myocardial infarction



  1. How would I use regex to match
    1
    but NOT
    2-9
    ? (is the a way to match a word that is at the beginning and end simultaneously?)

  2. How would I use regex and negative lookahead / lookbehind assertions to match
    1
    and then explicitly exclude
    2-6
    ?

  3. How would I modify the regex suitable to solve Q2 to then exclude
    7
    and
    8
    ?



Bonus question: How long did it take you to get good at using regex?

Thank you!

Answer

1) @Barmar mentioned it already: ^weight$ is fine for doing what you're asking. Are you sure that's what you want though, i.e. "weight" on a line by itself?

2) You could do something like /(?<!percentile )(?<!change )weight(?! ?change| percentile)/. The negative lookbehinds can't be written as (?<!change |percentile ) as the RE engine does not support variable width lookbehind but as the matches are zero-width you can just list them separately. All that looks awfully specific though. It would fail e.g. if your text contained these words separated by a line break.

3) Just add another negative lookbehind for **** and water.

I'm not all that good at regex :) And I never sat down and decided to learn the ins and outs, that's pretty useless if you don't have a use case; you'll just forget it in a short while. Now that I've been using regexen for over 20 years I'm still learning new things, like how to match Unicode properties just a few weeks ago.