sideroxylon sideroxylon - 9 days ago 5
Javascript Question

Regex for alphanumeric string with a maximum number of spaces

I need a JS regex to match a string based only on a known first and last sub-string and number of spaces - and I don't care about the length or the nature of what is between the first and last sub-strings (other than the exact number of spaces).

The following is a possible start string (from which I get the first and last sub-strings and the number of spaces):

cat apple dog mouse


From this, I now know the string starts with
cat
, ends with
mouse
and contains exactly 3 spaces (they could be be anywhere between the ends, but they will not be consecutive).

The string I need to match against might be:

catfish mouse mouse dormouse mouse mouse


or
cat mouse mouse mouse mouse mouse


So, what I need to match would be, in the first case
catfish mouse mouse dormouse
, and in the second case
cat mouse mouse mouse
- in both cases a string starting with
cat
, ending with
mouse
and containing exactly 3 spaces. At the moment, all my attempts match the entire sample string above, not just from
cat
to the third
mouse
. Here is my latest failure:

cat(?:(?![\s]{4,}).*)mouse


I have a strong suspicion I'm overthinking this - but thanks for any suggestions.

Answer

You can write a regex without look aheads do do this.

Example

\bcat(?:[^\s]*\s){3}[^\s]*mouse\b

Regex Demo


What it does?

  • \b Matches a word boundary. This ensures that it doesn't match strings that end as mousexyz
  • cat Matches cat at the start of the string
  • (?:[^\s]*\s){3}
    • [^\s]* Matches anything other than a space. So this one matches a single word and the following \s matches the space after the word.
    • {3} Makes sure that the single word with space is repeated 3 times.
  • [^\s]* Matches any character other than space after the 3 spaces.
  • mouse Matches mouse at the end of the string

Why doesn't cat(?:(?![\s]{4,}).*)mouse work?`

  • (?![\s]{4,}) This negative lookahead, will check if cat is not immediately followed by 4 spaces. Which is true so it matches all the input strings.