Casey B. Casey B. - 1 year ago 124
Java Question

Regex won't match whitespace character with [\r\n\t\f\s]

This is likely a very simple fix but I can't figure it out!

I'm trying to match (up to) 3 capitalized words in a row given the following text.

Russell Lake West
. The match should include all 3 words.

This regex will match the first 2 words but not the third (demo here):


This regex will match all 3 words, but I had to copy/paste the whitespace between
for it to work (demo here):

(([A-Z][a-z'-]+)\s{0,2}([A-Z][a-z'-]+)? \s{0,2}([A-Z][a-z'-]+)?)

^ pasted it here

So I assumed that maybe the whitespace isn't being treated as whitespace, but perhaps a newline character or similar, so I tried this (demo here):


But it doesn't recognize any of those characters before
, thus returning no results.

Why can't regex101 or Java recognize this apparent whitespace between
? What's a reliable way to handle this?

Answer Source

There are many kinds of spaces. The one you are using in your demo is non-breaking one (indexed as 160 in Unicode table) which doesn't belong to \s (whitespaces character class) as it doesn't represent place on which we can expect text to be split into separate parts like lines.

To match it you can use \p{Zs} class.
You can also combine both \s and \p{Zs} classes with [\\p{Zs}\\s].

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download