lawphotog lawphotog - 6 months ago 226
Vb.net Question

regular expression to allow spaces between words

I want a regular expression that prevents symbols and only allows letters and numbers. This regex works great but it doesn't allow for spaces between words.

^[a-zA-Z0-9_]*$


For example, when using this regular expression "HelloWorld" is fine but "Hello World" does not match.

How can I tweak it to allow spaces?

Answer

tl;dr

Just add a space in your character class.

^[a-zA-Z0-9_ ]*$

 


Now, if you want to be strict...

The above isn't exactly correct. Due to the fact that * means zero or more, it would match all of the following cases that one would not usually mean to match:

  • An empty string, "".
  • A string comprised entirely of spaces, "      ".
  • A string that leads and / or trails with spaces, "   Hello World  ".
  • A string that contains multiple spaces in between words, "Hello   World".

Originally I didn't think such details were worth going into, as OP was asking such a basic question that it seemed strictness wasn't a concern. Now that the question's gained some popularity however, I want to say...

...use @stema's answer.

Which, in my flavor (without using \w) translates to:

^[a-zA-Z0-9_]+( [a-zA-Z0-9_]+)*$

(Please upvote @stema regardless.)

Some things to note about this (and @stema's) answer:

  • If you want to allow multiple spaces between words (say, if you'd like to allow accidental double-spaces, or if you're working with copy-pasted text from a PDF), then add a + after the space:

    ^\w+( +\w+)*$
    
  • If you want to allow tabs and newlines (whitespace characters), then replace the space with a \s+:

    ^\w+(\s+\w+)*$
    

    Here I suggest the + by default because, for example, Windows linebreaks consist of two whitespace characters in sequence, \r\n, so you'll need the + to catch both.

Still not working?

Check what dialect of regular expressions you're using.* In languages like Java you'll have to escape your backslashes, i.e. \\w and \\s. In older or more basic languages and utilities, like sed, \w and \s aren't defined, so write them out with character classes, e.g. [a-zA-Z0-9_] and [\f\n\p\r\t], respectively.

 


* I know this question is tagged , but based on 25,000+ views, I'm guessing it's not only those folks who are coming across this question. Currently it's the first hit on google for the search phrase, regular expression space word.