MartinWebb MartinWebb - 5 months ago 22
Javascript Question

java-script Regex filtering on words

I have the following Regex:

The regex is in a bit of code in our app, I can see it splits words. It obviously removes characters such as $#* and so on. I need it to do the same thing exactly but allow the a hash tag, since the words can now have #hashtags.

"Test #words".toLowerCase().split(/\b/).filter(function(w){return w.match(/^\w+$/) }) // returns ["test", "words"]

The current Regex removes the hash, i want it to remain. So i get:

["test", "#words"]


Your "Test #words".toLowerCase().split(/\b/).filter(function(w){return w.match(/^\w+$/) }) does the following:

  • The whole string is turned to lower case
  • The string is split at any word boundary (leading and trailing, meaning Test #words is split into [,Test, #,words,])
  • The parts that match the ^\w+$ regex (1+ word chars from the start till end of string) are kept in the array.

You may use an identical matching approach to also include # with /(?:\B#)?\w+/g:

console.log("Test #words".toLowerCase().match(/(?:\B#)?\w+/g))

The pattern matches:

  • (?:\B#)? - an optional # preceded with a non-word boundary
  • \w+ - 1 or more word chars (from [a-zA-Z0-9_] ranges)

If context is not so important, use a simpler /#?\w+/g regex that will match an optional # anywhere in the string, followed with 1+ word chars.