Simon Simon - 4 months ago 8
Javascript Question

Regex expression that matches #tags but not hex codes

I'm trying to write a regex expression in JavaScript that will match any string as long as it starts with a space, then an octothorpe(#), then characters. However, I would like this expression to exclude hexadecimal codes.

I have this expression for capturing #tags:

/([\s]#[^<\s]+)/g


and I have an expression that captures hex codes in the format (#xxxxxx) that my larger program will be receiving them:

/(#[0-9a-fA-F]{6,6}\b)/g


but I do not know how to put them together so that I end up with matches that are described by the first expression but not by the second.

I'd like to do everything in a single regex statement. If this is not possible, I would like to know of a way to get all non-hex strings that begin with # using a combination of regex and JavaScript functions. I'm using jQuery and Backbone.js if that helps.

Extra Credit:

What is the difference between this :

/(#[0-9a-fA-F]{6,6}\b)/g


and this:

/(#[0-9a-fA-F]{6}\b)/g


I've been using https://regex101.com to write and test my expressions and both seem to give the same results.

Answer

You can use the second regular expression as a negative look-ahead ( (?! ) inside your first:

(?:\s|^)(#(?![\da-fA-F]{6}\b)[^<\s]+)

I added at the start the possibility to start a string immediately with a hash, without requiring the space.

See Regex tester

NB: {6,6} is indeed exactly the same as the short-cut for that: {6}. As stated on regular-expressions.info:

There's an additional quantifier that allows you to specify how many times a token can be repeated.

The syntax is {min,max}, where min is zero or a positive integer number indicating the minimum number of matches, and max is an integer equal to or greater than min indicating the maximum number of matches. [...] Omitting both the comma and max tells the engine to repeat the token exactly min times.

Comments