ManuKaracho ManuKaracho - 1 year ago 50
Javascript Question

How to do this condintional mention RegEx properly?

I insert mentions into textareas with this markup:

@User Name Can Have Spaces(userId: number)

@Javier Hernadez(5)

I have a JSON-List of Users:
var users = [{name: 'Javier Hernandez',id: 5},{...}];

Now I want to convert the markup to a plain HTML code:

var myHtml = "..."; // loaded externally and contains the markup
var matches = myHtml.match(/@([a-z\d_]+)/ig);

But that does not work for usernames with spaces, and I won't get the user ID.

I would now iterate over the matches, check if the user in the markup exists in my
array and replace the matches in a template string

<a href="path/to/user/{id}>{name}</a>

How would I do that properly?

Answer Source

First, an analysis of your current regex and why it doesn't work :

  • @ is the literal @ character, nothing to see here
  • [...] is a character class. It will match any (one) of the characters it contains
  • [a-z\d_] is a character class composed of every lowercase letter, every digit (represented by their own character class \d) and the underscore
  • + is a quantifier which means the token it modifies must be matched at least once and can be matched more than once. Here it applies to the previous character class
  • /pattern/flags is one of Javascript's regular expressions syntax
  • i is the case-insensitive flag. In this case, it means the character class will also match uppercase letters although it only contains lowercase letters
  • g is the global flag. It means that the regex will attempt to match multiple results rather than returning on the first encountered.

So you're trying to match @User Name Can Have Spaces(userId: number), but your regex does not match spaces as you mentionned, nor parenthesis.

You could add these three characters to the character class, as follows :

/@([a-z\d_ ()]+)/gi

However, a better traduction of what you're trying to match, in my opinion at least, would be the following :

/@[a-z\d_ ]+\(\d+\)/gi

Where we match an username that can contain letters, digits, underscores and spaces, followed by a space, opening parenthesis, number and closing parenthesis. The parenthesis must be escaped so they are understood as the literal character rather than an regex group.

If you want to extract easily the username and the user id separately, you might want to use the following, where they are each grouped in their respective group :

/@([a-z\d_ ]+)\((\d+)\)/gi

Here's a regex101 link to test it.