Scott Scott - 1 year ago 69
Javascript Question

Regex: parsing GitHub usernames (JavaScript)

I'm trying to parse GitHub usernames (that start with @) from a paragraph of text in order to link them to their associated profiles.

The GitHub username constraints are:

  • Alphanumeric with single hyphens (no consecutive hyphens)

  • Cannot begin or end with a hyphen (if it ends with a hyphen, just match everything up until there)

  • Max length of 39 characters.

For example, the following text:

Example @valid hello @valid-username: @another-valid-username, @-invalid @in--valid @ignore-last-dash- [email protected] @another-valid?

The script...

Should match:

  • @valid

  • @valid-username

  • @another-valid-username

  • @in

  • @ignore-last-dash

  • @another-valid

Should ignore:

I'm getting reasonably close with JavaScript by using:

/\[email protected]((?!.*(-){2,}.*)[a-z0-9][a-z0-9-]{0,38}[a-z0-9])/ig

But this isn't matching usernames with a single character (such as @a).

Here are my tests to far:

Is the current regex efficient? And how can I match a single non-hyphen character?

Answer Source
/\[email protected]([a-z0-9](?:-?[a-z0-9]){0,38})/gi

Note: When this regex runs into a character or set of characters that can't be in a username (i.e. ., --), it matches from @ up until that stopping point. OP says that's fine so I'm rolling with it. So, if bold is the matched area (NOT the captured area):


This works by using lots of nested groups. Regex101 has a fantastic breakdown, but here's mine anyway:

  1. \B: This is a builtin means 'not a word boundary', which seems to do the trick, though it may be problematic if something like [email protected] is a valid email address. At that point, though, it's indistinguishable from the text of someone who doesn't put spaces after punctuation[1] when they start a sentence with an @reference.

    Thanks to Honore Doktorr for pointing out that negative lookbehinds don't exist in JS.

  2. @: Just the literal @ symbol. One of the few places where a character means what it is.

  3. (...): The capturing group. The way it's placed means that it won't capture the @ symbol, it'll just match it, so it's easier to get the username -- no need to get a substring.
  4. [a-z0-9]: A character class to match any letter or number. Because of the i flag, this also matches capital letters. Because it's the first letter, it must be present.
  5. (?:...): This is a noncapturing group. It wraps a block of regex in a group without capturing it as a result.
  6. -?[a-z0-9]: The second bit is a character class, like before. The first says that it can match with or without the hyphen there. This section is what makes consecutive - invalid -- if there is a -, it has to be followed by something that matches [a-z0-9].
  7. {0,38}: Match the noncapturing group between 0 and 38 times, inclusive. Combined with #4, this gives us 39 letters maximum. Anything beyond that will be ignored.
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download