trnelson trnelson - 2 months ago 18
ASP.NET (C#) Question

Regex for ANY string except "www"? (subdomain)

I was wondering if someone out there could help me with a regex in C#. I think it's fairly simple but I've been wracking my brain over it and not quite sure why I'm having such a hard time. :)

I've found a few examples around but I can't seem to manipulate them to do what I need.

I just need to match ANY alphanumeric+dashes subdomain string that is not "www", and just up to the "."

Also, ideally, if someone were to type "www.subdomain.domain.com" I would like the www to be ignored if possible. If not, it's not a huge issue.

In other words, I would like to match:


  • (test).domain.com

  • (test2).domain.com

  • (wwwasdf).domain.com

  • (asdfwww).domain.com

  • (w).domain.com

  • (wwwwww).domain.com

  • (asfd-12345-www-bananas).domain.com

  • www.(subdomain).domain.com



And I don't want to match:


  • (www).domain.com



It seems to me like it should be easy, but I'm having troubles with the "not match" part.

For what it's worth, this is for use in the IIS 7 URL Rewrite Module, to rewrite for all non-www subdomains.

Thanks!

Answer

Is the remainder of the domain name constant, like .domain.com, as in your examples? Try this:

\b(?!www\.)(\w+(?:-\w+)*)(?=\.domain\.com\b)

Explanation:

  • \w+(?:-\w+)* matches a generic domain-name component as you described (but a little more rigorously).

  • (?=\.domain\.com\b) makes sure it's the first subdomain (i.e., the last one before the actual domain name).

  • \b(?!www\.) makes sure it isn't www. (without the \b, it could skip over the first w and match just the ww.).

In my tests, this regex matches precisely the parts you highlighted in your examples, and does not match the www. in either of the last two examples.


EDIT: Here's another version which matches the whole name, capturing the pieces in different groups:

^((?:\w+(?:-\w+)*\.)*)((?!www\.)\w+(?:-\w+)*)(\.domain\.com)$

In most cases, group $1 will contain an empty string because there's nothing before the subdomain name, but here's how it breaks down www.subdomain.domain.com:

$1: "www."
$2: "subdomain"
$3: ".domain.com"
Comments