I was wondering if someone out there could help me with a regex in C#. I think it's fairly simple but I've been wracking my brain over it and not quite sure why I'm having such a hard time. :)
I've found a few examples around but I can't seem to manipulate them to do what I need.
I just need to match ANY alphanumeric+dashes subdomain string that is not "www", and just up to the "."
Also, ideally, if someone were to type "www.subdomain.domain.com" I would like the www to be ignored if possible. If not, it's not a huge issue.
In other words, I would like to match:
Is the remainder of the domain name constant, like
.domain.com, as in your examples? Try this:
\w+(?:-\w+)* matches a generic domain-name component as you described (but a little more rigorously).
(?=\.domain\.com\b) makes sure it's the first subdomain (i.e., the last one before the actual domain name).
\b(?!www\.) makes sure it isn't
www. (without the
\b, it could skip over the first
w and match just the
In my tests, this regex matches precisely the parts you highlighted in your examples, and does not match the
www. in either of the last two examples.
EDIT: Here's another version which matches the whole name, capturing the pieces in different groups:
In most cases, group
$1 will contain an empty string because there's nothing before the subdomain name, but here's how it breaks down
$1: "www." $2: "subdomain" $3: ".domain.com"