BerggreenDK BerggreenDK - 1 month ago 7
ASP.NET (C#) Question

Top level domain from URL in C#

I am using C# and ASP.NET for this.

We receive a lot of "strange" requests on our IIS 6.0 servers and I want to log and catalog these by domain.

Eg. we get some strange requests like these:

http://www.poker.winner4ever.example.com/

http://www.hotgirls.example.com/

http://santaclaus.example.com/

http://m.example.com/

http://wap.example.com/

http://iphone.example.com/

the latter three are kinda obvious, but I would like to sort them all into one as "example.com" IS hosted on our servers. The rest isn't, sorry :-)

So I am looking for some good ideas for how to retrieve example.com from the above. Secondly I would like to match the m., wap., iphone etc into a group, but that's probably just a quick lookup in a list of mobile shortcuts.I could handcode this list for a start.

But is regexp the answer here or is pure string manipulation the easiest way? I was thinking of "splitting" the URL string by "." and the look for item[0] and item[1]...

Any ideas?

Answer

I needed the same, so I wrote a class that you can copy and paste into your solution. It uses a hard coded string array of tld's. http://pastebin.com/raw.php?i=VY3DCNhp

Console.WriteLine(GetDomain.GetDomainFromUrl("http://www.beta.microsoft.com/path/page.htm"));

outputs microsoft.com

and

Console.WriteLine(GetDomain.GetDomainFromUrl("http://www.beta.microsoft.co.uk/path/page.htm"));

outputs microsoft.co.uk