Ramesh Ramesh - 7 months ago 33
Java Question

How to check if the subdomain is also from same domain using java

i have a list of url's i need to filter specific domain and subdomain. say i have some domains like

http://www.example.com
http://test.example.com
http://test2.example.com


I need to extract urls which from domain example.com.

Answer

I understand you are probably looking for a fancy solution using URL class or something but it is not required. Simply think of a way to extract "example.com" from each of the urls.

Note: example.com is essentially a different domain than say example.net. Thus extracting just "example" is technically the wrong thing to do.

We can divide a sample url say:

http://sub.example.com/page1.html

Step 1: Split the url with delimiter " / " to extract the part containing the domain.

Each such part may be looked at in form of the following blocks (which may be empty)

[www][subdomain][basedomain]

Step 2: Discard "www" (if present). We are left with [subdomain][basedomain]

Step 3: Split the string with delimiter " . "

Step 4: Find the total number of strings generated from the split. If there are 2 strings, both of them are the target domain (example and com). If there are >=3 strings, get the last 3 strings. If the length of last string is 3, then the last 2 strings comprise the domain (example and com). If the length of last string is 2, then the last 3 strings comprise the domain (example and co and uk)

I think this should do the trick (I do hope this wasn't a homework :D )

    //You may clean this method to make it more optimum / better
    private String getRootDomain(String url){
         String[] domainKeys = url.split("/")[2].split("\\.");
             int length = domainKeys.length;
             int dummy = domainKeys[0].equals("www")?1:0;
             if(length-dummy == 2) 
                  return domainKeys[length-2] + "." + domainKeys[length-1];
             else{
                  if(domainKeys[length-1].length == 2) {
                       return domainKeys[length-3] + "." + domainKeys[length-2] + "." + domainKeys[length-1];
                  }
                  else{
                       return domainKeys[length-2] + "." + domainKeys[length-1];
                  }       
             }

    }