Prashant Prashant - 6 months ago 14
Python Question

How to get the home page URL link

Let's say I am on the webpage

https://company.slack.com/messages/@user1/


How could I get the URL of home page of the company/website in Java/Python which is
https://slack.com/

(in this case)

Now this seems so easy for some cases, but I want to generalise this & unable to cover all cases like that of slack/google_design/etc....

Say similar cases are:

https://www.youtube.com/watch?v=deL9VeNjcH8


Expected Output:
https://www.youtube.com


https://angel.co/weav-music?utm_source=lb


Expected Output:
https://angel.co


https://design.google.com/


Expected Output:
https://www.google.com


The code from the link below:

#include <iostream>
#include <string>

using namespace std;

int main() {
string s = "https://angel.co/weav-music?utm_source=lb";
int cnt=0;
int p;
int l=s.length();
for(int i=0;i<l;i++)
{
if(s[i]=='/' && cnt!=3)
cnt++;
if(s[i]=='/' && cnt==3){
p=i;break;}
}
cout<<s.substr(0,p);
return 0;
}


@all
Please see JonasCz's 2nd comment on his own answer that actually helped
me

Answer

You can use something like this:

URL aURL = new URL("https://company.slack.com/messages/@user1/");
System.out.println(aURL.getProtocol() + "://" + aURL.getHost());

Which prints:

https://company.slack.com

This works for other URLs too. See the docs for more details.


If you want to get only the main domain, without the subdomain (i.e. only http://slack.com), you can use Guava's InternetDomainName, eg. like this:

InternetDomainName.from("company.slack.com").topPrivateDomain().name();

The above will return slack.com.


To be complete, the whole code, in your case, would look like this:

URL aURL = new URL("https://company.slack.com/messages/@user1/");
InternetDomainName.from(aURL.getHost()).topPrivateDomain().name();
Comments