nelac123 nelac123 - 4 days ago 5
Java Question

Parsing a URL using a regular expression

I have been trying to parse an address from a string and have had partial success for select strings....

Currently I have

Pattern regex = Pattern.compile("[/].*[a-zA-Z](?=\/|:|)", Pattern.DOTALL)


On the input string
https://www.google.com/
the current pattern gets me
//www.google.com
(which is somewhat correct) however when I try the input string
https://www.google.com/search?q=Regular+Expressions&num=1000
it gives me
//www.google.com/search?q=Regular+Expressions&num


What I am trying to do is parse the address so that it ends before
:
,
/
, or whitespace

I did also come up with

Pattern regex = Pattern.compile("[.*/][^/][a-z].*[a-zA-Z](?=\/|:|)", Pattern.DOTALL)


and it works (partially) with
https://google.com:80
giving me
/google.com
.

What am I doing wrong?

Answer

Try this regex ^.*?\/\/([^:\/\s]+), the part you're searching is stored in group 1.

Pattern pattern = Pattern.compile("^.*?\\/\\/([^:\\/\\s]+)");
Matcher matcher = pattern.matcher("your input url");
while (matcher.find()) {
    System.out.println("Domain: " + matcher.group(1));
}

EDIT: Fixed the extra backslash issue by matching // before the expression.