Alisa Alisa - 1 month ago 17
Java Question

Java Regex replace : and / except the domain name in the url to white space

I have a long string, including lots of

:
and
/
. It also includes urls.

I want to replace all
:
and
/
but the domain name (e.g.,
http://example.com
) of the url's to white space.

So
link:http://example.com/test/page.html
will become
url http://example.com test page.html
.

I tried
replaceAll("[://]", " ")
but it also replaces
:
and
/
in
http://example.com
to white space.

Answer

Since you need to keep some pattern in one context and replace with something else in the other, you can use the regex to match and capture URLs (and anything you want to "protect") and just match what you need to remove. Then, use Matcher#appendReplacement() to check if the capture took place, and use the appropriate replacement accordingly.

The regex can be similar to (\\bhttps?://\\S*)|[:/] where (\\bhttps?://\\S*) matches and captures into Group 1 a http:// or https:// and then 0+ non-whitespace chars, and [:/] matches either : or / (to be replaced with a space).

Here is a sample code:

String fileText = "http://example.com//foo/bar  1: 2/";
String pattern = "(\\bhttps?://\\S*)|[:/]";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(fileText);

StringBuffer sb = new StringBuffer();
while (m.find()) {
    if (m.group(1) != null)
        m.appendReplacement(sb, m.group(1));
    else
        m.appendReplacement(sb, " ");
}
m.appendTail(sb); 
System.out.println(sb);

See the Java demo.