Jsmith Jsmith - 1 year ago 89
Java Question

Java replaceAll do not replace string

I am parsing through some XML and sanitizing some fields.

I'm trying to do the following in Java:

nameField = nameField.replaceAll("[^a-zA-Z\\d\\s\\.,'&]", "");

I do not want to replace any letters of the alphabet, any number, any whitespace, any period, any comma, any single quote or (this is where my issue is) the literal string

But I do want to replace occurrences of a single
or a single

But obviously my Regex as it sits won't work. It'll leave in all
and all

For example, say the string of
K&W@#9$9(AR;.0 O&
is found, my expected result would be:
KW99AR.0 O&

How can I achieve this?

Answer Source

Why don't you simplify your regular expression and just go with a lookahead/lookbehind:

//                  |"&" not followed by "amp;"
//                  |          | or
//                  |          | ";" not preceded by "&amp"
nameField.replaceAll("&(?!amp;)|(?<!&amp);", "");

The output for "K&W@#9$9(AR;.0 O&amp;" would be:

KW@#9$9(AR.0 O&amp;


Then, you can chain this with a cleanup, leaving your desired characters only. Here, I added the ; and & to the exclude list, since they're already cleaned up when "standalone" by the previous operation.

Also, you don't need to escape the dot in a custom character class.

.replaceAll("[^a-zA-Z\\d\\s.,;&]", "");

The two chained invocations will return:

KW99AR.0 O&amp;


  • As mentioned by Tushar, sequences of characters in a custom character class are not considered as sequences but alternate individual characters.
  • General rule of thumb: careful about using regex to parse markup. You may very well end up with a bigger mess. Regular expressions are not made to parse markup or languages with a grammar.
  • Your specific case is safe enough, but remember there are other XML entities such as &gt;, &lt; etc.
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download