Simon Simon -4 years ago 78
Java Question

Java replace all occurences of regex with another regex

Let's say I have a string with an xml many occurences of

<tagA>
:

String example = " (...) some xml here (...)
<tagA>283940</tagA>
(...) some xml here (...)
<tagA>& 9940</tagA>
<tagA>- 99440</tagA>
<tagA>< 99440</tagA>
<tagA>99440</tagA>
(...) more xml here (...) "


The content should contain only digits, but sometimes it has a random character followed by a whitespace and the the digits.
I want to remove the unwanted character and the whitespace. How to do that?

So far I know I should be looking for a regex
"<tagA>. [0-9]*<\/tagA>"
but I am stuck here.

I want to replace the characters because among those characters there are "&", ">", "<" signs which make the xml invalid (which prevents me from treating this as an XML).

Answer Source

The regex that you're looking for is: <(\w+)>(\D{0,})(\d+)

On the search Group 1 you'll get the TAG, on the Group 2 you'll get your weird stuff (everything that is not a digit) and in Group 3 there's the number.

There's an "enhanced version" of this regex that might work in more situations: (\w{0,})(<\w+>)(\D{0,})(\d+)(\D{0,})(<\/\w+>)(\w{0,})

This will place in the Group 1 any whitespace that might be before the tag. Group 7 will take care of the trailing whitespaces. Group 2 and 6 will match the opening tag and closing tag. Group 3 and 5 will match any weird character that you might have between your value. Group 4 will contain your value.

With the String::replaceAll, you can filter and sanitize by printing only the group 2, 4 and 6, getting rid of the rest.

//input data
String s = "<tagA>283940</tagA>\n" +
"                    <tagA>& 9940<</tagA>\n" +
"                    <tagA>- 99440</tagA>\n" +
"                    <tagA>< 99440</tagA>\n" +
"                    <tagA>99440</tagA>"
                + "<13243> asdfasdf </>";


    String replaced = s.replaceAll("(\\s{0,})(<\\w+>)(\\D{0,})(\\d+)(\\D{0,})(<\\/\\w+>)(\\s{0,})", "$2$4$6");
    System.out.println(replaced);

Output: <tagA>283940</tagA><tagA>9940</tagA><tagA>99440</tagA><tagA>99440</tagA><tagA>99440</tagA><13243> asdfasdf </>

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download