mihir S mihir S - 6 months ago 31
Java Question

Java Replace words within xml

I have the following xml

<some tag>
<some_nested_tag attr="Hello"> Text </some_nested_tag>
Hello world Hello Programming
</some tag>


From the above xml, I want to replace the occurances of the word "Hello" which are part of the tag content but not part of tag attribute.

I want the following output (Replacing Hello by HI):

<some tag>
<some_nested_tag attr="Hello"> Text </some_nested_tag>
HI world HI Programming
</some tag>


I tried java regex and also some of the DOM parser tutorials, but without any luck. I am posting here for help as I have limited time available to fix this in my project. Help would be appreciated.

Answer

That can be done by using a negative lookbehind.

Try this regex:

(?<!attr=")Hello

It will match Hello that is not preceded by attr=.

So you could try this:

str = str.replaceAll("(?<!attr=")Hello", "Hi");

It can also be done by negative lookahead:

Hello(?!([^<]+)?>)