goodman goodman - 3 months ago 20
Java Question

remove numeric xml tag using java

I have the following xml:

<?xml version=\"1.0\"?>
<1>
<TITLE>A Sample Article</TITLE>
<SECT>The First Major Section <PARA>This section will introduce a subsection.</PARA>
<2>
<SECT>The Subsection Heading <PARA>This is the text of the subsection. </PARA>
</SECT>
</SECT>
</ARTICLE>


I want to remove the numeric tags "<1>" and "<2>" using Java.

Parsers won't work as its an invalid xml. I need another solution such as a regular expression or any other idea.

Answer

You can just use the replaceAll method.

String str = "YOUR XML HERE";
str = str.replaceAll("<[12]>", "");

IDEOne demo

Or as Boheamian pointed in his comment you can use the \d shortcut for digits:

str = str.replaceAll("<\\d>", "");

Btw, if you have more than <1> and <2>, like <n> being n whatever number, then you could use:

str = str.replaceAll("<\\d+>", "");