java specialist java specialist - 2 months ago 10
Java Question

How to remove tags using regex / pattern in java

I have a string "

<li>test<ul></ul><li>test<ul><li>model<ul></ul><li>src<ul><li>org<ul>"
", and i want to remove the pattern "
<li>test<ul></ul>
" from the string.
So my desired output will be "
<li>test<ul><li>src<ul><li>org<ul>
"

I have tried following way.

public class Test {
public static void main(String[] args) {
String str = "<li>test<ul></ul><li>test<ul><li>model<ul></ul><li>src<ul><li>org<ul>";
str = str.replaceAll("(?s)<li>.*?<ul></ul>", "");
System.out.println(str);
}

}


but it is not worked and I got output as "
<li>src<ul><li>org<ul>
"

Answer

Try this and replace by ""

public static void main(String[] args) {
    String str = "<li>test<ul></ul><li>test<ul><li>model<ul></ul><li>src<ul><li>org<ul>";
    str = str.replaceAll("<li>([^<]*)<ul><\\/ul>", "");
    System.out.println(str);
}

Edit:

Here's the explanation as requested: reg engine will start matching for anything in between <li> and <ul></ul>. [^<]* will make sure that there is no "<" sign in between ... which is making it kind of lazy which could also be done by using .*?.