Simon Simon - 1 month ago 12
Java Question

Convert String to arraylist using split

Is it possible to convert below String content to an arraylist using split, so that you get something like in point A?

<a class="postlink" href="http://test.site/i7xt1.htm">http://test.site/i7xt1.htm<br/>
</a>
<br/>Mirror:<br/>
<a class="postlink" href="http://information.com/qokp076wulpw">http://information.com/qokp076wulpw<br/>
</a>
<br/>Additional:<br/>
<a class="postlink" href="http://additional.com/qokdsfsdwulpw">http://additional.com/qokdsfsdwulpw<br/>
</a>


Point A (desired arraylist content):

http://test.site/i7xt1.htm
Mirror:
http://information.com/qokp076wulpw
Additional:
http://additional.com/qokdsfsdwulpw


I am now using below code but it doesn`t bring the desired output. (mirror for instance is being added multiple times etc).

Document doc = Jsoup.parse(string);
Elements links = doc.select("a[href]");
for (Element link : links) {
Node previousSibling = link.previousSibling();

while (!(previousSibling.nodeName().equals("u") || previousSibling.nodeName().equals("#text"))) {
previousSibling = previousSibling.previousSibling();
}

String identifier = previousSibling.toString();

if (identifier.contains("Mirror")) {
totalUrls.add("MIRROR(s):");
}
totalUrls.add(link.attr("href"));
}

Answer

Fix your links first. As cricket_007 mentioned, having proper HTML would make this a lot easier.

String html = yourHtml.replaceAll("<br/></a>", "</a>"); // get rid of bad HTML
String[] lines = html.split("<br/>");

for (String str : Arrays.asList(lines)) {
    Jsoup.parse(str).text();
    ... // you can go further here, check if it has a link or not to display your semi-colon;
}

Now that the errant <br> tags are out of the links, you can split the string on the <br> tags that remain and print out your html result. It's not pretty, but it should work.

Comments