Rehama Rehama - 4 years ago 150
Java Question

Jsoup select links for different websites

I am filtering links out of a html body using JSOUP.

for such a webpage: https://en.wikipedia.org/wiki/Cloud_computing

i want to filter links such as:
https://en.wikipedia.org/wiki/Light

for hash tag links en.wikipedia.org/wiki/Cloud_computing#cite_note-1

i try

doc.select("a[href*=#]").remove();
and it works well where hash tag links in page html src:
<a href="#cite_ref-1">


but when i use
doc.select("a[href]*=/]").remove();
where links in page html src

<a href="/wiki/Light">CH</a>


But there are still links not filtered . How is this possible?

Answer Source

You have a typo.

doc.select("a[href]*=/]").remove();

It should be like this

doc.select("a[href*=/]").remove();

But this would remove every link containing a /. Is this what you want, or do you want to remove every link that starts with /. In that case you need this

doc.select("a[href^=/]").remove();
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download