Meghla Khan Meghla Khan - 3 months ago 11
HTML Question

Parsing HTML href attribute

I'm working on a project where i need to parse HTML for extracting data from a webpage. I'm using Jsoup in Java. I need to extract data from the following contents.

<tr>
<td><small><a href="http://www.timeanddate.com/worldclock/fixedtime.html?iso=20160821T2100&amp;p1=248" target="_blank">2016/08/21 21:00</a></small></td>
<td><small><a href="https://agc003.contest.atcoder.jp">AtCoder Grand Contest 003</a></small></td>

</tr>


I can get the value Contest name and Time but how to extract the URL. I want to get the contest URL
https://agc003.contest.atcoder.jp

how to get this ??

EDIT:
Here's my code



private void getAC() throws IOException {

Document doc = Jsoup.connect("https://atcoder.jp/").userAgent(Desktop.getDesktop().toString()).get();
Element table = doc.getElementsByClass("table-responsive").get(1);
Elements contestStartTime = table.getElementsByTag("td");
int cnt = 1;
for (Element i : contestStartTime) {
System.out.println(cnt + ". " + i.html());
cnt++;
}

}



Answer

JSoup have rich api for DOM processing, look for this functions:

Element content = doc.getElementById("content");
Elements links = content.getElementsByTag("a");
for (Element link : links) {
  String linkHref = link.attr("href");
  String linkText = link.text();
}

Also you can get your links this way

Elements links = doc.select("table a[href]");
Comments