user3319688 user3319688 - 25 days ago 7
PHP Question

Parsing PHP data from HTML webpage with jsoup

I'm not entirely sure how to phrase this question or title it so here it goes. I am using jsoup to parse a webpage (http://champion.gg/statistics/) and I'm trying to grab the stats from their table using this code.

public void connect(String url) {
try {
Document doc = Jsoup.connect(url).userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36").get();
System.out.println(doc.toString());
Element table = doc.select("table[class=table table-striped]").first();
Element tbody = table.select("tbody").first();
Iterator<Element> rows = tbody.select("tr").iterator();
rows.forEachRemaining(row -> {
System.out.println(row.toString());
});
} catch(IOException exception) {
if(Settings.DEBUG) {
Program.LOGGER.log(Level.SEVERE, "There was an error reading the document with the supplied URL!", exception);
}
Program.alert("Error loading webpage!");
}
}


and it is producing this result

<tr ng-repeat="champion in filteredChampions = (championData | startsWith:search.title | filter:roleSort | orderBy:[order+sortExpression.sortBy,order+sortExpression.lastSortBy])">
<td class="rank">{{indexNumber($index, filteredChampions.length)}}</td>
<td ng-class="{'selected-column':determineSelected('title')}"> <a href="/champion/{{champion.key}}/{{champion.role}}">
<div class="tsm-tooltip tsm-angular-champion-tt" data-type="champions" data-name="{{champion.key}}" data-id="{{matchupData}}">
<div class="matchup-champion {{champion.key}}"></div>
<span class="stat-champ-title">{{champion.title}}</span>
</div> </a> </td>
<td class="stats-role-title" ng-class="{'selected-column':determineSelected('role')}">{{champion.role}}</td>
<td ng-class="{'selected-column':determineSelected('winPercent')}"> <span ng-class="{'top-half': (champion.general.winPercent >= 50), 'bottom-half': (champion.general.winPercent < 50)}">{{champion.general.winPercent}}%</span> </td>
<td ng-class="{'selected-column':determineSelected('playPercent')}">{{champion.general.playPercent}}%</td>
<td ng-class="{'selected-column':determineSelected('banRate')}">{{champion.general.banRate}}%</td>
<td ng-class="{'selected-column':determineSelected('experience')}">{{champion.general.experience}}</td>
<td ng-class="{'selected-column':determineSelected('kills')}">{{champion.general.kills}}</td>
<td ng-class="{'selected-column':determineSelected('deaths')}">{{champion.general.deaths}}</td>
<td ng-class="{'selected-column':determineSelected('assists')}">{{champion.general.assists}}</td>
<td ng-class="{'selected-column':determineSelected('largestKillingSpree')}">{{champion.general.largestKillingSpree}}</td>
<td ng-class="{'selected-column':determineSelected('totalDamageDealtToChampions')}">{{champion.general.totalDamageDealtToChampions}}</td>
<td ng-class="{'selected-column':determineSelected('totalDamageTaken')}">{{champion.general.totalDamageTaken}}</td>
<td ng-class="{'selected-column':determineSelected('totalHeal')}">{{champion.general.totalHeal}}</td>
<td ng-class="{'selected-column':determineSelected('minionsKilled')}">{{champion.general.minionsKilled}}</td>
<td ng-class="{'selected-column':determineSelected('neutralMinionsKilledEnemyJungle')}">{{champion.general.neutralMinionsKilledEnemyJungle}}</td>
<td ng-class="{'selected-column':determineSelected('neutralMinionsKilledTeamJungle')}">{{champion.general.neutralMinionsKilledTeamJungle}}</td>
<td ng-class="{'selected-column':determineSelected('goldEarned')}">{{champion.general.goldEarned}}</td>
<td ng-class="{'selected-column':determineSelected('overallPosition')}">{{champion.general.overallPosition}}</td>
<td ng-class="{'selected-column':determineSelected('overallPositionChange')}"><span class="glyphicon" ng-class="{'glyphicon-arrow-up': (champion.general.overallPositionChange > 0), 'glyphicon-arrow-down': (champion.general.overallPositionChange < 0), 'same-position': (champion.general.overallPositionChange === 0)}">{{Math.abs(champion.general.overallPositionChange)}}</span></td>
</tr>


Now instead of producing the result for the average amount of kills a specific champion has it will say champion.general.kills in the result I get. How do I parse the page so that instead of champion.general.kills it will give an actual result such as 8?

Answer

When it comes to extracting data out of a webpage, you have to go to where the data is. In this case, the data is still within the webpage, which is good. You need to go get the script tag containing the data and parse that. For now, this sample code assumes it is the script tag at index 11.

public static void main(String[] args)
{
    try
    {
        Document doc = Jsoup
                .connect("http://champion.gg/statistics/")
                .userAgent(
                        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.152 Safari/537.36")
                .get();
        System.out.println(doc.toString());
        Elements table = doc.select("script");
        Element script = table.get(11);
        parseText(script);
    }
    catch (IOException exception)
    {

    }
}

public static void parseText(Element script)
{
    String text = ((DataNode) script.childNode(0)).toString().trim();
    int index = text.indexOf("_id");
    while (index > 0)
    {
        index += 6;// Beginning of value
        int endQuote = text.indexOf("\"", index);
        String id = text.substring(index, endQuote);
        index = text.indexOf("\"key\":\"", endQuote);
        endQuote = text.indexOf("\"", index + 8);
        String key = text.substring(index, endQuote);
        index = text.indexOf("\"kills\":", endQuote);
        endQuote = text.indexOf(",", index);
        String kills = text.substring(index, endQuote);
        text = text.substring(endQuote);
        index = text.indexOf("_id", index);
        System.out.println(id + key + kills);
    }
}

Output:

5812965753fa9743395ee93a"key":"Urgot"kills":6.47

5812965753fa9743395ee93b"key":"Aatrox"kills":5.8

5812965753fa9743395ee93d"key":"Galio"kills":4.58

5812965753fa9743395ee940"key":"Kled"kills":7.3 ...