J. Doe J. Doe - 1 year ago 118
Ruby Question

ruby, nokogiri, xpath and pgatour.com

I am trying to pull some historical data from pgatour.com using a ruby script, but I can't seem to get it to pull any data at all.

I know this has been discussed before, but I've tried all the solutions that I've come across and coming up blank, so now I'm stripping everything down to the most basic thing I can think of to try to find out if it's me or the website.

I am now just trying to grab one element of a table and then print it to the console

To get the xpath, I opened up chrome developer tools, found a score in the table and right-clicked to copy Xpath and then just used that directly in the code, but still get nothing

require 'open-uri'
require 'nokogiri'

url = "http://www.pgatour.com/tournaments/safeway-open/past-results.html"
html = open(url)
doc = Nokogiri::HTML(html)
puts doc.xpath('//*[@id="pastResultsData"]/ul/li[1]/table/tbody/tr[2]/td[5]').text

Am I doing something wrong? or is the site structured so that a simple solution like what I've done above will not work?

Answer Source

It's REALLY important to first see if the HTML is static or dynamic. There are many ways to do so, but a very simple test is to use Nokogiri at the command line:

>nokogiri 'http://www.pgatour.com/tournaments/safeway-open/past-results.html'
Your document is stored in @doc...
Welcome to NOKOGIRI. You are using ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]. Have fun ;)
>> @doc.at('#pastResultsData')
<div id="pastResultsData" class="clearfix module-tournament-past-results"/>

Looking for #pastResultsData will find any tag with that ID. The value returned shows the <div> tag is empty, which usually means it's waiting to be used as a container to be filled later by DHTML.

And, at that point Nokogiri can't help you, you'll need to either retrieve the content then parse that, or use something that parses JavaScript.