asheets asheets - 3 years ago 133
Python Question

Using python selenium to scrape time data

I'm using the following lines of code to get the utime of an element. From the output I can see that I'm targeting the correct area and that the utime attribute is present there, but I still recieve an output of

None
. I have tried re-writing the
data-utime
attribute several times to make sure it is formatted correctly for the function. What am I missing here?

Code:

timeStampBox = post.find_element_by_css_selector('.fsm.fwn.fcg')
timeStampBox = timeStampBox.find_element_by_class_name('_5pcq')

print(timeStampBox.get_attribute('innerHTML'))
print(timeStampBox.get_attribute('data-utime'))


Output:

<abbr title="Monday, September 4, 2017 at 6:11am" data-utime="1504530675" data-shorten="1" class="_5ptz"><span class="timestampContent" id="js_15">September 4 at 6:11am</span></abbr>
None

Answer Source

The abbr element is the innerHTML of timeStampBox but data-utime is not an attribute of timeStampBox.

Here's how I emulated your situation:

<html>
<body>
<div><abbr title="Monday, September 4, 2017 at 6:11am" data-utime="1504530675" data-shorten="1" class="_5ptz"><span class="timestampContent" id="js_15">September 4 at 6:11am</span></abbr></div>
</body>
</html>

The div element is a container for the abbr element. I can pretend that it's your timeStampBox element.

>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> driver.get('file://c:/scratch/temp.htm')

Identify timeStampBox and get its innerHTML. As before I got the abbr element.

>>> timeStampBox = driver.find_element_by_tag_name('div')
>>> timeStampBox.get_attribute('innerHTML')
'<abbr title="Monday, September 4, 2017 at 6:11am" data-utime="1504530675" data-shorten="1" class="_5ptz"><span class="timestampContent" id="js_15">September 4 at 6:11am</span></abbr>'

data-utime is None because this property does not exist in timeStampBox.

>>> timeStampBox.get_attribute('data-utime')

But it's there in abbr.

>>> abbr = driver.find_element_by_tag_name('abbr')
>>> abbr.get_attribute('data-utime')
'1504530675'

Moral of our story: search directly for abbr.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download