Shahin Shahin - 1 year ago 58
Python Question

Unable to parse a certain value from a webpage

I've written some code in python in combination with selenium to scrape "Latitude" from a website which is in this case "49°57'09"N (49.952500)" but for some reason I'm getting TimeoutException instead. I can't understand where I'm getting derailed from. Any input on this will be vastly appreciated.

The script I'm trying with:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
driver.get("http://www.gcmap.com/airport/EDEF")
wait = WebDriverWait(driver, 10)

driver.switch_to_frame(0)
for item in wait.until(EC.presence_of_all_elements_located((By.XPATH, "//table[contains(@class,'vcard')]//td/abbr[@class='latitude']"))):
print(item.text)
driver.quit()


Elements in which the latitude resides:

<td colspan="2" nowrap=""><abbr class="latitude" title="49.952500"></abbr>49°57'09"N (49.952500)</td>


Here is the error I'm getting:

80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:

Answer Source

The issue here is that the text desired is not inside the <abbr> tag, it's inside it's parent, the <td> tag. To find the element's parent, you can use the XPath's double-dot syntax with .find_element_by_xpath(".."). Also, finding the <abbr> by it's class name is a much cleaner way than to use it's XPath. Note no waiting(neither explicit or implicit) was necessary for the code below to work:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get("http://www.gcmap.com/airport/EDEF")

item = driver.find_element_by_class_name('latitude')
itemParentText = item.find_element_by_xpath("..").text

>>> print(itemParentText)
49°57'09"N (49.952500)
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download