Tendekai Muchenje Tendekai Muchenje - 3 months ago 14
Python Question

Implementing a modified do-while loop in Python i.e. do at least once and another time at the end of the loop?

I am having problems implementing something that equates a do while loop.

PROBLEM DESCRIPTION

I am scraping a site and the results pages are paginated, i.e.

1, 2, 3, 4, 5, .... NEXT


I am iterating through the pages using a test condition for the existence of the
NEXT
link. If there is one results page, There is no
NEXT
link so I will just scrape that first page. If there is more than one page, the last page also has no
NEXT
link. So the scraper function would also work on that page. The scraping function is called
findRecords()


So I am isolating my
NEXT
link using:

next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")


So I want to run a loop that performs the scrape at least once (when there is one or more results page). I am also clicking the
NEXT
button using a click() function. The code I have so far is:

while True:
findRecords()
next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
if not next_link:
break
next_link.click()


This is not working. Well, it works and it scrapes but when it reaches the last page it give me a
NoSuchElementException
as follows:

Traceback (most recent call last):
File "try.py", line 47, in <module>
next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 752, in find_element
'value': value})['value']
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
self.error_handler.check_response(response)
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']"}
(Session info: chrome=53.0.2785.89)
(Driver info: chromedriver=2.20.353124 (035346203162d32c80f1dce587c8154a1efa0c3b),platform=Linux 3.13.0-92-generic x86_64)


I know it's true that the element does not exist on that last page, because like i said before, the
NEXT
element does not exist on the last page.

So how do i fix my while loop to be able to scrape a single page result and/or that last page when the condition is not true and also elegantly break out of the while loop without giving me that hideous error?

PS: Other than the while loop above, I have also tried the following:

is_continue = True
while is_continue:
findRecords()
next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
if next_link:
is_continue = True
next_link.click()
else:
is_continue = False


And if it is any help, here is my scraper function
findRecords()
as well:

def findRecords():
filename = "sam_" + letter + ".csv"
bsObj = BeautifulSoup(driver.page_source, "html.parser")
tableList = bsObj.find_all("table", {"class":"width100 menu_header_top_emr"})
tdList = bsObj.find_all("td", {"class":"menu_header width100"})

for table,td in zip(tableList,tdList):
a = table.find_all("span", {"class":"results_body_text"})
b = td.find_all("span", {"class":"results_body_text"})
with open(filename, "a") as csv_file:
csv_file.write(', '.join(tag.get_text().strip() for tag in a+b) +'\n')

Answer

You should try using find_elementsas @Grasshopper suggested, it would return either list of WebElement or empty list. So just check its length as below :-

while True:
    findRecords()
    next_link = driver.find_elements(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
    if len(next_link) == 0:
        break
    next_link[0].click()