Tendekai Muchenje Tendekai Muchenje - 1 year ago 72
Python Question

Implementing a modified do-while loop in Python i.e. do at least once and another time at the end of the loop?

I am having problems implementing something that equates a do while loop.


I am scraping a site and the results pages are paginated, i.e.

1, 2, 3, 4, 5, .... NEXT

I am iterating through the pages using a test condition for the existence of the
link. If there is one results page, There is no
link so I will just scrape that first page. If there is more than one page, the last page also has no
link. So the scraper function would also work on that page. The scraping function is called

So I am isolating my
link using:

next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")

So I want to run a loop that performs the scrape at least once (when there is one or more results page). I am also clicking the
button using a click() function. The code I have so far is:

while True:
next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
if not next_link:

This is not working. Well, it works and it scrapes but when it reaches the last page it give me a
as follows:

Traceback (most recent call last):
File "try.py", line 47, in <module>
next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 752, in find_element
'value': value})['value']
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute
File "/usr/local/lib/python2.7/dist-packages/selenium/webdriver/remote/errorhandler.py", line 192, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']"}
(Session info: chrome=53.0.2785.89)
(Driver info: chromedriver=2.20.353124 (035346203162d32c80f1dce587c8154a1efa0c3b),platform=Linux 3.13.0-92-generic x86_64)

I know it's true that the element does not exist on that last page, because like i said before, the
element does not exist on the last page.

So how do i fix my while loop to be able to scrape a single page result and/or that last page when the condition is not true and also elegantly break out of the while loop without giving me that hideous error?

PS: Other than the while loop above, I have also tried the following:

is_continue = True
while is_continue:
next_link = driver.find_element(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
if next_link:
is_continue = True
is_continue = False

And if it is any help, here is my scraper function
as well:

def findRecords():
filename = "sam_" + letter + ".csv"
bsObj = BeautifulSoup(driver.page_source, "html.parser")
tableList = bsObj.find_all("table", {"class":"width100 menu_header_top_emr"})
tdList = bsObj.find_all("td", {"class":"menu_header width100"})

for table,td in zip(tableList,tdList):
a = table.find_all("span", {"class":"results_body_text"})
b = td.find_all("span", {"class":"results_body_text"})
with open(filename, "a") as csv_file:
csv_file.write(', '.join(tag.get_text().strip() for tag in a+b) +'\n')

Answer Source

You should try using find_elementsas @Grasshopper suggested, it would return either list of WebElement or empty list. So just check its length as below :-

while True:
    next_link = driver.find_elements(By.XPATH, "//a[contains(text(),'Next')][@style='text-decoration:underline; cursor: pointer;']")
    if len(next_link) == 0:
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download