Runner Bean Runner Bean - 6 months ago 36
Python Question

Beautiful Soup - hyphenated keyword, Error :: keyword can't be an expression

I'm using Selenium and then Beautiful Soup to try an scrape a webpage, the page uses JavaScript to load certain content. Selenium has given me the plain html, I have checked this, using print and found that it does contain the part that im trying to scrape. But my problem is with Beautiful Soup.

I want to find the div tags with

class="comment-detail"


I've tried using

comments = soup.find_all("div", class_="comment-detail")


but this returns empty, maybe because the actual div tags also have in them

data-selenium="reviews-comments"


The exact tag in the html is

<div data-selenium="reviews-comments" class="comment-detail">


so I tried the following,

comments = soup.find_all("div", data-selenium="reviews-comments", class_="comment-detail")


but this gives the error

SyntaxError: keyword can't be an expression


since

data-selenium


is like a subtraction operation when it is really just a hyphenated word. Ive tried enclosing it in quotation marks but that does not help.

Ive also tried

dct = {
'div': '',
'data-selenium': 'reviews-comments',
'class': 'comment-detail'

}
comments = soup.find_all(**dct)


but

len(comments)


returns zero, i.e. comments is empty.

for the sake of clarity to get my soup i have the code

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

browser = webdriver.Firefox()
browser.get('http://www.agoda.com/the-coast-resort-koh-phangan/hotel/koh-phangan-th.html/')
html_source = browser.page_source
browser.quit()

soup = BeautifulSoup(html_source,'html.parser')


Any ideas how to proceed here?

Answer

The problem stems from the URL, you have an extra forward slash at the end which returns a 404 page rather than the page you actually want. Just remove that and your code works fine.

Here's the code I used just in case:

from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  
from bs4 import BeautifulSoup

browser = webdriver.Firefox()  
browser.get('http://www.agoda.com/the-coast-resort-koh-phangan/hotel/koh-phangan-th.html')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup(html_source, 'html.parser')

comments = soup.find_all("div", class_="comment-detail")

print(comments)