Tomasz Kowalczyk Tomasz Kowalczyk - 15 days ago 11
Python Question

Scrape booking.com with Python against AJAX requests

I am trying to scrape data from booking.com and almost everything is working now but I am not able to get the prices, I read so far that it is because those prices are loaded via AJAX calls. Here is my code:

import requests
import re

from bs4 import BeautifulSoup

url = "http://www.booking.com/searchresults.pl.html"
payload = {
'ss':'Warszawa',
'si':'ai,co,ci,re,di',
'dest_type':'city',
'dest_id':'-534433',
'checkin_monthday':'25',
'checkin_year_month':'2015-10',
'checkout_monthday':'26',
'checkout_year_month':'2015-10',
'sb_travel_purpose':'leisure',
'src':'index',
'nflt':'',
'ss_raw':'',
'dcid':'4'
}

r = requests.post(url, payload)
html = r.content
parsed_html = BeautifulSoup(html, "html.parser")

print parsed_html.head.find('title').text

tables = parsed_html.find_all("table", {"class" : "sr_item_legacy"})

print "Found %s records." % len(tables)

with open("requests_results.html", "w") as f:
f.write(r.content)

for table in tables:
name = table.find("a", {"class" : "hotel_name_link url"})
average = table.find("span", {"class" : "average"})
price = table.find("strong", {"class" : re.compile(r".*\bprice scarcity_color\b.*")})
print name.text + " " + average.text + " " + price.text


Using
Developers Tools
from Chrome I noticed that the webpage sends a raw response with all of the data (including prices). After coping the response content from one of this tabs, there are raw values with prices, so why I can't retrieve them using my script, how to solve it?

enter image description here

Answer

The first problem is that the site is ill-formed: one div is opened in your table and an em is closed. So the html.parser cannot find the strong tag containing the price. This you can fix with installing and using lxml:

parsed_html = BeautifulSoup(html, "lxml")

The second problem is in your regex. It does not find anything. Change it to the following:

price = table.find("strong", {"class" : re.compile(r".*\bscarcity_color\b.*")})

Now you will find prices. However some entries do not contain any price thus your print statement will throw an error. To solve this you can change your print to the following:

print name.text, average.text, price.text if price else 'No price found'

And note that you can separate fields to print with comma (,) in Python so you do not need to concatenate them with + " " +.

Comments