user3376578 user3376578 - 1 year ago 142
Python Question

Web scraping with lxml and requests

I have a web page with hotels, where i want to get all the hotel names. I made a code following instructions from this page, but no success.
My code is here:

from lxml import html
import requests

page = requests.get('web page url')
tree = html.fromstring(page.content)

hotel_name = tree.xpath('//span[@title="sr-hotel__name"]/text()')


All I get is an empty list. Any help?

Answer Source

You need to add a user-agent:

from lxml import html
import requests
headers= {"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.82 Safari/537.36"}
page = requests.get(''
                    , headers=headers)
tree = html.fromstring(page.content)
hotel_name = tree.xpath('//span[@class="sr-hotel__name"]/text()')


Which will give you:

['\nHotel Telegraaf\n', '\nRadisson Blu Hotel Olümpia\n', '\nRadisson Blu Sky Hotel\n', '\nPark Inn by Radisson Central Tallinn\n', '\nPark Inn by Radisson Meriton Conference & Spa Hotel Tallinn\n', '\nMerchants House Hotel\n', '\nSwissotel Tallinn\n', '\nMy City Hotel\n', '\nNordic Hotel Forum\n', '\nHotel Palace by TallinnHotels\n', '\nHotel Ülemiste\n', '\nTallink City Hotel\n', '\nHotel London by Tartuhotels\n', '\nJohan Design & SPA Hotel\n', '\nThe von Stackelberg Hotel Tallinn\n']

But you should read their TOS:

Our services are only for personal and non-commercial use. Consequently, you will not be allowed on our web site available through the content, information, software, products or services of a commercial or compete for the purpose of reselling link (deep link) to use, copy, monitor (eg, spiders, scrape), display, download download or reproduce.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download