ByteMe ByteMe - 1 year ago 43
Python Question

Can't figure out how to scrape data in body tag using beautiful soup (Python)

from bs4 import BeautifulSoup
import urllib
from openpyxl import Workbook
from openpyxl.compat import range
from openpyxl.cell import get_column_letter

r = urllib.urlopen('').read()
soup = BeautifulSoup(r)
rate = soup.find_all('body')

print rate
print type(soup)

I'm trying to capture values in containers such as data-bedrooms="3", specifically the values given in the quotations, but I have no idea what they are formally called or how to parse them.

The below is a sample of part of the print out for the "body" so I know the values are there, the capturing the specific part is what I can't get:

data-ratemaximum="$260" data-rateminimum="$220" data-rateunits="night" data-rawlistingnumber="576329" data-requestuuid="73bcfaa3-9637-40a8-801c-ae86f93caf39" data-searchpdptab="C" data-serverday="18" data-showbookingphone="False"

Answer Source

You need to pick apart your result. It might be helpful to know that those things you seek are called attributes of a tag in HTML:

body_tag = rate[0]
data_bedrooms = body_tag.attrs['data-bedrooms']

The code above assumes you only have one <body> -- if you have more you will need to use a for loop on rate. You'll also possibly want to convert the value to an integer with int().