David.L David.L - 1 year ago 195
HTML Question

Web Scraping with Python - Selecting div, h2 and h3 class

This is my first time with Python and web scraping. Have been looking around and still unable to get what I need to do.

Below are print screen of the elements that I've used via Chrome.

What I am trying to do is that, I am trying to get the apartment names and the address from the selected city name.

List of Apartments in a selected city

import requests
from bs4 import BeautifulSoup

#url = 'http://www.homestead.ca/apartments-for-rent/'
rootURL = 'http://www.homestead.ca'
response = requests.get(rootURL)
html = response.content
soup = BeautifulSoup(html,'lxml')

dropdown_list = soup.select(".primary .child-pages a")

#city_names=[dropdown_list_value.text for dropdown_list_value in dropdown_list]
#print (city_names)

cityLinks=[rootURL + dropdown_list_value['href'] for dropdown_list_value in dropdown_list]

for cityLinks_select in dropdown_list: #Looping each city from the Apartment drop down list
print ('Selecting city:',cityLinks_select.text)
cityResponse = requests.get(cityLinks)
cityHtml = cityResponse.content
citySoup = BeautifulSoup(cityHtml,'lxml')

community_list = soup.select(".extended-search .property-container a[h2 h3]")
get and print the apartment link
get and print the apartment name
get and print the address of the apartment

Answer Source

As I commented, some of the data is dynamically created, if we look at the source itself we see:

                        <div class="content">
                                    <div class="title-container">
                                        <h2 class="building-name"><%= building.get('name') %></h2>
                                        <h3 class="address"><%= building.get('address').address %></h3>

                                    <div class="rent">
                                        <h4 class="sub-title">Rent from</h4>
                                        <% if (building.get('statistics').suites.rates.min !== 'undefined') { %>
                                            <% $min_rate = commaSeparateNumber(parseInt(building.get('statistics').suites.rates.min)); %>
                                            <span class="rent-value">$<%= $min_rate %></span>
                                        <% } %>

All we can get from the source is the building name, the address and the ph number:

cityLinks = [rootURL + dropdown_list_value['href'] for dropdown_list_value in dropdown_list]

for city in cityLinks:  # Looping each city from the Apartment drop down list
    cityResponse = requests.get(city)
    cityHtml = cityResponse.content
    citySoup = BeautifulSoup(cityHtml, 'lxml')
    for div in citySoup.select("div.building-info"):
        print(div.select_one("div.contact-container div.phone").text.strip())