Vicky Vicky - 2 years ago 100
Python Question

inconsistent text sources between the web (using Google Chrome to view sources) and web scraping tools

I am working on web scraping on the Zillow website, and I have tried two Python libraries to get the web texts (using requests and httplib2). I meet inconsistent numbers of cases for sale, due to different text sources between the web (using Google Chrome to view sources) and web scraping tools

import httplib2
http = httplib2.Http()
status, response = http.request(webpage)
response=response.decode('utf-8')

import requests
response = requests.get(webpage, headers=headers)
response=response.text


My code includes three steps.


  1. access the cases by zip codes (like https://www.zillow.com/homes/for_sale/75001_rb/1_p/)

  2. get longitude, latitude and zpid (like /homedetails/3756-Park-Pl-Addison-TX-75001/26935870_zpid/)

  3. use the zpid to access the detailed information, like https://www.zillow.com/homedetails/3756-Park-Pl-Addison-TX-75001/26935870_zpid/



Updated: this problem is solved by the selenium package. As Charles Duffy mentioned, DOMs result in differences of text sources between browsers and APIs.

Answer Source

I believe you are scrapping a website that is rendered using JavaScript (Dynamic website) that is why you are getting inconsistency between google chrome source code and web scrapping tool.

I'd recommend using any one of these for scrapping dynamic websites

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download