Kelly Roper Kelly Roper - 2 months ago 19
Python Question

Python Requests getting inconsistent response code

I am attempting to write a webscraper for the stats.nba.com website. Sometimes when I run a script, it comes out at as a 200 return code, but other times it becomes a 400 error code. I suspect that maybe it's takes a response sometimes, but not sure. Here is an example with four, but it's usually at a much bigger one.

Here is the code.

urls = ['http://stats.nba.com/stats/boxscoresummaryv2?GameID=0021500001', 'http://stats.nba.com/stats/boxscoresummaryv2?GameID=0021500002',
'http://stats.nba.com/stats/boxscoresummaryv2?GameID=0021500003', 'http://stats.nba.com/stats/boxscoresummaryv2?GameID=0021500004']

for url in urls:
r = requests.get(url)
print r.url
print r.status_code


Here's a sample response and I continue to get wildly inconsistent response codes.

http://stats.nba.com/stats/boxscoresummaryv2?GameID=0021500001
200
http://stats.nba.com/stats/boxscoresummaryv2?GameID=0021500002
400
http://stats.nba.com/stats/boxscoresummaryv2?GameID=0021500003
400
http://stats.nba.com/stats/boxscoresummaryv2?GameID=0021500004
400

Answer

You need to pass a user-agent:

In [11]: for url in urls:
....:         r = requests.get(url, headers={"user-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.92 Safari/537.36"})
   ....:      print(r.status_code)
   ....:     
200
200
200
200

No user-agent:

In [12]: for url in urls:
            r = requests.get(url)            print(r.status_code)   ....:     
200
400
400
400

I would also consider adding a sleep between requests if I were you.