I'm trying to determine whether it is a bug that Python's urllib.urlopen() function omits an HTTP Accept header when making simple REST API requests.
The Facebook Graph API seems to notice whether the header is present or not:
GET /zuck HTTP/1.0
$ curl -v https://graph.facebook.com/zuck
> GET /zuck HTTP/1.1
> User-Agent: curl/7.30.0
> Host: graph.facebook.com
> Accept: */*
'Accept-Encoding': ', '.join(('gzip', 'deflate')),
*/* indicates all media types
if no Accept header field is present, then it is assumed that the client accepts all media types
>>> import httplib
>>> httplib.HTTPConnection.debuglevel = 1
>>> import urllib
>>> u = urllib.urlopen('https://graph.facebook.com/zuck')
send: 'GET /zuck HTTP/1.0\r\nHost: graph.facebook.com\r\nUser-Agent: Python-urllib/1.17\r\n\r\n'
Reading-up about proxy servers (like NGinx and Varnish) helped me figure out what is going on.
While the presence of an
Accept: */* header shouldn't make a difference to a server, it can and likely will make a difference to a proxy server when the response includes a
Vary: Accept header. In particular, the proxy server is allowed to cache different results for different or omitted Accept headers.
Facebook has updated (and closed-off) its API since this question was asked, but at the time, here is the scenario that caused the observed effects. For backwards compatibility reasons, Facebook was using content negotiation and responding with
Accept header or had a browser-like
Accept: text/html;text/*;*/*. However, when it received
Accept: */*, it returned the more modern
application/json; charset=UTF-8. When a proxy server receives a request without an accept header, it can give either one of the cached responses; however, when it gets
Accept: */*, it always gives the last response.
So here is why you should include the
Accept: */* header: If you do, then a caching proxy will alway return the same content type. If omit the header, the response can vary depending on the results of the last user's content negotiation. REST API clients tend to rely on always getting the same content type back every time.