Junchao Gu Junchao Gu - 15 days ago 6
Python Question

strange behaviour with curl and python requests library

I am trying to get some data from a website. The requested url will take a look at the requested content type and then respond correspondingly.

So the curl command I tried:

curl --header "Accept: application/json, text/javascript, */*; q=0.01\r\nX-Requested-With: XMLHttpRequest\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36\r\n" http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/margin_bal_result.php\?l\=en-us\&d\=2016/11/15\&_\=1479700586981 -v
* About to connect() to www.tpex.org.tw port 80 (#0)
* Trying 210.63.162.130... connected
> GET /web/stock/margin_trading/margin_balance/margin_bal_result.php?l=en-us&d=2016/11/15&_=1479700586981 HTTP/1.1
> User-Agent: curl/7.22.0 (x86_64-pc-linux-gnu) libcurl/7.22.0 OpenSSL/1.0.1 zlib/1.2.3.4 libidn/1.23 librtmp/2.3
> Host: www.tpex.org.tw
> Accept: application/json, text/javascript, */*; q=0.01\r\nX-Requested-With: XMLHttpRequest\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36\r\nAccept-Encoding: gzip,deflate,sdch\r\n
>
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Date: Mon, 21 Nov 2016 07:35:56 GMT
< Server: Apache
< Content-Type: text/html; charset=utf-8
< X-Cache: MISS from localhost
< X-Cache-Lookup: MISS from localhost:3128
< Via: 1.0 localhost (squid/3.1.19)
< Connection: close
<
{"reportDate":"2016\/11\/15","iTotalRecords":610,"aaData":[["006201","YA HORNG ELECTRONIC CO.","6","0","0","0","6","0","0.09","6,361","0","0","0","0","0","0","0.0","6,361","0",""],...}


The response is truncated but basically it is JSON.

However, there is my Python code, I do not think there is much difference. But the response is html...

g_tpex_headers = {
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'User-Agent': (
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
' (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120'
' Chrome/37.0.2062.120 Safari/537.36'
),
'X-Requested-With': 'XMLHttpRequest',
}
data_link = (
'http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/'
'margin_bal.php?l=en-us&d={}&_=1479700586981'
)
data = []
with requests.Session() as session:
session.headers = g_tpex_headers
res = session.get(
actual_data_link.format(target_dt.strftime('%Y/%m/%d'))
)
print(res.content[:400])


The log:

send: 'GET /web/stock/margin_trading/margin_balance/margin_bal.php?l=en-us&d=2016/11/18&_=1479700586981 HTTP/1.1\r\nHost: www.tpex.org.tw\r\nX-Requested-With: XMLHttpRequest\r\nAccept-Encoding: gzip,deflate,sdch\r\nAccept: application/json, text/javascript, */*; q=0.01\r\nUser-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/37.0.2062.120 Chrome/37.0.2062.120 Safari/537.36\r\n\r\n'


and the response

<!DOCTYPE HTML>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title> HOME&nbsp;&gt;&nbsp;Mainboard&nbsp;&gt;&nbsp;Margin Trading&nbsp;&gt;&nbsp;Margin Balance</title>
<link rel="icon" type="image/ico" href="/web/images/favicon.ic


I could not see much difference. So why python requests is not getting JSON response.

Answer

Try to make the request in python the same as in your curl totally. your code:

data_link = (
    'http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/'
    'margin_bal.php?l=en-us&d={}&_=1479700586981'
)

changed:

data_link = (
    'http://www.tpex.org.tw/web/stock/margin_trading/margin_balance/'
    'margin_bal_result.php?l=en-us&d={}&_=1479700586981'
)

After I corrected data_link , I found it works actually.

Comments