ihightower ihightower - 11 days ago 6
Python Question

scraping data from simple website - change 'post' to 'get'

I visited this site:
http://www.avcodes.co.uk/airlcodesearch.asp

And, selected the last option:

Select a letter for ICAO Codes: and chose "B"


Then click
Submit.


I monitored the progress using using Tamper Data and Live HTTP Headers from Firefox.

And, all is well.. and the required direct URL to achieve the same effect is this:

http://www.avcodes.co.uk/airllistres.asp?statuslst=Y&iataairllst=&icaoairllst=B&B1=Submit

However, when using the above URL.. the data is NOT returned.

What am I missing and how can I find the correct URL.

The objective of this exercise is once I know the URL.. I will use a python script to loop through A to Z and get at the content from all the pages.

Please help.

Answer

I assume that you have permission from the website to use their database and that you are allowed to scrape their website. In any other case, it may be illegal to do this, depending on jurisdiction.

The problem here is that you are using GET to retrieve the contents, but the website is expecting a POST. Get and POST are not equivalent, although some programmers consider them the same (in PHP, for example, you can use $_REQUEST instead of $_GET and $_POST). This website is not like them, so you have to POST to this website.

In Python, you can post data to an url like this:

import urllib2
u = urllib2.urlopen("http://www.avcodes.co.uk/airllistres.asp", "statuslst=Y&iataairllst=&icaoairllst=B&B1=Submit")
print u.read()