I have a webpage http://timetable.ait.ie/js/filter.js and I seriously need to parse this page. I have been using BeautifulSoup over the past few days to parse html pages and I really get what I am doing there but this .js file is killing me.
At the moment I am using the following code:
page = urllib.urlopen("http://timetable.ait.ie/js/filter.js")
pageInfo = page.read()
staffarray = "BRADY, DAMIEN";
staffarray = "SCI";
staffarray = "BRADY001608";
If you have problem with regex then use standard string functions and slicing.
First split code into lines and later search
. Lastly use slicing.
import urllib req = urllib.urlopen("http://timetable.ait.ie/js/filter.js") lines = req.read().split('\n') for x in lines: if 'staffarray[' in x: if ' = ' in x: start = x.find('"')+1 end = -3 print '0', x[start:end] elif ' = ' in x: start = x.find('"')+1 end = -3 print '1', x[start:end]