Hugh Spry Hugh Spry - 2 years ago 64
Python Question

Need to extract data from a website and store in list using regex

So I have a task which requires me to extract data from a website to form a 'top 10 list'. I have chosen IMDB top 250 page

In other words I need a little help using regex to isolate the names of the films and then store them in a list. I already have the HTML stored in a variable as a string (if this is the wrong way of approaching it let me know).

Also, I am limited to use of modules urlopen, re and htmlparser

import HTMLParser
from urllib import urlopen
import re

site = urlopen("")
content =

print content

Answer Source

You really shouldn't use regex but you stated in your comment you have to, so here it is with regex:

import requests

respText = requests.get("").text

for title in re.findall(r'<td class="titleColumn">.+?>(.+?)<', respText, re.DOTALL):

In BeautifulSoup (Which you can't use)

soup = BeautifulSoup(respText, "html.parser")
for item in soup.find_all("td", {"class" : "titleColumn"}):
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download