I want a regular expression to extract the title from a HTML page. Currently I have this:
title = re.search('<title>.*</title>', html, re.IGNORECASE).group()
if title:
title = title.replace('<title>', '').replace('</title>', '')
Use ( ) and group(1)
(re.search
will return None
if it doesn't find the result, so don't use group() directly):
title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)
if title_search:
title = title_search.group(1)