hoju hoju - 1 year ago 81
HTML Question

extract contents of regex

I want a regular expression to extract the title from a HTML page. Currently I have this:

title = re.search('<title>.*</title>', html, re.IGNORECASE).group()
if title:
title = title.replace('<title>', '').replace('</title>', '')

Is there a regular expression to extract just the contents of so I don't have to remove the tags?


Answer Source

Use ( ) and group(1) (re.search will return None if it doesn't find the result, so don't use group() directly):

title_search = re.search('<title>(.*)</title>', html, re.IGNORECASE)

if title_search:
    title = title_search.group(1)