Python Question

Regex for deleting two patterns in a string

I'm using regex to parse HTML. So, confessing that sin right off the bat. If you have a better way, answer it here because I feel dirty and wrong.

Nonetheless, I can't find the answer to this regex question which can apply to non-HTML.

I have a string like:

tag ='style="width: 2010px; background-color: red; height: 200px; font-size: 12px"'

and want to remove the width and height elements only, so I tried:

r = r'style="(width:\s?\d+px;?)|(height:\s?\d+px;?)'
tag = re.sub(r, "", tag)

The pattern seems to match in regex101 here but I'm getting a
TypeError: 'expected string or buffer

Answer Source

Try using the following regex :




import re
regex = r"(?:width|height):\s?\d+px;?\s?"
test_str = '<div id="attachment_9565" class="wp-caption aligncenter" style="width: 2010px;background-color:red;height:200px">'
subst = ""
result = re.sub(regex, subst, test_str, 0)
if result:
    print (result)
