user3390232 user3390232 - 4 months ago 12
Python Question

Remove HTML block in Python

I'd like to know if there's a lib or some method in Python to extract an element from a HTML document. For example:

I have this document:

<html>
<head>
...
</head>
<body>
<div>
...
</div>
</body>
</html>


I want to remove the
<div></div>
tag block from the document and then it'll be like that:

<html>
<head>
...
</head>
<body>
</body>
</html>

Wso Wso
Answer

You don't need a library for this. Just use built in string methods.

def removeOneTag(text, tag):
    return text[:text.find("<"+tag+">")] + text[text.find("</"+tag+">") + len(tag)+3:]

This will remove everything in-between the first opening and closing tag. So your input in the example would be something like...

    x = """<html>
    <head>
      ...
    </head>
    <body>
       <div>
         ...
       </div>
    </body>
</html>"""
print(removeOneTag(x, "div"))

Then if you wanted to remove ALL the tags...

while(tag in x):
    x = removeOneTag(x, tag)
Comments