dabigone dabigone - 4 months ago 17
Python Question

Finding first tag in HTML file with BeautifulSoup

I have a set of HTML files which I want to pull the first tag in each file. As the files don’t have a specific tag which will always be the first in the file, I’m not sure how to do this.

As an example, for the following snippet, the first tag would be

<html>
.

<html>
<head>
<title>
insert title here
</title>
</head>
</html>


Any way to accomplish this with BeautifulSoup (or possibly another tool)? Thanks in advance :)

Answer

You can use BeautifulSoup in this case, just issue find() on a BeautifulSoup object - it would find the first element in the tree. .name would give you the tag name:

from bs4 import BeautifulSoup

data = """
<html>
 <head>
    <title>
     insert title here
    </title>
 </head>
</html>
"""

soup = BeautifulSoup(data, "html.parser")
print(soup.find().name)