firelitte firelitte - 7 months ago 17
Python Question

List of all element names in HTML document — beautifulsoup

I want to get a list containing all different tag names of a HTML document (a list of string of tag names without repetition). I tried putting empty entry with

, but this gave me the entire document instead.

Is there a way of doing it?

edit(trying the suggestion):

r = request.get(
data = r.text
soup = BeautifulSoup(data,'html.parser')

this gave me attribute error: 'str' object has no attribute 'text'


Using soup.findall() you get a list of every single element you can iterate over. Therefore you can do the following:

from bs4 import BeautifulSoup

html_doc = """
<html><head><title>The Dormouse's story</title></head>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="" class="sister" id="link1">Elsie</a>,
<a href="" class="sister" id="link2">Lacie</a> and
<a href="" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""  # an html sample
soup = BeautifulSoup(html_doc, 'html.parser')

document = soup.html.find_all()

el = ['html',]  # we already include the html tag
for n in document:
    if not in el:


The output of the code snippet would be:

>>> ['head', 'title', 'body', 'p', 'b', 'a']


As @PM 2Ring Pointed out there, if you don't care about the order in which the elements are added (which as he says I don't think it is the case), then you may use sets. In Python 3.x you don't have to import it, but if you use an older version you may want to check whether it is supported.

from bs4 import BeautifulSoup


el = {x for x in document} # use a set comprehension to generate it easily
el.add("html")  # only if you need to