Daniel P Daniel P - 2 months ago 23
Python Question

Beautiful Soup - Python

I was hoping to ask a pretty simple question. I have come across the below code and have not been able to find a decent explanation as to:


  1. What exactly does the
    .attrs
    function do in this case?

  2. What is the function of the
    ['href']
    part at the end i.e. what exactly does that part of the code execute?



Here is the code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("url")
bsObj = BeautifulSoup(html)
for link in bsObj.findAll("a"):
if 'href' in link.attrs:
print (link.attrs['href'])

Answer

Let's try to fetch this question it self and see:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://stackoverflow.com/q/39308028/1005215")
bsObj = BeautifulSoup(html)

i) what exactly does the .attrs function do in this code

In [6]: bsObj.findAll("a")[30]
Out[6]: <a class="question-hyperlink" href="/questions/39308028/beautifuelsoup-python">Beautifuelsoup - Python</a>

In [7]: bsObj.findAll("a")[30].attrs
Out[7]: 
{'class': ['question-hyperlink'],
 'href': '/questions/39308028/beautifuelsoup-python'}

In [8]: type(bsObj.findAll("a")[30])
Out[8]: bs4.element.Tag

If you read the documentation, you will notice that a tag may have any number of attributes. In the element number 30, the tag has attributes 'class' and 'href'

ii) what is the function of the ['href'] part at the end

In [9]: bsObj.findAll("a")[30]['href']
Out[9]: '/questions/39308028/beautifuelsoup-python'

If you look at the above output, you will see that the tag had an attribute 'href' and the above code fetched us the value for that attribute.

Comments