Daniel P Daniel P - 1 year ago 127
Python Question

Beautiful Soup - Python

I was hoping to ask a pretty simple question. I have come across the below code and have not been able to find a decent explanation as to:


  1. What exactly does the
    .attrs
    function do in this case?

  2. What is the function of the
    ['href']
    part at the end i.e. what exactly does that part of the code execute?



Here is the code:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("url")
bsObj = BeautifulSoup(html)
for link in bsObj.findAll("a"):
if 'href' in link.attrs:
print (link.attrs['href'])

Answer Source

Let's try to fetch this question it self and see:

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("http://stackoverflow.com/q/39308028/1005215")
bsObj = BeautifulSoup(html)

i) what exactly does the .attrs function do in this code

In [6]: bsObj.findAll("a")[30]
Out[6]: <a class="question-hyperlink" href="/questions/39308028/beautifuelsoup-python">Beautifuelsoup - Python</a>

In [7]: bsObj.findAll("a")[30].attrs
Out[7]: 
{'class': ['question-hyperlink'],
 'href': '/questions/39308028/beautifuelsoup-python'}

In [8]: type(bsObj.findAll("a")[30])
Out[8]: bs4.element.Tag

If you read the documentation, you will notice that a tag may have any number of attributes. In the element number 30, the tag has attributes 'class' and 'href'

ii) what is the function of the ['href'] part at the end

In [9]: bsObj.findAll("a")[30]['href']
Out[9]: '/questions/39308028/beautifuelsoup-python'

If you look at the above output, you will see that the tag had an attribute 'href' and the above code fetched us the value for that attribute.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download