I was going through a tutorial for list data scraping from a web page and wehave a BeautifulSoup object named 'soup', I am supposed to find all the elements from 'soup' such that they are in a table and the element is in some class so they did this:
> [t["class"] for t in soup.find_all("table") if t.get("class")]
"what is t["class"] doing in here why didn't we simply write t"*
Obviously because the author wanted to retrieve the
class attribute of the tag, not the full tag.
why are we using .get() method as boolean in this case, I mean does it not return the value stored for a key in a dictionary?
dict.get(key[, default=None]) does indeed return the value for key
key if it's set or
default (which defaults to
None) if it isn't.
The goal here is obviously to only get
class for tags having one.
Does it mean the beautiful soup object is a dictionary?
Here 't' is not "the beautiful soup object', it's a
Tag instance. And while not strictly being a
dict, it does behave as a one wrt/ html attributes indeed. This is documented FWIW.