Prashant Prashant - 1 year ago 66
HTML Question

find elements in html DOM

I want to get Name/logo/Description/type from this HTML DOM
I use python Beautiful Soup and got Description and Type

soup = BeautifulSoup(html_data)
for h in soup.findAll("span", { "class" : "description" }):
print h.text

But Unable to get the Name and logo in a Json file.

<a href="/organization/flipkart">
<div class="container organization_autocomplete">
<div class="logo">
<img src="https://abcdsdsdsf/imm.jpg">
<div class="identity container">
<div class="follow_card_wrapper"><div class="link_container"><div class="name follow_card" data-name="Flipkart" data-type="Organization" data-uuid="43b9e775b843f194fb96d266684cfb53" data-permalink="/organization/flipkart" data-image="https://abcdsdsdsf/imm.jpg">Flipkart</div></div><div class="card_inner"></div></div>
<div class="content container">
<span class="type">Company - </span>
<span class="description">
Flipkart is an online shopping destination for electronics, books, music and movies.

I tried with the same method replacing Class name, but get empty output.
Can anyone tell how the Nested class be dealt in such cases

I appreciate Alecxe efforts but Looking at alecxe's answer I have some confusions.

It would be good if someone can explain that the classes where they
have multiple name like

<div class=container

In this how to decide which one to use
for selecting name container or organization_autocomplete and
similarily for

  • Also for let's say name why we didn't used these classes
    class="name follow_card"

Answer Source

I would get both name and logo using the class attributes:

logo = soup.find("div", class_="logo").img["src"]
name = soup.find("div", class_="identity").find("div", class_="name").get_text()

print(logo, name)

Or, via CSS selectors:

logo = soup.select_one("div.logo img")["src"]
name = soup.select_one("div.identity").get_text()

print(logo, name)

As for choosing which classes to use, and which location techniques to apply in general - there is no silver bullet. Though, it is recommended to rely on ids and "data-oriented" classes or other data-oriented attributes. For instance, in your case, container class is more "layout"-oriented than "data"-oriented.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download