daOnlyBG daOnlyBG - 1 month ago 6
CSS Question

Extracting Content Within Multiple Span Tags in BeautifulSoup

I'm trying to extract string content from and within multiple span tags. A snap shot of the HTML page is:

<div class="secondary-attributes">
<span class="neighborhood-str-list">
Southeast
</span>
<address>
1234 Python Blvd S<br>Somewhere, NV 98765
</address>
<span class="biz-phone">
(555) 123-4567
</span>
</div>


Specifically, I'm trying to extract the phone number, nestled in between the
<span class="biz-phone></span>
tags. I attempted to do so with the following code:

import requests
from bs4 import BeautifulSoup

res = requests.get(url)
soup = BeautifulSoup(res.text, "html.parser")

phone_number_results = [phone_numbers for phone_numbers in soup.find_all('span','biz-phone')]


The code compiled without any syntax errors, but it didn't quite give me the result I was hoping for:

['<span class="biz-phone">\n (702) 476-5050\n </span>', '<span class="biz-phone">\n (702) 253-7296\n </span>', '<
span class="biz-phone">\n (702) 385-7912\n </span>', '<span class="biz-phone">\n (702) 776-7061\n </span>', '<spa
n class="biz-phone">\n (702) 221-7296\n </span>', '<span class="biz-phone">\n (702) 252-7296\n </span>', '<span c
lass="biz-phone">\n (702) 659-9101\n </span>', '<span class="biz-phone">\n (702) 355-9445\n </span>', '<span clas
s="biz-phone">\n (702) 396-3333\n </span>', '<span class="biz-phone">\n (702) 643-9851\n </span>', '<span class="

biz-phone">\n (702) 222-1441\n </span>']


My question has two parts:


  1. Why are the
    span
    tags appearing when I run the program?

  2. How do I get rid of them? I could just do string editing, but I feel like I wouldn't be taking full advantage of the BeautifulSoup package. Is there a more elegant way?



NOTE: there are more snippets of HTML code like the one shown above throughout the page; there are more instances of the
<span class="biz-phone"> (555) 123-4567 </span>
code (i.e., more phone numbers) that need to be extracted, hence why I was thinking of using
find_all()
.

Thank you in advance.

Answer
  1. find_all() returns a list of tags (bs4.element.Tag), not strings.

  2. As @furas points out, you want to access the text property on each of the tags to extract the text within the tag:

    phone_number_results = [phone_numbers.text.strip() for phone_numbers in soup.find_all('span', 'biz-phone')]

(you may also want to call strip() on top of that)