strahanstoothgap strahanstoothgap - 11 months ago 64
Python Question

Python to CSV is splitting string into two columns when I want one

I am scraping a page with BeautifulSoup, and part of the logic is that sometimes part of the contents of a tag can have a

in it.

So sometimes it looks like this:

<td class="xyz">
text 1
text 2

and sometimes it looks like this:

<td class="xyz">
text 1

I am looping through this and adding to an output_row list that I eventually add to a list of lists. Whether I see the former format or the latter, I want the text to be in one cell.

I've found a way to determine if I am seeing the
tag because the td.string shows up as none and I also know that text 2 always has 'ABC' in it. So:

elif td.string == None:
if 'ABC' in td.contents[2]:
new_string = td.contents[0] + ' ' + td.contents[2]
#this is for another situation and it works fine

As I print this in a Jupyter Notebook, it shows up as "text 1 text 2" as one line. But when I open up my CSV, it is in two different columns. So when td.string has contents (meaning no
tag), text 1 shows up in one column, but when I get to the pieces that have a
tag, all my data gets shifted.

I'm not sure why it shows up as two different strings (two columns) when I concatenate them before appending them to the list.

I'm writing to file like this:

with open('C:/location/file.csv', 'w',newline='') as csv_file:
for row in output_rows:


Answer Source

You can handle both cases using get_text() with "strip" and "separator":

from bs4 import BeautifulSoup

        <td class="xyz">
            text 1
            text 2

        <td class="xyz">
            text 1

soup = BeautifulSoup(dat, 'html.parser')
for td in"table > tr >"):
    print(td.get_text(separator=" ", strip=True))


text 1 text 2
text 1