Mary Mary - 2 years ago 74
HTML Question

beautiful soup captures null values in a table

For the following piece of html code, I used beautifulsoup to capture the table information:

<td>happy </td>
<td>daily </td>

This is my code:

comments = [td.get_text() for td in table.findAll("td")]
Comments=[data.encode('utf-8') for data in comments]

As you see, this table has 2 headers: "code and display" and some values in rows. the expected output of my code should be [code, display, min, minutes, happy, Hour, daily, day]

but this is the output:

['Code', 'Display', 'min', 'Minute', '', 'happy ',
'Hour', '', 'daily ', 'Day', '']

The output has '' in 5th, 8th,and 11th indices in comments that are not defined in this table. I think it may because of

how can I change the code to not capture u'' in the output. Thanks !

Answer Source

Sorry, I hadn't read your question carefully enough. You're right, the problem is the empty <td/> tags. Just adjust your generator to only include cells with text:

comments = [td.get_text() for td in table.findAll('td') if td.text]
