Anurag Joshi Anurag Joshi - 20 days ago 5
Python Question

Import data of specific columns using BeautifulSoup

<h3 id="LABandServerNamingConvention-:"><a href="/display/ES/Lab+Org+Code+Summary+Listing">Lab Org Code Summary Listing</a>:</h3>
<div class="sectionColumnWrapper">
<div class="sectionMacro">
<div class="sectionMacroRow">
<div class="columnMacro">
<div class="table-wrap">
<table class="confluenceTable">
<tbody>
<tr>
<th class="confluenceTh">
<p>Prefix</p>
</th>
<th class="confluenceTh">
<p>Group</p>
</th>
<th class="confluenceTh">
<p>Contact</p>
</th>
<th class="confluenceTh">
<p>Dev/Test Lab</p>
</th>
<th class="confluenceTh">
<p>Performance</p>
</th>
</tr>
<tr>
<td class="confluenceTd">
<p>SEE00</p>
</td>
<td class="confluenceTd">
<p>Entertainment</p>
</td>


I have a table with 5 columns out of which 2 are filled for this specific entry.
How do I get the row data from the table into my python code from this HTML snippet. I am using BeautifulSoup. This is what I have tried so far:

data = requests.get(url,auth=(username,password))
sample = data.content
soup = BeautifulSoup(sample,'html.parser')
article_text = ' '
article = soup.findAll('td', {'class' : "confluenceTd" })
for element in article:
article_text += '\n' + ''.join(element.findAll(text = True))


I want to somehow get 'SEE00' and 'Entertainment'.

Answer
from bs4 import BeautifulSoup
doc = '''<h3 id="LABandServerNamingConvention-:"><a href="/display/ES/Lab+Org+Code+Summary+Listing">Lab Org Code Summary Listing</a>:</h3>
<div class="sectionColumnWrapper"><div class="sectionMacro"><div class="sectionMacroRow"><div class="columnMacro"><div class="table-wrap"><table class="confluenceTable"><tbody><tr><th class="confluenceTh"><p>Prefix</p></th><th class="confluenceTh"><p>Group</p></th><th class="confluenceTh"><p>Contact</p></th><th class="confluenceTh"><p>Dev/Test Lab</p></th><th class="confluenceTh"><p>Performance</p></th></tr><tr><td class="confluenceTd"><p>SEE00</p></td><td class="confluenceTd"><p>Entertainment</p></td>
'''
soup = BeautifulSoup(doc, 'lxml')

for row in soup.find_all('tr'):
    print(row.get_text(separator='\t')) # this separator is only for format, you can use whatever you want

out:

Prefix  Group   Contact Dev/Test Lab    Performance
SEE00   Entertainment   

you can control the for loop with slice:

for row in soup.find_all('tr')[1:]:

this will only print

SEE00   Entertainment