Catalin Besleaga Catalin Besleaga - 7 months ago 20
HTML Question

XPath keep track of data from every table

I've been digging into a website for some time :

http://directory.ccnecommunity.org/reports/rptAccreditedPrograms_New.asp?sort=institution

I need to extract data under Master's from each University.

As you may notice not every University has Master's data, so I need to keep track of it.

How can I keep track of the data in this situation?

My python with XPATH code so far :

import __future__
from lxml import html
import requests
from bs4 import BeautifulSoup

page = requests.get('http://directory.ccnecommunity.org/reports/rptAccreditedPrograms_New.asp?sort=institution')

soup = str(BeautifulSoup(page.content, 'html.parser'))

tree = html.fromstring(soup)

for table in tree.xpath('//table[@width="95%" and @align="center" and @class="center"]'):
print('-- NEW TABLE -- \n')
tab = table.xpath('.//table[@width="260px"]/tr/td[@style="width: 100%;"]/text()')
print(tab)

print('Ready !!')


As you see it prints
-- NEW TABLE --
but the
tab
variable is an empty array.

The
tab
variable should have been consisted of the data under Baccalaureate, Master's and Doctor of Nursing Practice of each table.

Answer

Try:

for table in tree.xpath('(//tr[ td[span="Baccalaureate"] or td[contains(span,"Master")] ]/ancestor::tr[1])'):
  print('-- NEW TABLE -- \n')
  tab = table.xpath('.//table[@width="260px"]/tr/td[@style="width: 100%;"]/text()')
  print(tab)
Comments