345243lkj 345243lkj - 2 months ago 10
Python Question

How to extract certain strings when they occur adjacently with BeautifulSoup

I'm parsing an HTML page's result from BeautifulSoup and the part(s) I'm interested in looks like this:

<i class="fa fa-circle align-middle font-80" style="color: #45C414; margin-right: 15px"></i>Departure for <a href="/en/ais/details/ports/17787/port_name:TEKIRDAG/_:3525d580eade08cfdb72083b248185a9" title="View details for: TEKIRDAG">TEKIRDAG</a> </td>


I'm interested in extracting the
port_name
, TEKIRDAG, however there are many port name's that are labeled identically. My question is is there a way to only extract
port_name
if it occers after the string
'Departure for'
?

Answer

You can locate the text node and get the next sibling:

In [1]: from bs4 import BeautifulSoup

In [2]: data = """<i class="fa fa-circle align-middle font-80" style="color: #45C414; margin-right: 15px"></i>Departu
   ...: re for <a href="/en/ais/details/ports/17787/port_name:TEKIRDAG/_:3525d580eade08cfdb72083b248185a9" title="Vie
   ...: w details for: TEKIRDAG">TEKIRDAG</a> </td>"""
   ...:     

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: soup.find(text="Departure for ").next_sibling.get_text()
Out[4]: u'TEKIRDAG'
Comments