345243lkj 345243lkj - 4 months ago 18
Python Question

How to extract certain strings when they occur adjacently with BeautifulSoup

I'm parsing an HTML page's result from BeautifulSoup and the part(s) I'm interested in looks like this:

<i class="fa fa-circle align-middle font-80" style="color: #45C414; margin-right: 15px"></i>Departure for <a href="/en/ais/details/ports/17787/port_name:TEKIRDAG/_:3525d580eade08cfdb72083b248185a9" title="View details for: TEKIRDAG">TEKIRDAG</a> </td>

I'm interested in extracting the
, TEKIRDAG, however there are many port name's that are labeled identically. My question is is there a way to only extract
if it occers after the string
'Departure for'


You can locate the text node and get the next sibling:

In [1]: from bs4 import BeautifulSoup

In [2]: data = """<i class="fa fa-circle align-middle font-80" style="color: #45C414; margin-right: 15px"></i>Departu
   ...: re for <a href="/en/ais/details/ports/17787/port_name:TEKIRDAG/_:3525d580eade08cfdb72083b248185a9" title="Vie
   ...: w details for: TEKIRDAG">TEKIRDAG</a> </td>"""

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: soup.find(text="Departure for ").next_sibling.get_text()
Out[4]: u'TEKIRDAG'