345243lkj 345243lkj - 1 year ago 68
Python Question

How to extract certain strings when they occur adjacently with BeautifulSoup

I'm parsing an HTML page's result from BeautifulSoup and the part(s) I'm interested in looks like this:

<i class="fa fa-circle align-middle font-80" style="color: #45C414; margin-right: 15px"></i>Departure for <a href="/en/ais/details/ports/17787/port_name:TEKIRDAG/_:3525d580eade08cfdb72083b248185a9" title="View details for: TEKIRDAG">TEKIRDAG</a> </td>

I'm interested in extracting the
, TEKIRDAG, however there are many port name's that are labeled identically. My question is is there a way to only extract
if it occers after the string
'Departure for'

Answer Source

You can locate the text node and get the next sibling:

In [1]: from bs4 import BeautifulSoup

In [2]: data = """<i class="fa fa-circle align-middle font-80" style="color: #45C414; margin-right: 15px"></i>Departu
   ...: re for <a href="/en/ais/details/ports/17787/port_name:TEKIRDAG/_:3525d580eade08cfdb72083b248185a9" title="Vie
   ...: w details for: TEKIRDAG">TEKIRDAG</a> </td>"""

In [3]: soup = BeautifulSoup(data, "html.parser")

In [4]: soup.find(text="Departure for ").next_sibling.get_text()
Out[4]: u'TEKIRDAG'
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download