I'm trying to extract text between tags of a HTML page using a keyword. Here is an example.
<p>PhD, 2017, Subject,<br />
ABC University </p>
r = requests.get(site)
soup = BeautifulSoup(r.content, "lxml")
for elems in soup(text=re.compile('PhD')):
val = elems.find_parent('p').getText()
You can try to use
lxml.html to get desired text:
import lxml.html as html source = requests.get(site).content html_obj = html.fromstring(source) my_text = " ".join([text.strip() for text in html_obj.xpath('//h4[.="Education"]/following-sibling::p/text()')]) print(my_text)
'PhD, 2017, Subject, ABC University'