Scriptomaniac Scriptomaniac - 7 months ago 17
Python Question

How can I access specific parts of a string in a txt file with python?

So I have a large text file with a lot of lines of HTML that was nicely created by a webcrawler. It's full of lines that look like the code below. I'm wondering, How can I just get a new text file full of just the "desired text" instead of the entire line of html code?

b'<b><a href="example.html" target="_blank">Desired Text 1</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 2</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 3</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 4</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 5</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 6</a></b>'

tfv tfv
Answer

Have a look at BeautifulSoup, the examples have a demo on exactly that problem:

Beautiful Soup Quick Intro

[EDIT] Detailed solution for your case attached:

from bs4 import BeautifulSoup

text = """
b'<b><a href="example.html" target="_blank">Desired Text 1</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 2</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 3</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 4</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 5</a></b>'
b'<b><a href="example.html" target="_blank">Desired Text 6</a></b>'
"""

soup = BeautifulSoup(text, 'html.parser')
print soup.getText()