Catastrophe Catastrophe - 1 month ago 6
Python Question

Pyhon Program printing result multiple times

I have a code that uses a while loop to print whatever is included in the

<a href>
and the
</a>
tags of a webpage. I can extract the required Indexes and whatever is written in-between them and can also print it. The program is supposed to print the url only once, and then move on and increase the Index until it finds the next Index value for
<a href>
and
</a>
, print whatever's in-between them and continue to do so until the end of the string, printing every new url found on a separate line. Here's the code:

text = """ohsfhskfheifhsefis <a href = "fdnsfjsnfsnfns snkfsndfskj"</a>
<a href = "snfksnfsdf"</a>"""

index = 0

a = 0

b = 0

while index < len(text):

a = text.find('href', index)

b = text.find('/a', index)

print(text[a:b])

index = index + 2

if index >= len(text):

print("End")

break


However, when I run the program, it malfunctions as shown in the images.

Clearly the logic I'm using is wrong here. I know there are other easier ways to accomplish this task but I haven't got to the more complex stuff as I only recently started learning Python and would like to do it this way for now.

On the left is the first part of the Program. On the right is the second.

You can also clearly see the blank spaces being left out because the Program prints the url at every increment of the index.

Any kind of help would be greatly appreciated.

Answer

Your search starts with index set to 0, then finds the href text at position 22. You then increment the index to 2, search again, and again find the text at position 22.

If you want to search to continue past the last match, you need to set index to a position after the last match instead:

index = a + 1

Now the next text.find() call starts searching at index 23 instead.

You'll also need to test if the text is not found:

if a < 0 or b < 0:
    break

Rather than manually search through text like this, consider using a HTML parser. Your search would be trivial with BeautifulSoup for example.

Comments