Leo Lion Leo Lion - 1 month ago 12
HTML Question

Read the text of an web page in python

I do know, this question or similiar ones have already been asked. But the ones I found didn't provide the right answer for me so I ask here.

How can I get the text of an HTML site and which i can use to compare it to other given values?

Lets say I have this web page:

<html>
<head>
<title>This is my page</title>

<center>
<div class="mon_title">Some title here</div>
<table class="mon_list" >
<tr class='list'><th class="list" align="center"></th><th class="list" align="center">Set 1</th><th class="list" align="center">Set 2</th><th class="list" align="center">Set 4</th><th class="list" align="center">Set 5</th><th class="list" align="center">Set 6</th><th class="list" align="center">Set 7</th><th class="list" align="center">Set 8</th><th class="list" align="center">Set 9</th><th class="list" align="center">Set 10</th><th class="list" align="center">Set 11</th><th class="list" align="center">Set 12</th></tr>
<tr class='list even'><td class="list" align="center">Value 1</td><td class="list" align="center">Value 2</td><td class="list" align="center">Value 3</td><td class="list" align="center">Value 4</td><td class="list" align="center">Value 5</td><td class="list">Value 6</td><td class="list">Value 7</td><td class="list" align="center">Value 8</td><td class="list" align="center">Value 9</td><td class="list" align="center">Value 10</td><td class="list" align="center">Value 11</td><td class="list" align="center">Value 12</td></tr>
<tr class='list even'><td class="list" align="center">Value 1</td><td class="list" align="center">Value 2</td><td class="list" align="center">Value 3</td><td class="list" align="center">Value 4</td><td class="list" align="center">Value 5</td><td class="list">Value 6</td><td class="list">Value 7</td><td class="list" align="center">Value 8</td><td class="list" align="center">Value 9</td><td class="list" align="center">Value 10</td><td class="list" align="center">Value 11</td><td class="list" align="center">Value 12</td></tr>
</table>


Sorry for any typos or missing parts. I hope you get the point of the page.
So now, my program should read if some given Values out of the table are the same as the given ones like "Is Value 2 somewhere in it?" and if it is actually it should ask "is Value 5 in the same row?"

Is that generally possible?
How much effort would be needed to construct the program?

All i got ist the download of the actual full HTML webpage with this code in python:

import requests

url = 'http://some.random.site.com/you/ad/here'
print (requests.get(url).text)


which gives me the HTML code you see above. Instead I want that what you get when you click CTRL+A on a Website and copy+paste it into an Editor file.

PS: I'm fairly new to programming so sorry if there are any concepts i don't really get or sth like it.
Also, sorry for my english I'm german...

Answer Source

You can use urllib and re to find the values:

import urllib.request
import re

data = str(urllib.request.urlopen(url).read())

values = re.findall("Value \d+", data)

Output:

['Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12', 'Value 1', 'Value 2', 'Value 3', 'Value 4', 'Value 5', 'Value 6', 'Value 7', 'Value 8', 'Value 9', 'Value 10', 'Value 11', 'Value 12']