I'm new to Python. I wonder how to get the string from an html as following:
<span style="color: blue; font-size: 36px; font-weight: 600;"> string </span>
import lxml from html
page = requests.get("url")
tree = html.fromstring(page.content)
Be careful with tree searching html files because seldomly developers will move things which ends up breaking old projects. I feel its safer to go with string manipulation because if you plan it well, you won't have to reprogram it even if the developers decided to wrap your target in one more containers.
It's crazy how much you can accomplish simply with just the split function.
text = your_string.split(">").split("<")
Here are a couple tools I made and like to use when getting a string when I know what will be before it and after it.
def get_str_between(s, before, after): # gets a substring between two strings in a string in python # by: Cody Kochmann unique="~~~~the~obvious~way~to~do~it~~~~" return(s.replace(before, unique).replace(after, unique).split(unique)) def get_every_str_between(s, before, after): # returns an array of substrings between "before" and "after" # by: Cody Kochmann unique="~~~~this~is~the~obvious~way~to~do~it~~~~" second_unique="~~~~i~think~you'll~like~this~~~~~" s=(s.replace(before, unique).replace(after, second_unique).split(unique)) out= while len(s): tmp=s.pop() if second_unique in tmp: out.append(tmp.split(second_unique)) return(out) target = '<span style="color: blue; font-size: 36px; font-weight: 600;"> string </span>' print get_every_str_between(target, ">", "</")
[' string ']