Leo Mingo Leo Mingo - 8 months ago 27
HTML Question

How to tree-search an html file tag with no specific type

I'm new to Python. I wonder how to get the string from an html as following:

<span style="color: blue; font-size: 36px; font-weight: 600;"> string </span>

I tried

import lxml from html
import requests
page = requests.get("url")
tree = html.fromstring(page.content)

but I don't know what to do next to get string.

Thanks for your help.


Be careful with tree searching html files because seldomly developers will move things which ends up breaking old projects. I feel its safer to go with string manipulation because if you plan it well, you won't have to reprogram it even if the developers decided to wrap your target in one more containers.

It's crazy how much you can accomplish simply with just the split function.

text = your_string.split(">")[1].split("<")[0]

Here are a couple tools I made and like to use when getting a string when I know what will be before it and after it.

def get_str_between(s, before, after):
  # gets a substring between two strings in a string in python
  # by: Cody Kochmann
  return(s.replace(before, unique).replace(after, unique).split(unique)[1])

def get_every_str_between(s, before, after):
  # returns an array of substrings between "before" and "after"
  # by: Cody Kochmann
  s=(s.replace(before, unique).replace(after, second_unique).split(unique))
  while len(s):
    if second_unique in tmp:

target = '<span style="color: blue; font-size: 36px; font-weight: 600;"> string </span>'

print get_every_str_between(target, ">", "</")


[' string ']