George J George J - 1 year ago 126
Python Question

Python. How to find all occurrences of matched substring?

I have a big string - html page. I need to find all names of flash drives,
i.e. I need to get content between double quotes:

data-name="USB Flash-drive Leef Fuse 32Gb">
. So I need a string between
. Please, don't mention BeautifulSoup, I need to do it without BeautifulSoup and better without regular expressions, but regular expression are also accepted.

I tried to use this:

p = re.compile('(?<=")[^,]+(?=")')
result = p.match(html_str)

but result is None.
But on it worked:
enter image description here

Answer Source



from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        # tag = 'sometag'
        for attr in attrs:
            # attr = ('data-name', 'USB Flash-drive Leef Fuse 32Gb')
            if attr[0] == 'data-name':

parser = MyHTMLParser()
parser.feed('<sometag data-name="USB Flash-drive Leef Fuse 32Gb">hello  world</sometag>')


USB Flash-drive Leef Fuse 32Gb

I've added some comments to the code to show you what kind of data structure is returned by the parser.

It should be very easy to build from here.

Just feed in HTML, and it will parse it fine. Refer to the docs, and keep trying.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download