user1216583 user1216583 - 11 months ago 79 Question

Regex, everything between 2 html tags

i'm trying to get some information of a webpage via regex on visual basic 2010

it's something like this:

<SPAN CLASS="clear"></SPAN>
<h2> blabla </h2>
<h2> blabla </h2>
<b> blabla </b>

etc etc

<SPAN CLASS="clear"></SPAN>

what i want is everything between the 2
also the h2 tags and every other html tag that exists.

is this possible?

i've already tried (.?) and . and \w* but it doesn't return anything...

Answer Source

It's probably best to use an XML parser for that, but I'm assuming it's a one-off scrape or similar.

If I understand you correctly, this should get all the data between the tags:

Dim regex As New Text.RegularExpressions.Regex("<(.|\n)*?>")
Dim result As String = regex.Replace(yourHtml, String.Empty)

You could use this to get just the H2 tags and data:

Dim regex As New Text.RegularExpressions.Regex("<\s*h2[^>]*>(.*?)<\s*/\s*h2>")
Dim results As New Text.StringBuilder
For Each m As Text.RegularExpressions.Match In regex.Matches(yourHtml)