Data Data - 24 days ago 11
Vb.net Question

Regex is not matching all alternate groups

The input string is:

<input type="hidden" name="locale" value="us">


The regex pattern is:

Dim r As New Regex("<input\s{0,}(?:(name|type|value)=""([^""]+)""\s{0,})+>")


The code being used:

If r.IsMatch(s) Then
For Each m As Match In r.Matches(s)
Debug.Print(m.ToString)
For i As Integer = 0 To m.Groups.Count - 1
Debug.Print(New String(" "c, i + 1) & "-" & m.Groups(i).Value)
Next
Next
End If


The output:

<input type="hidden" name="locale" value="us">
-<input type="hidden" name="locale" value="us">
-value
-us


I would expect it to match:

-type
-hidden
-name
-locale
-value
-us


The alternate pattern used goes by the order it is provided in, perhaps that's why it's only spitting out one group, which is the last match.

Answer

It is not a good idea to parse HTML data with regex. Use HtmlAgilityPack or similar libraries that are meant to do this. See How do you parse an HTML in vb.net.

Answering your question, you do not access the captures that are all stored in the capture collection in each group. Here is a simple snippet showing how to obtain your desired result using the same regex:

Imports System
Imports System.Text.RegularExpressions

Public Class Test
    Public Shared Sub Main()
        Dim r As New Regex("<input\s{0,}(?:(name|type|value)=""([^""]+)""\s{0,})+>")
        Dim s As String
        s = "<input type=""hidden"" name=""locale"" value=""us"">"
        If r.IsMatch(s) Then
            For Each m As Match In r.Matches(s)
                Console.WriteLine(m.ToString)
                For j As Integer = 0 To m.Groups(1).Captures.Count - 1      ' Number of captures in Capture stack 1 (same will be in the second one)
                    Console.WriteLine(" -" & m.Groups(1).Captures(j).Value) ' Print the 1st group captures
                    Console.WriteLine(" -" & m.Groups(2).Captures(j).Value) ' Print the 2nd group captures
                Next
            Next
        End If
    End Sub
End Class

Output:

<input type="hidden" name="locale" value="us">
 -type
 -hidden
 -name
 -locale
 -value
 -us

See the VB.NET demo

Comments