Don_B Don_B - 3 months ago 12
Vb.net Question

How to match multiple regex patterns in multiple files and write something to a log file?

I want to search some regex patterns in files (*.txt) which are inside a folder whose path I'have given in a text box, and the folder contains other sub-folders with txt files in the form 12345\2031\30201\txt\120.txt and if the pattern matches even in one file, then a string is written on a log file which is created inside the folder whose path I've given in the text box and then it moves on to the next regex and so on
What I've done so far is

Dim tLoc As String = TextBox1.Text
Dim txtFilesArray = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories).Where(Function(f) f Like "*\#*\#*\#*\txt\#*.txt")
Dim fileLoc As String = tLoc & "\Checklist.log"
Dim fs As FileStream = Nothing
If (Not File.Exists(fileLoc)) Then
fs = File.Create(fileLoc)
Using fs

End Using
End If
For Each tFile In txtFilesArray
Dim input As String = File.ReadAllText(tFile)
Dim pattern1 As New Regex("(?<!>)(figure|fig\.|figs\.|figures) (\d+)")
Dim pattern2 As New Regex("(?<!>)(table|tab\.|tabs\.|tables) (\d+)")
If pattern1.IsMatch(input) Then
FileOpen(1, fileLoc, OpenMode.Append)
PrintLine(1, "Check figure link")
FileClose()
End If
If pattern2.IsMatch(input) Then
FileOpen(1, fileLoc, OpenMode.Append)
PrintLine(1, "Check table link")
FileClose()
End If

Next


But the problems are:
1) Even if
pattern1
matches in multiple files, I want it to write the string Check figure link only once in the log file and not in every time it finds a match in different files and same for pattern2....patternN, furthermore, I want the program to move on to the next regex pattern match the moment the
pattern1
matches in one file (no need to look for the same pattern in other files)
2)I have around a hundred of regex patterns that I want to use in this program, can anyone tell me how do I shorten the coding?

Answer

You can put the patterns in some kind of collection and then remove them from it when found

Dim re = Function(p$) New Regex(p, RegexOptions.Compiled)
Dim patterns = New Dictionary(Of String, Regex) From {
    {"Check figure link", re("(?<!>)(figure|fig\.|figs\.|figures) (\d+)")},
    {"Check table link", re("(?<!>)(table|tab\.|tabs\.|tables) (\d+)")}
}
Dim output = New List(Of String)
Dim tLoc = TextBox1.Text
Dim txtFiles = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories)

For Each tFile In txtFiles
    If Not tFile Like "*\#*\#*\#*\txt\#*.txt" Then Continue For
    Dim input = File.ReadAllText(tFile)

    Dim match = ""
    For Each pattern In patterns
        If pattern.Value.IsMatch(input) Then
            match = pattern.Key
            Exit For
        End If
    Next
    If match > "" Then
        output.Add(match)
        patterns.Remove(match)
    End If
Next
File.WriteAllLines(tLoc.TrimEnd("\"c) & "\Checklist.log", output)

If you want to compare each pattern against all files, then it will be easier to parallelize (run on multiple processors at the same time) because there will be no need to remove them from the collection:

Dim patterns = New List(Of String()) From {
    ({"Check figure link", "(?<!>)(figure|fig\.|figs\.|figures) (\d+)"}),
    ({"Check table link", "(?<!>)(table|tab\.|tabs\.|tables) (\d+)"})}

Parallel.ForEach(patterns,
    Sub(pattern)
        Dim tLoc = TextBox1.Text
        Dim output = New List(Of String)
        Dim txtFiles = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories)
        Dim regEx = New Regex(pattern(1), RegexOptions.Compiled)

        For Each tFile In txtFiles
            If tFile Like "*\#*\#*\#*\txt\#*.txt" Then
                Dim input = File.ReadAllText(tFile)
                If regEx.IsMatch(input) Then
                    output.Add(pattern(0))
                    Exit For
                End If
            End If
        Next
        File.AppendAllLines(TextBox1.Text.TrimEnd("\"c) & "\Checklist.log", output)
    End Sub)

or this shorter more complicated version

Dim patterns = New List(Of String()) From {
    ({"Check figure link", "(?<!>)(figure|fig\.|figs\.|figures) (\d+)"}),
    ({"Check table link", "(?<!>)(table|tab\.|tabs\.|tables) (\d+)"})}

Dim output = From pattern In patterns.AsParallel
             Let regEx = New Regex(pattern(1), RegexOptions.Compiled)
             From tFile In Directory.EnumerateFiles(TextBox1.Text, "*.txt", SearchOption.AllDirectories)
             Where tFile Like "*\#*\#*\#*\txt\#*.txt" AndAlso regEx.IsMatch(File.ReadAllText(tFile))
             Take 1
             Select pattern(0)

File.WriteAllLines(TextBox1.Text.TrimEnd("\"c) & "\Checklist.log", output)