Hossein Hossein - 4 years ago 147
C# Question

How can I get this regex right in c#?

I am trying to match any blocks that has

type:"Data"
in it and then replace it with the text I want.

A sample input is given below, there can be one or more of these:

layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: "examples/cifar10/mean.binaryproto"
mirror: true
#crop_size: 20
}

# this is a comment!
data_param {
source: "examples/cifar10/cifar10_train_lmdb"
batch_size: 100
backend: LMDB
}
}
layer {
name: "cifar"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mean_file: "examples/cifar10/mean.binaryproto"
}
data_param {
source: "examples/cifar10/cifar10_test_lmdb"
batch_size: 25
backend: LMDB
}
}


I came up with this regex :

((layer)( *)((\n))*{((.*?)(\n)*)*(type)( *):( *)("Data")((.*?)(\n)*)*)(.*?)(\n)}


I tried to model this :

find and select a block starting with layer,
there can be any number of space characters but after it
there should be a { character,
then there can be anything( for making it easier), and then
there should be a type followed by any number of spaces, then followed by "Data"
then anything can be there, until it is faced with a } character


But clearly this does not work properly. If I change the type in any of these layer blocks, nothing gets detected!, not even the layer which has the
type : "Data"

Answer Source

Based on this post about using .net regular expressions to do bracket matching you can adapt the regex presented:

\((?>\((?<c>)|[^()]+|\)(?<-c>))*(?(c)(?!))\)

It's looking for sets of matching ( and ) and you can simply swap those for { and } (nothing that they are escaped in that regex).

Then you can prefix the layer\s* bit.

For the feature to exclude blocks where type <> "Data" I've added a negative lookahead for all the other type keywords in your sample in the pastebin. Unfortunately adding a postitive lookahead for type: "Data" simply didn't work and I think if it did that would be your most robust solution.

Hopefully you have a finite list of type values and you can extend this for a practical solution:

layer\s*{(?>{(?<c>)|[^{}](?!type: "Accuracy"|type: "Convolution"|type: "Dropout"|type: "InnerProduct"|type: "LRN"|type: "Pooling"|type: "ReLU"|type: "SoftmaxWithLoss")+|}(?<-c>))*(?(c)(?!))}

The key bit to work with in the original regex is the [^()]+ which matches content between the brackets that are being matched by the other components of the regex. I've adapted that to [^{}]+ - being 'everything other than the braces' - and then added the long 'apart from' clause with the keywords to not match.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download