Evertthor_3 Evertthor_3 - 4 years ago 121
Python Question

best way to extract data using re.compiler

I need to extract (a lot of) info from different text files.
I wonder if there is a shorter and more efficient way than the following:

First part: (N lines long)

N1 = re.compile(r'')
N2 = re.compile(r'')
.
Nn = re.compile(r'')


Second part: (2N lines long)

with open(filename) as f:
for line in f:
if N1.match(line):
var1 = N1.match(line).group(x).strip()
elif N2.match(line):
var2 = N1.match(line).group(x).strip()
elif Nn.match(line):
varn = Nn


Do you recommend having the re.compile vars (part 1) separate from the part 2. What do you people use in this cases? Perhaps a function pasing the regex as argument? and call it every time.

In my case N is 30, meaning I have 90 lines for feeding a dictionary with very little, or no logic at all.

Answer Source

As mentionned in re module documentation, the regexes you pass through re methods are cached: depending on the number of expressions you have, caching them yourself might not be useful.

That being said, you should make a list of your regexes, so that a simple for loop would allow you to test all your patterns.

regexes = map(re.compile, ['', '', '', '', ...])
vars = ['']*len(regexes)
with open(filename) as f:
  for line in f:
    for i,regex in enumerate(regexes):
      if regex.match(line):
         var[i] = regex.match(line).group(x).strip()
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download