Keo Rithy Keo Rithy -4 years ago 88
Python Question

Python. Regular expression not returning output

I am trying to

instances of the string
and the digits that follow it, but when I call.

number_all = re.findall(r'\bPB\b([0-9])\d+', ' '.join(number_list))

doesn't return an output. I check my output file,
but there is nothing inside it. If i just do
it outputs
but no numbers.

My input file,
looks like this:

WB (19, 21, 24, 46, 60)
WB (12, 11, 9, 23, 49)
PB (18, 21, 10, 5, 5)
WB (2, 14, 2, 29, 67)
WB (1, 8, 1, 16, 52)
PB (2, 11, 8, 3, 4)

How can I output the following lines to sequence.txt?

PB (18, 21, 10, 5, 5)
PB (2, 11, 8, 3, 4)

Here is my current code:

sequence_raw_buffer = open('c:\\sequence.txt', 'a')
with open('c:\\raw-sequence.txt') as f:
number_list =
number_all = re.findall(r'\bPB\b([0-9])\d+', ' '.join(number_list))
unique = list(set(number_all))
for i in unique:
sequence_raw_buffer.write(i + '\n')
print "done"

Answer Source

Given the code you show, regex are an unnecessary over-complication to your problem. You can just iterate over the lines from the input file and dump the ones for which line.startswith("PB") returns True.

with open(r'c:\raw-sequence.txt', 'r') as f, open(r'c:\sequence.txt', 'a') as sequence_raw_buffer:
    for line in f:
        if line.startswith("PB"):
            print(line, file=sequence_raw_buffer)

This illustrates the fact that files can be iterated over line-by-line. I use print to dump the line because it will append the correct line terminator that the for loop strips off.

This example also shows you how to put multiple context managers into a single with block. You should have all your file in a with block, whether input or output, because I/O errors are a possibility in both directions.

Now, if you are trying to use regex for practice or because the match is really more complicated than what you present here, you can try


This matches as follows:

  • Literal PB
  • Optional unlimited number of spaces \s*
  • Literal open parens \(
  • Optional non-capturing group (?:)*, repeated as many times as necessary, containing
    • At least one digit \d+
    • Literal comma ,
    • Any number of spaces \s*
  • At least one actual number \d
  • Literal close parens \)

I would not bother concatenating the whole file together and using findall on that though, unless your expression can span multiple lines. I would prefer to still use the approach shown above, because in all but a few cases that I can think of, textual data will generally be delimited by newlines:

pattern = re.compile('PB\s*\((?:\d+,\s*)*\d+\)')
            if pattern.match(line):

Pre-compiling the pattern once makes the program run faster, but you could call re.match(..., line) every time as well.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download