Jianli Cheng Jianli Cheng - 3 months ago 10
Python Question

python regex find matched string

I am trying to find the matched string in a string using regex in Python. The

string
looks like this:

band 1 # energy -53.15719532 # occ. 2.00000000

ion s p d tot
1 0.000 0.995 0.000 0.995
2 0.000 0.000 0.000 0.000
tot 0.000 0.996 0.000 0.996

band 2 # energy -53.15719532 # occ. 2.00000000

ion s p d tot
1 0.000 0.995 0.000 0.995
2 0.000 0.000 0.000 0.000
tot 0.000 0.996 0.000 0.996

band 3 # energy -53.15719532 # occ. 2.00000000


My goal is to find the string after
tot
. So the matched string will be something like:

['0.000 0.996 0.000 0.996',
'0.000 0.996 0.000 0.996']


Here is my current code:

pattern = re.compile(r'tot\s+(.*?)\n', re.DOTALL)
pattern.findall(string)


However, the output gives me:

['1 0.000 0.995 0.000 0.995',
'0.000 0.996 0.000 0.996',
'1 0.000 0.995 0.000 0.995',
'0.000 0.996 0.000 0.996']


Any idea of what I am doing wrong?

Answer

You don't want the DOTALL flag. Remove it and use MULTILINE instead.

pattern = re.compile(r'^\s*tot(.*)', re.MULTILINE)

This matches all lines that start with tot. The rest of the line will be in group 1.

Citing the documentation, emphasis mine:

re.DOTALL

Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

Note that you can easily do this without regex.

with open("input.txt", "r") as data_file:
    for line in data_file:
        items = filter(None, line.split(" "))
        if items[0] == "tot":
            # etc
Comments