Sam Sam - 4 months ago 6
Python Question

How do I capture string between certain Character and String in multi line String? Python

Let's say we have a string

string="This is a test code [asdf -wer -a2 asdf] >(ascd asdfas -were)\

test \

(testing test) test >asdf \

test"


I need to get the string between character > and string "test".

I tried

re.findall(r'>[^)](.*)test',string, re.MULTILINE )


However I get

(ascd asdfas -were)\ test \ (testing test) test >asdf.


However I need:

(ascd asdfas -were)\


AND

asdf


How can I get those 2 string?

Answer

What about:

import re

s="""This is a test code [asdf -wer -a2 asdf] >(ascd asdfas -were)
test
(testing test) test >asdf
test"""

print(re.findall(r'>(.*?)\btest\b', s, re.DOTALL))

Output:

['(ascd asdfas -were)\n', 'asdf\n']

The only somewhat interesting parts of this pattern are:

  • .*?, where ? makes the .* "ungreedy", otherwise you'd have a single, long match instead of two.
  • Using \btest\b as the "ending" identifier (see Jan's comment below) instead of test. Where,

    \b Matches the empty string, but only at the beginning or end of a word....

Note, it may be reading up on re.DOTALL, as I think that's really what you want. DOTALL lets . characters include newlines, while MULTILINE lets anchors (^, $) match start and end of lines instead of the entire string. Considering you don't use anchors, I'm thinking DOTALL is more appropriate.

Comments