tadamhicks tadamhicks - 3 months ago 14
Python Question

Python regex get substring after preceding substring match

Text has no spaces, so I cannot split at all and use indexing on a list of strings.

The pattern I am looking for is:

check=


It is followed by a number and encoded querystring items (apache logfile) and is on every line of the file twice. I want output that gives me just what follows
check=


For instance, the string in a line looks like:

11.249.222.103 - - [15/Aug/2016:13:17:56 -0600] "GET /next2/120005079807?check=37593467%2CCOB&check=37593378%2CUAP&box=match&submit=Next HTTP/1.1" 500 1633 "https://mvt.squaretwofinancial.com/newmed/?button=All&submit=Submit" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"


And I need to fetch
37593467
and
37593378
in this case.

Answer

Please check this code.

import re

text = '''11.249.222.103 - - [15/Aug/2016:13:17:56 -0600] "GET /next2/120005079807?check=37593467%2CCOB&check=37593378%2CUAP&box=match&submit=Next HTTP/1.1" 500 1633 "https://mvt.squaretwofinancial.com/newmed/?button=All&submit=Submit" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:45.0) Gecko/20100101 Firefox/45.0"'''


for match in re.findall("check=(\d+)",text):
    print 'Found "%s"' % match

Output:

C:\Users\dinesh_pundkar\Desktop>python demo.py
Found "37593467"
Found "37593378"

Couple of URLs for help :