Observer Observer - 1 year ago 63
Python Question

Use Regex to parse out some part of URL using python

Suppose I am having some like as the following,

URL$st=fa+gw+hw+ek+ei/$st=fasd+/$st=fa+gq+hf+kg+is&sadfnlslkdfn&gl+jh+ke+oj+kp sfav

I want to check for first + symbol in the url and move backward until we find a special character such as / or ? or = or any other special character and start from that and go on until we find a space or end of line or & or /.

The regex which I wrote with the help of stackoverflow forums is as follows,"[^\w\+ ]([\w\+ ]+\+[\w\+ ]+)(?:[^\w\+ ]|$)", x).group(1)

This one works with the first row. But does not parse anything with second row. Also in the third row, I want to check for multiple patterns like this in the row. The current regex checks only for one pattern.

My output should be,

fa+gq+hf+kg+is gl+jh+ke+oj+kp

Can anybody help me to modify the regex which is already there to suit this needs?


Answer Source

I used regexr to come up with this (regexr link):



fa+gw+hw+ek+ei fasd+ fa+gq+hf+kg+is gl+jh+ke+oj+kp

EDIT: Instead of using, try using re.findall instead:

>>> s = "$st=fa+gq+hf+kg+is&sadfnlslkdfn&gl+jh+ke+oj+kp sfav"
>>> re.findall("([\w\+]+\+[\w\+]*)(?:[^\w\+]|$)", s)
['fa+gq+hf+kg+is', 'gl+jh+ke+oj+kp']
Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download