VespaQQ VespaQQ - 6 months ago 17
Python Question

Regex to find links in one row

I have this string:

http://pastebin.com/XXXXXXXhttp://pastebin.com/XXXXXX\r


I need to extract all links in one line which ends with \r. It can contain one link or even five links. I got something like this :

(http[s]*:.*)[\\r|h]


but it returns whole row as one match,
any ideas ?

Answer

You can use this lookahead based regex in findall:

>>> s='http://pastebin.com/XXXXXXXhttp://pastebin.com/XXXXXX\r'
>>> re.findall(r'https?://.+?(?=https?://|[\r\n]|$)', s)
['http://pastebin.com/XXXXXXX', 'http://pastebin.com/XXXXXX']

(?=http://|[\r\n]|$) is positive lookahead that asserts next position has http:// or \r or \n or line end.

RegEx Demo

Comments