McLeodx McLeodx - 5 months ago 4
Python Question

How to find substrings in a string that end when they reach a character pattern (Python)

I have one long string which is dozens of urls. Is there a non-regex way to turn this into a list of urls, starting with the pattern

'http'
and ending each substring when
'http'
is found a second time?

http://www.annuncibdsm.in/?view=selectcity&targetview=posthttp://www.bakecaincontri.in/?view=selectcity&targetview=posthttp://www.incontrixxx.in/?view=selectcity&targetview=posthttp://www.annuncixadulti.com/?view=selectcity&targetview=posthttp://dubizzle.us/?view=selectcity&targetview=posthttp://xincontri.com/index.php?view=selectcityhttp://www.18plusservices.com/mobile/?view=selectcity&targetview=post&cityid=0&lang=enhttp://www.mercatoneannunci.net/?view=selectcity&targetview=post&catid=46&cityid=-18&lang=it</a>http://www.annonce-be.com/?view=selectcity&targetview=post&cityid=-1&lang=fr


It's one long string without breaks.

Answer

Just try this:

" http".join(url.split("http")).split()

>>> url = "http://www.annuncibdsm.in/?view=selectcity&targetview=posthttp://www.bakecaincontri.in/?view=selectcity&targetview=posthttp://www.incontrixxx.in/?view=selectcity&targetview=posthttp://www.annuncixadulti.com/?view=selectcity&targetview=posthttp://dubizzle.us/?view=selectcity&targetview=posthttp://xincontri.com/index.php?view=selectcityhttp://www.18plusservices.com/mobile/?view=selectcity&targetview=post&cityid=0&lang=enhttp://www.mercatoneannunci.net/?view=selectcity&targetview=post&catid=46&cityid=-18&lang=it</a>http://www.annonce-be.com/?view=selectcity&targetview=post&cityid=-1&lang=fr"
>>> " http".join(url.split("http")).split()
['http://www.annuncibdsm.in/?view=selectcity&targetview=post', 'http://www.bakecaincontri.in/?view=selectcity&targetview=post', 'http://www.incontrixxx.in/?view=selectcity&targetview=post', 'http://www.annuncixadulti.com/?view=selectcity&targetview=post', 'http://dubizzle.us/?view=selectcity&targetview=post', 'http://xincontri.com/index.php?view=selectcity', 'http://www.18plusservices.com/mobile/?view=selectcity&targetview=post&cityid=0&lang=en', 'http://www.mercatoneannunci.net/?view=selectcity&targetview=post&catid=46&cityid=-18&lang=it</a>', 'http://www.annonce-be.com/?view=selectcity&targetview=post&cityid=-1&lang=fr']
>>> 

Essentially this just adds a space before every "http" then splits by " ".

Comments