Difender Difender - 7 months ago 11
Python Question

Regexp on specific URL

I have a list of URLS like this :

http://www.toto.com/bags/handbags/test1/
http://www.toto.com/bags/handbags/smt1/
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/
http://www.toto.com/bags/handbags/smt1/smt2/testing/
http://www.toto.com/bags/handbags/smt1/smt2/testing.html


What I want here is to only take URLS like

http://www.toto.com/something/else/again/more


Restricted to that, not taking if there is more.

Can you help me out ? :)

Answer

The appropriate regex is:

^http://www.toto.com/(\w+/){4}$

Example of filtering:

>>> for line in lines:
...     if re.match(r'^http://www.toto.com/(\w+/){4}$', line):
...         print line
... 
http://www.toto.com/bags/handbags/test1/test2/
http://www.toto.com/bags/handbags/blabla1/blabla2/
http://www.toto.com/bags/handbags/smt1/smt2/