Sajad Sajad - 1 month ago 5
Python Question

Extract sub path from url with regex

I have this url:

http://www.example.com/en/news/2016/07/17/1207151/%D9%81%D8%AA%D9%88%D8%A7%DB%8C-%D8%B1%D9%87%D8%A8%D8%B1-


I am going to extract
1207151
here.

here is my regext:

pattern = '(http[s]?:\/\/)?([^\/\s]+\/)+[^/]+[^/]+[^/]+[^/]/(?<field1>[^/]+)/'


but it's wrong!

what is my mistake?

Answer

You can use this regex in python code:

>>> url = 'http://www.example.com/en/news/2016/07/17/1207151/%D9%81%D8%AA%D9%88%D8%A7%DB%8C-%D8%B1%D9%87%D8%A8%D8%B1-'
>>> re.search(r'^https?://(?:([^/]+)/){7}', url).group(1)
'1207151'

([^/]+)/){7} will match 1 or more of any non-forward-slash and a / 7 times, giving us last match in captured group #1.

Comments