user116873 user116873 - 2 months ago 7
Python Question

Python: loop through different url in list

I have some url

auto.drom.ru/toyota/avensis?transmission=1&go_search=2
tyumen.drom.ru/toyota/avensis/17068483.html
auto.drom.ru/toyota/avensis?transmission=1&go_search=2
surgut.drom.ru/toyota/avensis/17067788.html
auto.drom.ru/toyota/avensis?transmission=1&go_search=2
auto.drom.ru/toyota/avensis?transmission=1&go_search=2


I need to open content of page that url:

tyumen.drom.ru/toyota/avensis/17068483.html
surgut.drom.ru/toyota/avensis/17067788.html


I try to write regular expression:

if 'drom\.ru/.?*/.?*/\d\.html' in url:
print url


But it returns nothing strings.
What I do wrong?

Answer

Please check the below code.

  • It seems the url you want to search has SomeNumber.html format. Written regex for the same.

url_list = [
'auto.drom.ru/toyota/avensis?transmission=1&go_search=2',
'tyumen.drom.ru/toyota/avensis/17068483.html',
'auto.drom.ru/toyota/avensis?transmission=1&go_search=2',
'surgut.drom.ru/toyota/avensis/17067788.html',
'auto.drom.ru/toyota/avensis?transmission=1&go_search=2',
'auto.drom.ru/toyota/avensis?transmission=1&go_search=2'
]

import re

for url in url_list:
    m = re.search(r'.*\/\d+\.html',url)
    if m is not None:
        print url

Output:

C:\Users\dinesh_pundkar\Desktop>python c.py
tyumen.drom.ru/toyota/avensis/17068483.html
surgut.drom.ru/toyota/avensis/17067788.html

C:\Users\dinesh_pundkar\Desktop>

Please check this link for regex verification in python.

Also, please check more info regarding re module here !!!

Thanks to shutdown-hnow for pointing toward this.

You can check this link for debugging regex.

Comments