zhan2383 zhan2383 - 1 year ago 83
Python Question

Python Reg Pattern URL select/filter

links = [

Objective: get links contain (/yyyy/mm/dd/ddddddddd/) format. e.g. /2017/03/10/519650091/

for some reasons just cannot get it right, always has the facebook, twitter and 2017/03/20170311 format links in it.

sel_links = []
def selectedLinks(links):
r = re.compile("^(/[0-9]{4}/[0-9]{2}/[0-9]{2}/[0-9]{9})$")
for link in links:
if r.search(link)!="None":
return set(sel_links)

Answer Source

You have several problems here:

  1. The pattern ^(/[0-9]{4}/[0-9]{2}/[0-9]{2}/[0-9]{9})$ requires the string to start with /[0-9]{4}/, but all your strings start with http.
  2. The condition r.search(link)!="None" will never be true, because re.search returns None or a match object, so comparison to the string "None" is inappropriate

It seems you're looking for this:

def selectedLinks(links):
    r = re.compile(r"/[0-9]{4}/[0-9]{2}/[0-9]{2}/[0-9]{9}")
    for link in links:
        if r.search(link):
    return set(sel_links)
