Ian Ian - 6 months ago 9
Python Question

Removing items from a list in python following validity check

Background:

I am writing a little script which requires, as one of it's arguments, an email address list in a file. The script will them go on to use the email address over a telnet connection to an SMTP server, so they need to be syntactically valid; consequently I have put a function to check the email address validity (incidentally, this regex may not be perfect, but is not the focus of the question, please bear with me. Will probably be loosened up):

def checkmailsyntax(email):
match = re.match('^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4})$', email)

if match == None:
return True


The main() program goes on to read the input filename as an argument (in argparse) and insert it into a (currently global) list:

with open(args.targetfile) as targets:
target_email_list = targets.readlines()


I figured it would be great for the script to automatically delete an email address from the list (rather than just telling you it was wrong which is what it used to do) if the
checkmailsyntax
function failed. This cleaned list could then go on to submit syntactically valid email addresses to the SMTP server:

for i in target_email_list:
if checkmailsyntax(i):
target_email_list.remove(i)


Error checking code that I have put in both before and after the delete element snippet to see if it's doing it's job:

for i in target_email_list:
print i


The issue: The output of the code is thus:

Before delete element snippet (and the entire contents of the file submitted):

me@example.com
you@example.com
them@example.com
noemail.com
incorrectemail.com
new@example.com
pretendemail.com
wrongemail.com
right@example.com
badlywrong.com
whollycorrect@example.com


After delete element snippet:

me@example.com
you@example.com
them@example.com
incorrectemail.com
new@example.com
wrongemail.com
right@example.com
whollycorrect@example.com


So I'm pretty stumped as to why
'noemail.com'
,
'pretendemail.com'
and
'badlywrong.com'
were removed and yet
'incorrectemail.com'
and
'wrongemail.com'
are not. It seems to occur when there are two syntactically incorrect emails in the file sequentially.

Can anyone point me in the right direction?

AKS AKS
Answer

It is because you are removing elements from the list while iterating over it:

for i in target_email_list:
    if checkmailsyntax(i):
        target_email_list.remove(i) # here

Since, following values are together:

pretendemail.com  
wrongemail.com

Once you remove pretendemail.com email, the next one wrongemail.com shifts up and the iterator thinks that this has been iterated. So the item which comes next is right@example.com and wrongemail.com is never checked for valid syntax. You can just add print(i) before checking the syntax and see for yourself.

You can use list comprehension for this purpose:

valid_emails = [email for email in target_email_list if checkmailsyntax(email)]
Comments