I have a very large .txt file with hundreds of thousands of email addresses scattered throughout. They all take the format:
This code extracts the email addresses in a string. Use it while reading line by line
>>> import re >>> line = "why people don't know what regex are? let me know email@example.com" >>> match = re.search(r'[\w\.-]+@[\w\.-]+', line) >>> match.group(0) 'firstname.lastname@example.org'
If you have several email addresses use
>>> line = "why people don't know what regex are? let me know email@example.com dssdadsa firstname.lastname@example.org" >>> match = re.findall(r'[\w\.-]+@[\w\.-]+', line) >>> match ['email@example.com', 'firstname.lastname@example.org']
The regex above probably finds the most common non-fake email address. If you want to be completely aligned with the RFC 5322 you should check whose it defines the exact allowed patterns for email addresses. Check this out to avoid any bugs in finding email addresses correctly.