fred blossom fred blossom - 1 month ago 9
Python Question

How do I remove all numbers?

All numbers in this list need to be removed (the original is 88,779 lines long):

SMITH 1.006 1.006 1
JOHNSON 0.810 1.816 2
WILLIAMS 0.699 2.515 3
JONES 0.621 3.136 4
BROWN 0.621 3.757 5
DAVIS 0.480 4.237 6
MILLER 0.424 4.660 7
...

Answer

You can do it line-by-line to avoid using more memory as the files get larger, and use regular expressions replacement with re.sub to match numbers in different formats:

import re

with open('infile.txt', 'rt') as infile:
    with open('outfile.txt', 'wt') as outfile:
        for line in infile:
            line_without_numbers = re.sub(r'\[0-9]*(\[0-9]*)?', '', line).strip()
            outfile.write(line_without_numbers)

I've also run .strip() on the string to remove the leading/trailing padding spaces for the numbers that have been removed.