Cesar Cesar - 2 months ago 6
Python Question

Reformat a column to only first 5 characters

I am new to Python and I'm struggling with this section. There are about 25 columns in a text file and 50,000+ Rows. For one of the columns, #11 (ZIP), this column contains all the zip code values of customers in this format "07598-XXXX", I would only like to get the first 5, so "07598", I need to do this for the entire column but I'm confused based on my current logic how to write it.
So far my code is able to delete rows that contains certain strings and I am also using the '|' delimiter to format it nicely as a CSV.

State | ZIP(#11) | Column 12| ....




NY | 60169-8547 | 98

NY | 60169-8973 | 58

NY | 11219-4598 | 25

NY | 11219-8475 | 12

NY | 20036-4879 | 56

How can I iterate through the ZIP column and just show the first 5 characters?
Thanks for the help!

import csv

my_file_name = "NVG.txt"
cleaned_file = "cleanNVG.csv"
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC']


with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
writer = csv.writer(outfile)
for line in csv.reader(infile, delimiter='|'):
if not any(remove_word in element for element in line for remove_word in remove_words):
writer.writerow(line)

Answer

Process title line separately, then read row by row like you do, just modify second line column by truncating to 5 characters.

import csv

my_file_name = "NVG.txt"
cleaned_file = "cleanNVG.csv"
remove_words = ['INAC-EIM','-INAC','TO-INAC','TO_INAC','SHIP_TO-inac','SHIP_TOINAC']


with open(my_file_name, 'r', newline='') as infile, open(cleaned_file, 'w',newline='') as outfile:
    writer = csv.writer(outfile)
    cr = csv.reader(infile, delimiter='|')
    # iterate over title line and write it as-is
    writer.writerow(next(cr))
    for line in cr:
        if not any(remove_word in element for element in line for remove_word in remove_words):
            line[1] = line[1][:5]   # truncate
            writer.writerow(line)

alternately, you could use line[1] = line[1].split("-")[0] which would keep everything on the left of the dash character.

Note the special processing for the title line: cr is an iterator. I just consume it manually before the for loop to perform a pass-through processing.