justin justin - 6 months ago 343
Python Question

How to Delete Rows CSV in python

I'm trying to compare two csv files (fileA and fileB), and remove any rows from fileA that are not found in fileB. I want to be able to do this without creating a third file. I thought I could do this using the csv writer module but now I'm second guessing myself.

Currently, I'm using the following code to record my comparison data from file B:

removal_list = set()
with open('fileB', 'rb') as file_b:
reader1 = csv.reader(file_b)
next(reader1)
for row in reader1:
removal_list.add((row[0], row[2]))


This is where I'm stuck and do not know how to delete the rows:

with open('fileA', 'ab') as file_a:
with open('fileB', 'rb') as file_b:
writer = csv.writer(file_a)
reader2 = csv.reader(file_b)
next(reader2)
for row in reader2:
if (row[0], row[2]) not in removal_list:
# If row was not present in file B, Delete it from file A.
#stuck here: writer.<HowDoIRemoveRow>(row)

Answer

This solution uses fileinput with inplace=True, which writes to a temporary file and then automatically renames it at the end to your file name. You can't remove rows from a file but you can rewrite it with only the ones you want.

if the keyword argument inplace=1 is passed to fileinput.input() or to the FileInput constructor, the file is moved to a backup file and standard output is directed to the input file (if a file of the same name as the backup file already exists, it will be replaced silently). This makes it possible to write a filter that rewrites its input file in place.

fileA

h1,h2,h3
a,b,c
d,e,f
g,h,i
j,k,l

fileB

h1,h2,h3
a,b,c
1,2,3
g,h,i
4,5,6

import fileinput, sys, csv

with open('fileB', 'rb') as file_b:
    r = csv.reader(file_b)
    next(r) #skip header
    seen = {(row[0], row[2]) for row in r}

f = fileinput.input('fileA', inplace=True) # sys.stdout is redirected to the file
print next(f), # write header as first line

w = csv.writer(sys.stdout) 
for row in csv.reader(f):
   if (row[0], row[2]) in seen: # write it if it's in B
       w.writerow(row)

fileA

h1,h2,h3
a,b,c    
g,h,i
Comments