Zachariah Reagh Zachariah Reagh - 2 months ago 10
Python Question

Updating initial CSV values based on later CSV values?

I have a fairly basic question, and I'm wondering what the best solution would be using Python. I have a set of CSV files, and within each file, I have rows of comma separated elements. Importantly, there are two distinct blocks of rows in each CSV file, let's say "Block 1" and "Block 2". Some values overlap between Block 1 and Block 2 (specific item of interest: the name of particular .jpg files), but the order will vary. Here is a shortened version of how the file is organized:

Trial,Image,Type,Reps
1,511.jpg,T,1REP
2,101a.jpg,2,1REP
3,185a.jpg,5,3REP
4,566.jpg,T,3REP
5,560.jpg,T,3REP

Trial,Image,Type,Reps,Keypress
1,101a.jpg,2,1REP,1
2,185a.jpg,5,3REP,0
3,511.jpg,T,1REP,1
4,560.jpg,T,3REP,1
5,566.jpg,T,3REP,0


For some clarification, this is the log file of an experiment where Block 1 is the time when images are studied. "Type" corresponds to the type of picture, and "Reps" corresponds to how many times overall the picture is seen (1 or 3 times), neither of which are central to what I want to achieve. What I would like to do is this: for each row in the first block, match to the name of the same .jpg file in the second block. Then I need to append the Block 1 row with "1" or "0" based on whether the corresponding "Keypress" in Block 2 is "1" or "0" element. Basically, when tested on the pictures, they make a button press of "1" or "0" and I want to back sort which ones got which press during study. Critically, I need to preserve the order of Block 1 (the studied order of images) with whatever solution I take.

Apologies for how basic this request is...I'm learning.

Answer

Your question isn't what I would call basic at all (and has nothing to do with sorting). In fact doing the processing you want is fairly involved. Essentially each file has to be read twice, first to extract the information needed from the second block, and then again to update the first block in it. Additionally, reading the file each time is broken down into two sub-steps, since there's two kinds of csv data in each file which must be handled separately in each pass.

Since it's fairly difficult to update a file in-place, an updated version of the file is first written to a separate temporary file which then replaces the original if processing completes without errors.

import csv
import shutil
from tempfile import NamedTemporaryFile

TRIAL = 0
IMAGE = 1
KEYPRESS = 4
filename = 'backsorting.csv'
img_resp_map = {}

# first pass
with open(filename, 'rb') as csvfile:
    reader = csv.reader(csvfile)

    # skip over first block
    next(reader)  # header
    while True:
        row = next(reader)
        if not row[TRIAL].isdigit():  # header of second block?
            break

    # use data in second block to create an image-to-response mapping
    for row in reader:
        img_resp_map[row[IMAGE]] = row[KEYPRESS]

# second pass
with open(filename, 'rb') as csvfile:
    reader = csv.reader(csvfile)
    fields = next(reader)  # get header of first block
    with NamedTemporaryFile('wb', dir='.', delete=False) as tempcsv:
        writer = csv.writer(tempcsv)
        writer.writerow(fields + ['Keypress'])  # new header with added field

        # copy and update rows of first block by appending the new field
        for row in reader:
            if not row[TRIAL].isdigit():  # header of second block?
                break
            writer.writerow(row+[img_resp_map[row[IMAGE]]])

        # copy second block of file unchanged
        writer.writerow(row)  # header (already read)
        writer.writerows(reader)

    # NOTE: the following is dangerous since it wipes out the original file
    shutil.move(tempcsv.name, filename)  # replace original file with temp one

My test file was named backsorting.csv and initially had this in it:

Trial,Image,Type,Reps
1,511.jpg,T,1REP
2,101a.jpg,2,1REP
3,185a.jpg,5,3REP
4,566.jpg,T,3REP
5,560.jpg,T,3REP
Trial,Image,Type,Reps,Keypress
1,101a.jpg,2,1REP,1
2,185a.jpg,5,3REP,0
3,511.jpg,T,1REP,1
4,560.jpg,T,3REP,1
5,566.jpg,T,3REP,0

After running the script, its contents were changed to this:

Trial,Image,Type,Reps,Keypress
1,511.jpg,T,1REP,1
2,101a.jpg,2,1REP,1
3,185a.jpg,5,3REP,0
4,566.jpg,T,3REP,0
5,560.jpg,T,3REP,1
Trial,Image,Type,Reps,Keypress
1,101a.jpg,2,1REP,1
2,185a.jpg,5,3REP,0
3,511.jpg,T,1REP,1
4,560.jpg,T,3REP,1
5,566.jpg,T,3REP,0