STD STD - 15 days ago 5
Python Question

How to split a log file into several csv files with python

I'm pretty new to python and coding in general, so sorry in advance for any dumb questions. My program needs to split an existing log file into several *.csv files (run1,.csv, run2.csv, ...) based on the keyword 'MYLOG'. If the keyword appears it should start copying the two desired columns into the new file till the keyword appears again. When finished there need to be as many csv files as there are keywords.




53.2436 EXP MYLOG: START RUN specs/run03_block_order.csv
53.2589 EXP TextStim: autoDraw = None
53.2589 EXP TextStim: autoDraw = None
55.2257 DATA Keypress: t
57.2412 DATA Keypress: t
59.2406 DATA Keypress: t
61.2400 DATA Keypress: t
63.2393 DATA Keypress: t
...
89.2314 EXP MYLOG: START BLOCK scene [specs/run03_block01.csv]
89.2336 EXP Imported specs/run03_block01.csv as conditions
89.2339 EXP Created sequence: sequential, trialTypes=9
...





[EDIT]: The output per file (run*.csv) should look like this:

onset type
53.2436 EXP
53.2589 EXP
53.2589 EXP
55.2257 DATA
57.2412 DATA
59.2406 DATA
61.2400 DATA
...





The program creates as much run*.csv as needed, but i can't store the desired columns in my new files. When finished, all I get are empty csv files. If I shift the counter variable to == 1 it creates just one big file with the desired columns.

Thanks again!

import csv

QUERY = 'MYLOG'

with open('localizer.log', 'rt') as log_input:
i = 0

for line in log_input:

if QUERY in line:
i = i + 1

with open('run' + str(i) + '.csv', 'w') as output:
reader = csv.reader(log_input, delimiter = ' ')
writer = csv.writer(output)
content_column_A = [0]
content_column_B = [1]

for row in reader:
content_A = list(row[j] for j in content_column_A)
content_B = list(row[k] for k in content_column_B)
writer.writerow(content_A)
writer.writerow(content_B)

Answer

Looking at the code there's a few things that are possibly wrong:

  1. the csv reader should take a file handler, not a single line.
  2. the reader delimiter should not be a single space character as it looks like the actual delimiter in your logs is a variable number of multiple space characters.
  3. the looping logic seems to be a bit off, confusing files/lines/rows a bit.

You may be looking at something like the code below (pending clarification in the question):

import csv
NEW_LOG_DELIMITER = 'MYLOG'

def write_buffer(_index, buffer):
    """
    This function takes an index and a buffer.
    The buffer is just an iterable of iterables (ex a list of lists)
    Each buffer item is a row of values.
    """
    filename = 'run{}.csv'.format(_index)
    with open(filename, 'w') as output:
        writer = csv.writer(output)
        writer.writerow(['onset', 'type'])  # adding the heading
        writer.writerows(buffer)

current_buffer = []
_index = 1

with open('localizer.log', 'rt') as log_input:
    for line in log_input:
        # will deal ok with multi-space as long as
        # you don't care about the last column
        fields = line.split()[:2]
        if not NEW_LOG_DELIMITER in line or not current_buffer:
            # If it's the first line (the current_buffer is empty)
            # or the line does NOT contain "MYLOG" then
            # collect it until it's time to write it to file.
            current_buffer.append(fields)
        else:
            write_buffer(_index, current_buffer)
            _index += 1
            current_buffer = [fields]  # EDIT: fixed bug, new buffer should not be empty
    if current_buffer:
        # We are now out of the loop,
        # if there's an unwritten buffer then write it to file.
        write_buffer(_index, current_buffer)