ws_e_c421 ws_e_c421 - 4 months ago 7
Python Question

How are file objects cleaned up in Python when the process is killed?

What happens to a file object in Python when the process is terminated? Does it matter whether Python is terminated with

SIGTERM
,
SIGKILL
,
SIGHUP
(etc.) or by a
KeyboardInterrupt
exception?

I have some logging scripts that continually acquire data and write it to a file. I don't care about doing any extra clean up, but I just want to make sure that log file is not corrupted when Python is abruptly terminated (e.g. I could leave it running in the background and just shutdown the computer). I made the following test scripts to try to see what happens:

termtest.sh
:

for i in $(seq 1 10); do
python termtest.py $i & export pypid=$!
sleep 0.3
echo $pypid
kill -SIGTERM $pypid
done


termtest.py
:

import csv
import os
import signal
import sys

end_loop = False


def handle_interrupt(*args):
global end_loop
end_loop = True


signal.signal(signal.SIGINT, handle_interrupt)

with open('test' + str(sys.argv[-1]) + '.txt', 'w') as csvfile:
writer = csv.writer(csvfile)
for idx in range(int(1e7)):
writer.writerow((idx, 'a' * 60000))
csvfile.flush()
os.fsync(csvfile.fileno())
if end_loop:
break


I ran
termtest.sh
with different signals (changed
SIGTERM
to
SIGINT
,
SIGHUP
, and
SIGKILL
in
termtest.sh
) (note: I put an explicit handler in
termtest.py
for
SIGINT
since Python does not handle that one other than as
Ctrl+C
). In all cases, all of the output files had only complete rows (no partial writes) and did not appear corrupted. I put the
flush()
and
fsync()
calls to try to make sure the data was being written to disk as much as possible so that the script had the greatest chance of being interrupted mid-write.

So can I conclude that Python always completes a write when it is terminated and does not leave a file in an intermediate state? Or does this depend on the operating system and file system (I was testing with Linux and an ext4 partition)?

Answer

It's not how files are "cleaned up" so much as how they are written to. It's possible that a program might perform multiple writes for a single "chunk" of data (row, or whatever) and you could interrupt in the middle of this process and end up with partial records written.

Looking at the C source for the csv module, it assembles each row to a string buffer, then writes that using a single write() call. That should generally be safe; either the row is passed to the OS or it's not, and if it gets to the OS it's all going to get written or it's not (barring of course things like hardware issues where part of it could go into a bad sector).

The writer object is a Python object, and a custom writer could do something weird in its write() that could break this, but assuming it's a regular file object, it should be fine.

Comments