Beginner Beginner - 29 days ago 7
Python Question

Garbage in file after truncate(0) in Python

Assume there is a file

test.txt
containing a string
'test'
.

Now, consider the following Python code:

f = open('test', 'r+')
f.read()
f.truncate(0)
f.write('passed')
f.flush();


Now I expect
test.txt
to contain
'passed'
now, however there are additionally some strange symbols!

Update: flush after truncate does not help.

Answer Source

This is because truncate doesn't change the stream position.

When you read() the file, you move the position to the end. So successive writes will write to file from that position. However, when you call flush(), it seems not only it tries to write the buffer to the file, but also does some error checking and fixes the current file position. When Flush() is called after the truncate(0), writes nothing (buffer is empty), then checks the file size and places the position at the first applicable place (which is 0).

UPDATE

Python's file function are NOT just wrappers around the C standard library equivalents, but knowing the C functions helps knowing what is happening more precisely.

From the ftruncate man page:

The value of the seek pointer is not modified by a call to ftruncate().

From the fflush man page:

If stream points to an input stream or an update stream into which the most recent operation was input, that stream is flushed if it is seekable and is not already at end-of-file. Flushing an input stream discards any buffered input and adjusts the file pointer such that the next input operation accesses the byte after the last one read.

This means if you put flush before truncate it has no effect. I checked and it was so.

But for putting flush after truncate:

If stream points to an output stream or an update stream in which the most recent operation was not input, fflush() causes any unwritten data for that stream to be written to the file, and the st_ctime and st_mtime fields of the underlying file are marked for update.

The man page doesn't mention the seek pointer when explaining output streams with last operation not being input. (Here our last operation is truncate)

UPDATE 2

I found something in python source code: Python-3.2.2\Modules\_io\fileio.c:837

#ifdef HAVE_FTRUNCATE
static PyObject *
fileio_truncate(fileio *self, PyObject *args)
{
    PyObject *posobj = NULL; /* the new size wanted by the user */
#ifndef MS_WINDOWS
    Py_off_t pos;
#endif

...

#ifdef MS_WINDOWS
    /* MS _chsize doesn't work if newsize doesn't fit in 32 bits,
       so don't even try using it. */
    {
        PyObject *oldposobj, *tempposobj;
        HANDLE hFile;

////// THIS LINE //////////////////////////////////////////////////////////////
        /* we save the file pointer position */
        oldposobj = portable_lseek(fd, NULL, 1);
        if (oldposobj == NULL) {
            Py_DECREF(posobj);
            return NULL;
        }

        /* we then move to the truncation position */
        ...

        /* Truncate.  Note that this may grow the file! */
        ...

////// AND THIS LINE //////////////////////////////////////////////////////////
        /* we restore the file pointer position in any case */
        tempposobj = portable_lseek(fd, oldposobj, 0);
        Py_DECREF(oldposobj);
        if (tempposobj == NULL) {
            Py_DECREF(posobj);
            return NULL;
        }
        Py_DECREF(tempposobj);
    }
#else

...

#endif /* HAVE_FTRUNCATE */

Look at the two lines I indicated (///// This Line /////). If your platform is Windows, then it's saving the position and returning it back after the truncate.

To my surprise, most of the flush functions inside the Python 3.2.2 functions either did nothing or did not call fflush C function at all. The 3.2.2 truncate part was also very undocumented. However, I did find something interesting in Python 2.7.2 sources. First, I found this in Python-2.7.2\Objects\fileobject.c:812 in truncate implementation:

 /* Get current file position.  If the file happens to be open for
 * update and the last operation was an input operation, C doesn't
 * define what the later fflush() will do, but we promise truncate()
 * won't change the current position (and fflush() *does* change it
 * then at least on Windows).  The easiest thing is to capture
 * current pos now and seek back to it at the end.
 */

So to summarize all, I think this is a fully platform dependent thing. I checked on default Python 3.2.2 for Windows x64 and got the same results as you. Don't know what happens on *nixes.