name_masked name_masked - 8 months ago 69
Linux Question

Default buffer size for a file on Linux

The documentation states that the default value for buffering is:

If omitted, the system default is used
. I am currently on Red Hat Linux 6, but I am not able to figure out the default buffering that is set for the system.

Can anyone please guide me as to how determine the buffering for a system?

Answer Source

Since you linked to the 2.7 docs, I'm assuming you're using 2.7. (In Python 3.x, this all gets a lot simpler, because a lot more of the buffering is exposed at the Python level.)

All open actually does (on POSIX systems) is call fopen, and then, if you've passed anything for buffering, setvbuf. Since you're not passing anything, you just end up with the default buffer from fopen, which is up to your C standard library. (See the source for details. With no buffering, it passes -1 to PyFile_SetBufSize, which does nothing unless bufsize >= 0.)

If you read the glibc setvbuf manpage, it explains that if you never call any of the buffering functions:

Normally all files are block buffered. When the first I/O operation occurs on a file, malloc(3) is called, and a buffer is obtained.

Note that it doesn't say what size buffer is obtained. This is intentional; it means the implementation can be smart and choose different buffer sizes for different cases. (There is a BUFSIZ constant, but that's only used when you call legacy functions like setbuf; it's not guaranteed to be used in any other case.)

So, what does happen? Well, if you look at the glibc source, ultimately it calls the macro _IO_DOALLOCATE, which can be hooked (or overridden, because glibc unifies C++ streambuf and C stdio buffering), but ultimately, it allocates a buf of _IO_BUFSIZE, which is an alias for the platform-specific macro _G_BUFSIZE, which is 8192.

Of course you probably want to trace down the macros on your own system rather than trust the generic source.

You may wonder why there is no good documented way to get this information. Presumably it's because you're not supposed to care. If you need a specific buffer size, you set one manually; if you trust that the system knows best, just trust it. Unless you're actually working on the kernel or libc, who cares? In theory, this also leaves open the possibility that the system could do something smart here, like picking a bufsize based on the block size for the file's filesystem, or even based on running stats data, although it doesn't look like linux/glibc, FreeBSD, or OS X do anything other than use a constant. And most likely that's because it really doesn't matter for most applications. (You might want to test that out yourself—use explicit buffer sizes ranging from 1KB to 2MB on some buffered-I/O-bound script and see what the performance differences are.)