Jeremy Friesner Jeremy Friesner - 1 month ago 14
C Question

How best to manage Linux's buffering behavior when writing a high-bandwidth data stream?

My problem is this: I have a C/C++ app that runs under Linux, and this app receives a constant-rate high-bandwith (~27MB/sec) stream of data that it needs to stream to a file (or files). The computer it runs on is a quad-core 2GHz Xeon running Linux. The filesystem is ext4, and the disk is a solid state E-SATA drive which should be plenty fast for this purpose.

The problem is Linux's too-clever buffering behavior. Specifically, instead of writing the data to disk immediately, or soon after I call write(), Linux will store the "written" data in RAM, and then at some later time (I suspect when the 2GB of RAM starts to get full) it will suddenly try to write out several hundred megabytes of cached data to the disk, all at once. The problem is that this cache-flush is large, and holds off the data-acquisition code for a significant period of time, causing some of the current incoming data to be lost.

My question is: is there any reasonable way to "tune" Linux's caching behavior, so that either it doesn't cache the outgoing data at all, or if it must cache, it caches only a smaller amount at a time, thus smoothing out the bandwidth usage of the drive and improving the performance of the code?

I'm aware of O_DIRECT, and will use that I have to, but it does place some behavioral restrictions on the program (e.g. buffers must be aligned and a multiple of the disk sector size, etc) that I'd rather avoid if I can.


You can use the posix_fadvise() with the POSIX_FADV_DONTNEED advice (possibly combined with calls to fdatasync()) to make the system flush the data and evict it from the cache.

See this article for a practical example.