mxcl mxcl - 1 year ago 43
C++ Question

Remove all but the last 500,000 bytes from a file with the STL

Our logging class, when initialised, truncates the log file to 500,000 bytes. From then on, log statements are appended to the file.

We do this to keep disk usage low, we're a commodity end-user product.

Obviously keeping the first 500,000 bytes is not useful, so we keep the last 500,000 bytes.

Our solution has some serious performance problem. What is an efficient way to do this?

Answer Source

"I would probably create a new file, seek in the old file, do a buffered read/write from old file to new file, rename the new file over the old one."

I think you'd be better off simply:

#include <fstream>
std::ifstream ifs("logfile");  //One call to start it all. . .
ifs.seekg(-512000, std::ios_base::end);  // One call to find it. . .
char tmpBuffer[512000];, 512000);  //One call to read it all. . .
std::ofstream ofs("logfile", ios::trunc);
ofs.write(tmpBuffer, 512000); //And to the FS bind it.

This avoids the file rename stuff by simply copying the last 512K to a buffer, opening your logfile in truncate mode (clears the contents of the logfile), and spitting that same 512K back into the beginning of the file.

Note that the above code hasn't been tested, but I think the idea should be sound.

You could load the 512K into a buffer in memory, close the input stream, then open the output stream; in this way, you wouldn't need two files since you'd input, close, open, output the 512 bytes, then go. You avoid the rename / file relocation magic this way.

If you don't have an aversion to mixing C with C++ to some extent, you could also perhaps:

(Note: pseudocode; I don't remember the mmap call off the top of my head)

int myfd = open("mylog", O_RDONLY); // Grab a file descriptor
(char *) myptr = mmap(mylog, myfd, filesize - 512000) // mmap the last 512K
std::string mystr(myptr, 512000) // pull 512K from our mmap'd buffer and load it directly into the std::string
munmap(mylog, 512000); //Unmap the file
close(myfd); // Close the file descriptor

Depending on many things, mmap could be faster than seeking. Googling 'fseek vs mmap' yields some interesting reading about it, if you're curious.