danielsto danielsto - 18 days ago 7
C++ Question

Performance reading binary files

I have a program that reads from a really big binary file (48 MB) and then passes the data to a matrix of custom structs named pixel:

struct pixel {
int r;
int g;
int b;
};


Opening the file:

ifstream myFile(inputPath, ios::binary);
pixel **matrixPixel;


The read of the file is done this way:

int position = 0;

for (int i = 0; i < HEIGHT; ++i) {
for (int j = 0; j < WIDTH; ++j) {
if (!myFile.eof()) {
myFile.seekg(position, ios::beg);
myFile.read((char *) &matrixPixel[i][j].r, 1); // red byte
myFile.seekg(position + HEIGHT * WIDTH, ios::beg);
myFile.read((char *) &matrixPixel[i][j].g, 1); // green byte
myFile.seekg(position + HEIGHT * WIDTH * 2, ios::beg);
myFile.read((char *) &matrixPixel[i][j].b, 1); // blue byte
++position;
}
}
}
myFile.close();


The thing is that, for a big file like the one at the beginning, it takes a lot of time (~7 min) and it's supposed to be optimized. How could I read from the file in less time?

Answer

So, the structure of the data you're storing in memory looks like this:

rgbrgbrgbrgbrgbrgbrgbrgbrgbrgb..............rgb

But the structure of the file you're reading looks like this (assuming your code's logic is correct):

rrrrrrrrrrrrrrrrrrrrrrrrrrr....
ggggggggggggggggggggggggggg....
bbbbbbbbbbbbbbbbbbbbbbbbbbb....

And in your code, you're translating between the two. Fundamentally, that's going to be slow. And what's more, you've chosen to read the file by making manual seeks to arbitrary points in the file. That's going to slow things down even more.

The first thing you can do is streamline the Hard Disk reads:

for(int channel = 0; channel < 3; channel++) {
    for (int i = 0; i < HEIGHT; ++i) {
        for (int j = 0; j < WIDTH; ++j) {
            if (!myFile.eof()) {
                switch(channel) {
                    case 0: myFile.read((char *) &matrixPixel[i][j].r, 1); break;
                    case 1: myFile.read((char *) &matrixPixel[i][j].g, 1); break;
                    case 2: myFile.read((char *) &matrixPixel[i][j].b, 1); break;
                }
            }
        }
    }
}

That requires the fewest changes to your code, and will speed up your code, but the code will probably still be slow.

A better approach, which increases CPU use but dramatically reduces Hard Disk use (which, in the vast majority of applications, will result in a speed-up), would be to store the data like so:

std::vector<unsigned char> reds(WIDTH * HEIGHT);
std::vector<unsigned char> greens(WIDTH * HEIGHT);
std::vector<unsigned char> blues(WIDTH * HEIGHT);

myFile.read(reds.data(), WIDTH * HEIGHT); //Stream can be checked for errors resulting from EOF or other issues.
myFile.read(greens.data(), WIDTH * HEIGHT);
myFile.read(blues.data(), WIDTH * HEIGHT);

std::vector<pixel> pixels(WIDTH * HEIGHT);

for(size_t index = 0; index < WIDTH * HEIGHT; index++) {
    pixels[index].r = reds[index];
    pixels[index].g = greens[index];
    pixels[index].b = blues[index];
}

The final, best approach, is to change how the binary file is formatted, because the way it appears to be formatted is insane (from a performance perspective). If the file is reformatted to the rgbrgbrgbrgbrgb style (which is far more standard in the industry), your code simply becomes this:

struct pixel {
    unsigned char red, green, blue;
}; //You'll never read values above 255 when doing byte-length color values.
std::vector<pixel> pixels(WIDTH * HEIGHT);
myFile.read(reinterpret_cast<char*>(pixels.data()), WIDTH * HEIGHT * 3);

This is extremely short, and is probably going to outperform all the other methods. But of course, that may not be an option for you.

I haven't tested any of these methods (and there may be a typo or two) but all of these methods should be faster than what you're currently doing.

Comments