zhanglistar zhanglistar - 4 months ago 9
Linux Question

read() of big 6GB file fails on x86_64

Here is the description of my problem:

I want to read a big file, about 6.3GB, all to memory using the

read
system call in C, but an error occurs.
Here is the code:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <unistd.h>
#include <assert.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <limits.h>

int main(int argc, char* argv[]) {
int _fd = open(argv[1], O_RDONLY, (mode_t) 0400);
if (_fd == -1)
return 1;
off_t size = lseek(_fd, 0, SEEK_END);
printf("total size: %lld\n", size);
lseek(_fd, 0, SEEK_SET);
char *buffer = malloc(size);
assert(buffer);
off_t total = 0;
ssize_t ret = read(_fd, buffer, size);
if (ret != size) {
printf("read fail, %lld, reason:%s\n", ret, strerror(errno));
printf("int max: %d\n", INT_MAX);
}
}


And compile it with:

gcc read_test.c


then run with:

./a.out bigfile


output:

total size: 6685526352
read fail, 2147479552, reason:Success
int max: 2147483647


The system environment is

3.10.0_1-0-0-8 #1 SMP Thu Oct 29 13:04:32 CST 2015 x86_64 x86_64 x86_64 GNU/Linux


There two places I don't understand:


  1. Reading fails on a big file, but not on a small file.

  2. Even if there is an error, it seems that the
    errno
    is not correctly set.


Answer

The read system call can return a smaller number than the requested size for multiple reasons, a positive non zero return value is not an error, errno is not set in this case, its value is indeterminate. You should keep reading in a loop until read returns 0 for end of file or -1 for an error. It is a very common bug to rely on read to read a complete block in a single call, even from regular files. Use fread for simpler semantics.

You print the value of INT_MAX, which is irrelevant to your issue. The size of off_t and size_t are the interesting ones. On your platform, 64 bit GNU/Linux, you are lucky that both off_t and size_t are 64 bit long. ssize_t has the same size as size_t by definition. On other 64 bit platforms, off_t might be smaller than size_t, preventing correct assessment of the file size, or size_t might be smaller than off_t, letting malloc allocate a block smaller than the file size. Note that in this case, read will be passed the same smaller size because size would be silently truncated in both calls.

Comments