Torsten Römer Torsten Römer - 5 months ago 19
Java Question

Why does Files.readAllBytes first read with a bufsize of 1?

I am writing a simple Linux USB character driver allowing to read a short string from the device node it creates.

It works fine but I noticed a difference between reading from the device node with

cat
and reading from a Java program with Files.readAllBytes.

Reading with
cat
, a buffer with size 131072 is passed in at the first call to the
file_operations.read
function and the 5 bytes string is copied:

kernel: [46863.186331] usbtherm: Device was opened
kernel: [46863.186407] usbtherm: buffer: 131072, read: 5, offset: 5
kernel: [46863.186444] usbtherm: done, returning 0
kernel: [46863.186481] usbtherm: Device was released


Reading with
Files.readAllBytes
, a buffer with size 1 is passed in at the first call, and then a buffer with size 8191 is passed and the remaining 4 bytes are copied:

kernel: [51442.728879] usbtherm: Device was opened
kernel: [51442.729032] usbtherm: buffer: 1, read: 1, offset: 1
kernel: [51442.729102] usbtherm: buffer: 8191, read: 4, offset: 5
kernel: [51442.729140] usbtherm: done, returning 0
kernel: [51442.729158] usbtherm: Device was released


The
file_operations.read
function (including the debugging
printk
's) is:

static ssize_t device_read(struct file *filp,
char *buffer,
size_t length,
loff_t *offset)
{
int err = 0;
size_t len_rem = 0;
size_t len_read = 0;

len_rem = strlen(message) - *offset;
if (len_rem <= *offset)
{
printk(KERN_INFO "usbtherm: done, returning 0\n");
return 0;
}

len_read = len_rem > length ? length : len_rem;

err = copy_to_user(buffer, message + *offset, len_read);
if (err)
{
err = -EFAULT;
goto error;
}

*offset += len_read;

printk(KERN_INFO "usbtherm: buffer: %ld, read: %ld, offset: %lld\n",
length, len_read, *offset);

return len_read;

error:
return err;
}


The string read in both cases is identical, so I suppose it is okay, I am just wondering why the different behaviour?

Answer

GNU cat

In the source of cat,

      insize = io_blksize (stat_buf);

you can see that the buffer's size is determined by coreutils' io_bliksize(), which has a rather interesting comment in this regard,

/* As of May 2014, 128KiB is determined to be the minimium blksize to best minimize system call overhead.

So that'd explain the results with cat, since 128KiB is 131072 bytes and the GNUrus decided that's the best way to minimize system call overhead.

Files.readAllBytes

Is a bit more difficult to grasp, at least for a simple soul like me. The source of readAllBytes

public static byte[] readAllBytes(Path path) throws IOException {
    try (SeekableByteChannel sbc = Files.newByteChannel(path);
         InputStream in = Channels.newInputStream(sbc)) {
        long size = sbc.size();
        if (size > (long)MAX_BUFFER_SIZE)
            throw new OutOfMemoryError("Required array size too large");

        return read(in, (int)size);
    }
}

shows it's simply calling read(InputStream, initialSize) where the initial size is determined by the size of the byte channel. The size() method also has an interesting comment,

The size of files that are not isRegularFile() files is implementation specific and therefore unspecified.

Finally, read(InputStream, initialSize) calls InputStream.read(byteArray, offset, length) to do the reading (comments in source are from the original source and are confusing things since initialSize=0, so the first time the while loop is entered, it does not read to EOF):

private static byte[] read(InputStream source, int initialSize)
        throws IOException {
    int capacity = initialSize;
    byte[] buf = new byte[capacity];
    int nread = 0;
    int n;
    for (;;) {
        // read to EOF which may read more or less than initialSize (eg: file
        // is truncated while we are reading)
        while ((n = source.read(buf, nread, capacity - nread)) > 0)
            nread += n;

        // if last call to source.read() returned -1, we are done
        // otherwise, try to read one more byte; if that failed we're done too
        if (n < 0 || (n = source.read()) < 0)
            break;

        // one more byte was read; need to allocate a larger buffer
        if (capacity <= MAX_BUFFER_SIZE - capacity) {
            capacity = Math.max(capacity << 1, BUFFER_SIZE);
        } else {
            if (capacity == MAX_BUFFER_SIZE)
                throw new OutOfMemoryError("Required array size too large");
            capacity = MAX_BUFFER_SIZE;
        }
        buf = Arrays.copyOf(buf, capacity);
        buf[nread++] = (byte)n;
    }
    return (capacity == nread) ? buf : Arrays.copyOf(buf, nread);
}

The declaration of BUFFER_SIZE for File

    // buffer size used for reading and writing
    private static final int BUFFER_SIZE = 8192;

Source of InputStream.read(byteArray, offset, length)

public int read(byte b[], int off, int len) throws IOException {
    //...

    int c = read();
    if (c == -1) {
        return -1;
    }
    b[off] = (byte)c;

    int i = 1;
    try {
        for (; i < len ; i++) {
            c = read();
            if (c == -1) {
                break;
            }
            b[off + i] = (byte)c;
        }
    } catch (IOException ee) {
    }
    return i;
}
Comments