Hadrian Węgrzynowski Hadrian Węgrzynowski - 4 months ago 19
Linux Question

Can process insert memory pages in the middle of mmapped space?

Distilled scenario:

User space program needs millions of page sized structs (i.e. 4k for most Linux systems). It also needs quick random access to structs. Sometimes program needs to insert new struct in the middle of the array. Order is important.

struct { char data[PAGE_SIZE]; } page_sized_t;
size_t N = 1 * 1000 * 1000;
size_t X = INSERT_INDEX;


Program could be implemented with having an heap allocated array containing pointers to heap allocated structs. Insert could be implemented with realloc and memmove.

struct page_sized_t **array = malloc( sizeof(array[0]) * N );
...
array = realloc( array, sizeof(array[0]) * (N+1) );
memmove( &array[X+1], &array[X], N-X );
array[X] = malloc( sizeof(array[X][0]) );
...


Now my question is this. Would it be practical to implement such a program in terms of having one big mmapped region of memory. Where every struct would lay in single page. Then insert could be implemented this way: program could ask kernel to insert new page between others. Basically kernel doing the job described in previous paragraph.

struct page_sized_t *array = mmap( 0, sizeof(array[0]) * N,
PROT_READ|PROT_WRITE, MAP_ANONYMOUS, -1, 0 );
...
// imaginary syscall: m_insert_map(old_address, old_size, insert_address, insert_size)
array = m_insert_map( array, sizeof(array[0]) * N, sizeof(array[0]) * X, sizeof(array[0]) );
...


I think that with current syscalls it is not possible. One can only mremap - so in a way only can insert pages at the end.

Summarizing: Could inserting of memory pages be implemented in Linux kernel? Would it be practical to use such an interface instead of user space implementation? Is there a system that has this implemented?

Answer

Program could be implemented with having an heap allocated array containing pointers to heap allocated structs. Insert could be implemented with realloc and memmove.

If you already have an array of pointers to the structures, why would you move the structures around in memory at all? Just update the pointers instead. Modifying a million entries consecutive in memory will always be more efficient than modifying a million page table entries.

Always refer to a structure by its index in the array, not via a pointer. That way you can always walk the array in order, even if the structures are not consecutive in memory.

Would it be practical to implement such a program in terms of having one big mmapped region of memory.

No. In your own terms, you have one page per structure. To insert a page in the middle, the rest of the page table entries need to be updated. That will be slow.

If you locate each structure via an indirect pointer anyway, i.e. you have

struct page_sized_t **array;

then there is no real reason to move the contents around; just update the pointers. Yes, that means that to move item j to index i, with i < j, you do need to

struct page_sized_t *temp = array[j];
memmove(array + i + 1, array + 1, (j - i) * sizeof array[0]);
array[i] = temp;

Note that array[j] is of type struct page_sized_t *, so this moves the pointers around, not the contents. Modifying the pointers will always be faster than modifying as many page table entries. (Even if huge pages are used, the logic needed to coalesce/split them into normal pages as needed will almost certainly impractical. You might be able to contrive a microbenchmark where it performed better than the simple memmove (although if you did, I'd be surprised out of my socks), but in all real life scenarios such page table shenanigans just add to overheads.

Could inserting of memory pages be implemented in Linux kernel? Would it be practical to use such an interface instead of user space implementation?

You can already do it via mremap().

You remap the region starting at the insertion point using mremap(array + index, pagesize * (size - index), pagesize * (size - index + 1), MREMAP_FIXED, array + index + 1). Then, you use either mremap(array, pagesize * index, pagesize * (index + 1), 0) to grow the initial part to cover the hole, or mmap(array + index, pagesize, PROT_READ | PROT_WRITE, MAP_PRIVATE, -1, 0) to plug the hole.

It is quite similar to how you'd do the same thing using memmove(), really.

However, you must ensure that no other thread will make new memory allocations (via mmap()) during the two calls, as otherwise the kernel might provide the "hole" for the other memory allocation call, breaking the scheme. This is completely an userspace issue, and should be trivial to achieve in a single-threaded application (as it is not async-signal-safe to use any memory allocation functions in signal handlers), but for multithreaded programs it might be difficult or even impossible -- even some C library functions do implicit/internal dynamic memory allocation.

In summary:

Whatever you are doing, it looks like you are not using the most efficient data structure, and because of that, are looking for speedups in the wrong places. In particular, needing direct/random access to the contents does not mean you should use a linear array.

Since you haven't provided enough information to even make a handwavey suggestion, I'll just point out that having page-sized structures is itself a bad sign. Databases use indexes (where the values corresponding to a single key/field are consecutive (in some sense at least) for faster access (and collation). So, unless your accesses to each structure really do usually need all data within the structure, you might do better to split it up into separate arrays.

Comments