peanutlover peanutlover - 10 months ago 48
C Question

In the kernel space, how does one get the physical addresses corresponding to a file on ext4-formatted disk

First time in linux kernel here.
Anyways, my question is, if you're here:

https://github.com/torvalds/linux/blob/master/fs/ext4/file.c#L360



You have access to these two structs inside the

ext4_file_mmap
function:

struct file *file, struct vm_area_struct *vma


I am changing the implementation of this function for
dax
mode so that the page tables get entirely filled out for the file the moment you call
mmap
(to see how much better performance not taking any
pagefaults
gives us).

I have managed to get the following done so far (assuming I have access to to the two structs that
ext4_file_mmap
has access to):

// vm_area_struct defined in /include/linux/mm_types.h : 284
// file defined in /include/linux/fs.h : 848

loff_t file_size = file_inode(file)->i_size;
unsigned long start_va = vma->vm_start;


Now, the difficulty lies here. How do I get the physical addresses (blocks? Not sure if
dax
uses blocks) associated with this file?

I have spent the last couple of days staring at the linux source code, trying to make sense of stuff, and boy have I been successful.

Any help, hint,or suggestion is greatly appreciated!
Thanks!

Some updates: When you
mmap
a file in
dax
mode, you don't fetch anything into memory. The device, in this case PMEM, is byte-addressable and gives DDR latencies, so it's accessed directly (no memory in between). Certain
pte
s lead to the access of this PMEM device instead of memory.

Answer Source

First of all mmap support MAP_POPULATE flag specifically to avoid page faults. In principle it may be it does not work with dax, but that's unlikely.

Second of all it seems you don't have any measurements of the current state of affairs. Just "changing something and checking the difference" is a fundamentally wrong approach. In particular it may be the actual bottleneck will be removed as an unintended consequence of the change and the win will end up being misattributed. You can start by using 'perf' to get basic numbers and generating flamegraphs ( http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html ). If you do a lot of i/o over a small range, page faults should have a negligible effect.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download