I am using a mmap to read a file. Now when filesize is around 880 MB it take around 2 second to iterate through file.
Now I increased the filesize by 10 times by replicating the content of file. Now using mmap again took aroung 1 min 2 sec.
I thought that time to iterate should increase linearly with filesize.
For 3 times I am getting around 6 sec.
But for 4 times I am getting around 17 sec.
Can someone explain why the increase is not linear?
Here is my code
fp = fopen64(filename.c_str(), "rm");
if (fp == NULL)
struct stat st;
fileSize = st.st_size;
data = (char *)mmap64(NULL, fileSize, PROT_READ,MAP_PRIVATE,fileno(fp), 0);
Indexer *indexer = new Indexer(maxToken);
if (rows!=NULL) delete  rows;
rows = new Row [n];
off_t clineNo =0;
off_t cptr = ptr;
int token =0;
for (int i=0;i<n;i++)
while ( ptr < fileSize )
if (data[ptr] == ',')
if (token <maxToken)
rows[clineNo].second[token] = (indexer->getIndex(token,std::string(&data[cptr],ptr-cptr)));
cptr = ++ptr;
if (data[ptr] == '\n')
rows[clineNo++].first = std::stoi(std::string(&data[cptr],ptr-cptr));
cptr = ++ptr;
if (clineNo >= n) break;
int Indexer::getIndex(int i, std::string str)
if (items[i].find(str) == items[i].end())
items[i][str] = currentIndex;
Access time is not constant, it gets slower the larger your dataset is. I suggest reading Latency Numbers Every Programmer Should Know.
If you run a benchmark with a random memory access pattern, and adjust the dataset size, you will see several ranges where the performance changes.
Performance is fastest when the dataset fits in L1 cache. L1 cache is small, say 64 KiB per core, but it is fast (~1 cycle access time, almost as fast as registers).
Performance suddenly drops when you need L2 cache. L2 cache is larger and slower than L1 cache. The drop in performance is something like 10x or so.
Performance drops again when your dataset is too large for L2 cache but fits in RAM. Another 10x performance drop or so for cache misses.
Performance drops through the floor when your dataset is too large for RAM but fits on disk. The performance hit is like 1000x for cache misses, assuming you have a fast SSD, and maybe 100,000x if you have a non-SSD hard drive.
Your 880 MB dataset fits neatly in 8 GiB of RAM, but the 8,800 MB dataset does not, it cannot all be resident at the same time. Due to the access patterns of
unordered_map, the kernel will not be able to guess what data should be resident at any given time and the performance will be quite poor.
It's nice to pretend that you have an infinite amount of storage that is all the same speed but that is not even remotely true.