0x90 0x90 - 1 year ago 72
Python Question

Delete / Insert Data in mmap'ed File

I am working on a script in Python that maps a file for processing using mmap().

The tasks requires me to change the file's contents by

  1. Replacing data

  2. Adding data into the file at an offset

  3. Removing data from within the file (not whiting it out)

Replacing data works great as long as the old data and the new data have the same number of bytes:

VDATA = mmap.mmap(f.fileno(),0)
start = 10
end = 20
VDATA[start:end] = "0123456789"

However, when I try to remove data (replacing the range with "") or inserting data (replacing the range with contents longer than the range), I receive the error message:

IndexError: mmap slice assignment is
wrong size

This makes sense.

The question now is, how can I insert and delete data from the mmap'ed file?
From reading the documentation, it seems I can move the file's entire contents back and forth using a chain of low-level actions but I'd rather avoid this if there is an easier solution.

Answer Source

In lack of an alternative, I went ahead and wrote two helper functions - deleteFromMmap() and insertIntoMmap() - to handle the low level file actions and ease the development.

The closing and reopening of the mmap instead of using resize() is do to a bug in python on unix derivates leading resize() to fail. (http://mail.python.org/pipermail/python-bugs-list/2003-May/017446.html)

The functions are included in a complete example. The use of a global is due to the format of the main project but you can easily adapt it to match your coding standards.

import mmap

# f contains "0000111122223333444455556666777788889999"

f = open("data","r+")
VDATA = mmap.mmap(f.fileno(),0)

def deleteFromMmap(start,end):
    global VDATA
    length = end - start
    size = len(VDATA)
    newsize = size - length

    VDATA = mmap.mmap(f.fileno(),0)

def insertIntoMmap(offset,data):
    global VDATA
    length = len(data)
    size = len(VDATA)
    newsize = size + length

    VDATA = mmap.mmap(f.fileno(),0)



# -> 000022223333444455556666777788889999


# -> 0000AAAA22223333444455556666777788889999