Difender Difender - 4 months ago 6
Python Question

Splitting a big file into smaller ones basing on a line

I have a pretty big file (more than 20GB) and I'd like to split it into smaller ones, like multiple files of 2GB.

One thing is I have to split before a specific line:

I'm using Python, but if there another solution in shell for example, I'm up for it.

This is how the big file looks like:

bigfile.txt
(20GB)

Recno:: 0
some data...

Recno:: 1
some data...

Recno:: 2
some data...

Recno:: 3
some data...

Recno:: 4
some data...

Recno:: 5
some data...

Recno:: x
some more data...


This is what I want:

file1.txt
(2 GB +/-)

Recno::0
some data...

Recno:: 1
some data...


file2.txt
(2GB +/-)

Recno:: 2
some data...

Recno:: 4
some data...

Recno:: 5
some data...


And so on, and so on...

Thanks !

Answer

You could do something like this:

import sys

try:
    _, size, file = sys.argv
    size = int(size)
except ValueError:
    sys.exit('Usage: splitter.py <size in bytes> <filename to split>')

with open(file) as infile:
    count = 0
    current_size = 0
    # you could do something more
    # fancy with the name like use
    # os.path.splitext
    outfile = open(file+'_0')
    for line in infile:
        if current_size > size and line.startswith('Recno'):
            outfile.close()
            count += 1
            current_size = 0
            outfile = open(file+'_{}'.format(count))
        current_size += len(line)
        outfile.write(line)
    outfile.close()