Joe Fusaro Joe Fusaro - 26 days ago 16
Python Question

Generate chunks from file

I have a JSON file and would like to write a function to return a list of the next 10 objects in the file. I've started with a class,

FileProcessor
, and the method
get_row()
which returns a generator that yields a single JSON object from the file. Another method,
get_chunk()
, should return the next 10 objects.

Here is what I have so far:

class FileProcessor(object):

def __init__(self, filename):
self.FILENAME = filename

def get_row(self):
with open( os.path.join('path/to/file', self.FILENAME), 'r') as f:
for i in f:
yield json.loads(i)

def get_chunk(self):
pass


I've tried like this, but it only returns the first 10 rows every time.

def get_chunk(self):
chunk = []
consumer = self.consume()
for i in self.get_row():
chunk.append(i)
return chunk


So what is the correct way to write
get_chunk()
?

Answer

Here's a simple generator that gets values from another generator and puts them into a list. It should work with your FileProcessor.get_row method.

def count(n):
    for v in range(n):
        yield str(v)

def chunks(it, n):
    while True:
        yield [next(it) for _ in range(n)]

for u in chunks(count(100), 12):
    print(u)

output

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11']
['12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23']
['24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '35']
['36', '37', '38', '39', '40', '41', '42', '43', '44', '45', '46', '47']
['48', '49', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59']
['60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71']
['72', '73', '74', '75', '76', '77', '78', '79', '80', '81', '82', '83']
['84', '85', '86', '87', '88', '89', '90', '91', '92', '93', '94', '95']

Note that this only yields complete chunks. If that's a problem, you can do this:

def chunks(it, n):
    while True:
        chunk = []
        for _ in range(n):
            try:
                chunk.append(next(it))
            except StopIteration: 
                yield chunk
                return
        yield chunk

which will print

['96', '97', '98', '99']

after the previous output.


A better way to do this is to use itertools.islice, which will handle a partial final chunk:

from itertools import islice

def chunks(it, n):
    while True:
        a = list(islice(it, n))
        if not a:
            return
        yield a

Thanks to Antti Haapala for reminding me about islice. :)

Comments