mandeep_m91 mandeep_m91 - 1 month ago 8
Node.js Question

node.js : async.each gets too slow with large number of elements to process asynchronously

I have a system with 4GB of RAM. I need to process a set of 200 files (average file size = 20MB) in the following manner:


  • read each file from gridfs

  • extract some information from the file

  • store the info to some collection in mongoDB



Now the code, for doing the same is:

async.each(files, function (file, callback){

console.log("reading file", file._id);

readstream[file._id] = db.gfs().createReadStream({
_id: file._id
});

readstream[file._id].on('data', function (chunk) {
part[file._id] = part[file._id] && (part[file._id] + chunk.toString()) || chunk.toString();
});

readstream[file._id].on('end', function(){

// do something here

});

}, function (err){
if(err){
console.error("error ", err);
res.json(err);
}
else{
console.log("saved all files ############ YIPPIEEEEEEEEEEEEE ###################");
res.json({"status": 1});
}
});


It works like charm for 10 files. When the number of files is large (200 in my case), it gets really slow, possibly due to memory limits.

For now, I can process the files 10 at a time and live with it since its a one time activity. But I wanted to know what is the standard practice for tackling such situations in production ?

Answer

The problem lays down to parallel execution as async.each executes all the tasks in parallel, as a workaround, you may use async.eachSeries to execute the tasks one by one, you may also consider using async.cargo to combine the execution of multiple tasks in shots.

Comments