orszaczky orszaczky -4 years ago 133
Javascript Question

MongoDB shell: nested iteration through cursors not executing

If there are two nested

cursor.forEach()
functions, the second one is not getting executed. The same happens with a
while
loop:

I want to remove duplicates from a huge collection, by moving documents to another collection, and checking if a duplicate already exists. I'm running the following code in the mongo shell:

var fromColl = db.from.find(),
toColl;

fromColl.forEach(function(fromObj){
toColl = db.to.find({name: fromObj.name});
if (toColl.length() == 0) {
//no duplicates found in the target coll, insert
db.to.insert(fromObj);
} else {
//possible duplicates found in the target coll
print('possible duplicates: ' + toColl.length());
toColl.forEach(function(toObj){
if (equal(fromObj.data, toObj.data)) {
//duplicate...
}
});
}
});


In the else block
toColl.length()
is printed, but the second forEach loop isn't executed. Does anyone know why?

Answer Source

--- WORKAROUND ---

I found a workaround, and created an array of the second cursor: toColl = db.to.find({name: fromObj.name}).toArray(); and I iterated the array with a plain JS for loop:

var fromColl = db.from.find(),
    toColl,
    toObj;

fromColl.forEach(function(fromObj){
    toColl = db.to.find({name: fromObj.name}).toArray();
    if (toColl.length == 0) {
        //no duplicates found in the target coll, insert
        db.to.insert(fromObj);
    } else {
        //possible duplicates found in the target coll
        print('possible duplicates: ' + toColl.length());
        for (var i = 0; i < toColl.length; i++) {
            toObj = toColl[i];
            if (equal(fromObj.data, toObj.data)) {
                //duplicate...
            }
        });
    }
});

--- UPDATE ---

As Stephen Steneker pointed out:

The mongo shell has some shortcuts for working with data in the shell. This is explained in more detail in the MongoDB documentation: Iterate a Cursor in the mongo Shell.

In particular:

if the returned cursor is not assigned to a variable using the var keyword, then the cursor is automatically iterated up to 20 times to print up to the first 20 documents in the results.

In the code example the var declaration for toColl was prior to executing the find().

Iterating all the results with toArray() is a possible approach, but requires loading all documents returned by the cursor into RAM. Manually iterating the cursor is a more scalable approach.

-- SOLUTION --

The main problem turned out to be using toColl.length() instead of toColl.count().

Because toColl.length() resets the cursor.

Big thanks to Rhys Campbell and Stephen Steneker of the MongoDB user group for helping resolving this bug.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download