Gui-Don Gui-Don - 1 year ago 58
PHP Question

Memory usage goes wild with Doctrine bulk insert

I’m trying to insert a large amount of data (30 000+ lines) in a MySQL database using Doctrine2 and the Symfony2 fixture bundle.

I looked at the right way to do it. I saw lots of questions about memory leaks and Doctrine, but no satisfying answer for me. It often comes the Doctrine

clear()
function.

So, I did various shapes of this:

while (($data = getData()) {
$iteration++;

$obj = new EntityObject();
$obj->setName('henry');
// Fill object...

$manager->persist($obj);

if ($iteration % 500 == 0) {
$manager->flush();
$manager->clear();

// Also tried some sort of:
// $manager->clear($obj);
// $manager->detach($obj);
// gc_collect_cycles();
}
}


PHP memory still goes wild, right after the
flush()
(I’m sure of that). In fact, every time the entities are flushed, memory goes up for a certain amount depending on batch size and the entities, until it reaches the deadly Allowed Memory size exhausted error. With a very very tiny entity, it works but memory consumption increase too much: several MB whereas it should be KB.

clear()
,
detach()
or calling GC doesn’t seem to have an effect at all. It only clears some KB.

Is my approach flawed? Did I miss something, somewhere? Is it a bug?

More info:


  • Without
    flush()
    memory barely moves;

  • Lowering the batch do not change the outcome;

  • Data comes from a CSV that need to be sanitized;



EDIT (partial solution):

@qooplmao brought a solution that significantly decrease memory consumption, disable doctrine sql logger:
$manager->getConnection()->getConfiguration()->setSQLLogger(null);


However, it is still abnormally high and increasing.

Answer Source

I resolved my problem using this resource, as @Axalix suggested.

This is how I modified the code:

// IMPORTANT - Disable the Doctrine SQL Logger
$manager->getConnection()->getConfiguration()->setSQLLogger(null);

while (($data = getData()) {
  $iteration++;

  $obj = new EntityObject();
  $obj->setName('henry');
  // Fill object...

  $manager->persist($obj);

  // IMPORTANT - Temporary store entities (of course, must be defined first outside of the loop)
  $tempObjets[] = $obj;

  if ($iteration % 500 == 0) {
    $manager->flush();

    // IMPORTANT - clean entities
    foreach($tempObjets as $tempObject) {
      $manager->detach($tempObject);
    }

    $tempObjets = null;
    gc_enable();
    gc_collect_cycles();
  }
}

And, last but not least, as I use this script with Symfony data fixtures, adding the --no-debug parameter in the command is also very important. Then memory consumption is stable.

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download