Skip to content

"OutOfMemoryError" when importing big amount of entities #104

@N2oo

Description

@N2oo

Hi,

To reproduce :

use symfony typesense:import --max-per-page=10000 on large amont of values from your database to typesense.
Crash appened close to 15000 Entities in memory.

Reason :

The process throw a fatal error : OutOfMemoryError from Doctrine's classes because Entity manager isn't cleared.

Here is the fact : You must detach Objects from Doctrine by clearing the entity manager.
$this->em->clear()
Here is the ressource that made me think about it.
https://www.doctrine-project.org/projects/doctrine-orm/en/2.14/reference/batch-processing.html

Suggestion :

Here is the change i made from the ImportCommand class :

private function populateIndex(InputInterface $input, OutputInterface $output, string $index)
   {
       /*...*/

       for ($i = $firstPage; $i <= $lastPage; ++$i) {
           $q = $this->em->createQuery('select e from '.$class.' e')
               ->setFirstResult(($i - 1) * $maxPerPage)
               ->setMaxResults($maxPerPage)
           ;

           if ($io->isDebug()) {
               $io->text('<info>Running request : </info>'.$q->getSQL());
           }

           $entities = $q->toIterable();

           $data = [];
           foreach ($entities as $entity) {
               $data[] = $this->transformer->convert($entity);
           }

           $io->text('Import <info>['.$collectionName.'] '.$class.'</info> Page '.$i.' of '.$lastPage.' ('.count($data).' items)');

           $result = $this->documentManager->import($collectionName, $data, $action);


           if ($this->printErrors($io, $result)) {
               $this->isError = true;

               throw new \Exception('Error happened during the import of the collection : '.$collectionName.' (you can see them with the option -v)');
           }

           $populated += count($data);
           $this->em->clear(); //clear every iterations
       }
       $this->em->clear();//clear cache after processing all data

       $io->newLine();
       return $populated;
   }

I've made a fork using this edit, everything seem's ok :
I didn't noticed big performance issues.

Hope it could help,
Have good day.

Originally posted by @N2oo in #74 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions