如何在 Symfony 中更快地改进 CSV 导入?

How can I improve CSV import faster in Symfony?

我必须导入大约 20,000 行的 CSV 文件。此 CSV 从 FTP 服务器读取,该服务器每 30 分钟更新一次。但是我编写的代码已经花费了超过 45 分钟的时间来导入。这很慢。谁能帮帮我。

foreach ($readers as $key => $row) {
    $totalRecords +=1;

    $filterArray = $this->entityManager->getRepository(Article::class)->findBy(['id' =>  $row['id']]);

    if (empty($filterArray)) {
        $notFoundRecords +=1;
        continue;
    }
    $foundRecords +=1;
    $this->processPriceRow($row);
}


protected function processPriceRow($row)
{
    $existingRecord = $this->entityManager
                           ->getRepository(WareHouse::class)
                           ->findBy(['id' => $row['product_id']]);

    if (empty($existingRecord)) {
        return $this->fillArticleWareHouse($row);
    }
}


protected function fillArticleWareHouse($row, $i, $batchSize)
{
    $newWareHouse = new WareHouse();
    ....
    ....
    ...

    // Insert.
    $this->entityManager->persist($newWareHouse);
    $this->entityManager->flush();
}

我正在考虑基于 batchSize = 100 持久化数据。但是因为我有函数内函数,所以我也无法实现它。

你可以像这样实现批处理。


    protected $batchSize = 100;
    protected $i = 0;
    
    protected function processPriceRow($row)
    {
        $existingRecord = $this->entityManager
            ->getRepository(WareHouse::class)
            ->findBy(['id' => $row['product_id']]);

        if (empty($existingRecord)) {
            return $this->fillArticleWareHouse($row);
        }
        $this->entityManager->flush();
    }

    protected function fillArticleWareHouse($row)
    {
        $newWareHouse = new WareHouse();
        //....
        $this->entityManager->persist($newWareHouse);
        ++$this->i;
        if (($this->i % $this->batchSize) === 0) {
            $this->entityManager->flush();
        }
    }

此外,如果您 select 所有 Article 和 WareHouse 实体都具有一个 select 并将它们保存到数组 [entityId => Entity]。

会更好
        // articles
        $rowIds = [];
        foreach ($readers as $key => $row) {
            $rowIds[] = $row['id'];
        }
        
        $articles = $this->entityManager->getRepository(Article::class)->findBy(['id' =>  $rowIds]);
        $articleIdToArticle = [];
        foreach ($articles as $article) {
            $articleIdToArticle[$article->getId()] = $article;
        }
        
        foreach ($readers as $key => $row) {
            $totalRecords +=1;
            if(!key_exists($row['id'], $articleIdToArticle)) {
                $notFoundRecords +=1;
                continue;
            }            
            $foundRecords +=1;
            $this->processPriceRow($row);
        }