如何在 Symfony 中更快地改进 CSV 导入?
How can I improve CSV import faster in Symfony?
我必须导入大约 20,000 行的 CSV 文件。此 CSV 从 FTP 服务器读取,该服务器每 30 分钟更新一次。但是我编写的代码已经花费了超过 45 分钟的时间来导入。这很慢。谁能帮帮我。
foreach ($readers as $key => $row) {
$totalRecords +=1;
$filterArray = $this->entityManager->getRepository(Article::class)->findBy(['id' => $row['id']]);
if (empty($filterArray)) {
$notFoundRecords +=1;
continue;
}
$foundRecords +=1;
$this->processPriceRow($row);
}
protected function processPriceRow($row)
{
$existingRecord = $this->entityManager
->getRepository(WareHouse::class)
->findBy(['id' => $row['product_id']]);
if (empty($existingRecord)) {
return $this->fillArticleWareHouse($row);
}
}
protected function fillArticleWareHouse($row, $i, $batchSize)
{
$newWareHouse = new WareHouse();
....
....
...
// Insert.
$this->entityManager->persist($newWareHouse);
$this->entityManager->flush();
}
我正在考虑基于 batchSize = 100 持久化数据。但是因为我有函数内函数,所以我也无法实现它。
你可以像这样实现批处理。
protected $batchSize = 100;
protected $i = 0;
protected function processPriceRow($row)
{
$existingRecord = $this->entityManager
->getRepository(WareHouse::class)
->findBy(['id' => $row['product_id']]);
if (empty($existingRecord)) {
return $this->fillArticleWareHouse($row);
}
$this->entityManager->flush();
}
protected function fillArticleWareHouse($row)
{
$newWareHouse = new WareHouse();
//....
$this->entityManager->persist($newWareHouse);
++$this->i;
if (($this->i % $this->batchSize) === 0) {
$this->entityManager->flush();
}
}
此外,如果您 select 所有 Article 和 WareHouse 实体都具有一个 select 并将它们保存到数组 [entityId => Entity]。
会更好
// articles
$rowIds = [];
foreach ($readers as $key => $row) {
$rowIds[] = $row['id'];
}
$articles = $this->entityManager->getRepository(Article::class)->findBy(['id' => $rowIds]);
$articleIdToArticle = [];
foreach ($articles as $article) {
$articleIdToArticle[$article->getId()] = $article;
}
foreach ($readers as $key => $row) {
$totalRecords +=1;
if(!key_exists($row['id'], $articleIdToArticle)) {
$notFoundRecords +=1;
continue;
}
$foundRecords +=1;
$this->processPriceRow($row);
}
我必须导入大约 20,000 行的 CSV 文件。此 CSV 从 FTP 服务器读取,该服务器每 30 分钟更新一次。但是我编写的代码已经花费了超过 45 分钟的时间来导入。这很慢。谁能帮帮我。
foreach ($readers as $key => $row) {
$totalRecords +=1;
$filterArray = $this->entityManager->getRepository(Article::class)->findBy(['id' => $row['id']]);
if (empty($filterArray)) {
$notFoundRecords +=1;
continue;
}
$foundRecords +=1;
$this->processPriceRow($row);
}
protected function processPriceRow($row)
{
$existingRecord = $this->entityManager
->getRepository(WareHouse::class)
->findBy(['id' => $row['product_id']]);
if (empty($existingRecord)) {
return $this->fillArticleWareHouse($row);
}
}
protected function fillArticleWareHouse($row, $i, $batchSize)
{
$newWareHouse = new WareHouse();
....
....
...
// Insert.
$this->entityManager->persist($newWareHouse);
$this->entityManager->flush();
}
我正在考虑基于 batchSize = 100 持久化数据。但是因为我有函数内函数,所以我也无法实现它。
你可以像这样实现批处理。
protected $batchSize = 100;
protected $i = 0;
protected function processPriceRow($row)
{
$existingRecord = $this->entityManager
->getRepository(WareHouse::class)
->findBy(['id' => $row['product_id']]);
if (empty($existingRecord)) {
return $this->fillArticleWareHouse($row);
}
$this->entityManager->flush();
}
protected function fillArticleWareHouse($row)
{
$newWareHouse = new WareHouse();
//....
$this->entityManager->persist($newWareHouse);
++$this->i;
if (($this->i % $this->batchSize) === 0) {
$this->entityManager->flush();
}
}
此外,如果您 select 所有 Article 和 WareHouse 实体都具有一个 select 并将它们保存到数组 [entityId => Entity]。
会更好 // articles
$rowIds = [];
foreach ($readers as $key => $row) {
$rowIds[] = $row['id'];
}
$articles = $this->entityManager->getRepository(Article::class)->findBy(['id' => $rowIds]);
$articleIdToArticle = [];
foreach ($articles as $article) {
$articleIdToArticle[$article->getId()] = $article;
}
foreach ($readers as $key => $row) {
$totalRecords +=1;
if(!key_exists($row['id'], $articleIdToArticle)) {
$notFoundRecords +=1;
continue;
}
$foundRecords +=1;
$this->processPriceRow($row);
}