Entity Framework 核心性能优化以摄取相同的非常大的文件夹 table
Entity Framework Core Performance Optimization to ingest very large folder of the same table
我有一个后台服务,它使用 C# 将 3600 个 xml 文件(总共 5Gb 文件大小)提取到 SQL 服务器数据库。完成摄入的持续时间约为 16 小时。我使用 hangfire 创建了 3 个 jobs/threads,每个作业都会有一个文件夹要摄取,文件夹 A、B、C。
问题是文件夹 C 非常重。我的想法是将文件夹C中的文件分成两个文件夹,文件夹C1和文件夹C2。所以现在,我有 4 个 jobs/threads,文件夹 A、B、C1 和 C2。但问题是 C1 和 C2 作业命中数据库错误,我相信这是由于它们都评估相同 table.
An exception occurred in the database while saving changes for context
type 'xxxContext'.
System.InvalidOperationException: A second operation started on this
context before a previous operation completed. This is usually caused
by different threads using the same instance of DbContext
还有一次出现此错误:
An exception occurred in the database while saving changes for context
type 'xxxContext'. System.InvalidOperationException: Collection was
modified; enumeration operation may not execute.
hangfire 的错误如下:
Hangfire.Storage.DistributedLockTimeoutException Timeout expired. The
timeout elapsed prior to obtaining a distributed lock on the
'HangFire:IIngestService.IngestPersonXML' resource.
Hangfire.Storage.DistributedLockTimeoutException: Timeout expired. The
timeout elapsed prior to obtaining a distributed lock on the
'HangFire:IIngestService.IngestPersonXML' resource.
当我使用 Parallel.ForEach
时,我也得到这个错误:
System.InvalidOperationException: 'Operations that change
non-concurrent collections must have exclusive access. A concurrent
update was performed on this collection and corrupted its state. The
collection's state is no longer correct.'
我只需要插入数据库。无需更新或删除操作。有什么解决方法吗?
EF 不适用于此类操作。为此使用 SqlBulCopy。
有一些库为 EF 无缝提供它,但您也可以编写自己的实现 - 没那么复杂
真的不明白这部分
I only need to insert into db. Do update or delete operation needed. Is there any workaround for this?
所以你需要更新还是不需要?好吧..如果你需要更新一堆行,将它们与批量复制一起插入到临时table中,然后只做连接更新。
我有一个后台服务,它使用 C# 将 3600 个 xml 文件(总共 5Gb 文件大小)提取到 SQL 服务器数据库。完成摄入的持续时间约为 16 小时。我使用 hangfire 创建了 3 个 jobs/threads,每个作业都会有一个文件夹要摄取,文件夹 A、B、C。
问题是文件夹 C 非常重。我的想法是将文件夹C中的文件分成两个文件夹,文件夹C1和文件夹C2。所以现在,我有 4 个 jobs/threads,文件夹 A、B、C1 和 C2。但问题是 C1 和 C2 作业命中数据库错误,我相信这是由于它们都评估相同 table.
An exception occurred in the database while saving changes for context type 'xxxContext'. System.InvalidOperationException: A second operation started on this context before a previous operation completed. This is usually caused by different threads using the same instance of DbContext
还有一次出现此错误:
An exception occurred in the database while saving changes for context type 'xxxContext'. System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
hangfire 的错误如下:
Hangfire.Storage.DistributedLockTimeoutException Timeout expired. The timeout elapsed prior to obtaining a distributed lock on the 'HangFire:IIngestService.IngestPersonXML' resource.
Hangfire.Storage.DistributedLockTimeoutException: Timeout expired. The timeout elapsed prior to obtaining a distributed lock on the 'HangFire:IIngestService.IngestPersonXML' resource.
当我使用 Parallel.ForEach
时,我也得到这个错误:
System.InvalidOperationException: 'Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.'
我只需要插入数据库。无需更新或删除操作。有什么解决方法吗?
EF 不适用于此类操作。为此使用 SqlBulCopy。
有一些库为 EF 无缝提供它,但您也可以编写自己的实现 - 没那么复杂
真的不明白这部分
I only need to insert into db. Do update or delete operation needed. Is there any workaround for this?
所以你需要更新还是不需要?好吧..如果你需要更新一堆行,将它们与批量复制一起插入到临时table中,然后只做连接更新。