Entity Framework 核心性能优化以摄取相同的非常大的文件夹 table

Entity Framework Core Performance Optimization to ingest very large folder of the same table

我有一个后台服务,它使用 C# 将 3600 个 xml 文件(总共 5Gb 文件大小)提取到 SQL 服务器数据库。完成摄入的持续时间约为 16 小时。我使用 hangfire 创建了 3 个 jobs/threads,每个作业都会有一个文件夹要摄取,文件夹 A、B、C。

问题是文件夹 C 非常重。我的想法是将文件夹C中的文件分成两个文件夹,文件夹C1和文件夹C2。所以现在,我有 4 个 jobs/threads,文件夹 A、B、C1 和 C2。但问题是 C1 和 C2 作业命中数据库错误,我相信这是由于它们都评估相同 table.

An exception occurred in the database while saving changes for context type 'xxxContext'. System.InvalidOperationException: A second operation started on this context before a previous operation completed. This is usually caused by different threads using the same instance of DbContext

还有一次出现此错误:

An exception occurred in the database while saving changes for context type 'xxxContext'. System.InvalidOperationException: Collection was modified; enumeration operation may not execute.

hangfire 的错误如下:

Hangfire.Storage.DistributedLockTimeoutException Timeout expired. The timeout elapsed prior to obtaining a distributed lock on the 'HangFire:IIngestService.IngestPersonXML' resource.

Hangfire.Storage.DistributedLockTimeoutException: Timeout expired. The timeout elapsed prior to obtaining a distributed lock on the 'HangFire:IIngestService.IngestPersonXML' resource.

当我使用 Parallel.ForEach 时,我也得到这个错误:

System.InvalidOperationException: 'Operations that change non-concurrent collections must have exclusive access. A concurrent update was performed on this collection and corrupted its state. The collection's state is no longer correct.'

我只需要插入数据库。无需更新或删除操作。有什么解决方法吗?

EF 不适用于此类操作。为此使用 SqlBulCopy
有一些库为 EF 无缝提供它,但您也可以编写自己的实现 - 没那么复杂

真的不明白这部分

I only need to insert into db. Do update or delete operation needed. Is there any workaround for this?

所以你需要更新还是不需要?好吧..如果你需要更新一堆行,将它们与批量复制一起插入到临时table中,然后只做连接更新。