如何迭代所有行并有效地在 EF Core 中应用更新?

How does one iterate over all rows and apply updates in EF Core effectively?

我需要遍历与 Songs 具有一对多关系的整个 Artist table。艺术家和歌曲列表的详细信息存储在一个文件中,此路径保存在 Artist table、DataFilePath.

下面的代码是我对此的尝试,但它给出了以下异常: System.InvalidOperationException: Connection is busy

// Artist to Songs is a 1 to many relationship
foreach(var artist in db.Artist.Include(x => x.Songs)) 
{
    try
    {
        // Load details for artist and songs from artist.DataFilePath
        var fileData = LoadArtistFile(artist.DataFilePath);

        // UpdateArtists() will
        //  - update each property for the Artists entity
        //  - add any new Songs 
        //  - update each Song property for existing songs
        UpdateArtist(artist, fileData);
        
        db.SaveChanges();
    } catch (Exception err)
    {
        _logger.LogError(err.Message);
    }
}

2 个问题:

  1. 如何修复上面的代码以避免连接忙问题? (我尝试了很多其他变体,包括延迟加载和显式加载)。

  2. 如果艺术家 table 有数百万行,因此歌曲 table 有数百万行 - 这仍然是更新两者的最有效方法吗 tables? (留在 EF Core 中)

更新

我采纳了 David 的回答并修改了使用批处理的方法,因为我无法一次将所有行加载到内存中...

int batchSize = 50; // batch of 50 gave best performance (vs 10 and 100)
int currentBatch = 0;
// get the max id from the database for a stopping point (probably a better way to do this)
int maxId = GetMaxArtistID();
bool done = false;
while (!done)
{
    using (var db = new Context())
        {
            var artists = db.Artists
                .OrderBy(x => x.Id)
                .Skip(currentBatch++ * batchSize)
                .Take(batchSize)
                .Include(x => x.Songs)
                .ToList();
            var ids = String.Join(", ", artists.Select(x => x.Id));
            Console.WriteLine($"Working on batch:{currentBatch} ids: {ids}");
            foreach (var artist in artists)
            {
                var fileData = LoadArtistFile(artist.DataFilePath)
                UpdateArtist(artist, fileData);
                if (artist.Id >= maxId)
                    done = true;
            }
            db.SaveChanges();
        }
    }
}

How do I fix the code above to avoid the Connection is busy problem?

在迭代之前将数据加载到内存中。例如

foreach(var artist in db.Artist.Include(x => x.Songs).ToList()) 

If the artists table were millions of rows and therefore songs table were 10's of millions of rows - is this still the most effective way to update the both tables? (staying within EF Core)

EF 本身不做服务器端数据修改,所以这确实是唯一的方法。但在某种程度上,您希望 使用 EF,并将您的数据文件加载到 table 并在服务器端执行更新。