aggregating/filtering 导航属性时的性能命中/内存消耗

Question

假设我有以下一组类：

public class MegaBookCorporation
{
    public int ID { get; private set}
    public int BooksInStock 
    {
        get
        {
            return Stores.Sum( x => x.BooksInStock)
        }
    }
    public virtual ICollection<MegaBookCorporationStore> Stores { get; set; }
}


public class MegaBookCorporationStore
{
    public int ID { get; private set; }
    public string BookStoreName { get; private get; }
    public virtual MegaBookCorporation ManagingCorporation { get; private set;}
    public int BooksInStock
    {
        get
        {
            return Books.Where( x=> !x.IsSold).Count();
        }
    }

    public virtual ICollection<Book> Books { get; set; }
}

public class Book
{
    public int IndividualBookTrackerID { get; private set; }
    public virtual MegaBookCorporationStore { get; private set; }
    public bool IsSold { get; private set; }
    public DateTime? SellingDate { get; private set;}
}

我在工作中讨论了在 MegaBookCorporation 中检索 NumberOfBooks 时所涉及的性能影响。两个重要事实：

1/ 我们正在按照虚拟关键字的建议使用带有延迟加载的 EF 6。

2/ 由于每本书都是单独跟踪的，因此数据库中的图书条目数量会很快变大。 table 的长运行可能有数亿的规模。我们每天可能会增加 100,000 本书。

我支持的观点是当前的实施很好，我们不会运行陷入问题。我的理解是调用GetEnumerator时会生成一个SQL语句来过滤集合。

我的同事提出的另一个建议是缓存图书数量。这意味着每当调用 AddBookToStock() 或 SellBook() 方法时更新字段 "int ComputedNumberOfBooks"。该字段需要在商店和公司中重复和更新类。（当然我们需要处理并发）

我知道添加这些字段没什么大不了的，但我真的对这个想法感到难过。对我来说，这看起来像是预先设计了一个不存在的问题，而且在我看来也不存在。

我决定用 SO 再次检查我的声明，发现 2 个相互矛盾的答案：

One saying that the whole Books collection would be pulled to memory，因为ICollection只是继承自IEnumerable。 The other saying the opposite : the navigation property will be treated as an IQueryable until it is evaluated。（为什么不呢，因为属性被代理包装了）

所以这是我的问题：

1- 真相是什么？

2- 即使引用了整个集合，您不认为这没什么大不了的，因为它是一个 IEnumerable（低内存使用率）。

3- 您如何看待此示例的内存消耗/性能影响，最好的方法是什么？

谢谢

Answer 1

判决

事实是，使用您定义的属性加载了整本 collection 书籍。原因如下。

理想情况下，你希望能够做到

var numberOfBooks = context.MegaBookCorporations
                           .Where(m => m.ID == someId)
                           .Select(m => m.BooksInStock)
                           .Single();

如果 EF 能够将其转换为 SQL，您将有一个仅 returns 整数且不将任何实体加载到内存中的查询。

但是，不幸的是，EF 无法做到这一点。它会抛出一个异常，即 BooksInStock.

没有 SQL 翻译

要避免此异常，您可以执行以下操作：

var numberOfBooks = context.MegaBookCorporations
                           .Where(m => m.ID == someId)
                           .Single()
                           .BooksInStock;

这极大地改变了事情。 Single() 将一个 MegaBookCorporation 绘制到内存中。访问其 BooksInStock 属性会触发延迟加载 MegaBookCorporation.Stores。随后，对于每个 Store，完整的 Books collection 被加载。最后，在内存中应用 LINQ 操作（x => !x.IsSold、Count、Sum）。

所以在这种情况下，the first link 是正确的。延迟加载总是加载完整的 collections。 collection一经加载，将不再加载。

但是 second link 也是正确的:)。

只要您设法在一个可以转换为 SQL 的 LINQ 语句中完成所有操作，导航属性和谓词将在数据库中进行评估，并且不会发生延迟加载。但是你不能使用 BooksInStock 属性。

实现此目的的唯一方法是使用像

这样的 LINQ 语句

var numberOfBooks = context.MegaBookCorporations
                           .Where(m => m.ID == someId)
                           .SelectMany(m => m.Stores)
                           .SelectMany(s => s.Books)
                           .Count();

这会执行一个非常高效的查询，其中包含一个连接和一个 COUNT，仅返回计数。

很遗憾，您的关键假设...

that a SQL statement would be generated to filter the collection when GetEnumerator is called.

不完全正确。生成 SQL 语句，但不包括过滤器。根据您提到的书籍数量，这将导致严重的性能和内存问题。

那怎么办？

如果您经常需要这些计数并且不想一直单独查询它们，则应该采取一些措施。你同事的想法，数据库中的冗余 ComputedNumberOfBooks 字段可能是一个解决方案，但我同意你的反对意见。

应该（几乎）不惜一切代价避免冗余。最糟糕的是，它总是需要客户端应用程序来保持双方同步。或者数据库触发器。

但是谈到数据库...如果这些计数很重要并且经常被查询，我会在 MegaBookCorporationStore table 中引入一个计算列 BooksInStock。它的公式可以简单地计算商店中的书籍数量。然后，您可以将此计算列作为标记为 DatabaseGeneratedOption.Computed 的属性添加到您的实体中。没有冗余。

Answer 2

What is the truth?

如果您使用 MegaBookCorporation.BooksInStock 获取存储的图书总数，将从数据库中加载所有图书。查询提供程序无法为属性 getter 的正文生成 SQL 表达式，只能获取所有数据并在内存中对其进行评估。

Even if the whole collection is referenced, don't you think that it's not a big deal since it would be an IEnumerable (low memory usage).

是的，这很重要，因为它根本无法扩展。它与 IEnumerable 无关。问题是在评估 Count().

之前获取所有数据

What do you think of the memory consumption / performance hit on this example, and what would be the best way to go?

内存消耗会随着数据库中存储的书籍数量的增加而增长。由于您只想了解他们的人数，这显然是不行的。 Here你可以看到如何正确地做到这一点。

aggregating/filtering 导航属性时的性能命中/内存消耗

Performance hit / Memory consumption when aggregating/filtering navigation properties

c#

entity-framework

lazy-loading

navigation-properties

entity-framework-6

判决

那怎么办？