IEnumerable 扩展以批量拉取结果
IEnumerable extension to pull results in batches
我正在使用 Entity Framework 并且经常 运行 遇到我想要遍历大量记录的问题。我的问题是,如果我一次把它们全部拉出来,我就有超时的风险;如果我一次拉一个,实际上每条记录都将是一个单独的查询,并且需要很长时间。
我想实现一个 Linq 扩展,它可以批量提取结果,但仍然可以用作 IEnumerable。我会给它一组键(很可能是我正在提取的任何记录的主要 ID)、批处理大小(简单对象较大,复杂对象较小),以及定义如何应用的 Func
一组记录类型 T
的一组键。我会这样称呼它:
//get the list of items to pull--in this case, a set of order numbers
List<int> orderNumbers = GetOrderNumbers();
//set the batch size
int batchSize = 100;
//loop through the set using BatchedSelector extension. Note the selection
//function at the end which allows me to
foreach (var order in dbContext.Orders.BatchedSelector(repairNumbers, batchSize, (o, k) => k.Contains(o.OrderNumber)))
{
//do things
}
这是我的解决方案草稿:
/// <summary>
/// A Linq extension that fetches IEnumerable results in batches, aggregating queries
/// to improve EF performance. Operates transparently to application and acts like any
/// other IEnumerable.
/// </summary>
/// <typeparam name="T">Header record type</typeparam>
/// <param name="source">Full set of records</param>
/// <param name="keys">The set of keys that represent specific records to pull</param>
/// <param name="selector">Function that filters the result set to only those which match the key set</param>
/// /// <param name="maxBatchSize">Maximum number of records to pull in one query</param>
/// <returns></returns>
public static IEnumerable<T> BatchedSelector<T>(this IEnumerable<T> source, IEnumerable<int> keys, Func<T, IEnumerable<int>, bool> selector, int maxBatchSize)
{
//the index of the next key (or set of keys) to process--we start at 0 of course
int currentKeyIndex = 0;
//to provide some resiliance, we will allow the batch size to decrease if we encounter errors
int currentBatchSize = maxBatchSize;
int batchDecreaseAmount = Math.Max(1, maxBatchSize / 10); //10%, but at least 1
//other starting variables; a list to hold results and the associated batch of keys
List<T> resultList = null;
IEnumerable<int> keyBatch = null;
//while there are still keys remaining, grab the next set of keys
while ((keyBatch = keys.Skip(currentKeyIndex).Take(currentBatchSize)).Count() > 0)
{
//try to fetch the results
try
{
resultList = source.Where(o => selector(o, keyBatch)).ToList(); // <-- this is where errors occur
currentKeyIndex += maxBatchSize; //increment key index to mark these keys as processed
}
catch
{
//decrease the batch size for our retry
currentBatchSize -= batchDecreaseAmount;
//if we've run out of batch overhead, throw the error
if (currentBatchSize <= 0) throw;
//otherwise, restart the loop
continue;
}
//since we've successfully gotten the set of keys, yield the results
foreach (var match in resultList) yield return match;
}
//the loop is over; we're done
yield break;
}
出于某种原因,"where" 子句无效。我已验证正确的密钥在 keyBatch 中,但预期的 WHERE OrderNumber IN (k1, k2, k3, kn)
行不存在。就好像我根本没有 where 语句。
我最好的猜测是我需要构建表达式并对其进行编译,但我不确定这是否是问题所在,我也不确定如何修复它。会喜欢任何输入。谢谢!
Where
、Skip
、Take
等所有这些方法都是扩展方法,不是IEnumerable<T>
的成员。因为所有这些方法实际上有 2 个版本,一个用于 IEnumerable<>
,一个用于 IQueryable<>
。
可枚举扩展
Where(Func<TSource, bool> predicate)
Select(Func<TSource, TResult> selector)
可查询扩展
Where(Expression<Func<TSource, bool>> predicate)
Select(Expression<Func<TSource, TResult>> predicate)
如您所见,不同之处在于 Queryable
扩展采用 Expression<>
而不是直接委托。这些表达式允许 EF 将您的代码转换为 SQL.
由于您在 BatchedSelector()
方法中将 variables/parameters 声明为 IEnumerable<>
您正在使用 Enumerable
class 中的扩展,并且此扩展已执行在记忆中。
一个常见的错误是认为由于多态性,DbSet
(IQueryable<>
) 无论您将它用作 IEnumerable<>
,查询都会被翻译成 SQL,这仅适用于适当的成员,但不适用于扩展方法。
可以修复您的代码,将 IEnumerable<>
variables/parameters 更改为 IQueryable<>
。
您可以详细了解 IEnumerable
和 IQueryable
here 之间的区别。
首先谢谢大家Arturo。你让我走上了这个解决方案的正确轨道。我认为这是一个 Linq->Entity 问题,但这些问题对我来说仍然远非凭直觉来解决。
其次,我大量借鉴了Shimmy's answer to this question。谢谢希米!
首先,我更新了方法以支持整数以外的键类型,因为为什么不呢。所以方法签名现在是(注意对 IQueryable 源的更改):
public static IEnumerable<T> BatchedSelector<T, TKey>(this IQueryable<T> source, Expression<Func<T, TKey>> selector, IEnumerable<TKey> keys, int maxBatchSize)
除了产生错误的行之外,该方法基本保持不变,现在被替换为:
resultList = source.WhereIn(selector, keyBatch).ToList();
WhereIn
是主要从 Shimmy 借来的 Linq 扩展:
public static IQueryable<T> WhereIn<T, TKey>(this IQueryable<T> source, Expression<Func<T, TKey>> selector, IEnumerable<TKey> keyCollection)
{
if (selector == null) throw new ArgumentNullException("Null selector");
if (keyCollection == null) throw new ArgumentNullException("Null collection");
//if no items in collection, no results
if (!keyCollection.Any()) return source.Where(t => false);
//assemble expression
var p = selector.Parameters.Single();
var equals = keyCollection.Select(value => (Expression)Expression.Equal(selector.Body, Expression.Constant(value, typeof(TKey))));
var body = equals.Aggregate((accumulate, equal) => Expression.Or(accumulate, equal));
//return expression
return source.Where(Expression.Lambda<Func<T, bool>>(body, p));
}
这让我学到了一些很酷的东西:如果你输入一个包含一堆常量比较的 where 子句,它将被转换为一个 SQL In
语句!整洁!
通过这些更改,该方法可以快速轻松地产生结果。
我正在使用 Entity Framework 并且经常 运行 遇到我想要遍历大量记录的问题。我的问题是,如果我一次把它们全部拉出来,我就有超时的风险;如果我一次拉一个,实际上每条记录都将是一个单独的查询,并且需要很长时间。
我想实现一个 Linq 扩展,它可以批量提取结果,但仍然可以用作 IEnumerable。我会给它一组键(很可能是我正在提取的任何记录的主要 ID)、批处理大小(简单对象较大,复杂对象较小),以及定义如何应用的 Func
一组记录类型 T
的一组键。我会这样称呼它:
//get the list of items to pull--in this case, a set of order numbers
List<int> orderNumbers = GetOrderNumbers();
//set the batch size
int batchSize = 100;
//loop through the set using BatchedSelector extension. Note the selection
//function at the end which allows me to
foreach (var order in dbContext.Orders.BatchedSelector(repairNumbers, batchSize, (o, k) => k.Contains(o.OrderNumber)))
{
//do things
}
这是我的解决方案草稿:
/// <summary>
/// A Linq extension that fetches IEnumerable results in batches, aggregating queries
/// to improve EF performance. Operates transparently to application and acts like any
/// other IEnumerable.
/// </summary>
/// <typeparam name="T">Header record type</typeparam>
/// <param name="source">Full set of records</param>
/// <param name="keys">The set of keys that represent specific records to pull</param>
/// <param name="selector">Function that filters the result set to only those which match the key set</param>
/// /// <param name="maxBatchSize">Maximum number of records to pull in one query</param>
/// <returns></returns>
public static IEnumerable<T> BatchedSelector<T>(this IEnumerable<T> source, IEnumerable<int> keys, Func<T, IEnumerable<int>, bool> selector, int maxBatchSize)
{
//the index of the next key (or set of keys) to process--we start at 0 of course
int currentKeyIndex = 0;
//to provide some resiliance, we will allow the batch size to decrease if we encounter errors
int currentBatchSize = maxBatchSize;
int batchDecreaseAmount = Math.Max(1, maxBatchSize / 10); //10%, but at least 1
//other starting variables; a list to hold results and the associated batch of keys
List<T> resultList = null;
IEnumerable<int> keyBatch = null;
//while there are still keys remaining, grab the next set of keys
while ((keyBatch = keys.Skip(currentKeyIndex).Take(currentBatchSize)).Count() > 0)
{
//try to fetch the results
try
{
resultList = source.Where(o => selector(o, keyBatch)).ToList(); // <-- this is where errors occur
currentKeyIndex += maxBatchSize; //increment key index to mark these keys as processed
}
catch
{
//decrease the batch size for our retry
currentBatchSize -= batchDecreaseAmount;
//if we've run out of batch overhead, throw the error
if (currentBatchSize <= 0) throw;
//otherwise, restart the loop
continue;
}
//since we've successfully gotten the set of keys, yield the results
foreach (var match in resultList) yield return match;
}
//the loop is over; we're done
yield break;
}
出于某种原因,"where" 子句无效。我已验证正确的密钥在 keyBatch 中,但预期的 WHERE OrderNumber IN (k1, k2, k3, kn)
行不存在。就好像我根本没有 where 语句。
我最好的猜测是我需要构建表达式并对其进行编译,但我不确定这是否是问题所在,我也不确定如何修复它。会喜欢任何输入。谢谢!
Where
、Skip
、Take
等所有这些方法都是扩展方法,不是IEnumerable<T>
的成员。因为所有这些方法实际上有 2 个版本,一个用于 IEnumerable<>
,一个用于 IQueryable<>
。
可枚举扩展
Where(Func<TSource, bool> predicate)
Select(Func<TSource, TResult> selector)
可查询扩展
Where(Expression<Func<TSource, bool>> predicate)
Select(Expression<Func<TSource, TResult>> predicate)
如您所见,不同之处在于 Queryable
扩展采用 Expression<>
而不是直接委托。这些表达式允许 EF 将您的代码转换为 SQL.
由于您在 BatchedSelector()
方法中将 variables/parameters 声明为 IEnumerable<>
您正在使用 Enumerable
class 中的扩展,并且此扩展已执行在记忆中。
一个常见的错误是认为由于多态性,DbSet
(IQueryable<>
) 无论您将它用作 IEnumerable<>
,查询都会被翻译成 SQL,这仅适用于适当的成员,但不适用于扩展方法。
可以修复您的代码,将 IEnumerable<>
variables/parameters 更改为 IQueryable<>
。
您可以详细了解 IEnumerable
和 IQueryable
here 之间的区别。
首先谢谢大家Arturo。你让我走上了这个解决方案的正确轨道。我认为这是一个 Linq->Entity 问题,但这些问题对我来说仍然远非凭直觉来解决。
其次,我大量借鉴了Shimmy's answer to this question。谢谢希米!
首先,我更新了方法以支持整数以外的键类型,因为为什么不呢。所以方法签名现在是(注意对 IQueryable 源的更改):
public static IEnumerable<T> BatchedSelector<T, TKey>(this IQueryable<T> source, Expression<Func<T, TKey>> selector, IEnumerable<TKey> keys, int maxBatchSize)
除了产生错误的行之外,该方法基本保持不变,现在被替换为:
resultList = source.WhereIn(selector, keyBatch).ToList();
WhereIn
是主要从 Shimmy 借来的 Linq 扩展:
public static IQueryable<T> WhereIn<T, TKey>(this IQueryable<T> source, Expression<Func<T, TKey>> selector, IEnumerable<TKey> keyCollection)
{
if (selector == null) throw new ArgumentNullException("Null selector");
if (keyCollection == null) throw new ArgumentNullException("Null collection");
//if no items in collection, no results
if (!keyCollection.Any()) return source.Where(t => false);
//assemble expression
var p = selector.Parameters.Single();
var equals = keyCollection.Select(value => (Expression)Expression.Equal(selector.Body, Expression.Constant(value, typeof(TKey))));
var body = equals.Aggregate((accumulate, equal) => Expression.Or(accumulate, equal));
//return expression
return source.Where(Expression.Lambda<Func<T, bool>>(body, p));
}
这让我学到了一些很酷的东西:如果你输入一个包含一堆常量比较的 where 子句,它将被转换为一个 SQL In
语句!整洁!
通过这些更改,该方法可以快速轻松地产生结果。