LINQ Lambda 与查询语法性能

LINQ Lambda vs Query Syntax Performance

我今天在我的项目中看到了一个 LINQ 查询语法,它正在计算 List 中具有特定条件的项目,如下所示:

int temp = (from A in pTasks 
            where A.StatusID == (int)BusinessRule.TaskStatus.Pending     
            select A).ToList().Count();

我想通过使用 Count(Func) 重写它来重构它以使其更具可读性。我认为这在性能方面也会很好,所以我写道:

int UnassignedCount = pTasks.Count(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending);

但是当我使用 StopWatch 检查时,lambda 表达式所花费的时间总是比查询语法多:

Stopwatch s = new Stopwatch();
s.Start();
int UnassignedCount = pTasks.Count(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending);
s.Stop();
Stopwatch s2 = new Stopwatch();
s2.Start();
int temp = (from A in pTasks 
            where A.StatusID == (int)BusinessRule.TaskStatus.Pending
            select A).ToList().Count();
s2.Stop();

有人可以解释为什么会这样吗?

我模拟了你的情况。是的,这些查询的执行时间是不同的。但是,这种差异的原因不是查询的语法。使用方法或查询语法并不重要。两者产生相同的结果,因为 查询表达式在编译之前被翻译成它们的 lambda 表达式

但是,如果您注意到这两个查询在 all.Your 处不相同,第二个查询将在编译之前被翻译成它的 lambda 语法(您可以删除 ToList() 来自查询,因为它是多余的):

pTasks.Where(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending).Count();

现在我们有两个 lambda 语法的 Linq 查询。我上面说的那个和这个:

pTasks.Count(x => x.StatusID == (int)BusinessRule.TaskStatus.Pending);

现在的问题是:
为什么这两个查询的执行时间不同?

让我们找出答案:
我们可以通过回顾这些来理解这种差异的原因:
- .Where(this IEnumerable<TSource> source, Func<TSource, bool> predicate).Count(this IEnumerable<TSource> source)

- Count(this IEnumerable<TSource> source, Func<TSource, bool> predicate)

下面是Count(this IEnumerable<TSource> source, Func<TSource, bool> predicate)的实现:

public static int Count<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null) throw Error.ArgumentNull("source");
    if (predicate == null) throw Error.ArgumentNull("predicate");
    int count = 0;
    foreach (TSource element in source) {
        checked {
            if (predicate(element)) count++;
        }
    }
    return count;
}

这里是 Where(this IEnumerable<TSource> source, Func<TSource, bool> predicate):

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate)
{
    if (source == null) 
        throw Error.ArgumentNull("source");
    if (predicate == null) 
        throw Error.ArgumentNull("predicate");
    if (source is Iterator<TSource>) 
        return ((Iterator<TSource>)source).Where(predicate);
    if (source is TSource[]) 
        return new WhereArrayIterator<TSource>((TSource[])source, predicate);
    if (source is List<TSource>) 
        return new WhereListIterator<TSource>((List<TSource>)source, predicate);
    return new WhereEnumerableIterator<TSource>(source, predicate);
}

让我们关注Where()实现。如果您的集合是 List,它将 return WhereListIterator(),但 Count() 只会迭代源代码。 在我看来,他们在 WhereListIteratorimplementation 中做了一些 加速 。在此之后,我们调用 Count() 方法,该方法不接受谓词作为输入,只会迭代过滤后的集合。


关于 WhereListIterator 的实施速度:

我在那里找到了 this question in SO: LINQ performance Count vs Where and Count. You can read @Matthew Watson answer。他解释了这两个查询之间的性能差异。结果是: Where迭代器避免间接虚拟table调用,而是直接调用迭代器方法。 正如您在该答案中看到的那样,将发出 call 指令而不是 callvirt。而且,callvirtcall 慢:

来自书本CLR via C#:

When the callvirt IL instruction is used to call a virtual instance method, the CLR discovers the actual type of the object being used to make the call and then calls the method polymorphically. In order to determine the type, the variable being used to make the call must not be null. In other words, when compiling this call, the JIT compiler generates code that verifes that the variable’s value is not null. If it is null, the callvirt instruction causes the CLR to throw a NullReferenceException. This additional check means that the callvirt IL instruction executes slightly more slowly than the call instruction.

正如 Farhad 所说,Where(x).Count()Count(x) 的实现各不相同。第一个实例化了一个额外的迭代器,在我的电脑上它花费了大约 30.000 ticks(不管集合大小)

此外,ToList 不是免费的。它分配内存。这需要时间。在我的电脑上,它大约使执行时间加倍。 (所以线性依赖于集合大小)

此外,调试需要启动时间。因此很难一次准确地衡量性能。我会推荐一个像这个例子一样的循环。然后,忽略第一组结果。

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            var pTasks = Task.GetTasks();
            for (int i = 0; i < 5; i++)
            {

                var s1 = Stopwatch.StartNew();
                var count1 = pTasks.Count(x => x.StatusID == (int) BusinessRule.TaskStatus.Pending);
                s1.Stop();
                Console.WriteLine(s1.ElapsedTicks);

                var s2 = Stopwatch.StartNew();
                var count2 =
                    (
                        from A in pTasks
                        where A.StatusID == (int) BusinessRule.TaskStatus.Pending
                        select A
                        ).ToList().Count();
                s2.Stop();
                Console.WriteLine(s2.ElapsedTicks);

                var s3 = Stopwatch.StartNew();
                var count3 = pTasks.Where(x => x.StatusID == (int) BusinessRule.TaskStatus.Pending).Count();
                s3.Stop();
                Console.WriteLine(s3.ElapsedTicks);


                var s4 = Stopwatch.StartNew();
                var count4 =
                    (
                        from A in pTasks
                        where A.StatusID == (int) BusinessRule.TaskStatus.Pending
                        select A
                        ).Count();
                s4.Stop();
                Console.WriteLine(s4.ElapsedTicks);

                var s5 = Stopwatch.StartNew();
                var count5 = pTasks.Count(x => x.StatusID == (int) BusinessRule.TaskStatus.Pending);
                s5.Stop();
                Console.WriteLine(s5.ElapsedTicks);
                Console.WriteLine();
            }
            Console.ReadLine();
        }
    }

    public class Task
    {
        public static IEnumerable<Task> GetTasks()
        {
            for (int i = 0; i < 10000000; i++)
            {
                yield return new Task { StatusID = i % 3 };
            }
        }

        public int StatusID { get; set; }
    }

    public class BusinessRule
    {
        public enum TaskStatus
        {
            Pending,
            Other
        }
    }
}