为什么这个带有反射语句的 linq 打败了我编译的表达式树？

Question

受到此 blogpost 的启发，我开始使用编译的表达式树重构以下 Linq 查询：

var result = dummies.Select(y =>
                   y.GetType().GetProperties()
                   .Where(x => x.GetMethod.IsPublic)
                   .Where(x => fields.Contains(x.Name, StringComparer.OrdinalIgnoreCase))
                   .ToDictionary(x => x.Name, x => x.GetValue(y)))
                   .Where(x => x.Any());

该代码旨在提取一组指定属性的值，为每个元素返回一个字典。在我第一次编写自己的表达式树时，我想出了这个解决方案来生成属性调用：

        foreach (string propName in Properties)
        {
            var prop = typeof(DummyType).GetProperty(propName);

            if (prop != null)
            {
                props.Add(prop);
            }
        }

        var accessors = new List<Tuple<string, Func<DummyType, object>>>();

        foreach (var prop in props)
        {
            var instance = Expression.Parameter(typeof(DummyType));
            var call = Expression.Property(instance, prop);
            var expr = Expression.Lambda<Func<DummyType, object>>(call, instance).Compile();
            accessors.Add(Tuple.Create(prop.Name, expr));
        }

对于每个 DummyType 元素，accessors 中的调用将被迭代此实现无法处理返回值类型的属性，尽管我能够使用 MakeGenericType 结合 DynamicInvoke 调用，但因为它被记录为“后期绑定”，我已经丢弃它以避免它扭曲性能。

结果令人惊讶，Linq 查询击败了我的表达式树，尽管它为我装箱值类型并且为每个元素调用 GetProperties，而 linq 表达式属性访问器是在从虚拟类型集合中收集值之前生成。

|   Method |         Mean |      Error |     StdDev |  Ratio | RatioSD |
|--------- |-------------:|-----------:|-----------:|-------:|--------:|
|     Linq |     73.09 ns |   0.878 ns |   0.778 ns |   1.00 |    0.00 |
| ExprTree | 16,293.69 ns | 184.834 ns | 172.894 ns | 222.83 |    3.96 |

基准是使用 benchmark.net 生成的。

为什么表达式树方法明显更慢？
假设表达式树解决方案更快是否公平？
奖励：在此上下文中使用 MakeGenericType 解决方案对性能有何影响？

编辑：我对代码进行了一些重构

它不再适用于通用类型，但仅适用于 DummyType
linq 解决方案不再查询 public 属性只是为了否认它的性能优势
表达式。树解决方案仅在使用属性调用时获取“访问器”/“函数指针”（？）一次以提取和设置 DummyType 实例，而 linq 查询检索 dummies 集合中每个元素的属性
仅提取字符串属性以避免必须满足 DummyType
不再创建字典集合，而是创建键值对的一维集合

    public class MyBenchMarks    
    {
        IEnumerable<DummyType> Dummies = DummyType.GenerateDummySet();
        IEnumerable<string> Properties = new string[] { "Prop1" };

        [Benchmark(Description = "Linq", Baseline = true)]
        public Object LinqSolution() => new Mappers().LinqSolution(Properties, Dummies);


        [Benchmark(Description = "ExprTree", Baseline = false)]
        public void ExprTreeSolution() => new Mappers().ExprTreeSolution(Properties, Dummies);
    }

    public class Mappers
    {
        List<Tuple<string, Func<DummyType, object>>> GetAccessors(IEnumerable<string> fields)
        {
            List<PropertyInfo> props = new List<PropertyInfo>(fields.Select(x => typeof(DummyType).GetProperty(x)).Where(x => x != null));
            var accessors = new List<Tuple<string, Func<DummyType, object>>>();

            foreach (var prop in props)
            {
                var instance = Expression.Parameter(typeof(DummyType));
                var call = Expression.Property(instance, prop);
                var expr = Expression.Lambda<Func<DummyType, object>>(call, instance).Compile();

                accessors.Add(Tuple.Create(prop.Name, expr));
            }

            return accessors;
        }

        public IEnumerable<KeyValuePair<string, object>> ExprTreeSolution(IEnumerable<string> fields, IEnumerable<DummyType> dummies)
        {
            List<KeyValuePair<string, object>> result = new List<KeyValuePair<string, object>>();
            var accessors = GetAccessors(fields);

            foreach (var dummy in dummies)
            {
                foreach (var accessor in accessors)
                {
                    var propResult = accessor.Item2(dummy);
                    result.Add(KeyValuePair.Create(accessor.Item1, propResult));
                }
            }

            return result;
        }

        public IEnumerable<KeyValuePair<string, object>> LinqSolution<T>(IEnumerable<String> fields, IEnumerable<T> dummies)
        {
            var result = dummies.Select(y =>
                  y.GetType().GetProperties()
                  .Where(x => fields.Contains(x.Name, StringComparer.OrdinalIgnoreCase))
                  .Select(x => KeyValuePair.Create(x.Name, x.GetValue(y))).ToList())
                  .SelectMany(x => x);

            return result;
        }
    }

    public class DummyType
    {
        public bool Prop0 { get; set; }
        public string Prop1 { get; set; }
        public int Prop2 { get; set; }

        public static List<DummyType> GenerateDummySet()
        {
            return Enumerable.Range(0, 100).Select(x =>
                 new DummyType
                 {
                     Prop0 = true,
                     Prop1 = "fooBar",
                     Prop2 = x
                 }).ToList();
        }
    }

对应结果：

BenchmarkDotNet = v0.12.1, OS = Windows 10.0.19041.630(2004 /?/ 20H1)
Intel Core i5-8600K CPU 3.60GHz (Coffee Lake), 1 CPU, 6 logical and 6 physical cores
.NET Core SDK=5.0.100  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT[AttachedDebugger]
DefaultJob : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

|   Method |         Mean |      Error |     StdDev |    Ratio | RatioSD |
|--------- |-------------:|-----------:|-----------:|---------:|--------:|
| Linq     | 66.14 ns     | 0.162 ns   | 0.143 ns   | 1.00     | 0.00    |
| ExprTree | 70,366.57 ns | 500.248 ns | 443.457 ns | 1,063.84 | 7.51    |

此代码可以运行来自使用

的控制台应用程序

BenchmarkRunner.Run(typeof(Program).Assembly);

Answer 1

另一个猜测，添加到我的评论中：我刚刚注意到版本 A 和版本 B 似乎 return 从假人的属性中读取的最终值。

如果您已经完全按照此处显示的方式测量了版本 A 和版本 B 的时间，请注意，您不仅在测量通过一种或另一种方式访问数据的时间，而且还测量设置所有东西的时间。

例如，在版本 A 中，您可能使用 y.GetType().GetProperties() 将所有道具合二为一，从而节省了一些时间去吧，而在版本 B 中，你正在做一些可能非常浪费的事情，方法是查看一些“属性”列表并分别查找每个属性按名称：var prop = typeof(DummyType).GetProperty(propName);

此外，如果您按所示测量它，则版本 B 包括在运行时在您 .Compile(); 表达式的位置生成动态程序集（或多个程序集）。这可能会花费很多时间，这可能会增加很多统计数据。

所以...我认为你必须认真修改你的问题。遗漏了很多重要的信息，只能靠猜测了。

Answer 2

主要问题是 LinqSolution 正在返回延迟的 LINQ IEnumerable<>。它实际上并没有在做工作（反射）。尝试将 return result 更改为 return result.ToList()。这将至少有助于确保您将苹果与苹果进行比较。

除此之外，认识到编译表达式的行为是相当昂贵的。除非多次重复使用已编译的函数，否则您可能不会看到很大的性能提升。要查看实际效果，请尝试在 GenerateDummySet 中生成 10000 个项目，而不是仅生成 100 个。

要在实际代码中利用这一点，请尝试记忆编译后的函数（例如，使用静态 Lazy<> 初始化）。

为什么这个带有反射语句的 linq 打败了我编译的表达式树？

Why does this linq with reflection statement beat my compiled expression tree?

c#

reflection

expression-trees