为什么一个代表比另一个更快?

Why one delegate is faster than the other?

我正在尝试了解两个代表之间表现差异的原因。它发生在我试图解决 question. @Enigmativity 一种类型转换的替代方法时,这导致委托具有更快的调用。这是该代码的最小版本:

delegate void MyAction<T>(T val);
static int Counter;

// My suggestion
static MyAction<T> GetAction1<T>()
    => new MyAction<T>((Action<T>)(object)ActionInt);

// Enigmativity's suggestion
static MyAction<T> GetAction2<T>()
    => (MyAction<T>)(Delegate)(MyAction<int>)ActionInt;

static void ActionInt(int val) { Counter++; }

有一个自定义通用委托类型 MyAction<T>,它与内置 Action<T> 具有相同的签名。我们想从一个通用的 <T> 方法实例化这个委托,我们想在内部将它转换为特定类型的 ActionInt 方法。你可以看到我的方法和 Enigmativity 的方法。似乎在这两种情况下,类型转换都发生在 MyAction<T> 委托的实例化期间。调用生成的委托不应产生类型转换开销。至少这是我的理论。但是,当我测量生成的代表的表现时,Enigmativity 的代表始终比我的代表快 20% 左右:

static void Test(string title, MyAction<int> action)
{
    Counter = 0;
    var stopwatch = Stopwatch.StartNew();
    for (int i = 0; i < 100_000_000; i++) action(i);
    stopwatch.Stop();
    Console.WriteLine($"{title}, Counter: {Counter:#,0}, Duration: {stopwatch.ElapsedMilliseconds:#,0} msec");
}

Test("GetAction1", GetAction1<int>());
Test("GetAction2", GetAction2<int>());
Test("GetAction1", GetAction1<int>());
Test("GetAction2", GetAction2<int>());

输出:

GetAction1, Counter: 100,000,000, Duration: 444 msec
GetAction2, Counter: 100,000,000, Duration: 374 msec
GetAction1, Counter: 100,000,000, Duration: 447 msec
GetAction2, Counter: 100,000,000, Duration: 371 msec

Try it on Fiddle.

谁能解释为什么会这样?

使用反编译器,我们可以发现:

实施GetAction1<T>()

IL_0000: ldnull
IL_0001: ldftn        void ConsoleApp1.UnderTest::ActionInt(int32)
IL_0007: newobj       instance void class [System.Runtime]System.Action`1<int32>::.ctor(object, native int)
IL_000c: castclass    class [System.Runtime]System.Action`1<!!0/*T*/>
IL_0011: ldftn        instance void class [System.Runtime]System.Action`1<!!0/*T*/>::Invoke(!0/*T*/)
IL_0017: newobj       instance void class ConsoleApp1.UnderTest/MyAction`1<!!0/*T*/>::.ctor(object, native int)
IL_001c: ret

实施GetAction2<T>()

IL_0000: ldnull
IL_0001: ldftn        void ConsoleApp1.UnderTest::ActionInt(int32)
IL_0007: newobj       instance void class ConsoleApp1.UnderTest/MyAction`1<int32>::.ctor(object, native int)
IL_000c: castclass    class ConsoleApp1.UnderTest/MyAction`1<!!0/*T*/>
IL_0011: ret

您可以在第一种情况下看到它实际上创建了两个委托,并将一个与另一个链接起来。

在第二种情况下,它只创建了一个委托。

我无法解释具体原因,但我认为这是因为 GetAction1.

object 的额外演员表

似乎有一个更快的实现,即:

 public static MyAction<T> GetAction3<T>()
      => x => ActionInt((int)(object)x);

这会生成更长的 IL 代码:

IL_0000: ldsfld       class ConsoleApp1.UnderTest/MyAction`1<!0/*T*/> class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<>9__4_0'
IL_0005: dup
IL_0006: brtrue.s     IL_0021
IL_0008: pop
IL_0009: ldsfld       class ConsoleApp1.UnderTest/'<>c__4`1'<!0/*T*/> class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<>9'
IL_000e: ldftn        instance void class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<GetAction3>b__4_0'(!0/*T*/)
IL_0014: newobj       instance void class ConsoleApp1.UnderTest/MyAction`1<!!0/*T*/>::.ctor(object, native int)
IL_0019: dup
IL_001a: stsfld       class ConsoleApp1.UnderTest/MyAction`1<!0/*T*/> class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<>9__4_0'
IL_001f: stloc.0      // V_0
IL_0020: ldloc.0      // V_0
IL_0021: ret

然而,调用 GetAction3() 和执行 returns 的操作都更快。

这是我测试的基准程序:

using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

namespace ConsoleApp1;

public static class Program
{
    public static void Main()
    {
        var summary = BenchmarkRunner.Run<UnderTest>();
    }
}

public class UnderTest
{
    public delegate void MyAction<T>(T val);
    public static int counter;

    public static MyAction<T> GetAction1<T>()
        => new MyAction<T>((Action<T>)(object)ActionInt);

    // Enigmativity's suggestion
    public static MyAction<T> GetAction2<T>()
        => (MyAction<T>)(Delegate)(MyAction<int>)ActionInt;

    public static MyAction<T> GetAction3<T>()
        => x => ActionInt((int)(object)x);

    public static MyAction<int> Act1 = GetAction1<int>();
    public static MyAction<int> Act2 = GetAction2<int>();
    public static MyAction<int> Act3 = GetAction3<int>();

    static void ActionInt(int val) { counter++; }

    [Benchmark]
    public void Action1()
    {
        _ = GetAction1<int>();
    }

    [Benchmark]
    public void Action2()
    {
        _ = GetAction2<int>();
    }

    [Benchmark]
    public void Action3()
    {
        _ = GetAction3<int>();
    }

    [Benchmark]
    public void RunAction1()
    {
        Act1(0);
    }

    [Benchmark]
    public void RunAction2()
    {
        Act2(0);
    }

    [Benchmark]
    public void RunAction3()
    {
        Act3(0);
    }
}

结果:

|     Method |       Mean |     Error |    StdDev |
|----------- |-----------:|----------:|----------:|
|    Action1 | 13.3355 ns | 0.1670 ns | 0.1480 ns |
|    Action2 |  6.9685 ns | 0.1313 ns | 0.1228 ns |
|    Action3 |  1.3437 ns | 0.0321 ns | 0.0285 ns |
| RunAction1 |  2.4100 ns | 0.0454 ns | 0.0425 ns |
| RunAction2 |  1.6493 ns | 0.0594 ns | 0.0527 ns |
| RunAction3 |  0.8347 ns | 0.0295 ns | 0.0276 ns |

当然,none 的操作实际上使用传递给它们的 int,因为它们都只是调用 ActionInt() 而忽略了它的参数。

我想你也可以将其实现为:

public static MyAction<T> GetAction3<T>()
    => _ => ActionInt(0);

这可能会更快,但我还没有尝试过。