为什么一个代表比另一个更快?
Why one delegate is faster than the other?
我正在尝试了解两个代表之间表现差异的原因。它发生在我试图解决 question. @Enigmativity 一种类型转换的替代方法时,这导致委托具有更快的调用。这是该代码的最小版本:
delegate void MyAction<T>(T val);
static int Counter;
// My suggestion
static MyAction<T> GetAction1<T>()
=> new MyAction<T>((Action<T>)(object)ActionInt);
// Enigmativity's suggestion
static MyAction<T> GetAction2<T>()
=> (MyAction<T>)(Delegate)(MyAction<int>)ActionInt;
static void ActionInt(int val) { Counter++; }
有一个自定义通用委托类型 MyAction<T>
,它与内置 Action<T>
具有相同的签名。我们想从一个通用的 <T>
方法实例化这个委托,我们想在内部将它转换为特定类型的 ActionInt
方法。你可以看到我的方法和 Enigmativity 的方法。似乎在这两种情况下,类型转换都发生在 MyAction<T>
委托的实例化期间。调用生成的委托不应产生类型转换开销。至少这是我的理论。但是,当我测量生成的代表的表现时,Enigmativity 的代表始终比我的代表快 20% 左右:
static void Test(string title, MyAction<int> action)
{
Counter = 0;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 100_000_000; i++) action(i);
stopwatch.Stop();
Console.WriteLine($"{title}, Counter: {Counter:#,0}, Duration: {stopwatch.ElapsedMilliseconds:#,0} msec");
}
Test("GetAction1", GetAction1<int>());
Test("GetAction2", GetAction2<int>());
Test("GetAction1", GetAction1<int>());
Test("GetAction2", GetAction2<int>());
输出:
GetAction1, Counter: 100,000,000, Duration: 444 msec
GetAction2, Counter: 100,000,000, Duration: 374 msec
GetAction1, Counter: 100,000,000, Duration: 447 msec
GetAction2, Counter: 100,000,000, Duration: 371 msec
谁能解释为什么会这样?
使用反编译器,我们可以发现:
实施GetAction1<T>()
:
IL_0000: ldnull
IL_0001: ldftn void ConsoleApp1.UnderTest::ActionInt(int32)
IL_0007: newobj instance void class [System.Runtime]System.Action`1<int32>::.ctor(object, native int)
IL_000c: castclass class [System.Runtime]System.Action`1<!!0/*T*/>
IL_0011: ldftn instance void class [System.Runtime]System.Action`1<!!0/*T*/>::Invoke(!0/*T*/)
IL_0017: newobj instance void class ConsoleApp1.UnderTest/MyAction`1<!!0/*T*/>::.ctor(object, native int)
IL_001c: ret
实施GetAction2<T>()
:
IL_0000: ldnull
IL_0001: ldftn void ConsoleApp1.UnderTest::ActionInt(int32)
IL_0007: newobj instance void class ConsoleApp1.UnderTest/MyAction`1<int32>::.ctor(object, native int)
IL_000c: castclass class ConsoleApp1.UnderTest/MyAction`1<!!0/*T*/>
IL_0011: ret
您可以在第一种情况下看到它实际上创建了两个委托,并将一个与另一个链接起来。
在第二种情况下,它只创建了一个委托。
我无法解释具体原因,但我认为这是因为 GetAction1
.
中 object
的额外演员表
似乎有一个更快的实现,即:
public static MyAction<T> GetAction3<T>()
=> x => ActionInt((int)(object)x);
这会生成更长的 IL 代码:
IL_0000: ldsfld class ConsoleApp1.UnderTest/MyAction`1<!0/*T*/> class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<>9__4_0'
IL_0005: dup
IL_0006: brtrue.s IL_0021
IL_0008: pop
IL_0009: ldsfld class ConsoleApp1.UnderTest/'<>c__4`1'<!0/*T*/> class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<>9'
IL_000e: ldftn instance void class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<GetAction3>b__4_0'(!0/*T*/)
IL_0014: newobj instance void class ConsoleApp1.UnderTest/MyAction`1<!!0/*T*/>::.ctor(object, native int)
IL_0019: dup
IL_001a: stsfld class ConsoleApp1.UnderTest/MyAction`1<!0/*T*/> class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<>9__4_0'
IL_001f: stloc.0 // V_0
IL_0020: ldloc.0 // V_0
IL_0021: ret
然而,调用 GetAction3()
和执行 returns 的操作都更快。
这是我测试的基准程序:
using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
namespace ConsoleApp1;
public static class Program
{
public static void Main()
{
var summary = BenchmarkRunner.Run<UnderTest>();
}
}
public class UnderTest
{
public delegate void MyAction<T>(T val);
public static int counter;
public static MyAction<T> GetAction1<T>()
=> new MyAction<T>((Action<T>)(object)ActionInt);
// Enigmativity's suggestion
public static MyAction<T> GetAction2<T>()
=> (MyAction<T>)(Delegate)(MyAction<int>)ActionInt;
public static MyAction<T> GetAction3<T>()
=> x => ActionInt((int)(object)x);
public static MyAction<int> Act1 = GetAction1<int>();
public static MyAction<int> Act2 = GetAction2<int>();
public static MyAction<int> Act3 = GetAction3<int>();
static void ActionInt(int val) { counter++; }
[Benchmark]
public void Action1()
{
_ = GetAction1<int>();
}
[Benchmark]
public void Action2()
{
_ = GetAction2<int>();
}
[Benchmark]
public void Action3()
{
_ = GetAction3<int>();
}
[Benchmark]
public void RunAction1()
{
Act1(0);
}
[Benchmark]
public void RunAction2()
{
Act2(0);
}
[Benchmark]
public void RunAction3()
{
Act3(0);
}
}
结果:
| Method | Mean | Error | StdDev |
|----------- |-----------:|----------:|----------:|
| Action1 | 13.3355 ns | 0.1670 ns | 0.1480 ns |
| Action2 | 6.9685 ns | 0.1313 ns | 0.1228 ns |
| Action3 | 1.3437 ns | 0.0321 ns | 0.0285 ns |
| RunAction1 | 2.4100 ns | 0.0454 ns | 0.0425 ns |
| RunAction2 | 1.6493 ns | 0.0594 ns | 0.0527 ns |
| RunAction3 | 0.8347 ns | 0.0295 ns | 0.0276 ns |
当然,none 的操作实际上使用传递给它们的 int
,因为它们都只是调用 ActionInt()
而忽略了它的参数。
我想你也可以将其实现为:
public static MyAction<T> GetAction3<T>()
=> _ => ActionInt(0);
这可能会更快,但我还没有尝试过。
我正在尝试了解两个代表之间表现差异的原因。它发生在我试图解决
delegate void MyAction<T>(T val);
static int Counter;
// My suggestion
static MyAction<T> GetAction1<T>()
=> new MyAction<T>((Action<T>)(object)ActionInt);
// Enigmativity's suggestion
static MyAction<T> GetAction2<T>()
=> (MyAction<T>)(Delegate)(MyAction<int>)ActionInt;
static void ActionInt(int val) { Counter++; }
有一个自定义通用委托类型 MyAction<T>
,它与内置 Action<T>
具有相同的签名。我们想从一个通用的 <T>
方法实例化这个委托,我们想在内部将它转换为特定类型的 ActionInt
方法。你可以看到我的方法和 Enigmativity 的方法。似乎在这两种情况下,类型转换都发生在 MyAction<T>
委托的实例化期间。调用生成的委托不应产生类型转换开销。至少这是我的理论。但是,当我测量生成的代表的表现时,Enigmativity 的代表始终比我的代表快 20% 左右:
static void Test(string title, MyAction<int> action)
{
Counter = 0;
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 100_000_000; i++) action(i);
stopwatch.Stop();
Console.WriteLine($"{title}, Counter: {Counter:#,0}, Duration: {stopwatch.ElapsedMilliseconds:#,0} msec");
}
Test("GetAction1", GetAction1<int>());
Test("GetAction2", GetAction2<int>());
Test("GetAction1", GetAction1<int>());
Test("GetAction2", GetAction2<int>());
输出:
GetAction1, Counter: 100,000,000, Duration: 444 msec
GetAction2, Counter: 100,000,000, Duration: 374 msec
GetAction1, Counter: 100,000,000, Duration: 447 msec
GetAction2, Counter: 100,000,000, Duration: 371 msec
谁能解释为什么会这样?
使用反编译器,我们可以发现:
实施GetAction1<T>()
:
IL_0000: ldnull
IL_0001: ldftn void ConsoleApp1.UnderTest::ActionInt(int32)
IL_0007: newobj instance void class [System.Runtime]System.Action`1<int32>::.ctor(object, native int)
IL_000c: castclass class [System.Runtime]System.Action`1<!!0/*T*/>
IL_0011: ldftn instance void class [System.Runtime]System.Action`1<!!0/*T*/>::Invoke(!0/*T*/)
IL_0017: newobj instance void class ConsoleApp1.UnderTest/MyAction`1<!!0/*T*/>::.ctor(object, native int)
IL_001c: ret
实施GetAction2<T>()
:
IL_0000: ldnull
IL_0001: ldftn void ConsoleApp1.UnderTest::ActionInt(int32)
IL_0007: newobj instance void class ConsoleApp1.UnderTest/MyAction`1<int32>::.ctor(object, native int)
IL_000c: castclass class ConsoleApp1.UnderTest/MyAction`1<!!0/*T*/>
IL_0011: ret
您可以在第一种情况下看到它实际上创建了两个委托,并将一个与另一个链接起来。
在第二种情况下,它只创建了一个委托。
我无法解释具体原因,但我认为这是因为 GetAction1
.
object
的额外演员表
似乎有一个更快的实现,即:
public static MyAction<T> GetAction3<T>()
=> x => ActionInt((int)(object)x);
这会生成更长的 IL 代码:
IL_0000: ldsfld class ConsoleApp1.UnderTest/MyAction`1<!0/*T*/> class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<>9__4_0'
IL_0005: dup
IL_0006: brtrue.s IL_0021
IL_0008: pop
IL_0009: ldsfld class ConsoleApp1.UnderTest/'<>c__4`1'<!0/*T*/> class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<>9'
IL_000e: ldftn instance void class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<GetAction3>b__4_0'(!0/*T*/)
IL_0014: newobj instance void class ConsoleApp1.UnderTest/MyAction`1<!!0/*T*/>::.ctor(object, native int)
IL_0019: dup
IL_001a: stsfld class ConsoleApp1.UnderTest/MyAction`1<!0/*T*/> class ConsoleApp1.UnderTest/'<>c__4`1'<!!0/*T*/>::'<>9__4_0'
IL_001f: stloc.0 // V_0
IL_0020: ldloc.0 // V_0
IL_0021: ret
然而,调用 GetAction3()
和执行 returns 的操作都更快。
这是我测试的基准程序:
using System;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
namespace ConsoleApp1;
public static class Program
{
public static void Main()
{
var summary = BenchmarkRunner.Run<UnderTest>();
}
}
public class UnderTest
{
public delegate void MyAction<T>(T val);
public static int counter;
public static MyAction<T> GetAction1<T>()
=> new MyAction<T>((Action<T>)(object)ActionInt);
// Enigmativity's suggestion
public static MyAction<T> GetAction2<T>()
=> (MyAction<T>)(Delegate)(MyAction<int>)ActionInt;
public static MyAction<T> GetAction3<T>()
=> x => ActionInt((int)(object)x);
public static MyAction<int> Act1 = GetAction1<int>();
public static MyAction<int> Act2 = GetAction2<int>();
public static MyAction<int> Act3 = GetAction3<int>();
static void ActionInt(int val) { counter++; }
[Benchmark]
public void Action1()
{
_ = GetAction1<int>();
}
[Benchmark]
public void Action2()
{
_ = GetAction2<int>();
}
[Benchmark]
public void Action3()
{
_ = GetAction3<int>();
}
[Benchmark]
public void RunAction1()
{
Act1(0);
}
[Benchmark]
public void RunAction2()
{
Act2(0);
}
[Benchmark]
public void RunAction3()
{
Act3(0);
}
}
结果:
| Method | Mean | Error | StdDev |
|----------- |-----------:|----------:|----------:|
| Action1 | 13.3355 ns | 0.1670 ns | 0.1480 ns |
| Action2 | 6.9685 ns | 0.1313 ns | 0.1228 ns |
| Action3 | 1.3437 ns | 0.0321 ns | 0.0285 ns |
| RunAction1 | 2.4100 ns | 0.0454 ns | 0.0425 ns |
| RunAction2 | 1.6493 ns | 0.0594 ns | 0.0527 ns |
| RunAction3 | 0.8347 ns | 0.0295 ns | 0.0276 ns |
当然,none 的操作实际上使用传递给它们的 int
,因为它们都只是调用 ActionInt()
而忽略了它的参数。
我想你也可以将其实现为:
public static MyAction<T> GetAction3<T>()
=> _ => ActionInt(0);
这可能会更快,但我还没有尝试过。