C# 标记结构性能

Question

在 F# 中，我们有几个非常好的设计时类型安全解决方案：类型别名和单例结构联合（并且没有开始的隐式转换！）：

// type aliases are erased at compile time
type Offset = int64<offset>

// no allocations
[<Struct>]
type Offset = Offset of int64

C# 的替代方案是什么？

我从未见过标记结构（包含单个元素）的实际用法，但看起来如果我们添加 explicit 类型转换，那么我们可以获得设计时行为与 F# 中的类型别名非常相似。也就是说 - IDE 会抱怨类型不匹配，并且必须显式转换值。

下面是一些 POC 代码：

public struct Offset {
    private readonly long _value;
    private Offset(long value) {
        _value = value;
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static explicit operator Offset(long value) {
        return new Offset(value);
    }

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static explicit operator long(Offset offset) {
        return offset._value;
    }
}

public interface IIndex<T> {
    Offset OffsetOf(T value);
    T AtOffset(Offset offset);
}

public class SmapleUsage
{
    public void Test(IIndex<long> idx)
    {
        // without explicit cast we have nice red squiggles
        var valueAt = idx.AtOffset((Offset)123);
        long offset = (long)idx.OffsetOf(42L);
    }
}

所以，IDE 东西不错！但是我本来想问什么是性能影响和其他缺点，为了避免"just measure it"评论刚刚测量了它并开始停止写这个问题......但是结果来了违反直觉：

[Test]
public void OffsetTests() {
    var array = Enumerable.Range(0, 1024).ToArray();
    var sw = new Stopwatch();

    for (int rounds = 0; rounds < 10; rounds++) {
        sw.Restart();
        long sum = 0;
        for (int rp = 0; rp < 1000000; rp++) {
            for (int i = 0; i < array.Length; i++) {
                sum += GetAtIndex(array, i);
            }
        }
        sw.Stop();
        if (sum < 0) throw new Exception(); // use sum after loop
        Console.WriteLine($"Index: {sw.ElapsedMilliseconds}");

        sw.Restart();
        sum = 0;
        for (int rp = 0; rp < 1000000; rp++) {
            for (int i = 0; i < array.Length; i++) {
                sum += GetAtOffset(array, (Offset)i);
            }
        }
        if (sum < 0) throw new Exception(); // use sum after loop
        sw.Stop();
        Console.WriteLine($"Offset: {sw.ElapsedMilliseconds}");

        sw.Restart();
        sum = 0;
        for (int rp = 0; rp < 1000000; rp++) {
            for (int i = 0; i < array.Length; i++) {
                sum += array[i];
            }
        }
        if (sum < 0) throw new Exception(); // use sum after loop
        sw.Stop();
        Console.WriteLine($"Direct: {sw.ElapsedMilliseconds}");
    }
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private int GetAtIndex(int[] array, long index) {
    return array[index];
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private int GetAtOffset(int[] array, Offset offset) {
    return array[(long)offset];
}

令人惊讶的是，在 i7@2.2Hz x64/Release 上 Offset 的情况在每一轮测试中明显更快 - 典型值为：

Int64: 1046
Offset: 932
Direct: 730

与仅使用 int64 相比，我希望得到相同或更慢的结果。那么 这里发生了什么？ 你能重现相同的差异或发现一些缺陷吗？如果我测量不同的东西？

Answer 1

1. 在 Int64 测试中将 for (int i = 0; 替换为 for (long i = 0; 后，性能将与直接测试相同。

在使用 int 时，它会生成这样的 x86-64 指令：

inc         ecx  
cmp         ecx,0F4240h

在使用 long 时，它会生成这样的 x86-64 指令：

inc         rcx  
cmp         rcx,0F4240h

因此，使用 32 位寄存器 ecx 或其 64 位版本 rcx 的唯一区别是，由于 CPU 设计，后者速度更快。

2. 在 Offset 测试中使用 long 作为迭代器，你会看到类似的性能。

3.因为代码是在release模式下优化的，所以使用Int64和Offset几乎没有区别，但是在某些时候说明有点重新安排。

使用 Offset 时（少一条指令）：

movsxd      rdx,eax  
movsxd      r8,r14d  
cmp         rdx,r8  
jae         <address>

使用 Int64 时（多一条指令）：

movsxd      rdx,r14d  
movsxd      r8,eax  
cmp         r8,rdx  
jae         <address>  
movsxd      rdx,eax

4. 直接测试是最快的，因为它不使用上面#3 中显示的指令进行数组边界检查。当您编写如下循环时会发生这种优化：

for (var i=0; i<array.Length; i++) { ... array[i] ... }

通常，如果您的索引在数组边界之外，它会抛出 IndexOutOfRangeException，但在这种情况下，编译器知道它不会发生，因此它会忽略检查。

然后，即使在其他测试中有额外的指令，由于 CPU 分支预测器，它们具有相似的性能，它在需要时提前启动运行指令，并在条件失败时丢弃结果。

C# 标记结构性能

C# marker structures performance

.net

c#

performance