反射使 HashCode 不稳定

Question

在以下代码中，访问 SomeClass 的自定义属性会导致 SomeAttribute 的哈希函数变得不稳定。怎么回事？

static void Main(string[] args)
{
    typeof(SomeClass).GetCustomAttributes(false);//without this line, GetHashCode behaves as expected

    SomeAttribute tt = new SomeAttribute();
    Console.WriteLine(tt.GetHashCode());//Prints 1234567
    Console.WriteLine(tt.GetHashCode());//Prints 0
    Console.WriteLine(tt.GetHashCode());//Prints 0
}


[SomeAttribute(field2 = 1)]
class SomeClass
{
}

class SomeAttribute : System.Attribute
{
    uint field1=1234567;
    public uint field2;            
}

更新：

这已作为错误报告给 MS。 https://connect.microsoft.com/VisualStudio/feedback/details/3130763/attibute-gethashcode-unstable-if-reflection-has-been-used

更新 2：

此问题现已在 dotnetcore 中得到解决： https://github.com/dotnet/coreclr/pull/13892

Answer 1

这个真的很棘手。首先我们来看一下Attribute.GetHashCode方法的源码：

public override int GetHashCode()
{
    Type type = GetType();

    FieldInfo[] fields = type.GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
    Object vThis = null;

    for (int i = 0; i < fields.Length; i++)
    {
        // Visibility check and consistency check are not necessary.
        Object fieldValue = ((RtFieldInfo)fields[i]).UnsafeGetValue(this);

        // The hashcode of an array ignores the contents of the array, so it can produce 
        // different hashcodes for arrays with the same contents.
        // Since we do deep comparisons of arrays in Equals(), this means Equals and GetHashCode will
        // be inconsistent for arrays. Therefore, we ignore hashes of arrays.
        if (fieldValue != null && !fieldValue.GetType().IsArray)
            vThis = fieldValue;

        if (vThis != null)
            break;
    }

    if (vThis != null)
        return vThis.GetHashCode();

    return type.GetHashCode();
}

简而言之，它的作用是：

枚举你的属性字段
找到第一个不是数组且没有空值的字段
Return 该字段的哈希码

此时我们可以得出两个结论：

只考虑一个字段来计算属性的哈希码
该算法在很大程度上依赖于由 Type.GetFields 编辑的 return 字段的顺序（因为我们采用第一个匹配条件的字段）

进一步测试，我们可以看到由 Type.GetFields 编辑的字段顺序在两个版本的代码之间发生了变化：

typeof(SomeClass).GetCustomAttributes(false);//without this line, GetHashCode behaves as expected
SomeAttribute tt = new SomeAttribute();
Console.WriteLine(tt.GetHashCode());//Prints 1234567
Console.WriteLine(tt.GetHashCode());//Prints 0
Console.WriteLine(tt.GetHashCode());//Prints 0

foreach (var field in new SomeAttribute().GetType().GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic))
{
    Console.WriteLine(field.Name);
}

如果第一行没有注释，代码显示：

field2

field1

如果该行被注释，代码显示：

field1

field2

因此它确认某些东西正在改变字段的顺序，从而为 GetHashCode 函数产生不同的结果。

更有趣的是：

typeof(SomeClass).GetCustomAttributes(false);//without this line, GetHashCode behaves as expected
SomeAttribute tt = new SomeAttribute();
foreach (var field in new SomeAttribute().GetType().GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic))
{
    Console.WriteLine(field.Name);
}

Console.WriteLine(tt.GetHashCode());//Prints 0
Console.WriteLine(tt.GetHashCode());//Prints 0
Console.WriteLine(tt.GetHashCode());//Prints 0

foreach (var field in new SomeAttribute().GetType().GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic))
{
    Console.WriteLine(field.Name);
}

此代码显示：

field1

field2

0

0

0

field2

field1

剩下的唯一问题是：为什么在第一次调用 GetFields 后字段的顺序会发生变化？我相信它与 Type 实例中的内部缓存有关。

我们可以通过运行在 quickwatch window:

中检查缓存的值

System.Runtime.InteropServices.GCHandle.InternalGet(((System.RuntimeType)typeof(SomeAttribute)).m_cache) as RuntimeType.RuntimeTypeCache

在执行的一开始，缓存是空的（很明显）。然后，我们执行：

typeof(SomeClass).GetCustomAttributes(false)

在这一行之后，如果我们检查缓存，它包含一个字段：field2。现在这很有趣。为什么是这个领域？因为你使用了SomeClass的属性：[SomeAttribute(field2 = 1)]

然后，我们执行第一个 GetHashCode 并检查缓存，它现在包含 field2 然后 field1 （记住顺序很重要）。由于字段的顺序，GetHashCode 的后续执行将 return 0。

现在，如果我们删除行 typeof(SomeClass).GetCustomAttributes(false) 并在第一个 GetHashCode 之后检查缓存，我们会找到 field1，然后是 field2。

总结一下：

属性的哈希码算法使用它找到的第一个字段的值。因此，它在很大程度上依赖于 return 由 Type.GetFields 方法编辑的字段的顺序。出于性能目的，此方法在内部使用缓存。

有两种情况：

不使用的场景typeof(SomeClass).GetCustomAttributes(false);

这里调用GetFields时，缓存为空。它将由属性的字段填充，顺序为 field1, field2。然后 GetHashCode 会找到 field1 作为第一个字段，并显示 1234567.
你使用的场景typeof(SomeClass).GetCustomAttributes(false);

执行该行时，将执行属性构造函数：[SomeAttribute(field2 = 1)]。届时，field2 的元数据将被推送到缓存中。然后你调用GetHashCode，缓存就完成了。 field2已经存在，不再添加。然后，接下来将添加field1。所以缓存中的顺序是field2, field1。因此，GetHashCode会找到field2作为第一个字段，并显示0.

剩下唯一令人惊讶的一点是：为什么第一次调用 GetHashCode 的行为与接下来的不同？我没有检查过，但我相信它检测到缓存不完整，并以不同的方式读取字段。然后对于后续调用，缓存是完整的并且它的行为是一致的。

老实说，我认为这是一个错误。 GetHashCode 的结果应该随着时间的推移保持一致。因此，Attribute.GetHashCode 的实现不应该依赖 Type.GetFields 编辑的 return 字段的顺序，因为我们已经看到它可以改变。这应该报告给 Microsoft。

Answer 2

Kevin 对此进行了出色的分析。我认为框架实现应该使用所有字段和属性类型来计算哈希码，并且显然每次都生成相同的哈希码。同时这里有 2 个解决方案。我不是 computing/combining 散列码专家，所以我将散列码用于元组。

class SomeAttribute : System.Attribute
{
    uint field1 = 1234567;
    public uint field2;

    public override int GetHashCode()
    {
        return (GetType(), field1, field2).GetHashCode();
    }
}

另一种解决方案，如果您希望每个实例都是唯一的（以便在字典中使用）。在对象上使用 GetHashCode。

class SomeAttribute : System.Attribute
{
    private object FixHashCodeBug = new Object();

    public override int GetHashCode()
    {
        return FixHashCodeBug.GetHashCode();
    }
}

反射使 HashCode 不稳定

Reflection Renders HashCode Unstable

c#

reflection

hash

custom-attributes

更新：

更新 2：