用户定义 类 的默认哈希值是多少?
What is the default hash of user defined classes?
docs 错误地声称
Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is their id()
虽然我记得这曾经是正确的,但在 python(v2.7.10、v3.5.0)的当前版本中,此类散列等于其 id 的对象显然不正确。
>>> class A:
... pass
...
>>> a = A()
>>> hash(a)
-9223372036578022804
>>> id(a)
4428048072
在文档的 another part 中说哈希是 从 id 派生的 。 When/why实现是否改变,hash返回的数字"derived from"现在的id如何?
相关函数似乎是:
Py_hash_t
_Py_HashPointer(void *p)
{
Py_hash_t x;
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
x = (Py_hash_t)y;
if (x == -1)
x = -2;
return x;
}
(该代码来自 here, and is then used to be the tp_hash
slot in type
here.) The comment there seems to give a reason for not using the pointer (which is the same thing as the id
) directly. Indeed, the commit that introduced that change to the function is here,并指出更改的原因是:
Issue #5186: Reduce hash collisions for objects with no hash
method by rotating the object pointer by 4 bits to the right.
指的是 this 问题,该问题更多地解释了进行更改的原因。
由于 issue #5186,这在 2009 年发生了变化;通常的 id()
值导致了太多的冲突:
In the issue 5169 discussion, Antoine Pitrou suggested that for an object
x without a `__hash__` method, `id()/8` might be a better hash value than
`id()`, since dicts use the low order bits of the hash as initial key, and
the 3 lowest bits of an `id()` will always be zero.
current implementation 获取 id 并旋转它以产生更多变化的值:
long
_Py_HashPointer(void *p)
{
long x;
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
x = (long)y;
if (x == -1)
x = -2;
return x;
}
这导致 14% 到 34% 的加速,具体取决于执行的测试。
词汇表已经过时了;我看到你了already opened an issue with the project。
docs 错误地声称
Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is their
id()
虽然我记得这曾经是正确的,但在 python(v2.7.10、v3.5.0)的当前版本中,此类散列等于其 id 的对象显然不正确。
>>> class A:
... pass
...
>>> a = A()
>>> hash(a)
-9223372036578022804
>>> id(a)
4428048072
在文档的 another part 中说哈希是 从 id 派生的 。 When/why实现是否改变,hash返回的数字"derived from"现在的id如何?
相关函数似乎是:
Py_hash_t
_Py_HashPointer(void *p)
{
Py_hash_t x;
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
x = (Py_hash_t)y;
if (x == -1)
x = -2;
return x;
}
(该代码来自 here, and is then used to be the tp_hash
slot in type
here.) The comment there seems to give a reason for not using the pointer (which is the same thing as the id
) directly. Indeed, the commit that introduced that change to the function is here,并指出更改的原因是:
Issue #5186: Reduce hash collisions for objects with no hash method by rotating the object pointer by 4 bits to the right.
指的是 this 问题,该问题更多地解释了进行更改的原因。
由于 issue #5186,这在 2009 年发生了变化;通常的 id()
值导致了太多的冲突:
In the issue 5169 discussion, Antoine Pitrou suggested that for an object
x without a `__hash__` method, `id()/8` might be a better hash value than
`id()`, since dicts use the low order bits of the hash as initial key, and
the 3 lowest bits of an `id()` will always be zero.
current implementation 获取 id 并旋转它以产生更多变化的值:
long
_Py_HashPointer(void *p)
{
long x;
size_t y = (size_t)p;
/* bottom 3 or 4 bits are likely to be 0; rotate y by 4 to avoid
excessive hash collisions for dicts and sets */
y = (y >> 4) | (y << (8 * SIZEOF_VOID_P - 4));
x = (long)y;
if (x == -1)
x = -2;
return x;
}
这导致 14% 到 34% 的加速,具体取决于执行的测试。
词汇表已经过时了;我看到你了already opened an issue with the project。