hashlib 哈希器究竟是如何处理输入的？

Question

The Python 2.7 documentation 对 hashlib 哈希器有这样的说法：

hash.update(arg)

    Update the hash object with the string arg. [...]

但是我看到有人给它输入不是字符串的对象，例如buffers, numpy ndarrays.

考虑到 Python 的鸭子类型，可能指定非字符串参数，我并不感到惊讶。

问题是：我怎么知道散列器用参数做正确的事情？

我无法想象 hasher 会天真地对参数进行浅层迭代，因为对于多维的 ndarray，这可能会悲惨地失败——如果你进行浅层迭代，你会得到一个 n-1 维的 ndarray .

Answer 1

update unpacks its arguments using the s# format spec。这意味着它可以是字符串、Unicode 或缓冲区接口.

您不能在纯 Python 中定义 a buffer interface，但是像 numpy 这样的 C 库可以并且可以这样做 - 这允许它们被传递到 hash.update.

诸如多维数组之类的东西工作正常 - 在 C 级别上，它们存储为连续的字节序列。

How exactly do the hashlib hashers treat input?