set() 中的 "add" 操作或 Python 中的 dict() 中的 "insert" 实际上是 O(n)，其中 n 是密钥字符串的长度？

Question

dict()的insert操作和set()的add操作是O(n)还是O(1)存在矛盾，其中n是字符串的长度。

假设我们有长度不同的字符串，即 n1、n2、...n_x。然后执行以下操作的时间复杂度：

s = set()
d = dict()
for x in {N}: # where N = [n1, n2, ... n_x]
  s.add(x)
  d[x] = 1

是 O(len(N) * Z) 其中 Z = len(n_1) + len(n_2) + ... len(n_x) 如果我们假设添加或插入是 O(1) 操作，那么时间复杂度将为 O(len(N))。

以上是真的吗？

发件人：http://svn.python.org/projects/python/trunk/Objects/stringobject.c 我们看到哈希的计算取决于字符串的长度，这就是我假设下面的 len：

static long string_hash(PyStringObject *a)
{
    register Py_ssize_t len;
    register unsigned char *p;
    register long x;

    if (a->ob_shash != -1)
        return a->ob_shash;
    len = Py_SIZE(a);
    p = (unsigned char *) a->ob_sval;
    x = *p << 7;
    while (--len >= 0)
        x = (1000003*x) ^ *p++;
    x ^= Py_SIZE(a);
    if (x == -1)
        x = -2;
    a->ob_shash = x;
    return x;
}

这里 () 有人展示了改变字符串的长度不会影响计算散列的时间。但这与上面的代码相矛盾。

从下面link我们知道，hash值一旦计算出来，就存储在对象中。这意味着查找将是常数时间 O(1)。 Get dictionary keys hashes without recalculation 但是，完成哈希计算的 insertion/adding 应该是线性的。

Answer 1

insert 的性能取决于无数事物。对于长度为 k 的字符串，哈希函数的计算确实是 O(k)，但在一般情况下它只是无趣。

如果考虑长度只有8字节的字符串key，有18446744073709551616种不同的组合，8是一个常量，8字节key的hash计算复杂度为O(8 ) 是 O(1)。

但是在 18446744073709551616 项中，插入哈希 table 仍然需要 1 微秒。对于列表，插入到开头的时间复杂度为 O(n)，而 one 项的 insertion/copying 在列表末尾只用了一纳秒，插入到开始列出那么多项目可能需要 585 年。

OTOH，虽然可以想象您可能拥有 4294967296 甚至 18446744073709551616 项的集合，但如果您的 key 4294967296 或 18446744073709551616 字节到您的哈希table 您 真的需要重新考虑您的架构 。

set() 中的 "add" 操作或 Python 中的 dict() 中的 "insert" 实际上是 O(n)，其中 n 是密钥字符串的长度？

Is "add" operation in set() or "insert" in dict() in Python actually O(n) where n is the length of the key string?

python

dictionary

add

set

time-complexity