数组插入时间跳跃

Question

在深入研究hash和zval结构以及数组是如何基于它的过程中，遇到了奇怪的插入时间。

示例如下：

$array = array();
$someValueToInsert = 100;
for ($i = 0; $i < 10000; ++$i) {
    $time = microtime(true);
    array_push($array, $someValueToInsert);
    echo $i . " : " . (int)((microtime(true) - $time) * 100000000) . "</br>";
}

因此，我发现每个 1024、2024、4048... 元素将使用更多时间插入(>~x10)。

这并不取决于我是使用 array_push、array_unshift 还是简单地使用 $array[] = someValueToInsert.

我在哈希结构中考虑：

typedef struct _hashtable {
   ...
    uint nNumOfElements; 
 ...
} HashTable;

nNumOfElements 有默认的最大值，但它不能回答为什么在特殊计数器（1024、2048...）中插入需要更多时间。

有什么想法吗？

Answer 1

我觉得跟动态数组的实现有关。看这里"Geometric expansion and amortized cost"http://en.wikipedia.org/wiki/Dynamic_array

To avoid incurring the cost of resizing many times, dynamic arrays resize by a large amount, **such as doubling in size**, and use the reserved space for future expansion

您也可以在 PHP 此处阅读有关数组的内容 https://nikic.github.io/2011/12/12/How-big-are-PHP-arrays-really-Hint-BIG.html

这是动态数组的标准做法。例如。在这里查看 C++ dynamic array, increasing capacity

capacity = capacity * 2; // doubles the capacity of the array

Answer 2

虽然我建议在 PHP 内部列表中仔细检查我的答案，但我相信答案在 zend_hash_do_resize() 中。当散列 table 中需要更多元素时，将调用此函数并将现存散列 table 的大小加倍。由于 table 从 1024 开始生命，这种加倍解释了您观察到的结果。代码：

} else if (ht->nTableSize < HT_MAX_SIZE) {  /* Let's double the table size */
    void *old_data = HT_GET_DATA_ADDR(ht);
    Bucket *old_buckets = ht->arData;

    HANDLE_BLOCK_INTERRUPTIONS();
    ht->nTableSize += ht->nTableSize;
    ht->nTableMask = -ht->nTableSize;
    HT_SET_DATA_ADDR(ht, pemalloc(HT_SIZE(ht), ht->u.flags & HASH_FLAG_PERSISTENT));
    memcpy(ht->arData, old_buckets, sizeof(Bucket) * ht->nNumUsed);
    pefree(old_data, ht->u.flags & HASH_FLAG_PERSISTENT);
    zend_hash_rehash(ht);
    HANDLE_UNBLOCK_INTERRUPTIONS();

我不确定是 remalloc 是性能命中，还是 rehashing 是命中，还是整个块都是不间断的table。在上面放一个分析器会很有趣。我认为有些人可能已经为 PHP 7.

做到了

旁注，Thread Safe 版本的处理方式不同。我不太熟悉该代码，因此如果您使用 ZTS，可能会出现不同的问题。

数组插入时间跳跃

Array Insert Time Jump

php

arrays

hashtable