堆插入的 O(1) 平均情况复杂度的论证

Argument for O(1) average-case complexity of heap insertion

Wikipedia page for binary heaps 的声明是插入在最坏情况下是 O(log n),但平均 O(1):

The number of operations required depends only on the number of levels the new element must rise to satisfy the heap property, thus the insertion operation has a worst-case time complexity of O(log n) but an average-case complexity of O(1).

linked page 试图证明这一点如下:

However, on average, the newly inserted element does not travel very far up the tree. In particular, assuming a uniform distribution of keys, it has a one-half chance of being greater than its parent; it has a one-half chance of being greater than its grandparent given that it is greater than its parent; it has a one-half chance of being greater than its great-grandparent given that it is greater than its parent, and so on [...] so that in the average case insertion takes constant time

不过这肯定是胡说八道?在我看来,如果树是随机排序的,那么新元素大于其父元素的可能性为 50/50;但是,粗略地说,大元素沉到底部,随着堆的增长,这种可能性远小于 50/50。

对吗?

几个月来维基百科上都是这样...

对于平均堆插入时间为 O(1) 的说法,有更好的参考资料:1991 年的论文“Average Case Analysis of Heap Building by Repeated Insertion" by Hayward & McDiarmid. (This paper is linked in what is currently reference 4 of the Wikipedia article.) That paper in turn references a 1975 paper by Porter & Simon, "Random insertion into a priority queue structure”,它处理了一次堆插入,并证明了平均情况是 O(1).

直觉上,论点很简单。堆的一半是叶子,叶子往往更大。如果我们暂时假设叶子是堆中最大的元素(而不是趋向于变大),那么我们可以说新元素成为叶子的概率——即它位于上半部分值范围的 - 正好是 0.5。如果新元素不是堆的叶子(也是0.5的概率),我们可以用原始堆中仅由非叶子节点组成的截断堆重复这个过程,所以新元素在第二个的概率-最低水平将是剩余水平的一半:0.25。因此,它处于第三级的概率为 0.125,依此类推。那么我们必须搜索的预期级别数将是 1*0.5 + 2*0.25 + 3*0.125...,即 2.

当然,随机新元素大于随机二级父元素的概率并不是真的0.5;实际上少了一点。但是,只要它受常数限制,计算 expected 比较次数的幂级数之和仍将受常数限制。结果发现常数在 2.6 左右。

另请参阅 this useful answer,在讨论堆的复杂性并将其与 BST 的复杂性进行对比时,对堆中恒定的平均插入时间进行了详细的图形分析。