Python 列表实现细节
CPython List Implementation Specifics
我知道这里回答了一个类似的问题:How is Python's List Implemented? 但我想问更多关于细节的问题。我想知道更多关于 CPython 如何实现列表大小调整的信息。我对C不太熟悉,所以看源码有点费劲
我想我理解的是有列表的大小Py_ssize_t ob_size
和分配给列表的数量Py_ssize_t allocated
,当ob_size
达到allocated
,那么需要分配更多的内存。我假设如果系统允许,内存将被分配到位,否则列表将被复制到内存中的另一个位置。特别是,我问的是选择 allocated
改变多少。从 listobject.c
开始,新分配的内存如下:
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
本质上,我们分配的对象大小比所需的对象大小多 1/8(忽略常量)。我想知道为什么选择这个1/8?在我的入门编码 class 中,我记得学习过 ArrayLists,它在满时大小翻倍。也许也可以选择增加 1/2 或 1/4。
增加越小,一长串追加的摊销时间就越差(仍然不变,但因子更大),因此 1/8 似乎是一个糟糕的选择。我的猜测是,每次分配少量资金会增加能够就地重新分配的机会。这是正确的推理吗?这个 CPython 实现在实践中是否运行良好?
注意:当删除元素后减少分配给列表的内存时,当列表减少到原始大小的一半时会发生这种情况,从这部分代码可以看出:
/* Bypass realloc() when a previous overallocation is large enough to accommodate the newsize. If the newsize falls lower than half the allocated size, then proceed with the realloc() to shrink the list. */
if (allocated >= newsize && newsize >= (allocated >> 1)) {
嗯,基于 the 21-year-old commit that implemented that behavior,原因 是“因为它改进了 Tim Peters 的 Win98 机器上的内存行为”。从下面的提交中复制 Tim 的评论。
Accurate timings are impossible on my Win98SE box, but this is obviously
faster even on this box for reasonable list.append() cases. I give
credit for this not to the resizing strategy but to getting rid of integer
multiplication and divsion (in favor of shifting) when computing the
rounded-up size.
For unreasonable list.append() cases, Win98SE now displays linear behavior
for one-at-time appends up to a list with about 35 million elements. Then
it dies with a MemoryError, due to fatally fragmented address space
(there's plenty of VM available, but by this point Win9X has broken user
space into many distinct heaps none of which has enough contiguous space
left to resize the list, and for whatever reason Win9x isn't coalescing
the dead heaps). Before the patch it got a MemoryError for the same
reason, but once the list reached about 2 million elements.
Haven't yet tried on Win2K but have high hopes extreme list.append()
will be much better behaved now (NT & Win2K didn't fragment address space,
but suffered obvious quadratic-time behavior before as lists got large).
For other systems I'm relying on common sense: replacing integer * and /
by << and >> can't plausibly hurt, the number of function calls hasn't
changed, and the total operation count for reasonably small lists is about
the same (while the operations are cheaper now).
...
This over-allocates proportional to the list size, making room
for additional growth. The over-allocation is mild, but is
enough to give linear-time amortized behavior over a long
sequence of appends() in the presence of a poorly-performing
system realloc() (which is a reality, e.g., across all flavors
of Windows, with Win9x behavior being particularly bad -- and
we've still got address space fragmentation problems on Win9x
even with this scheme, although it requires much longer lists to
provoke them than it used to).
Raymond Hettinger 在 this commit 中进一步调整了这些值:
The Py2.3 approach overallocated small lists by up to 8 elements.
The last checkin would limited this to one but slowed down (by 20 to 30%)
the creation of small lists between 3 to 8 elements.
This tune-up balances the two, limiting overallocation to 3 elements
(significantly reducing space consumption from Py2.3) and running faster
than the previous checkin.
The first part of the growth pattern (0, 4, 8, 16) neatly meshes with
allocators that trigger data movement only when crossing a power of two
boundary. Also, then even numbers mesh well with common data alignments.
我知道这里回答了一个类似的问题:How is Python's List Implemented? 但我想问更多关于细节的问题。我想知道更多关于 CPython 如何实现列表大小调整的信息。我对C不太熟悉,所以看源码有点费劲
我想我理解的是有列表的大小Py_ssize_t ob_size
和分配给列表的数量Py_ssize_t allocated
,当ob_size
达到allocated
,那么需要分配更多的内存。我假设如果系统允许,内存将被分配到位,否则列表将被复制到内存中的另一个位置。特别是,我问的是选择 allocated
改变多少。从 listobject.c
开始,新分配的内存如下:
new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);
本质上,我们分配的对象大小比所需的对象大小多 1/8(忽略常量)。我想知道为什么选择这个1/8?在我的入门编码 class 中,我记得学习过 ArrayLists,它在满时大小翻倍。也许也可以选择增加 1/2 或 1/4。 增加越小,一长串追加的摊销时间就越差(仍然不变,但因子更大),因此 1/8 似乎是一个糟糕的选择。我的猜测是,每次分配少量资金会增加能够就地重新分配的机会。这是正确的推理吗?这个 CPython 实现在实践中是否运行良好?
注意:当删除元素后减少分配给列表的内存时,当列表减少到原始大小的一半时会发生这种情况,从这部分代码可以看出:
/* Bypass realloc() when a previous overallocation is large enough to accommodate the newsize. If the newsize falls lower than half the allocated size, then proceed with the realloc() to shrink the list. */
if (allocated >= newsize && newsize >= (allocated >> 1)) {
嗯,基于 the 21-year-old commit that implemented that behavior,原因 是“因为它改进了 Tim Peters 的 Win98 机器上的内存行为”。从下面的提交中复制 Tim 的评论。
Accurate timings are impossible on my Win98SE box, but this is obviously faster even on this box for reasonable list.append() cases. I give credit for this not to the resizing strategy but to getting rid of integer multiplication and divsion (in favor of shifting) when computing the rounded-up size.
For unreasonable list.append() cases, Win98SE now displays linear behavior for one-at-time appends up to a list with about 35 million elements. Then it dies with a MemoryError, due to fatally fragmented address space (there's plenty of VM available, but by this point Win9X has broken user space into many distinct heaps none of which has enough contiguous space left to resize the list, and for whatever reason Win9x isn't coalescing the dead heaps). Before the patch it got a MemoryError for the same reason, but once the list reached about 2 million elements.
Haven't yet tried on Win2K but have high hopes extreme list.append() will be much better behaved now (NT & Win2K didn't fragment address space, but suffered obvious quadratic-time behavior before as lists got large).
For other systems I'm relying on common sense: replacing integer * and / by << and >> can't plausibly hurt, the number of function calls hasn't changed, and the total operation count for reasonably small lists is about the same (while the operations are cheaper now).
...
This over-allocates proportional to the list size, making room for additional growth. The over-allocation is mild, but is enough to give linear-time amortized behavior over a long sequence of appends() in the presence of a poorly-performing system realloc() (which is a reality, e.g., across all flavors of Windows, with Win9x behavior being particularly bad -- and we've still got address space fragmentation problems on Win9x even with this scheme, although it requires much longer lists to provoke them than it used to).
Raymond Hettinger 在 this commit 中进一步调整了这些值:
The Py2.3 approach overallocated small lists by up to 8 elements. The last checkin would limited this to one but slowed down (by 20 to 30%) the creation of small lists between 3 to 8 elements.
This tune-up balances the two, limiting overallocation to 3 elements (significantly reducing space consumption from Py2.3) and running faster than the previous checkin.
The first part of the growth pattern (0, 4, 8, 16) neatly meshes with allocators that trigger data movement only when crossing a power of two boundary. Also, then even numbers mesh well with common data alignments.