Timsort 中 运行 的大小
size of run in Timsort
我正在研究Timsort link
hackernoon Timsort
有说法"Merging 2 arrays is more efficient when the number of runs is equal to, or slightly less than, a power of two. Timsort chooses minrun to try to ensure this efficiency, by making sure minrun is equal to or less than a power of two."
为什么 "Merging 2 arrays is more efficient when the number of runs is equal to, or slightly less than, a power of two"?
它用于 general/random 情况下的平衡合并(如果给定数据不是随机的但具有较长的自然运行,那么 minrun
因此它的选择可能并不重要,并且平衡更多地取决于运行的长度而不是运行的次数。
在 general/random 的情况下,您希望生成恰好 minrun
个元素的运行。然后,当运行次数为 2 的幂时,您将获得完全平衡的合并。如果运行次数比 2 的幂 大 一点,您最终会得到一个非常不平衡的合并。如果运行次数比 2 的幂 小 一点,您只会得到一点点不平衡。
同样,这是(至少大部分)针对 general/random 的情况。例如,如果你有 nine 个长度为 800、100、100、100、100、100、100、100、100 个元素的自然运行,那么你也会在整个过程中获得完美平衡的合并,尽管运行次数略 大于 比 2 的幂
摘自 Tim 在 listsort.txt
中关于 minrun
选择的描述:
Because sortperf.py only tries powers of 2, it took a long time to notice
that 32 isn't a good choice for the general case! Consider N=2112:
>>> divmod(2112, 32)
(66, 0)
>>>
If the data is randomly ordered, we're very likely to end up with 66 runs
each of length 32. The first 64 of these trigger a sequence of perfectly
balanced merges (see next section), leaving runs of lengths 2048 and 64 to
merge at the end. The adaptive gimmicks can do that with fewer than 2048+64
compares, but it's still more compares than necessary, and-- mergesort's
bugaboo relative to samplesort --a lot more data movement (O(N) copies just
to get 64 elements into place).
If we take minrun=33 in this case, then we're very likely to end up with 64
runs each of length 33, and then all merges are perfectly balanced. Better!
我正在研究Timsort link hackernoon Timsort
有说法"Merging 2 arrays is more efficient when the number of runs is equal to, or slightly less than, a power of two. Timsort chooses minrun to try to ensure this efficiency, by making sure minrun is equal to or less than a power of two."
为什么 "Merging 2 arrays is more efficient when the number of runs is equal to, or slightly less than, a power of two"?
它用于 general/random 情况下的平衡合并(如果给定数据不是随机的但具有较长的自然运行,那么 minrun
因此它的选择可能并不重要,并且平衡更多地取决于运行的长度而不是运行的次数。
在 general/random 的情况下,您希望生成恰好 minrun
个元素的运行。然后,当运行次数为 2 的幂时,您将获得完全平衡的合并。如果运行次数比 2 的幂 大 一点,您最终会得到一个非常不平衡的合并。如果运行次数比 2 的幂 小 一点,您只会得到一点点不平衡。
同样,这是(至少大部分)针对 general/random 的情况。例如,如果你有 nine 个长度为 800、100、100、100、100、100、100、100、100 个元素的自然运行,那么你也会在整个过程中获得完美平衡的合并,尽管运行次数略 大于 比 2 的幂
摘自 Tim 在 listsort.txt
中关于 minrun
选择的描述:
Because sortperf.py only tries powers of 2, it took a long time to notice
that 32 isn't a good choice for the general case! Consider N=2112:
>>> divmod(2112, 32)
(66, 0)
>>>
If the data is randomly ordered, we're very likely to end up with 66 runs
each of length 32. The first 64 of these trigger a sequence of perfectly
balanced merges (see next section), leaving runs of lengths 2048 and 64 to
merge at the end. The adaptive gimmicks can do that with fewer than 2048+64
compares, but it's still more compares than necessary, and-- mergesort's
bugaboo relative to samplesort --a lot more data movement (O(N) copies just
to get 64 elements into place).
If we take minrun=33 in this case, then we're very likely to end up with 64
runs each of length 33, and then all merges are perfectly balanced. Better!