快速舍入数字 >= 0 到 2 的特定幂的倍数

Question

有一种广为人知的模式可以将数字四舍五入到最接近的 2 的幂的倍数。将数字增加 2 的次方，然后擦除它下面的任何位：

power = 1 << i
(n + (power - 1)) & ~(power - 1)

对于我的用例，此模式的问题是 0 没有四舍五入。显而易见的解决方案是添加一个分支，但我宁愿避免，因为这段代码的性能非常重要。

在某些情况下，我已经通过特定于上下文的 hack 避免了这种成本。将较早的 (x <= FAST_PATH_LIMIT) 条件更改为 (x - 1 <= FAST_PATH_LIMIT - 1) 强制零换行，并允许在慢速路径中处理它。遗憾的是，这样做的机会并不总是可用的。

我很乐意接受针对相对晦涩的体系结构的特定于平台的程序集 hack。我只是想高兴地知道有更好的方法可以做到这一点。不过，C 或 x86/ARM 汇编中的魔术技巧实际上很有用。

Answer 1

ARM 有一个 CLZ（计数前导零）指令，让您无需循环即可执行此操作。 Intel 有一个大致相同的 BFS（Bit Scan Forward）。要么让你快速准备口罩。

http://en.wikipedia.org/wiki/Find_first_set

Answer 2

如果您希望零和其他已经四舍五入的 2 的幂始终向上舍入，则：

((n | 1) + (power - 1)) & ~(power - 1)

或者如果只是为了零

((n | (!n)) + (power - 1)) & ~(power - 1)

许多架构，例如 PPC，都没有分支 (!n)

Answer 3

如果输入值的范围受到合理限制，例如 0..255，您可以使用查找 table:

const unsigned char roundup_pow2 [] = {1, 2, 2, 2, 4, 4, 4, 4, // ...
};

unsigned int restricted_roundup_power2 (int v)
{
     if (v >= 0  &&  v <= sizeof roundup_pows)
           return roundup_pow2 [v];
     return 0; // ???
}

范围可以扩展重用自身：

unsigned int roundup_power2 (int v)
{
     if (v >= 0  &&  v <= sizeof roundup_pows)
           return roundup_pow2 [v];
     return 8 + roundup_power2 (v >> 8);
}

当然，可以编写一个简单的程序（留作练习）来创建 table 值，而不是手动计算它们。

Answer 4

对于 x86 程序集中特定于平台的方式，我将添加以下方式：

mov edx, num
mov eax, 1
xor ebx, ebx     ; EBX = 0 for use in CMOVZ
rep bsr ecx, edx ; get index of highest bit set - if num is 0 ECX would be undefined...  use faster LZCNT if available.
cmovz ecx, ebx   ; ...so set it to 0 if that's the case
shl eax, cl      ; get power of 2
cmp eax, edx     ; internally subtract num, which results in negative value (borrow/carry) except if it's already a power of 2 or zero
setc cl          ; if negative value(borrow/carry)...
shl eax, cl      ; ...then shift by one to next highest power
; EAX = result

虽然另一个问题已经被接受，但这是一种不同的方法。

快速舍入数字 >= 0 到 2 的特定幂的倍数

quickly rounding numbers >= 0 up to the multiple of a specific power of 2

c

assembly

bit-manipulation