strlen AVX-512 __builtin_ctz 无效值

strlen AVX-512 __builtin_ctz invalid value

我用 avx-512 指令编写了 strlen 函数,这是我的源代码

size_t avx512_strlen(const char * s) {
    __m512i vec0, vec1;
    unsigned long long mask;
    const char * ptr = s;

    vec0 = _mm512_setzero_epi32();

    while (1) {
        vec1 = _mm512_loadu_si512(s);
        mask = _mm512_cmpeq_epi8_mask(vec0, vec1);

        if(mask != 0) {
            mask = __builtin_ctz(mask);
            return (s-ptr) + mask;
        }

        s += 64;
    }

    return s-ptr;
}

'__builtin_ctz(mask)'的值有问题,返回值不正确。事实上,这个函数不能计算空终止符(0x00)在最后一次检查中的位置

例如,我有这个字符串

char str[] = "EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE"
                 "EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE"
                 "EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE"
                 "EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE";

这个字符串的长度是(360)但是这个函数returns(352)的问题来自'__builtin_ctz'部分。在执行“__builtin_ctz”之前,提供的掩码是正确的,它是

0001110100010001000100010000000000000000000000000000000000000000

在最后一次检查中,我们检查了 320 个字符并且 __builtin_ctz 必须 returns (40)(正如您在掩码中看到的那样,我们将 40 个零计数到第一个“1”并提供mask 是正确的,'__builtin_ctz' 算错了!

有什么问题?

__builtin_ctzunsigned int 上运行,这在任何 x86 平台上都可能是 32 位。同时,unsigned long long 在任何 x86 平台上都可能是 64 位。所以你的面具在这一行被截断:

            mask = __builtin_ctz(mask);

由于低32位全为0,the result is undefined (per GCC):

Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined.

(尽管未定义,352 - 320 = 32 "number of trailing 0 bits in a 32-bit zero integer." 的合理答案)

您可能打算改用 __builtin_ctzll(mask)。这应该能让你得到正确的计数。