在 C++ 数组快速 SIMD 版本中寻找短值

Looking for short value in C++ Array fast SIMD version

我的程序中有一个算法运行良好。我推测是否有可能加快思考速度:

unsigned short c;   
bool found = false;
unsigned short* arrIterator = arr;
while(( c = *arrIterator & mask) != stopValue)
{       
    if(c == next)
    {               
        found= true;
        break;                  
    }
    arrIterator ++;
}   

是否可以将此类算法重写为 SIMD 指令?

假设 arr 是 16 对齐的(让它如此),你可以这样做(未测试)

__m128i vstop = _mm_set1_epi16(stopValue);
__m128i vnext = _mm_set1_epi16(next);
int found_mask = 0;
int stop_mask = 0;
do
{
    __m128i data = _mm_load_si128(arrIterator++);
    __m128i contains_next = _mm_cmpeq_epi16(data, vnext);
    __m128i contains_stop = _mm_cmpeq_epi16(data, vstop);
    found_mask = _mm_movemask_epi8(contains_next);
    stopmask = found_mask | _mm_movemask_epi8(contains_stop);
} while (stopmask == 0);

然后您可以通过对 found_mask 和迭代器的当前值进行位扫描来告诉索引在哪里找到了 next