如何仅使用 SSE2 floor/int in double?
How to floor/int in double using only SSE2?
在float
中,floor()
似乎比int()
还容易,如:
float z = floor(LOG2EF * x + 0.5f);
const int32_t n = int32_t(z);
成为:
__m128 z = _mm_add_ps(_mm_mul_ps(log2ef, x), half);
__m128 t = _mm_cvtepi32_ps(_mm_cvttps_epi32(z));
z = _mm_sub_ps(t, _mm_and_ps(_mm_cmplt_ps(z, t), one));
__m128i n = _mm_cvtps_epi32(z);
但是您如何在 double
中仅使用 SSE2 来实现这一点?
这是我要转换的双重版本:
double z = floor(LOG2E * x + 0.5);
const int32_t n = int32_t(z);
只需使用 单精度 (...ps...
) 内在的 双精度 等价物 (...pd...
) :
__m128i n = _mm_cvtpd_epi32(z);
根据 Intel Intrinsics Guide,该内在函数确实可用于 SSE2:https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=4966,1917&techs=SSE2
__m128i _mm_cvtpd_epi32 (__m128d a)
Convert packed double-precision (64-bit) floating-point elements in a
to packed 32-bit integers, and store the results in dst
.
FOR j := 0 to 1
i := 32*j
k := 64*j
dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k])
ENDFOR
在float
中,floor()
似乎比int()
还容易,如:
float z = floor(LOG2EF * x + 0.5f);
const int32_t n = int32_t(z);
成为:
__m128 z = _mm_add_ps(_mm_mul_ps(log2ef, x), half);
__m128 t = _mm_cvtepi32_ps(_mm_cvttps_epi32(z));
z = _mm_sub_ps(t, _mm_and_ps(_mm_cmplt_ps(z, t), one));
__m128i n = _mm_cvtps_epi32(z);
但是您如何在 double
中仅使用 SSE2 来实现这一点?
这是我要转换的双重版本:
double z = floor(LOG2E * x + 0.5);
const int32_t n = int32_t(z);
只需使用 单精度 (...ps...
) 内在的 双精度 等价物 (...pd...
) :
__m128i n = _mm_cvtpd_epi32(z);
根据 Intel Intrinsics Guide,该内在函数确实可用于 SSE2:https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=4966,1917&techs=SSE2
__m128i _mm_cvtpd_epi32 (__m128d a)
Convert packed double-precision (64-bit) floating-point elements in
a
to packed 32-bit integers, and store the results indst
.FOR j := 0 to 1 i := 32*j k := 64*j dst[i+31:i] := Convert_FP64_To_Int32(a[k+63:k]) ENDFOR