AMD 处理器上 FSIN 和其他 x87 三角函数指令的准确性
Accuracy of FSIN and other x87 trigonometric instructions on AMD processors
在 Intel 处理器上,x87 trigonometric instructions such as FSIN have limited accuracy due to the use of a 66-bit approximation of pi even though the computation itself is otherwise accurate to the full 64-bit mantissa of an 80-bit extended-precision floating-point value. (Full accuracy for all valid inputs requires a 128-bit approximation of pi.) The omission in Intel's documentation was corrected 问题引起他们注意后。
但是,除了 AMD64 Architecture Programmer's Manual, Volume 1 中提到的内容之外,我找不到关于 AMD 执行 x87 三角函数指令准确性的类似详细信息:
6.4.5.1 Accuracy of Transcendental Results
x87 computations are carried out in double-extended-precision format, so that the transcendental functions provide results accurate to within one unit in the last place (ulp) for each of the floating-point data types.
对于所有有效输入(包括 128 位或更好的 pi 近似值),AMD 的 x87 三角函数指令的实现实际上是否完全精确到扩展精度格式的一个 ULP 以内?与 Zen 和 Zen 2 架构(Ryzen 和 EPYC)相关的答案将是理想的。
我在 http://notabs.org/fpuaccuracy/ (direct download link 找到了一个程序; GPLv3) 旨在测试 x87 三角函数指令的准确性。程序提供的 fpuaccuracy examples
的参考输出是使用 Intel Core i7-2600 (Sandy Bridge) 生成的,如下所示:
sin with smallest failing argument
argument 4000 C10A 7DC0 DC46 D753 (decimal 3.0162653335001840718)
actual 3FFB FFFF BBF1 3588 24AF (decimal 0.1249994929300478145)
x87 fpu 3FFB FFFF BBF1 3588 24AE (decimal 0.12499949293004781449)
error -1.0002171407788819287 ulp
sin near pi
argument 4000 C90F DAA2 2168 C235 (decimal 3.1415926535897932385)
actual BFBE ECE6 75D1 FC8F 8CBB (decimal -5.0165576126683320235E-20)
x87 fpu BFBF 8000 0000 0000 0000 (decimal -5.42101086242752217E-20)
error -1376283091369227076.6 ulp
sin with large argument
argument 403D FFFF FFFF 2D2A 9042 (decimal 9223372035086174241)
actual BFDF E730 CF55 1180 63F3 (decimal -4.2053336735954077951E-10)
x87 fpu BFF8 C28B 4641 7452 B463 (decimal -0.011874025925697012908)
error -4.7037861121081250351E+26 ulp
cos with smallest failing argument
argument 3FFF C10E 8AC0 BFEB 5E80 (decimal 1.5082562867317745453)
actual 3FFA FFFF 3EA3 D2D7 355B (decimal 0.062499279677629184442)
x87 fpu 3FFA FFFF 3EA3 D2D7 355A (decimal 0.062499279677629184438)
error -1.005468872258621479 ulp
cos near pi/2
argument 3FFF C90F DAA2 2168 C235 (decimal 1.5707963267948966193)
actual BFBD ECE6 75D1 FC8F 8CBB (decimal -2.5082788063341660117E-20)
x87 fpu BFBE 8000 0000 0000 0000 (decimal -2.710505431213761085E-20)
error -1376283091369227076.6 ulp
cos with large argument
argument 403D FFFF FFFF 6CE1 B432 (decimal 9223372035620657689)
actual 3FDD DFD2 E369 AE25 7E4A (decimal 1.0178327217734091432E-10)
x87 fpu BFF8 C28B 45B2 1490 D117 (decimal -0.011874025404105249357)
error -1.8815144449581111989E+27 ulp
tan with smallest failing argument
argument 3FFF B8B5 07B4 294A BD53 (decimal 1.4430245999997931928)
actual 4001 F915 0EE5 BAC8 446C (decimal 7.7838205801874740721)
x87 fpu 4001 F915 0EE5 BAC8 446D (decimal 7.7838205801874740726)
error 1.0017725812707024772 ulp
tan near pi/2
argument 3FFF C90F DAA2 2168 C235 (decimal 1.5707963267948966193)
actual C040 8A51 E04D AABD A35F (decimal -39867976298117107068)
x87 fpu C040 8000 0000 0000 0000 (decimal -36893488147419103232)
error 743622037674500958.81 ulp
tan with large argument
argument 403D FFFF FFFF DCF6 FE38 (decimal 9223372036560879388)
actual 4005 A86C 499C 14EA BD4A (decimal 84.211499097398127292)
x87 fpu 401F C10C D618 50D5 E957 (decimal 6477687856.6315280604)
error 9.3353319161898434351E+26 ulp
当 运行 在配备 AMD Ryzen 7 2700U (Zen) 的笔记本电脑上时,我得到以下信息:
sin with smallest failing argument
argument 4000 C10A 7DC0 DC46 D753 (decimal 3.0162653335001840718)
actual 3FFB FFFF BBF1 3588 24AF (decimal 0.1249994929300478145)
x87 fpu 3FFB FFFF BBF1 3588 24AE (decimal 0.12499949293004781449)
error -1.0002171407788819287 ulp
sin near pi
argument 4000 C90F DAA2 2168 C235 (decimal 3.1415926535897932385)
actual BFBE ECE6 75D1 FC8F 8CBB (decimal -5.0165576126683320235E-20)
x87 fpu BFBF 8000 0000 0000 0000 (decimal -5.42101086242752217E-20)
error -1376283091369227076.6 ulp
sin with large argument
argument 403D FFFF FFFF 2D2A 9042 (decimal 9223372035086174241)
actual BFDF E730 CF55 1180 63F3 (decimal -4.2053336735954077951E-10)
x87 fpu BFF8 C28B 4641 7452 B463 (decimal -0.011874025925697012908)
error -4.7037861121081250351E+26 ulp
cos with smallest failing argument
argument 3FFF C10E 8AC0 BFEB 5E80 (decimal 1.5082562867317745453)
actual 3FFA FFFF 3EA3 D2D7 355B (decimal 0.062499279677629184442)
x87 fpu 3FFA FFFF 3EA3 D2D7 355A (decimal 0.062499279677629184438)
error -1.005468872258621479 ulp
cos near pi/2
argument 3FFF C90F DAA2 2168 C235 (decimal 1.5707963267948966193)
actual BFBD ECE6 75D1 FC8F 8CBB (decimal -2.5082788063341660117E-20)
x87 fpu BFBE 8000 0000 0000 0000 (decimal -2.710505431213761085E-20)
error -1376283091369227076.6 ulp
cos with large argument
argument 403D FFFF FFFF 6CE1 B432 (decimal 9223372035620657689)
actual 3FDD DFD2 E369 AE25 7E4A (decimal 1.0178327217734091432E-10)
x87 fpu BFF8 C28B 45B2 1490 D117 (decimal -0.011874025404105249357)
error -1.8815144449581111989E+27 ulp
tan with smallest failing argument
argument 3FFF B8B5 07B4 294A BD53 (decimal 1.4430245999997931928)
actual 4001 F915 0EE5 BAC8 446C (decimal 7.7838205801874740721)
x87 fpu 4001 F915 0EE5 BAC8 446C (decimal 7.7838205801874740721)
error 0.0017725812707024772387 ulp
tan near pi/2
argument 3FFF C90F DAA2 2168 C235 (decimal 1.5707963267948966193)
actual C040 8A51 E04D AABD A35F (decimal -39867976298117107068)
x87 fpu C040 8000 0000 0000 0000 (decimal -36893488147419103232)
error 743622037674500958.81 ulp
tan with large argument
argument 403D FFFF FFFF DCF6 FE38 (decimal 9223372036560879388)
actual 4005 A86C 499C 14EA BD4A (decimal 84.211499097398127292)
x87 fpu 401F C10C D618 50D5 E957 (decimal 6477687856.6315280604)
error 9.3353319161898434351E+26 ulp
除了一个例外(最小失败参数的棕褐色),结果是相同的。我还在我的 Ryzen 9 3950X (Zen 2) 上进行了测试,得到了相同的结果。
总之,最近的 AMD 处理器,包括 Zen 和 Zen 2 架构,使用 66 位pi 的近似值,并且在给定某些参数时会产生现代英特尔处理器为 x87 三角函数指令提供的相同类型的不准确性。
在 Intel 处理器上,x87 trigonometric instructions such as FSIN have limited accuracy due to the use of a 66-bit approximation of pi even though the computation itself is otherwise accurate to the full 64-bit mantissa of an 80-bit extended-precision floating-point value. (Full accuracy for all valid inputs requires a 128-bit approximation of pi.) The omission in Intel's documentation was corrected 问题引起他们注意后。
但是,除了 AMD64 Architecture Programmer's Manual, Volume 1 中提到的内容之外,我找不到关于 AMD 执行 x87 三角函数指令准确性的类似详细信息:
6.4.5.1 Accuracy of Transcendental Results
x87 computations are carried out in double-extended-precision format, so that the transcendental functions provide results accurate to within one unit in the last place (ulp) for each of the floating-point data types.
对于所有有效输入(包括 128 位或更好的 pi 近似值),AMD 的 x87 三角函数指令的实现实际上是否完全精确到扩展精度格式的一个 ULP 以内?与 Zen 和 Zen 2 架构(Ryzen 和 EPYC)相关的答案将是理想的。
我在 http://notabs.org/fpuaccuracy/ (direct download link 找到了一个程序; GPLv3) 旨在测试 x87 三角函数指令的准确性。程序提供的 fpuaccuracy examples
的参考输出是使用 Intel Core i7-2600 (Sandy Bridge) 生成的,如下所示:
sin with smallest failing argument
argument 4000 C10A 7DC0 DC46 D753 (decimal 3.0162653335001840718)
actual 3FFB FFFF BBF1 3588 24AF (decimal 0.1249994929300478145)
x87 fpu 3FFB FFFF BBF1 3588 24AE (decimal 0.12499949293004781449)
error -1.0002171407788819287 ulp
sin near pi
argument 4000 C90F DAA2 2168 C235 (decimal 3.1415926535897932385)
actual BFBE ECE6 75D1 FC8F 8CBB (decimal -5.0165576126683320235E-20)
x87 fpu BFBF 8000 0000 0000 0000 (decimal -5.42101086242752217E-20)
error -1376283091369227076.6 ulp
sin with large argument
argument 403D FFFF FFFF 2D2A 9042 (decimal 9223372035086174241)
actual BFDF E730 CF55 1180 63F3 (decimal -4.2053336735954077951E-10)
x87 fpu BFF8 C28B 4641 7452 B463 (decimal -0.011874025925697012908)
error -4.7037861121081250351E+26 ulp
cos with smallest failing argument
argument 3FFF C10E 8AC0 BFEB 5E80 (decimal 1.5082562867317745453)
actual 3FFA FFFF 3EA3 D2D7 355B (decimal 0.062499279677629184442)
x87 fpu 3FFA FFFF 3EA3 D2D7 355A (decimal 0.062499279677629184438)
error -1.005468872258621479 ulp
cos near pi/2
argument 3FFF C90F DAA2 2168 C235 (decimal 1.5707963267948966193)
actual BFBD ECE6 75D1 FC8F 8CBB (decimal -2.5082788063341660117E-20)
x87 fpu BFBE 8000 0000 0000 0000 (decimal -2.710505431213761085E-20)
error -1376283091369227076.6 ulp
cos with large argument
argument 403D FFFF FFFF 6CE1 B432 (decimal 9223372035620657689)
actual 3FDD DFD2 E369 AE25 7E4A (decimal 1.0178327217734091432E-10)
x87 fpu BFF8 C28B 45B2 1490 D117 (decimal -0.011874025404105249357)
error -1.8815144449581111989E+27 ulp
tan with smallest failing argument
argument 3FFF B8B5 07B4 294A BD53 (decimal 1.4430245999997931928)
actual 4001 F915 0EE5 BAC8 446C (decimal 7.7838205801874740721)
x87 fpu 4001 F915 0EE5 BAC8 446D (decimal 7.7838205801874740726)
error 1.0017725812707024772 ulp
tan near pi/2
argument 3FFF C90F DAA2 2168 C235 (decimal 1.5707963267948966193)
actual C040 8A51 E04D AABD A35F (decimal -39867976298117107068)
x87 fpu C040 8000 0000 0000 0000 (decimal -36893488147419103232)
error 743622037674500958.81 ulp
tan with large argument
argument 403D FFFF FFFF DCF6 FE38 (decimal 9223372036560879388)
actual 4005 A86C 499C 14EA BD4A (decimal 84.211499097398127292)
x87 fpu 401F C10C D618 50D5 E957 (decimal 6477687856.6315280604)
error 9.3353319161898434351E+26 ulp
当 运行 在配备 AMD Ryzen 7 2700U (Zen) 的笔记本电脑上时,我得到以下信息:
sin with smallest failing argument
argument 4000 C10A 7DC0 DC46 D753 (decimal 3.0162653335001840718)
actual 3FFB FFFF BBF1 3588 24AF (decimal 0.1249994929300478145)
x87 fpu 3FFB FFFF BBF1 3588 24AE (decimal 0.12499949293004781449)
error -1.0002171407788819287 ulp
sin near pi
argument 4000 C90F DAA2 2168 C235 (decimal 3.1415926535897932385)
actual BFBE ECE6 75D1 FC8F 8CBB (decimal -5.0165576126683320235E-20)
x87 fpu BFBF 8000 0000 0000 0000 (decimal -5.42101086242752217E-20)
error -1376283091369227076.6 ulp
sin with large argument
argument 403D FFFF FFFF 2D2A 9042 (decimal 9223372035086174241)
actual BFDF E730 CF55 1180 63F3 (decimal -4.2053336735954077951E-10)
x87 fpu BFF8 C28B 4641 7452 B463 (decimal -0.011874025925697012908)
error -4.7037861121081250351E+26 ulp
cos with smallest failing argument
argument 3FFF C10E 8AC0 BFEB 5E80 (decimal 1.5082562867317745453)
actual 3FFA FFFF 3EA3 D2D7 355B (decimal 0.062499279677629184442)
x87 fpu 3FFA FFFF 3EA3 D2D7 355A (decimal 0.062499279677629184438)
error -1.005468872258621479 ulp
cos near pi/2
argument 3FFF C90F DAA2 2168 C235 (decimal 1.5707963267948966193)
actual BFBD ECE6 75D1 FC8F 8CBB (decimal -2.5082788063341660117E-20)
x87 fpu BFBE 8000 0000 0000 0000 (decimal -2.710505431213761085E-20)
error -1376283091369227076.6 ulp
cos with large argument
argument 403D FFFF FFFF 6CE1 B432 (decimal 9223372035620657689)
actual 3FDD DFD2 E369 AE25 7E4A (decimal 1.0178327217734091432E-10)
x87 fpu BFF8 C28B 45B2 1490 D117 (decimal -0.011874025404105249357)
error -1.8815144449581111989E+27 ulp
tan with smallest failing argument
argument 3FFF B8B5 07B4 294A BD53 (decimal 1.4430245999997931928)
actual 4001 F915 0EE5 BAC8 446C (decimal 7.7838205801874740721)
x87 fpu 4001 F915 0EE5 BAC8 446C (decimal 7.7838205801874740721)
error 0.0017725812707024772387 ulp
tan near pi/2
argument 3FFF C90F DAA2 2168 C235 (decimal 1.5707963267948966193)
actual C040 8A51 E04D AABD A35F (decimal -39867976298117107068)
x87 fpu C040 8000 0000 0000 0000 (decimal -36893488147419103232)
error 743622037674500958.81 ulp
tan with large argument
argument 403D FFFF FFFF DCF6 FE38 (decimal 9223372036560879388)
actual 4005 A86C 499C 14EA BD4A (decimal 84.211499097398127292)
x87 fpu 401F C10C D618 50D5 E957 (decimal 6477687856.6315280604)
error 9.3353319161898434351E+26 ulp
除了一个例外(最小失败参数的棕褐色),结果是相同的。我还在我的 Ryzen 9 3950X (Zen 2) 上进行了测试,得到了相同的结果。
总之,最近的 AMD 处理器,包括 Zen 和 Zen 2 架构,使用 66 位pi 的近似值,并且在给定某些参数时会产生现代英特尔处理器为 x87 三角函数指令提供的相同类型的不准确性。