从 sum() 和 '+' 获得不同的结果

Question

下面是我的实验：

> xx = 293.62882204364098
> yy = 0.086783439604999998
> print(xx + yy, 20)
[1] 293.71560548324595175
> print(sum(c(xx,yy)), 20)
[1] 293.71560548324600859

令我感到奇怪的是，当 sum() 和 + 应用于相同的数字时会给出不同的结果。

这是预期的结果吗？

我怎样才能得到相同的结果？

哪个最有效？

Answer 1

看起来加法的速度是求和的 3 倍，但除非您进行高频交易，否则我看不出这会成为您的时间瓶颈。

xx = 293.62882204364098
yy = 0.086783439604999998

microbenchmark::microbenchmark(xx + yy, sum(xx,yy), sum(c(xx, yy)))
Unit: nanoseconds
           expr min    lq   mean median    uq  max neval
        xx + yy  88 102.5 111.90  107.0 110.0  352   100
    sum(xx, yy) 201 211.0 256.57  218.5 232.5 2886   100
 sum(c(xx, yy)) 283 297.5 330.42  304.0 311.5 1944   100

Answer 2

有一个r-devel thread here that includes some detailed description of the implementation. In particular, from Tomas Kalibera:

R uses long double type for the accumulator (on platforms where it is available). This is also mentioned in ?sum: "Where possible extended-precision accumulators are used, typically well supported with C99 and newer, but possibly platform-dependent."

这意味着 sum() 更准确，尽管它带有一个巨大的闪烁警告标志，如果这种准确度对您很重要，您应该 非常担心 关于你的计算的实现 [在算法和基础数值实现方面]。

我回答了一个问题，我最终发现（在一些错误的开始之后）+ 和 sum() 之间的差异是由于对 sum().

This code shows that the sums of individual elements (as in sum(xx,yy) are added together with + (in C), whereas this code用于对各个分量求和；第 154 行（LDOUBLE s=0.0）显示累加器以扩展精度存储（如果可用）。

我相信@JonSpring 的计时结果可能解释（但很乐意被纠正）（1）sum(xx,yy) 将有更多的处理，类型-检查等比 +; (2) sum(c(xx,yy)) 会比 sum(xx,yy) 稍慢，因为它在扩展精度下工作。

从 sum() 和 '+' 获得不同的结果

Obtaining different results from sum() and '+'

r

sum