什么解释了 R 中 x.x5 的小数点后一位舍入?

What explains the 1 decimal place rounding of x.x5 in R?

我正在寻找关于如何在 R 中对这样的序列进行小数点后一位舍入的解释:

seq(1.05, 2.95, by = .1)

在高中时,我会四舍五入,即 2.05 变成 2.1。但 R 将其四舍五入为 2,小数点后一位四舍五入。

Round up from .5

上述 Whosebug 答案中的以下舍入函数始终实现高中舍入:

round2 = function(x, n) {
  posneg = sign(x)
  z = abs(x)*10^n
  z = z + 0.5
  z = trunc(z)
  z = z/10^n
  z*posneg
}

这段代码比较了R舍入和上面的舍入。

data.frame(cbind(
  Number = seq(1.05, 2.95, by = .1), 
  Popular.Round = round2(seq(1.05, 2.95, by = .1), 1),
  R.Round = round(seq(1.05, 2.95, by = .1), 1)))

使用 R 舍入,1.05 向上舍入为 1.1,而 2.05 向下舍入为 2。然后 1.95 向上舍入为 2,2.95 也向上舍入为 3。

如果是"round to even",为什么是3,即奇数

当被问及这种行为时,是否有比 "just deal with it" 更好的回答?

读起来太长?向下滚动

对我个人而言,这是一项有趣的研究。根据文档:

Note that for rounding off a 5, the IEC 60559 standard (see also ‘IEEE 754’) is expected to be used, ‘go to the even digit’. Therefore round(0.5) is 0 and round(-1.5) is -2. However, this is dependent on OS services and on representation error (since e.g. 0.15 is not represented exactly, the rounding rule applies to the represented number and not to the printed number, and so round(0.15, 1) could be either 0.1 or 0.2).

Rounding to a negative number of digits means rounding to a power of ten, so for example round(x, digits = -2) rounds to the nearest hundred.

For signif the recognized values of digits are 1...22, and non-missing values are rounded to the nearest integer in that range. Complex numbers are rounded to retain the specified number of digits in the larger of the components. Each element of the vector is rounded individually, unlike printing.

首先,您问 "If it is "四舍五入”,为什么是 3,即奇数。” 明确地说,四舍五入为偶数的规则适用于四舍五入 5。如果你 运行 round(2.5)round(3.5),则 R returns 分别为 2 和 4。

如果你去这里,https://stat.ethz.ch/pipermail/r-help/2008-June/164927.html,你会看到这个回复:

The logic behind the round to even rule is that we are trying to represent an underlying continuous value and if x comes from a truly continuous distribution, then the probability that x==2.5 is 0 and the 2.5 was probably already rounded once from any values between 2.45 and 2.54999999999999..., if we use the round up on 0.5 rule that we learned in grade school, then the double rounding means that values between 2.45 and 2.50 will all round to 3 (having been rounded first to 2.5). This will tend to bias estimates upwards. To remove the bias we need to either go back to before the rounding to 2.5 (which is often impossible to impractical), or just round up half the time and round down half the time (or better would be to round proportional to how likely we are to see values below or above 2.5 rounded to 2.5, but that will be close to 50/50 for most underlying distributions). The stochastic approach would be to have the round function randomly choose which way to round, but deterministic types are not comforatable with that, so "round to even" was chosen (round to odd should work about the same) as a consistent rule that rounds up and down about 50/50.

If you are dealing with data where 2.5 is likely to represent an exact value (money for example), then you may do better by multiplying all values by 10 or 100 and working in integers, then converting back only for the final printing. Note that 2.50000001 rounds to 3, so if you keep more digits of accuracy until the final printing, then rounding will go in the expected direction, or you can add 0.000000001 (or other small number) to your values just before rounding, but that can bias your estimates upwards.

简答:如果你总是向上四舍五入,那么你的数据就会向上倾斜。但是,如果你按相等四舍五入,那么你的 rounded-data 总体上会变得平衡。

让我们使用您的数据进行测试:

round2 = function(x, n) {
  posneg = sign(x)
  z = abs(x)*10^n
  z = z + 0.5
  z = trunc(z)
  z = z/10^n
  z*posneg
}

x <- data.frame(cbind(
  Number = seq(1.05, 2.95, by = .1), 
  Popular.Round = round2(seq(1.05, 2.95, by = .1), 1),
  R.Round = round(seq(1.05, 2.95, by = .1), 1)))

> mean(x$Popular.Round)
[1] 2.05
> mean(x$R.Round)
[1] 2.02

使用更大的样本:

x <- data.frame(cbind(
  Number = seq(1.05, 6000, by = .1), 
  Popular.Round = round2(seq(1.05, 6000, by = .1), 1),
  R.Round = round(seq(1.05, 6000, by = .1), 1)))

> mean(x$Popular.Round)
[1] 3000.55
> mean(x$R.Round)
[1] 3000.537