R 中的 Sample() 在总体向量长度 > 13 后返回 non-random 样本。为什么？

Question

以下代码将 return 一个完美的示例：

b <- sample(c(0,1,2,3,4,5,6,7,8,9,10,11,12), 100000, replace=TRUE)
hist(b)

将元素的数量增加 1 到 14 将导致：

b <- sample(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13), 100000, replace=TRUE)
hist(b)

这显然不正确。零出现的次数比它应该出现的次数多。这有什么原因吗？

Answer 1

问题出在hist，而不是sample。

你可以检查这样做：

> table(sample(0:15, 10000, replace=T))

  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15 
634 642 664 654 628 598 633 642 647 625 587 577 618 645 615 591

来自 hist 帮助：

If right = TRUE (default), the histogram cells are intervals of the form (a, b], i.e., they include their right-hand endpoint, but not their left one, with the exception of the first cell when include.lowest is TRUE.

For right = FALSE, the intervals are of the form [a, b), and include.lowest means ‘include highest’.

如果你尝试

hist(sample(0:15, 10000, replace=T), br=-1:15)

结果看起来是正确的

R 中的 Sample() 在总体向量长度 > 13 后返回 non-random 样本。为什么？

Sample() in R returning non-random sample after population vector length > 13. Why?

r

sample

random-sample