R 中的 Sample() 在总体向量长度 > 13 后返回 non-random 样本。为什么?
Sample() in R returning non-random sample after population vector length > 13. Why?
以下代码将 return 一个完美的示例:
b <- sample(c(0,1,2,3,4,5,6,7,8,9,10,11,12), 100000, replace=TRUE)
hist(b)
将元素的数量增加 1 到 14 将导致:
b <- sample(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13), 100000, replace=TRUE)
hist(b)
这显然不正确。零出现的次数比它应该出现的次数多。这有什么原因吗?
问题出在hist
,而不是sample
。
你可以检查这样做:
> table(sample(0:15, 10000, replace=T))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
634 642 664 654 628 598 633 642 647 625 587 577 618 645 615 591
来自 hist
帮助:
If right = TRUE (default), the histogram cells are intervals of the
form (a, b], i.e., they include their right-hand endpoint, but not
their left one, with the exception of the first cell when
include.lowest is TRUE.
For right = FALSE, the intervals are of the form [a, b), and
include.lowest means ‘include highest’.
如果你尝试
hist(sample(0:15, 10000, replace=T), br=-1:15)
结果看起来是正确的
以下代码将 return 一个完美的示例:
b <- sample(c(0,1,2,3,4,5,6,7,8,9,10,11,12), 100000, replace=TRUE)
hist(b)
将元素的数量增加 1 到 14 将导致:
b <- sample(c(0,1,2,3,4,5,6,7,8,9,10,11,12,13), 100000, replace=TRUE)
hist(b)
这显然不正确。零出现的次数比它应该出现的次数多。这有什么原因吗?
问题出在hist
,而不是sample
。
你可以检查这样做:
> table(sample(0:15, 10000, replace=T))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
634 642 664 654 628 598 633 642 647 625 587 577 618 645 615 591
来自 hist
帮助:
If right = TRUE (default), the histogram cells are intervals of the form (a, b], i.e., they include their right-hand endpoint, but not their left one, with the exception of the first cell when include.lowest is TRUE.
For right = FALSE, the intervals are of the form [a, b), and include.lowest means ‘include highest’.
如果你尝试
hist(sample(0:15, 10000, replace=T), br=-1:15)
结果看起来是正确的