当向量长度为一时,R 中的 sample() 不可预测
sample() in R unpredictable when vector length is one
我正在尝试调试一个简短的程序,但在某些情况下,我在从矢量元素采样结束时得到了令人不安的结果。它发生在向量的元素保持下降到单个值的时候。
在特定情况下,我指的向量称为 remaining
并包含一个元素,即数字 2
。我希望来自该向量的任何大小为 1 的采样都会顽固地 return 2
,因为 2 是向量中唯一的元素,但事实并非如此:
Browse[2]> is.vector(remaining)
[1] TRUE
Browse[2]> sample(remaining,1)
[1] 2
Browse[2]> sample(remaining,1)
[1] 2
Browse[2]> sample(remaining,1)
[1] 1
Browse[2]> sample(x=remaining, size=1)
[1] 1
Browse[2]> sample(x=remaining, size=1)
[1] 2
Browse[2]> sample(x=remaining, size=1)
[1] 1
Browse[2]> sample(x=remaining, size=1)
[1] 1
Browse[2]> sample(x=remaining, size=1)
[1] 1
如您所见,有时 return 是 1
而其他一些是 2
。
我对函数有什么误解 sample()
?
来自help("sample")
:
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1,
sampling via sample takes place from 1:x.
所以,当你有 remaining = 2
时,sample(remaining)
就等同于 sample(x = 1:2)
更新
从评论中可以明显看出您也在寻找解决此问题的方法。这是三个提到的替代方案的基准比较:
library(microbenchmark)
# if remaining is of length one
remaining <- 2
microbenchmark(a = {if ( length(remaining) > 1 ) { sample(remaining) } else { remaining }},
b = ifelse(length(remaining) > 1, sample(remaining), remaining),
c = remaining[sample(length(remaining))])
Unit: nanoseconds
expr min lq mean median uq max neval cld
a 349 489 625.12 628.0 663.5 3283 100 a
b 1536 1886 2240.58 2025.0 2165.5 13898 100 b
c 4051 4400 5193.41 4679.5 5064.0 38413 100 c
# If remaining is not of length one
remaining <- 1:10
microbenchmark(a = {if ( length(remaining) > 1 ) { sample(remaining) } else { remaining }},
b = ifelse(length(remaining) > 1, sample(remaining), remaining),
c = remaining[sample(length(remaining))])
Unit: microseconds
expr min lq mean median uq max neval cld
a 5.238 5.7970 6.82703 6.251 6.9145 51.264 100 a
b 11.663 12.2920 13.14831 12.851 13.3745 34.851 100 b
c 5.238 5.9715 6.57140 6.426 6.8450 14.667 100 a
如果在 remaining
的长度 > 1 时更频繁地调用 sample()
,并且 if() {} else {}
否则方法会更快。
我正在尝试调试一个简短的程序,但在某些情况下,我在从矢量元素采样结束时得到了令人不安的结果。它发生在向量的元素保持下降到单个值的时候。
在特定情况下,我指的向量称为 remaining
并包含一个元素,即数字 2
。我希望来自该向量的任何大小为 1 的采样都会顽固地 return 2
,因为 2 是向量中唯一的元素,但事实并非如此:
Browse[2]> is.vector(remaining)
[1] TRUE
Browse[2]> sample(remaining,1)
[1] 2
Browse[2]> sample(remaining,1)
[1] 2
Browse[2]> sample(remaining,1)
[1] 1
Browse[2]> sample(x=remaining, size=1)
[1] 1
Browse[2]> sample(x=remaining, size=1)
[1] 2
Browse[2]> sample(x=remaining, size=1)
[1] 1
Browse[2]> sample(x=remaining, size=1)
[1] 1
Browse[2]> sample(x=remaining, size=1)
[1] 1
如您所见,有时 return 是 1
而其他一些是 2
。
我对函数有什么误解 sample()
?
来自help("sample")
:
If x has length 1, is numeric (in the sense of is.numeric) and x >= 1, sampling via sample takes place from 1:x.
所以,当你有 remaining = 2
时,sample(remaining)
就等同于 sample(x = 1:2)
更新
从评论中可以明显看出您也在寻找解决此问题的方法。这是三个提到的替代方案的基准比较:
library(microbenchmark)
# if remaining is of length one
remaining <- 2
microbenchmark(a = {if ( length(remaining) > 1 ) { sample(remaining) } else { remaining }},
b = ifelse(length(remaining) > 1, sample(remaining), remaining),
c = remaining[sample(length(remaining))])
Unit: nanoseconds
expr min lq mean median uq max neval cld
a 349 489 625.12 628.0 663.5 3283 100 a
b 1536 1886 2240.58 2025.0 2165.5 13898 100 b
c 4051 4400 5193.41 4679.5 5064.0 38413 100 c
# If remaining is not of length one
remaining <- 1:10
microbenchmark(a = {if ( length(remaining) > 1 ) { sample(remaining) } else { remaining }},
b = ifelse(length(remaining) > 1, sample(remaining), remaining),
c = remaining[sample(length(remaining))])
Unit: microseconds
expr min lq mean median uq max neval cld
a 5.238 5.7970 6.82703 6.251 6.9145 51.264 100 a
b 11.663 12.2920 13.14831 12.851 13.3745 34.851 100 b
c 5.238 5.9715 6.57140 6.426 6.8450 14.667 100 a
如果在 remaining
的长度 > 1 时更频繁地调用 sample()
,并且 if() {} else {}
否则方法会更快。