R：从一个类别中随机抽取最少数量的观察值

Question

我有来自 GPS 项圈的位置数据，我正在尝试根据项圈在 R 中的功能来模拟不同的场景。其中一个模拟是项圈往往会在一天中错过获取 GPS 点的机会（对于各种原因）。我的数据每天包含 14 个 GPS 点，我想随机 select（不更换）最少 5 个点，最大 14 个的可能性。

在另一个模拟中，我每天使用这个脚本从另一个线程 (R: Random sampling an even number of observations from a range of categories) 提取 5 个随机点，但我不完全理解允许我更改的脚本的所有不同部分它让它至少提取 5 分。非常感谢任何建议。

dat2 <- data.table(dat.r)
dat2.ss <- dat2[ , .SD[sample(1:.N,min(5,.N))], by=DayNo]

数据框的输出 (dat.r)

dput(head(dat.r, 20))
structure(list(Latitude = c(5.4118432, 5.4118815, 5.4115713, 
5.4111541, 5.4087853, 5.4083702, 5.4082527, 5.4078161, 5.4075528, 
5.407321, 5.4070598, 5.4064237, 5.4070621, 5.4070251, 5.4070555, 
5.4065127, 5.4065134, 5.4064872, 5.4056724, 5.4038751), Longitude = c(118.0225467, 
118.0222841, 118.0211875, 118.0208637, 118.0205413, 118.0206064, 
118.0204101, 118.0209272, 118.0213827, 118.0214189, 118.0217748, 
118.0223343, 118.0227079, 118.0226511, 118.0226916, 118.0220733, 
118.02218, 118.0221843, 118.0223316, 118.0198153), DayNo = c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L)), .Names = c("Latitude", "Longitude", "DayNo"), row.names = c(NA, 
20L), class = "data.frame")

Answer 1

这应该有效：

library(data.table)
set.seed(1)    # for reproducible example
setDT(dat.r)[,.SD[sample(.N, sample(min(5,.N):min(.N,14),1))], by=DayNo]
#     DayNo Latitude Longitude
#  1:     1 5.411881  118.0223
#  2:     1 5.411154  118.0209
#  3:     1 5.407553  118.0214
#  4:     1 5.411843  118.0225
#  5:     1 5.411571  118.0212
#  6:     1 5.407062  118.0227
#  7:     1 5.408785  118.0205
#  8:     1 5.408370  118.0206
#  9:     2 5.406513  118.0221
# 10:     2 5.407025  118.0227
# 11:     2 5.406513  118.0222
# 12:     2 5.405672  118.0223
# 13:     2 5.403875  118.0198

这个想法是 sample(x, n) 从向量 1:x 中提取大小为 n 的样本（其中 x 是一个数字，而不是向量）。所以你希望 n 本身是从 5:min(.N,14) 中采样的。我添加了在给定的一天中少于五个点的可能性。

R：从一个类别中随机抽取最少数量的观察值

R: Random sampling a minimum number of observations from a category

random

gps

r

categories