Select 个最接近定义值的元素,同时避免重复

Select elements closest to a defined value while avoiding duplicates

我有以下问题。我有许多覆盖生物梯度的地块。从这些图中,我想 select 25,即覆盖梯度最好。为此,我提取了最小值和最大值,并计算了最能覆盖梯度的值。然后我选择了最接近理想值的地块。这很好用。然而,有时一个图最接近两个理论值,因此,我最终会在列表中出现重复项,我想避免这种情况。显然,我可以增加 length.out 的数量,但从我的角度来看,这不是最佳解决方案。我想以 25 selected 和独特的情节结束。

下面的代码举例说明了这个问题:length.out设置为25,但只有19个地块被select编辑。

data <- structure(list(Plot = c("3", "4", "5", "6", "8", "12", "14", 
"15", "17", "18", "19", "20", "21", "22", "23", "25", "26", "28", 
"29", "30", "32", "33", "34", "35", "36", "37", "38", "39", "40", 
"41", "42", "43", "44", "45", "46", "47", "48", "49"), Value = c(2.19490722347427, 
0.817884294633935, 0.834577676660982, 1.19923035999043, 0.293146158435238, 
1.93237941781986, 1.74536845664897, 2.22904916731729, 0.789604037117133, 
0.439716474953651, 0.834321473446987, 1.07386786707173, 0.977203815084214, 
0.539717907433468, 0.950019385036826, 1.10794069639141, 1.41499437622422, 
1.12933520841724, 1.99342508363262, 1.05715847816517, 2.27711128641038, 
1.9766526350752, 2.16657914911448, 2.01955890337827, 1.1080527140292, 
1.16614766657035, 1.04478527637105, 0.980792736677819, 0.818000882117776, 
0.656157422806534, 1.07223822052094, 0.799912719334531, 0.4365715090508, 
0.824331627537106, 1.19478221856558, 1.06047128780385, 1.54822823084764, 
0.582397279167692)), class = "data.frame", row.names = c("3", 
"4", "5", "6", "8", "12", "14", "15", "17", "18", "19", "20", 
"21", "22", "23", "25", "26", "28", "29", "30", "32", "33", "34", 
"35", "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", 
"46", "47", "48", "49"))

opt_seq<-seq(min(data$Value), max(data$Value), length.out = 25)
sel_plots <- sapply(opt_seq, function(i) which.min(abs(data$Value - i)))#25 plots
length(unique(sel_plots))

非常感谢您的帮助!!

你可以试试:

sel_plots <- logical(nrow(data))
for(i in opt_seq) {
  sel_plots[which(!sel_plots)[which.min(abs(data$Value[!sel_plots] - i))]] <- TRUE
}
sel_plots <- which(sel_plots)
length(unique(sel_plots))
#[1] 25

这样做的一种方法是在 for 循环中找到具有 absolute rankwhich.min 的元素,并在每次迭代后删除该元素。

y <- data$Value  ## copy values column
r <- c()  ## initialize result vector

for (x in opt_seq) {
  i <- which.min(rank(abs(x - y)))
  r <- c(r, y[i])
  y <- y[-i]
}
r
# [1] 0.2931462 0.4365715 0.4397165 0.5397179 ...
stopifnot(!any(duplicated(r)) & length(r) == 25)