基于 R 中每个数据集样本大小的阈值

Threshold based on sample size in each dataset in R

我想在 R 中创建一个基于样本大小的阈值,并且在每个数据集中都会有所不同。假设我有以下两个数据集:

data<-matrix( rnorm(200*10,mean=0,sd=1), 200, 10) 
colnames(data) <- c("X1", "X2", "X3", "X4", "X5", "X6","X7","X8","X9","X10")

data<-matrix( rnorm(100*10,mean=0,sd=1), 100, 10) 
colnames(data) <- c("X1", "X2", "X3", "X4", "X5", "X6","X7","X8","X9","X10")

我想创建一个阈值,每次都会有所不同,并将根据样本大小进行计算。我想把下图编码成一个阈值,左边是阈值,右边是每一列的样本量(=n)。我怎样才能做到这一点?

据我了解,您想获取矩阵的行数,然后选择一个阈值。

一种方法是使用样本大小和阈值创建 table。然后,给定一个样本大小,过滤 table 以便最后一行是阈值并提取该值。

# create table of thresholds
thresholds <- data.frame(n = c(50, 60, 70, 85), # etc
                         threshold = c(.75, .7, .65, .6) )

sample_size <- nrow(data)

tail(thresholds[thresholds$n <= sample_size, "threshold"], 1)
# Or if you're more comfortable with dplyr / tidyverse
thresholds <- tibble(n = c(50, 60, 70, 85), # etc
                         threshold = c(.75, .7, .65, .6) )
thresholds %>% 
   filter(n <= nrow(data)) %>%
   pull(threshold) %>%
   last()

如果您计划对多个数据集执行此操作,您可以创建一个函数:

get_threshold <- function(data, .thresholds = thresholds) {
   .thresholds %>% 
      filter(n <= nrow(data)) %>%
      pull(threshold) %>%
      last()
}