基于 R 中每个数据集样本大小的阈值

Question

我想在 R 中创建一个基于样本大小的阈值，并且在每个数据集中都会有所不同。假设我有以下两个数据集：

data<-matrix( rnorm(200*10,mean=0,sd=1), 200, 10) 
colnames(data) <- c("X1", "X2", "X3", "X4", "X5", "X6","X7","X8","X9","X10")

data<-matrix( rnorm(100*10,mean=0,sd=1), 100, 10) 
colnames(data) <- c("X1", "X2", "X3", "X4", "X5", "X6","X7","X8","X9","X10")

我想创建一个阈值，每次都会有所不同，并将根据样本大小进行计算。我想把下图编码成一个阈值，左边是阈值，右边是每一列的样本量（=n）。我怎样才能做到这一点？

Answer 1

据我了解，您想获取矩阵的行数，然后选择一个阈值。

一种方法是使用样本大小和阈值创建 table。然后，给定一个样本大小，过滤 table 以便最后一行是阈值并提取该值。

# create table of thresholds
thresholds <- data.frame(n = c(50, 60, 70, 85), # etc
                         threshold = c(.75, .7, .65, .6) )

sample_size <- nrow(data)

tail(thresholds[thresholds$n <= sample_size, "threshold"], 1)

# Or if you're more comfortable with dplyr / tidyverse
thresholds <- tibble(n = c(50, 60, 70, 85), # etc
                         threshold = c(.75, .7, .65, .6) )
thresholds %>% 
   filter(n <= nrow(data)) %>%
   pull(threshold) %>%
   last()

如果您计划对多个数据集执行此操作，您可以创建一个函数：

get_threshold <- function(data, .thresholds = thresholds) {
   .thresholds %>% 
      filter(n <= nrow(data)) %>%
      pull(threshold) %>%
      last()
}

基于 R 中每个数据集样本大小的阈值

Threshold based on sample size in each dataset in R

loops

if-statement

r

threshold