我如何根据该列的汇总数据自动删除 R 中的列？

Question

我有一个用于创建自动化仪表板的数据集。从本质上讲，它是在逐月查看特定条件与医疗保健机构的护理成本之间的关系。我想要做的是伪代码：

dataset %>% select(-c("columns where the average value is lower than X"))

谷歌搜索似乎无法让我接近。

Answer 1

我们可以使用select_if

library(dplyr)
val <- 10
dataset %>%
    select_if(~ is.numeric(.) && mean(.) < val)

或使用base R

dataset[, names(which(colMeans(dataset[sapply(dataset, class) == 
            "numeric"]) < val)), drop = FALSE]
#   col3
#1    3
#2    4
#3    7

dataset <- data.frame(col1 = c('A', 'B', 'C'), col2 = c(10, 8, 15),
     col3 = c(3, 4, 7), stringsAsFactors = FALSE)

How would I automate dropping a column in R based on summary data for that column?