对所有值都是 r 中相同字符值的列进行子集化

Subsetting a column where all values are the same character value in r

我正在尝试识别列具有单个字符值的数据框列 tree

这是一个示例数据集。

df <- data.frame(id = c(1,2,3,4,5),
                 var.1 = c(5,6,7,"tree",4),
                 var.2 = c("tree","tree","tree","tree","tree"),
                 var.3 = c(4,5,8,9,1))

> df
  id var.1 var.2 var.3
1  1     5  tree     4
2  2     6  tree     5
3  3     7  tree     8
4  4  tree  tree     9
5  5     4  tree     1

我会标记 Var.2 变量,因为它包含所有 "tree 值。

flagged [1] "var.2"

有什么想法吗? 谢谢!

对于每一列,检查所有元素是否都等于第一个元素。

df <- data.frame(id = c(1,2,3,4,5),
                 var.1 = c(5,6,7,"tree",4),
                 var.2 = c("tree","tree","tree","tree","tree"),
                 var.3 = c(4,5,8,9,1))


names(df)[sapply(df, function(x) all(x == x[1]))]
#> [1] "var.2"

reprex package (v2.0.1)

创建于 2022-02-17

使用 dplyr,你可以做到

library(dplyr)

flagged <- df %>%
  select(where(~n_distinct(.x) == 1 && unique(.x) == "tree")) %>%
  names()

你 select 所有只有一个等于“树”的不同值的列,然后提取列名。