r dplyr ends_with 多个字符串匹配

Question

我可以使用 dplyr::select(ends_with) 到 select 符合任何多个条件的列名吗？考虑到我的列名，我想使用 ends with 而不是 contains 或 matches，因为我想要 select 的字符串在列名的末尾是相关的，但也可能出现在其他人的中间。例如，

df <- data.frame(a10 = 1:4,
             a11 = 5:8,
             a20 = 1:4,
             a12 = 5:8)

我想 select 以 1 或 2 结尾的列只包含 a11 和 a12 列。 select(ends_with) 是最好的方法吗？

谢谢！

Answer 1

我不知道 ends_with() 是否是执行此操作的最佳方法，但您也可以使用逻辑索引在 base R 中执行此操作。

# Extract the last character of the column names, and test if it is "1" or "2"
lgl_index <- substr(x     = names(df), 
                    start = nchar(names(df)), 
                    stop  = nchar(names(df))) %in% c("1", "2")

使用此索引，您可以按如下方式对数据帧进行子集化

df[, lgl_index]
  a11 a12
1   5   5
2   6   6
3   7   7
4   8   8

或 dplyr::select()

select(df, which(lgl_index))
  a11 a12
1   5   5
2   6   6
3   7   7
4   8   8

仅保留以 1 或 2 结尾的列。

Answer 2

您也可以使用正则表达式来做到这一点。我知道您最初不想使用匹配，但如果您使用 "end of string" 符号 $，它实际上工作得很好。用 | 分隔您的各种结尾。

df <- data.frame(a10 = 1:4,
                 a11 = 5:8,
                 a20 = 1:4,
                 a12 = 5:8)

df %>% select(matches('1$|2$'))
  a11 a12
1   5   5
2   6   6
3   7   7
4   8   8

如果您有一个包含长列表的更复杂的示例，请使用 paste0 和 collapse = '|'。

dff <- data.frame(a11 = 1:3,
                  a12 = 2:4,
                  a13 = 3:5,
                  a16 = 5:7,
                  my_cat = LETTERS[1:3],
                  my_dog = LETTERS[5:7],
                  my_snake = LETTERS[9:11])

my_cols <- paste0(c(1,2,6,'dog','cat'), 
                  '$', 
                  collapse = '|')

dff %>% select(matches(my_cols))

  a11 a12 a16 my_cat my_dog
1   1   2   5      A      E
2   2   3   6      B      F
3   3   4   7      C      G

Answer 3

从版本 1.0.0 开始，您可以使用 !（否定）、&（与）和 [=14= 等布尔逻辑组合多个 select离子] （要么）。

### Install development version on GitHub first until CRAN version is available
# install.packages("devtools")
# devtools::install_github("tidyverse/dplyr")
library(dplyr, warn.conflicts = FALSE)

df <- data.frame(a10 = 1:4,
                 a11 = 5:8,
                 a20 = 1:4,
                 a12 = 5:8)

df %>% 
  select(ends_with("1") | ends_with("2"))
#>   a11 a12
#> 1   5   5
#> 2   6   6
#> 3   7   7
#> 4   8   8

或使用 num_range() 到 select 所需的列

df %>% 
  select(num_range(prefix = "a", range = 11:12))
#>   a11 a12
#> 1   5   5
#> 2   6   6
#> 3   7   7
#> 4   8   8

^{由 reprex package (v0.3.0)}

于 2020 年 2 月 17 日创建

r dplyr ends_with 多个字符串匹配

r dplyr ends_with multiple string matches

select

r

string-matching

ends-with

dplyr