dplyr::select() 有一些数据框中可能不存在的变量？

Question

我有一个辅助函数（比如 foo()），它将运行用于可能包含或不包含指定变量的各种数据帧。假设我有

library(dplyr)
d1 <- data_frame(taxon=1,model=2,z=3)
d2 <- data_frame(taxon=2,pss=4,z=3)

我想要select的变量是

vars <- intersect(names(data),c("taxon","model","z"))

也就是说，我想要 foo(d1) 到 return taxon、model 和 z 列，而 foo(d2) return只是 taxon 和 z。

如果 foo 包含 select(data,c(taxon,model,z))，则 foo(d2) 失败（因为 d2 不包含 model）。如果我使用 select(data,-pss) 那么 foo(d1) 也会同样失败。

如果我退出 tidyverse（只是 return data[vars]），我知道该怎么做，但我想知道是否有方便的方法来做到这一点 (1) select() 某种帮助者 (tidyselect::select_helpers) 或 (2) 与 tidyeval （我仍然还没有时间来解决我的问题！）

Answer 1

您可以使用 one_of()，它会在列不存在时发出警告，否则会选择正确的列：

d1 %>%
    select(one_of(c("taxon", "model", "z")))
d2 %>%
    select(one_of(c("taxon", "model", "z")))

Answer 2

使用内置 anscombe 数据框作为示例，注意 z 不是 anscombe 中的列：

anscombe %>% select(intersect(names(.), c("x1", "y1", "z")))

给予：

   x1    y1
1  10  8.04
2   8  6.95
3  13  7.58
4   9  8.81
5  11  8.33
6  14  9.96
7   6  7.24
8   4  4.26
9  12 10.84
10  7  4.82
11  5  5.68

Answer 3

另一种选择是select_if:

d2 %>% select_if(names(.) %in% c('taxon', 'model', 'z'))

# # A tibble: 1 x 2
#   taxon     z
#   <dbl> <dbl>
# 1     2     3

select_if 已被取代。使用 any_of 代替：

d2 %>% select(any_of(c('taxon', 'model', 'z')))
# # A tibble: 1 x 2
#   taxon     z
#   <dbl> <dbl>
# 1     2     3

在 R 中输入 ?dplyr::select，你会发现：

These helpers select variables from a character vector:

all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.

any_of(): Same as all_of(), except that no error is thrown for names that don't exist.

dplyr::select() 有一些数据框中可能不存在的变量？

dplyr::select() with some variables that may not exist in the data frame?

select

r

dplyr

nse

tidyselect