dplyr: mutate_at + coalesce: 列的动态名称

Question

我一直在尝试将 mutate_at 与 coalesce 结合起来，以防动态生成列名。

在我的示例中只有五列，但在实际数据中有更多列（并非所有列都应包含在 coalesce 步骤中）。

示例 DF：

data_example <- data.frame(
  aa = c(1, NA, NA),
  bb = c(NA, NA, 2),
  cc = c(6, 7, 8),
  aa_extra = c(2, 2, NA),
  bb_extra = c(1, 2, 3)
)

预期输出：

  aa bb cc aa_extra bb_extra
1  1  1  6        2        1
2  2  2  7        2        2
3 NA  2  8       NA        3

输出为structure:

structure(list(aa = c(1, 2, NA), bb = c(1, 2, 2), cc = c(6, 7, 
8), aa_extra = c(2, 2, NA), bb_extra = c(1, 2, 3)), class = "data.frame", row.names = c(NA, 
-3L))

我试过类似的方法，但没有成功（“只有字符串可以转换为符号”）。我想避免创建额外的变量，只需将所有内容都包含在 mutate_at 表达式中，因为这是更长的 dplyr“流”的一部分。

data_example %>%
  dplyr::mutate_at(
    gsub("_extra", "", grep("_extra$",
                            colnames(.),
                            perl = T,
                            value = T)),
    dplyr::funs(
      dplyr::coalesce(., !!! dplyr::sym(paste0(., "_extra")))
    )
  )

我也试过这个（没有错误，但是 bb 列的值是错误的）：

data_example %>%
  dplyr::mutate_at(
    gsub("_extra", "", grep("_extra$",
                            colnames(.),
                            perl = T,
                            value = T)),
    dplyr::funs(
      dplyr::coalesce(., !!as.name(paste0(names(.), "_extra")))
    )
  )

如何获取已处理列的名称并将其传递给coalesce？

Answer 1

我们可以将数据集split去掉列名子串("_extra")后变成data.frames的list，然后用map循环通过 list、coalesce 列，然后 bind 使用原始数据集中的“_extra”列

library(tidyverse)
data_example %>% 
   split.default(str_remove(names(.), "_extra")) %>%
   map_df(~ coalesce(!!! .x)) %>%
   #or use
   # map_df(reduce, coalesce) %>%
   bind_cols(., select(data_example, ends_with("extra")))
# A tibble: 3 x 5
#     aa    bb    cc aa_extra bb_extra
#  <dbl> <dbl> <dbl>    <dbl>    <dbl>
#1     1     1     6        2        1
#2     2     2     7        2        2
#3    NA     2     8       NA        3

Answer 2

将 data.table 用于 melt 和 dcast 因为我永远不记得 spread 和 gather 是如何工作的

library(data.table)
library(dplyr)

data_example %>% 
  mutate(row = row_number()) %>% 
  melt('row') %>% 
  group_by(g = sub('_*$', '', variable), row) %>% 
  mutate(value = reduce(value, coalesce)) %>% 
  dcast(row ~ variable) %>% 
  select(-row)

#   aa bb cc aa_extra bb_extra
# 1  1  1  6        1        1
# 2  2  2  7        2        2
# 3 NA  2  8       NA        2

Answer 3

猜想现在可以使用 mutate + across

实现预期的结果

data_example %>% 
  mutate(across(c(str_subset(names(.), "_extra") %>% str_remove("_extra")) ,
                ~ coalesce( ., get(str_c(cur_column(), "_extra"))  ))) 

  aa bb cc aa_extra bb_extra
1  1  1  6        2        1
2  2  2  7        2        2
3 NA  2  8       NA        3

dplyr: mutate_at + coalesce: 列的动态名称

dplyr: mutate_at + coalesce: dynamic names of columns

r

dplyr

rlang