将多列与一列进行比较，并将符合条件的列名称 return

Question

我有一个这样的 df:

df <- data.frame(
  Death = as.Date(c("2017-09-20")),
  First_Date = as.Date(c("2016-09-09", "2018-09-20", "2016-09-09")),
  Second_Date = as.Date(c("2019-05-02", "2019-09-20", "2016-09-09")),
  new = c("Second_Date", "First_Date, Second_Date", NA),
  row_number = c(1,2,3))

我想创建列 'new'，如果任何包含单词“Date”的列在 'Death' 日期列之后，那么我想 return 名称这些列。例如，您可以看到：

第一行，Second_Date是死后所以新=Second_date
在第二行，First_Date和Second_Date都是死后所以新=First_Date，Second_Date
在第三行，none 的日期是在死后，所以新 = NA

到目前为止我有这个代码：

df2 <- df %>% mutate(new = Reduce(coalesce, across(contains("Date"), ~ ifelse(. > Death, cur_column(), NA_character_))))

但我只能 return 从左到右满足此条件的第一列。任何帮助将不胜感激。

Answer 1

我们循环 across 列名称中以 '_Date' 作为后缀的列，如果值大于 Death 列，则获取列名称 (cur_column())，return 通过修改 .names 作为新列，然后使用 unite 将这些 _new 列合并为一个

library(dplyr)
library(tidyr)
df %>% 
   mutate(across(ends_with("_Date"),
    ~ case_when(.x > Death ~ cur_column()), .names = "{.col}_new")) %>% 
   unite(new, ends_with("_new"), na.rm = TRUE, sep = ", ") %>%
   na_if("")

-输出

     Death First_Date Second_Date row_number                     new
1 2017-09-20 2016-09-09  2019-05-02          1             Second_Date
2 2017-09-20 2018-09-20  2019-09-20          2 First_Date, Second_Date
3 2017-09-20 2016-09-09  2016-09-09          3                    <NA>

注意：coalesce return 只是第一个 non-NA 行中的值

Answer 2

另一种可能的解决方案，在基础 R 中：

df <- data.frame(
  Death = as.Date(c("2017-09-20")),
  First_Date = as.Date(c("2016-09-09", "2018-09-20", "2016-09-09")),
  Second_Date = as.Date(c("2019-05-02", "2019-09-20", "2016-09-09")))
         
df$new <- apply(df, 1, \(x) if (any(x[1] < x[2:3])) 
             paste(names(df)[c(F, x[1] < x[2:3])], collapse = ", ") else NA)

df

#>        Death First_Date Second_Date                     new
#> 1 2017-09-20 2016-09-09  2019-05-02             Second_Date
#> 2 2017-09-20 2018-09-20  2019-09-20 First_Date, Second_Date
#> 3 2017-09-20 2016-09-09  2016-09-09                    <NA>

将多列与一列进行比较，并将符合条件的列名称 return

compare multiple columns to one column and return names of columns that match condition

r

dplyr