聚合 toString 忽略 NA 值/连接包括 NAs 的行

aggregate toString ignoring NA values / Concatenate rows including NAs

目的是根据唯一标识符将行连接到字符串(其中包含 NA)

id     year    cat_1       cat_2
001    2021    Too high    NA
001    2021    YOY error   YOY error
002    2021    Too high    Too low    
002    2021    NA          YOY error
003    2021    Too high    NA
003    2021    YOY error   NA

寻找比以下更有效的解决方案:

df <- df %>% group_by(id, year) %>% summarise(across(everything(), toString, na.rm = TRUE))

这导致 NA 连接成字符串

id     year    cat_1                  cat_2
001    2021    Too high, YOY error    NA, YOY error
002    2021    Too high, NA           Too low, YOY error  
003    2021    Too high, YOY error    NA, NA

然后将字符串 NAs 替换为空格,将空格替换为 NA:

df[] <- lapply(df, gsub, pattern = "NA, ", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = ", NA", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = "NA", replacement = "", fixed = TRUE)
df[df==""] <- NA

我假设我在使用摘要时误用了 na.rm?或者有不同的方法吗?

df %>%
  group_by(id, year) %>%
  summarise(across(everything(), ~toString(na.omit(.x))))

# A tibble: 3 x 4
# Groups:   id [3]
     id  year cat_1               cat_2               
  <int> <int> <chr>               <chr>               
1     1  2021 Too high, YOY error "YOY error"         
2     2  2021 Too high            "Too low, YOY error"
3     3  2021 Too high, YOY error ""                  

基数 R:

aggregate(.~id + year, df, \(x)toString(na.omit(x)), na.action = identity)

  id year               cat_1              cat_2
1  1 2021 Too high, YOY error          YOY error
2  2 2021            Too high Too low, YOY error
3  3 2021 Too high, YOY error