聚合 toString 忽略 NA 值/连接包括 NAs 的行
aggregate toString ignoring NA values / Concatenate rows including NAs
目的是根据唯一标识符将行连接到字符串(其中包含 NA)
id year cat_1 cat_2
001 2021 Too high NA
001 2021 YOY error YOY error
002 2021 Too high Too low
002 2021 NA YOY error
003 2021 Too high NA
003 2021 YOY error NA
寻找比以下更有效的解决方案:
df <- df %>% group_by(id, year) %>% summarise(across(everything(), toString, na.rm = TRUE))
这导致 NA 连接成字符串
id year cat_1 cat_2
001 2021 Too high, YOY error NA, YOY error
002 2021 Too high, NA Too low, YOY error
003 2021 Too high, YOY error NA, NA
然后将字符串 NAs 替换为空格,将空格替换为 NA:
df[] <- lapply(df, gsub, pattern = "NA, ", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = ", NA", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = "NA", replacement = "", fixed = TRUE)
df[df==""] <- NA
我假设我在使用摘要时误用了 na.rm?或者有不同的方法吗?
df %>%
group_by(id, year) %>%
summarise(across(everything(), ~toString(na.omit(.x))))
# A tibble: 3 x 4
# Groups: id [3]
id year cat_1 cat_2
<int> <int> <chr> <chr>
1 1 2021 Too high, YOY error "YOY error"
2 2 2021 Too high "Too low, YOY error"
3 3 2021 Too high, YOY error ""
基数 R:
aggregate(.~id + year, df, \(x)toString(na.omit(x)), na.action = identity)
id year cat_1 cat_2
1 1 2021 Too high, YOY error YOY error
2 2 2021 Too high Too low, YOY error
3 3 2021 Too high, YOY error
目的是根据唯一标识符将行连接到字符串(其中包含 NA)
id year cat_1 cat_2
001 2021 Too high NA
001 2021 YOY error YOY error
002 2021 Too high Too low
002 2021 NA YOY error
003 2021 Too high NA
003 2021 YOY error NA
寻找比以下更有效的解决方案:
df <- df %>% group_by(id, year) %>% summarise(across(everything(), toString, na.rm = TRUE))
这导致 NA 连接成字符串
id year cat_1 cat_2
001 2021 Too high, YOY error NA, YOY error
002 2021 Too high, NA Too low, YOY error
003 2021 Too high, YOY error NA, NA
然后将字符串 NAs 替换为空格,将空格替换为 NA:
df[] <- lapply(df, gsub, pattern = "NA, ", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = ", NA", replacement = "", fixed = TRUE)
df[] <- lapply(df, gsub, pattern = "NA", replacement = "", fixed = TRUE)
df[df==""] <- NA
我假设我在使用摘要时误用了 na.rm?或者有不同的方法吗?
df %>%
group_by(id, year) %>%
summarise(across(everything(), ~toString(na.omit(.x))))
# A tibble: 3 x 4
# Groups: id [3]
id year cat_1 cat_2
<int> <int> <chr> <chr>
1 1 2021 Too high, YOY error "YOY error"
2 2 2021 Too high "Too low, YOY error"
3 3 2021 Too high, YOY error ""
基数 R:
aggregate(.~id + year, df, \(x)toString(na.omit(x)), na.action = identity)
id year cat_1 cat_2
1 1 2021 Too high, YOY error YOY error
2 2 2021 Too high Too low, YOY error
3 3 2021 Too high, YOY error