r - 在替换 NA 时合并组中的行
r - merge rows in group while replacing NAs
我试图找到这个问题的答案,但找不到。如果有答案,我深表歉意,并会立即删除我的问题。
我正在尝试将几行合并为一行(此计算应在组上单独进行,在这种情况下变量 id
可用于分组),这样就不会留下任何 NA 值。
# initial dataframe
df_start <- data.frame(
id = c("as", "as", "as", "as", "as", "bs", "bs", "bs", "bs", "bs"),
b = c(NA, NA, NA, NA, "A", NA, NA, 6, NA, NA),
c = c(2, NA, NA, NA, NA, 7, NA, NA, NA, NA),
d = c(NA, 4, NA, NA, NA, NA, 8, NA, NA, NA),
e = c(NA, NA, NA, 3, NA, NA, NA, NA, "B", NA),
f = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 10))
# desired output
df_end <- data.frame(id = c("as", "bs"),
b = c("A", 6),
c = c(2, 7),
d = c(4, 8),
e = c(3,"B"),
f = c(5, 10))
不必删题,可能对部分用户有帮助。这将每个组汇总到每个列的第一个非 NA 出现。
library(dplyr)
df_start <- data.frame(
id = c("as", "as", "as", "as", "as", "bs", "bs", "bs", "bs", "bs"),
b = c(NA, NA, NA, NA, "A", NA, NA, 6, NA, NA),
c = c(2, NA, NA, NA, NA, 7, NA, NA, NA, NA),
d = c(NA, 4, NA, NA, NA, NA, 8, NA, NA, NA),
e = c(NA, NA, NA, 3, NA, NA, NA, NA, "B", NA),
f = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 10))
df_start %>%
group_by(id) %>%
summarise_all(list(~first(na.omit(.))))
输出:
# A tibble: 2 x 6
id b c d e f
<fct> <fct> <dbl> <dbl> <fct> <dbl>
1 as A 2. 4. 3 5.
2 bs 6 7. 8. B 10.
当然,如果每个列的每个组的值出现多次,您当然会丢失一些数据。
希望这对您有所帮助,使用 dplyr
df_start <- sapply(df_start, as.character)
df_start[is.na(df_start)] <- " "
df_start <- as.data.frame(df_start)
library(dplyr)
df_start %>%
group_by(id) %>%
summarise_all(funs(trimws(paste(., collapse = '')))) -> df
我试图找到这个问题的答案,但找不到。如果有答案,我深表歉意,并会立即删除我的问题。
我正在尝试将几行合并为一行(此计算应在组上单独进行,在这种情况下变量 id
可用于分组),这样就不会留下任何 NA 值。
# initial dataframe
df_start <- data.frame(
id = c("as", "as", "as", "as", "as", "bs", "bs", "bs", "bs", "bs"),
b = c(NA, NA, NA, NA, "A", NA, NA, 6, NA, NA),
c = c(2, NA, NA, NA, NA, 7, NA, NA, NA, NA),
d = c(NA, 4, NA, NA, NA, NA, 8, NA, NA, NA),
e = c(NA, NA, NA, 3, NA, NA, NA, NA, "B", NA),
f = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 10))
# desired output
df_end <- data.frame(id = c("as", "bs"),
b = c("A", 6),
c = c(2, 7),
d = c(4, 8),
e = c(3,"B"),
f = c(5, 10))
不必删题,可能对部分用户有帮助。这将每个组汇总到每个列的第一个非 NA 出现。
library(dplyr)
df_start <- data.frame(
id = c("as", "as", "as", "as", "as", "bs", "bs", "bs", "bs", "bs"),
b = c(NA, NA, NA, NA, "A", NA, NA, 6, NA, NA),
c = c(2, NA, NA, NA, NA, 7, NA, NA, NA, NA),
d = c(NA, 4, NA, NA, NA, NA, 8, NA, NA, NA),
e = c(NA, NA, NA, 3, NA, NA, NA, NA, "B", NA),
f = c(NA, NA, 5, NA, NA, NA, NA, NA, NA, 10))
df_start %>%
group_by(id) %>%
summarise_all(list(~first(na.omit(.))))
输出:
# A tibble: 2 x 6
id b c d e f
<fct> <fct> <dbl> <dbl> <fct> <dbl>
1 as A 2. 4. 3 5.
2 bs 6 7. 8. B 10.
当然,如果每个列的每个组的值出现多次,您当然会丢失一些数据。
希望这对您有所帮助,使用 dplyr
df_start <- sapply(df_start, as.character)
df_start[is.na(df_start)] <- " "
df_start <- as.data.frame(df_start)
library(dplyr)
df_start %>%
group_by(id) %>%
summarise_all(funs(trimws(paste(., collapse = '')))) -> df