将行合并为一个以进行重复观察,并将多个非重复变量连接到 r 数据框中的单个变量
collapse rows into one for repeated observations and concatenate multiple non-repeating variables into single variable in r data frame
很确定没有人问过这个问题。
> have
x1 x2 x3
1 Apple Banana Potassium
2 Apple Banana Thiamine
> want
x1 x2 x3
1 Apple Banana Potassium / Thiamine
x1 和 x2 类似于 ID 变量,x3 是分类值。此处提出了一个类似的问题:Condensing multiple observations on the same individual into a single row, adding multiples as new columns 但它会导致额外的列具有 NA 值,用于无法获得分类值的观察值。我试图将 NA 值转换为“”并将它们粘贴在一起。结果并不理想。当有多个 NA 值被空白替换时看起来像这样。 "Potassium / / / "
这是我的尝试。我修改了你的数据;我用 NA 添加了两行。我将 x3 转换为字符并将 NA 替换为“”。使用 toString
和 summarise
,我组合了 x3
中的所有元素。最后,我按照您的问题中的描述将 ,
更改为 /
。
mydf <- structure(list(x1 = structure(c(1L, 1L, 1L, 1L), .Label = "Apple", class = "factor"),
x2 = structure(c(1L, 1L, 1L, 1L), .Label = "Banana", class = "factor"),
x3 = structure(c(1L, 2L, NA, NA), .Label = c("Potassium",
"Thiamine"), class = "factor")), .Names = c("x1", "x2", "x3"
), class = "data.frame", row.names = c("1", "2", "3", "4"))
# x1 x2 x3
#1 Apple Banana Potassium
#2 Apple Banana Thiamine
#3 Apple Banana <NA>
#4 Apple Banana <NA>
library(dplyr)
mutate(group_by(mydf, x1, x2),
x3 = replace(as.character(x3), !complete.cases(x3), "")) %>%
summarise(x3 = paste(x3, collapse = " / "))
# x1 x2 x3
#1 Apple Banana Potassium / Thiamine / /
很确定没有人问过这个问题。
> have
x1 x2 x3
1 Apple Banana Potassium
2 Apple Banana Thiamine
> want
x1 x2 x3
1 Apple Banana Potassium / Thiamine
x1 和 x2 类似于 ID 变量,x3 是分类值。此处提出了一个类似的问题:Condensing multiple observations on the same individual into a single row, adding multiples as new columns 但它会导致额外的列具有 NA 值,用于无法获得分类值的观察值。我试图将 NA 值转换为“”并将它们粘贴在一起。结果并不理想。当有多个 NA 值被空白替换时看起来像这样。 "Potassium / / / "
这是我的尝试。我修改了你的数据;我用 NA 添加了两行。我将 x3 转换为字符并将 NA 替换为“”。使用 toString
和 summarise
,我组合了 x3
中的所有元素。最后,我按照您的问题中的描述将 ,
更改为 /
。
mydf <- structure(list(x1 = structure(c(1L, 1L, 1L, 1L), .Label = "Apple", class = "factor"),
x2 = structure(c(1L, 1L, 1L, 1L), .Label = "Banana", class = "factor"),
x3 = structure(c(1L, 2L, NA, NA), .Label = c("Potassium",
"Thiamine"), class = "factor")), .Names = c("x1", "x2", "x3"
), class = "data.frame", row.names = c("1", "2", "3", "4"))
# x1 x2 x3
#1 Apple Banana Potassium
#2 Apple Banana Thiamine
#3 Apple Banana <NA>
#4 Apple Banana <NA>
library(dplyr)
mutate(group_by(mydf, x1, x2),
x3 = replace(as.character(x3), !complete.cases(x3), "")) %>%
summarise(x3 = paste(x3, collapse = " / "))
# x1 x2 x3
#1 Apple Banana Potassium / Thiamine / /