将行合并为一个以进行重复观察,并将多个非重复变量连接到 r 数据框中的单个变量

collapse rows into one for repeated observations and concatenate multiple non-repeating variables into single variable in r data frame

很确定没有人问过这个问题。

> have
     x1     x2        x3
1 Apple Banana Potassium
2 Apple Banana  Thiamine

> want
     x1     x2                   x3
1 Apple Banana Potassium / Thiamine

x1 和 x2 类似于 ID 变量,x3 是分类值。此处提出了一个类似的问题:Condensing multiple observations on the same individual into a single row, adding multiples as new columns 但它会导致额外的列具有 NA 值,用于无法获得分类值的观察值。我试图将 NA 值转换为“”并将它们粘贴在一起。结果并不理想。当有多个 NA 值被空白替换时看起来像这样。 "Potassium / / / "

这是我的尝试。我修改了你的数据;我用 NA 添加了两行。我将 x3 转换为字符并将 NA 替换为“”。使用 toStringsummarise,我组合了 x3 中的所有元素。最后,我按照您的问题中的描述将 , 更改为 /

mydf <- structure(list(x1 = structure(c(1L, 1L, 1L, 1L), .Label = "Apple", class = "factor"), 
x2 = structure(c(1L, 1L, 1L, 1L), .Label = "Banana", class = "factor"), 
x3 = structure(c(1L, 2L, NA, NA), .Label = c("Potassium", 
"Thiamine"), class = "factor")), .Names = c("x1", "x2", "x3"
), class = "data.frame", row.names = c("1", "2", "3", "4"))

#     x1     x2        x3
#1 Apple Banana Potassium
#2 Apple Banana  Thiamine
#3 Apple Banana      <NA>
#4 Apple Banana      <NA>

library(dplyr)
mutate(group_by(mydf, x1, x2), 
       x3 = replace(as.character(x3), !complete.cases(x3), "")) %>%
summarise(x3 = paste(x3, collapse = " / "))

#     x1     x2                         x3
#1 Apple Banana Potassium / Thiamine /  /