不同长度数据帧的聚合因子

aggregating factors of different length dataframes

我有各种数据框,例如:

Var1 "Bananas" "Apples" "Oranges" 
Freq    "2"      "2"       "1"              


Var2 "Bananas" "Carrots" "Strawberries" "Apples"
Freq    "3"       "2"        "3"          "4"              

作为输出,我想要一个数据帧/table/类似的东西,给出每个输入数据帧的出现次数,包括在一个很好的概述中出现的 0 次。所以像:

Var     "Bananas" "Apples" "Oranges" "Carrots" "Strawberries"
Sample1   "2"        "2"      "1"       "0"         "0"
Sample2   "3"        "4"      "0"       "2"         "3"

我想不出任何解决方案,尤其是因为 data.frames 不允许不同的长度。

你应该看看 ?merge:

set.seed(1234)
dat1 <- data.frame(var1 = LETTERS[1:5], freq = sample(1:100, 5))
dat2 <- data.frame(var2 = LETTERS[3:7], freq = sample(1:100, 5))

res <- merge(dat1, dat2, by.x = "var1", by.y = "var2", all = TRUE)
res[is.na(res)] <- 0
res
#   var1 freq.x freq.y
# 1    A     12      0
# 2    B     62      0
# 3    C     60     65
# 4    D     61      1
# 5    E     83     23
# 6    F      0    100
# 7    G      0     50

请注意,NA0 的含义截然不同。查看帮助文件 ?dplyr::join

library(dplyr)
df1 <- data.frame(Var1 =c("Bananas", "Apples", "Oranges"), 
           Freq =c(2,2,1))
df2 <- data.frame(Var1 =c("Bananas", "Carrots",
                          "Strawberries", "Apples"), 
                  Freq =c(3,2,3,4))
full_join(df1,df2, by = "Var1")