R附加2个具有不同列的数据框

R append 2 data frames with different columns

我想将 dfToAdd 添加到 df,其中第一个缺少列。重要的细节是 df 有两种类型的列。第一组列相互关联。 例如group="A" 表示 name="Group A" 和 color="Blue"。不能有 A-Group A-Red 的组合。 第二种类型的列相互关联。 动物="Dog" 动作="Bark" 我想添加第二个数据框,其中缺少第一种类型的列。这些列应填充第一种类型的列的组合,如以下 dfResult(行的顺序无关紧要):

df = data.frame(group = c("A", "A", "A", "B", "B", "B"),
                name = c("Group A", "Group A", "Group A", "Group B", "Group B", "Group B"),
                color = c("Blue", "Blue", "Blue", "Red", "Red", "Red"),
                animal = c("Dog", "Cat", "Mouse", "Dog", "Cat", "Mouse"),
                action = c("Bark", "Meow", "Squeak", "Bark", "Meow", "Squeak")
                )

dfToAdd = data.frame(animal = c("Lion", "Bird"), 
                     action = c("Roar", "Chirp"))

dfResult = data.frame(group = c("A", "A", "A", "B", "B", "B", "A", "A", "B", "B"),
                      name = c("Group A", "Group A", "Group A", "Group B", "Group B", "Group B", "Group A", "Group A", "Group B", "Group B"),
                      color = c("Blue", "Blue", "Blue", "Red", "Red", "Red", "Blue", "Blue", "Red", "Red"),
                      animal = c("Dog", "Cat", "Mouse", "Dog", "Cat", "Mouse", "Lion", "Bird", "Lion", "Bird"),
                      action = c("Bark", "Meow", "Squeak", "Bark", "Meow", "Squeak", "Roar", "Chirp", "Roar", "Chirp"))
> df
  group    name color animal action
1     A Group A  Blue    Dog   Bark
2     A Group A  Blue    Cat   Meow
3     A Group A  Blue  Mouse Squeak
4     B Group B   Red    Dog   Bark
5     B Group B   Red    Cat   Meow
6     B Group B   Red  Mouse Squeak
> dfToAdd
  animal action
1   Lion   Roar
2   Bird  Chirp
> dfResult
   group    name color animal action
1      A Group A  Blue    Dog   Bark
2      A Group A  Blue    Cat   Meow
3      A Group A  Blue  Mouse Squeak
4      B Group B   Red    Dog   Bark
5      B Group B   Red    Cat   Meow
6      B Group B   Red  Mouse Squeak
7      A Group A  Blue   Lion   Roar
8      A Group A  Blue   Bird  Chirp
9      B Group B   Red   Lion   Roar
10     B Group B   Red   Bird  Chirp

但第一种类型的列(组、名称、颜色)尚不完全清楚。我正在处理任意数量的多个分组变量。你可以想象,可能有也可能没有描述 column="Group A is a good group" 或 date="2020.04.13"。我们只确定第二类的列:动物和动作。

在写这篇文章时,我想到了在 tidyr 的 [complete][2] 函数的两边使用 [nesting][1] 并手动检测缺失的列(也许有更优雅的解决方案):

# First find all grouping columns
groupCols = colnames(df)[!(colnames(df) %in% colnames(dfToAdd))]
otherCols = colnames(df)[colnames(df) %in% colnames(dfToAdd)]
# Populate missing columns with first grouping appearing in the df
dfToAdd[groupCols] = df[1, groupCols]
# rbind it to append
dfResult = rbind(df, dfToAdd)
# Now we have obvious missing combinations, tidyr::complete accepts nesting information to generate combinations only for those, which needs to be different.
dfResult %>% tidyr::complete(tidyr::nesting(!!! syms(otherCols)), tidyr::nesting(!!! syms(groupCols)))

编辑:实际上意识到我在最后使用了未知的列名。这真的行不通。我需要将 groupCols(字符向量)提供给第二次嵌套调用。

edit2:现在多亏了 akrun 的回答,我也可以更正这个了。

我们可以在单个 %>% 中完成此操作,方法是 slice 从 'df' 的第一行开始,select 不是 [=18] 中的列=],将其与 'dfToAdd' 绑定,然后将行与 'df' 绑定并使用 complete

library(dplyr)
library(tidyr)
library(rlang)
library(purrr)
df %>%
       slice(1) %>%
       select(-names(dfToAdd)) %>%  
       uncount(nrow(dfToAdd))  %>%     
       bind_cols(dfToAdd) %>%
       bind_rows(df, .) %>% 
       complete(nesting(!!! syms(names(dfToAdd))), 
             nesting(!!! syms(setdiff(names(.), names(dfToAdd)))))
# A tibble: 10 x 5
#   animal action group name    color
# * <fct>  <fct>  <fct> <fct>   <fct>
# 1 Cat    Meow   A     Group A Blue 
# 2 Cat    Meow   B     Group B Red  
# 3 Dog    Bark   A     Group A Blue 
# 4 Dog    Bark   B     Group B Red  
# 5 Mouse  Squeak A     Group A Blue 
# 6 Mouse  Squeak B     Group B Red  
# 7 Bird   Chirp  A     Group A Blue 
# 8 Bird   Chirp  B     Group B Red  
# 9 Lion   Roar   A     Group A Blue 
#10 Lion   Roar   B     Group B Red