R附加2个具有不同列的数据框
R append 2 data frames with different columns
我想将 dfToAdd 添加到 df,其中第一个缺少列。重要的细节是 df 有两种类型的列。第一组列相互关联。
例如group="A" 表示 name="Group A" 和 color="Blue"。不能有 A-Group A-Red 的组合。
第二种类型的列相互关联。
动物="Dog" 动作="Bark"
我想添加第二个数据框,其中缺少第一种类型的列。这些列应填充第一种类型的列的组合,如以下 dfResult(行的顺序无关紧要):
df = data.frame(group = c("A", "A", "A", "B", "B", "B"),
name = c("Group A", "Group A", "Group A", "Group B", "Group B", "Group B"),
color = c("Blue", "Blue", "Blue", "Red", "Red", "Red"),
animal = c("Dog", "Cat", "Mouse", "Dog", "Cat", "Mouse"),
action = c("Bark", "Meow", "Squeak", "Bark", "Meow", "Squeak")
)
dfToAdd = data.frame(animal = c("Lion", "Bird"),
action = c("Roar", "Chirp"))
dfResult = data.frame(group = c("A", "A", "A", "B", "B", "B", "A", "A", "B", "B"),
name = c("Group A", "Group A", "Group A", "Group B", "Group B", "Group B", "Group A", "Group A", "Group B", "Group B"),
color = c("Blue", "Blue", "Blue", "Red", "Red", "Red", "Blue", "Blue", "Red", "Red"),
animal = c("Dog", "Cat", "Mouse", "Dog", "Cat", "Mouse", "Lion", "Bird", "Lion", "Bird"),
action = c("Bark", "Meow", "Squeak", "Bark", "Meow", "Squeak", "Roar", "Chirp", "Roar", "Chirp"))
> df
group name color animal action
1 A Group A Blue Dog Bark
2 A Group A Blue Cat Meow
3 A Group A Blue Mouse Squeak
4 B Group B Red Dog Bark
5 B Group B Red Cat Meow
6 B Group B Red Mouse Squeak
> dfToAdd
animal action
1 Lion Roar
2 Bird Chirp
> dfResult
group name color animal action
1 A Group A Blue Dog Bark
2 A Group A Blue Cat Meow
3 A Group A Blue Mouse Squeak
4 B Group B Red Dog Bark
5 B Group B Red Cat Meow
6 B Group B Red Mouse Squeak
7 A Group A Blue Lion Roar
8 A Group A Blue Bird Chirp
9 B Group B Red Lion Roar
10 B Group B Red Bird Chirp
但第一种类型的列(组、名称、颜色)尚不完全清楚。我正在处理任意数量的多个分组变量。你可以想象,可能有也可能没有描述 column="Group A is a good group" 或 date="2020.04.13"。我们只确定第二类的列:动物和动作。
在写这篇文章时,我想到了在 tidyr 的 [complete][2]
函数的两边使用 [nesting][1]
并手动检测缺失的列(也许有更优雅的解决方案):
# First find all grouping columns
groupCols = colnames(df)[!(colnames(df) %in% colnames(dfToAdd))]
otherCols = colnames(df)[colnames(df) %in% colnames(dfToAdd)]
# Populate missing columns with first grouping appearing in the df
dfToAdd[groupCols] = df[1, groupCols]
# rbind it to append
dfResult = rbind(df, dfToAdd)
# Now we have obvious missing combinations, tidyr::complete accepts nesting information to generate combinations only for those, which needs to be different.
dfResult %>% tidyr::complete(tidyr::nesting(!!! syms(otherCols)), tidyr::nesting(!!! syms(groupCols)))
编辑:实际上意识到我在最后使用了未知的列名。这真的行不通。我需要将 groupCols(字符向量)提供给第二次嵌套调用。
edit2:现在多亏了 akrun 的回答,我也可以更正这个了。
我们可以在单个 %>%
中完成此操作,方法是 slice
从 'df' 的第一行开始,select
不是 [=18] 中的列=],将其与 'dfToAdd' 绑定,然后将行与 'df' 绑定并使用 complete
library(dplyr)
library(tidyr)
library(rlang)
library(purrr)
df %>%
slice(1) %>%
select(-names(dfToAdd)) %>%
uncount(nrow(dfToAdd)) %>%
bind_cols(dfToAdd) %>%
bind_rows(df, .) %>%
complete(nesting(!!! syms(names(dfToAdd))),
nesting(!!! syms(setdiff(names(.), names(dfToAdd)))))
# A tibble: 10 x 5
# animal action group name color
# * <fct> <fct> <fct> <fct> <fct>
# 1 Cat Meow A Group A Blue
# 2 Cat Meow B Group B Red
# 3 Dog Bark A Group A Blue
# 4 Dog Bark B Group B Red
# 5 Mouse Squeak A Group A Blue
# 6 Mouse Squeak B Group B Red
# 7 Bird Chirp A Group A Blue
# 8 Bird Chirp B Group B Red
# 9 Lion Roar A Group A Blue
#10 Lion Roar B Group B Red
我想将 dfToAdd 添加到 df,其中第一个缺少列。重要的细节是 df 有两种类型的列。第一组列相互关联。 例如group="A" 表示 name="Group A" 和 color="Blue"。不能有 A-Group A-Red 的组合。 第二种类型的列相互关联。 动物="Dog" 动作="Bark" 我想添加第二个数据框,其中缺少第一种类型的列。这些列应填充第一种类型的列的组合,如以下 dfResult(行的顺序无关紧要):
df = data.frame(group = c("A", "A", "A", "B", "B", "B"),
name = c("Group A", "Group A", "Group A", "Group B", "Group B", "Group B"),
color = c("Blue", "Blue", "Blue", "Red", "Red", "Red"),
animal = c("Dog", "Cat", "Mouse", "Dog", "Cat", "Mouse"),
action = c("Bark", "Meow", "Squeak", "Bark", "Meow", "Squeak")
)
dfToAdd = data.frame(animal = c("Lion", "Bird"),
action = c("Roar", "Chirp"))
dfResult = data.frame(group = c("A", "A", "A", "B", "B", "B", "A", "A", "B", "B"),
name = c("Group A", "Group A", "Group A", "Group B", "Group B", "Group B", "Group A", "Group A", "Group B", "Group B"),
color = c("Blue", "Blue", "Blue", "Red", "Red", "Red", "Blue", "Blue", "Red", "Red"),
animal = c("Dog", "Cat", "Mouse", "Dog", "Cat", "Mouse", "Lion", "Bird", "Lion", "Bird"),
action = c("Bark", "Meow", "Squeak", "Bark", "Meow", "Squeak", "Roar", "Chirp", "Roar", "Chirp"))
> df
group name color animal action
1 A Group A Blue Dog Bark
2 A Group A Blue Cat Meow
3 A Group A Blue Mouse Squeak
4 B Group B Red Dog Bark
5 B Group B Red Cat Meow
6 B Group B Red Mouse Squeak
> dfToAdd
animal action
1 Lion Roar
2 Bird Chirp
> dfResult
group name color animal action
1 A Group A Blue Dog Bark
2 A Group A Blue Cat Meow
3 A Group A Blue Mouse Squeak
4 B Group B Red Dog Bark
5 B Group B Red Cat Meow
6 B Group B Red Mouse Squeak
7 A Group A Blue Lion Roar
8 A Group A Blue Bird Chirp
9 B Group B Red Lion Roar
10 B Group B Red Bird Chirp
但第一种类型的列(组、名称、颜色)尚不完全清楚。我正在处理任意数量的多个分组变量。你可以想象,可能有也可能没有描述 column="Group A is a good group" 或 date="2020.04.13"。我们只确定第二类的列:动物和动作。
在写这篇文章时,我想到了在 tidyr 的 [complete][2]
函数的两边使用 [nesting][1]
并手动检测缺失的列(也许有更优雅的解决方案):
# First find all grouping columns
groupCols = colnames(df)[!(colnames(df) %in% colnames(dfToAdd))]
otherCols = colnames(df)[colnames(df) %in% colnames(dfToAdd)]
# Populate missing columns with first grouping appearing in the df
dfToAdd[groupCols] = df[1, groupCols]
# rbind it to append
dfResult = rbind(df, dfToAdd)
# Now we have obvious missing combinations, tidyr::complete accepts nesting information to generate combinations only for those, which needs to be different.
dfResult %>% tidyr::complete(tidyr::nesting(!!! syms(otherCols)), tidyr::nesting(!!! syms(groupCols)))
编辑:实际上意识到我在最后使用了未知的列名。这真的行不通。我需要将 groupCols(字符向量)提供给第二次嵌套调用。
edit2:现在多亏了 akrun 的回答,我也可以更正这个了。
我们可以在单个 %>%
中完成此操作,方法是 slice
从 'df' 的第一行开始,select
不是 [=18] 中的列=],将其与 'dfToAdd' 绑定,然后将行与 'df' 绑定并使用 complete
library(dplyr)
library(tidyr)
library(rlang)
library(purrr)
df %>%
slice(1) %>%
select(-names(dfToAdd)) %>%
uncount(nrow(dfToAdd)) %>%
bind_cols(dfToAdd) %>%
bind_rows(df, .) %>%
complete(nesting(!!! syms(names(dfToAdd))),
nesting(!!! syms(setdiff(names(.), names(dfToAdd)))))
# A tibble: 10 x 5
# animal action group name color
# * <fct> <fct> <fct> <fct> <fct>
# 1 Cat Meow A Group A Blue
# 2 Cat Meow B Group B Red
# 3 Dog Bark A Group A Blue
# 4 Dog Bark B Group B Red
# 5 Mouse Squeak A Group A Blue
# 6 Mouse Squeak B Group B Red
# 7 Bird Chirp A Group A Blue
# 8 Bird Chirp B Group B Red
# 9 Lion Roar A Group A Blue
#10 Lion Roar B Group B Red