合并具有不同列的数据框
Merging dataframes that have different columns
我有 2 个数据框:
df1 = data.frame(Bird_ID = c(1:6), Sex = c("Male","Female","Male","Male","Male","UNK"), Age.years =c("2","4","8","2","12","1"))
df2 = data.frame(Bird_ID = c(7), Sex = c("Female"), date.fledged= c("19/10/2021"))
df1
# Bird_ID Sex Age.years
# 1 Male 2
# 2 Female 4
# 3 Male 8
# 4 Male 2
# 5 Male 12
# 6 UNK 1
df2
# Bird_ID Sex Date.fledged
# 7 Female 19/10/2021
- 我的第一个数据框 (
df1
) 是我的数据库,里面有我所有的鸟类记录和有用的信息
- 我的第二个数据框 (
df2
) 是“更新程序”。我想将这些信息合并到主数据库(df1
),输出将是这样的:
dfmerged = data.frame(Bird_ID = c(1:7), Sex = c("Male","Female","Male","Male","Male","UNK","Female"), Age.years =c("2","4","8","2","12","1",NA))`
dfmerged
# Bird_ID Sex Age.years
# 1 Male 2
# 2 Female 4
# 3 Male 8
# 4 Male 2
# 5 Male 12
# 6 UNK 1
# 7 Female NA
如何使用来自 df2
的信息更新鸟类数据库 df1
并仅保留(和所有)主数据库 df1
中的列?例如这里 dfmerged
仅保留来自 df1
的列,删除来自 df2
的“Date.fledged”列,而鸟 7 的 NA 为“Age.years”,因为数据缺少(这就是想要的输出)。
你可以使用
library(dplyr)
df1 %>%
bind_rows(df2) %>%
select(names(df1))
这个returns
Bird_ID Sex Age.years
1 1 Male 2
2 2 Female 4
3 3 Male 8
4 4 Male 2
5 5 Male 12
6 6 UNK 1
7 7 Female <NA>
您可以进行完全联接。
merge(df1, df2, by = c('Bird_ID', 'Sex'), all = TRUE)[-4]
# Bird_ID Sex Age.years
#1 1 Male 2
#2 2 Female 4
#3 3 Male 8
#4 4 Male 2
#5 5 Male 12
#6 6 UNK 1
#7 7 Female <NA>
在dplyr
-
library(dplyr)
full_join(df1, df2, by = c('Bird_ID', 'Sex')) %>%
select(-date.fledged)
我有 2 个数据框:
df1 = data.frame(Bird_ID = c(1:6), Sex = c("Male","Female","Male","Male","Male","UNK"), Age.years =c("2","4","8","2","12","1"))
df2 = data.frame(Bird_ID = c(7), Sex = c("Female"), date.fledged= c("19/10/2021"))
df1
# Bird_ID Sex Age.years
# 1 Male 2
# 2 Female 4
# 3 Male 8
# 4 Male 2
# 5 Male 12
# 6 UNK 1
df2
# Bird_ID Sex Date.fledged
# 7 Female 19/10/2021
- 我的第一个数据框 (
df1
) 是我的数据库,里面有我所有的鸟类记录和有用的信息 - 我的第二个数据框 (
df2
) 是“更新程序”。我想将这些信息合并到主数据库(df1
),输出将是这样的:
dfmerged = data.frame(Bird_ID = c(1:7), Sex = c("Male","Female","Male","Male","Male","UNK","Female"), Age.years =c("2","4","8","2","12","1",NA))`
dfmerged
# Bird_ID Sex Age.years
# 1 Male 2
# 2 Female 4
# 3 Male 8
# 4 Male 2
# 5 Male 12
# 6 UNK 1
# 7 Female NA
如何使用来自 df2
的信息更新鸟类数据库 df1
并仅保留(和所有)主数据库 df1
中的列?例如这里 dfmerged
仅保留来自 df1
的列,删除来自 df2
的“Date.fledged”列,而鸟 7 的 NA 为“Age.years”,因为数据缺少(这就是想要的输出)。
你可以使用
library(dplyr)
df1 %>%
bind_rows(df2) %>%
select(names(df1))
这个returns
Bird_ID Sex Age.years
1 1 Male 2
2 2 Female 4
3 3 Male 8
4 4 Male 2
5 5 Male 12
6 6 UNK 1
7 7 Female <NA>
您可以进行完全联接。
merge(df1, df2, by = c('Bird_ID', 'Sex'), all = TRUE)[-4]
# Bird_ID Sex Age.years
#1 1 Male 2
#2 2 Female 4
#3 3 Male 8
#4 4 Male 2
#5 5 Male 12
#6 6 UNK 1
#7 7 Female <NA>
在dplyr
-
library(dplyr)
full_join(df1, df2, by = c('Bird_ID', 'Sex')) %>%
select(-date.fledged)