包含不同列数的 cbind 数据框
cbind dataframes containing varying numbers of columns
我有几个包含基本相同变量的数据帧,但某些数据帧中缺少一些变量。我想在缺少的字段中创建 NA 值的同时绑定(* 的一个特定列)数据框。举个例子:
d1 <- data.frame(y1 = c("var1", "var2", "var3"),
y2 = c(3, 2, 4),
y3 = c("not_needed", "not_needed2", "not_needed3"))
d2 <- data.frame(y1 = c("var1", "var3"),
y2 = c(2, 1),
y3 = c("not_needed", "not_needed2"))
d3 <- data.frame(y1 = c("var1", "var2", "var4"),
y2 = c(3, 2, 5),
y3 = c("not_needed", "not_needed2", "not_needed3"))
expected_output <- data.frame(y1 = c("var1", "var2", "var3", "var4"),
y2.d1 = c(3, 2, 4, NA),
y2.d2 = c(2, NA, 1, NA),
y2.d3 = c(3, 2, NA, 5))
*输出数据框中不需要列 y3
。
我尝试了 plyr
中的 rbind.fill()
和其他一些想法,但到目前为止没有成功。
@joran 我不认为这是链接问题的重复,因为我不是要合并整个数据框,只是每个数据框的一列。我很感激答案可能就在某处,但没有具体提及。
使用tidyverse. We can put all data frame in a list and then use functions from purrr合并的解决方案。请注意,我在创建示例数据框时使用 stringsAsFactors = FALSE
来防止因子列。
library(tidyverse)
d_list <- list(d1, d2, d3)
d_final <- d_list %>%
map(select, y1, y2) %>%
reduce(full_join, by = "y1") %>%
setNames(c("y1", paste0("y2.d", 1:3)))
d_final
# y1 y2.d1 y2.d2 y2.d3
# 1 var1 3 2 3
# 2 var2 2 NA 2
# 3 var3 4 1 NA
# 4 var4 NA NA 5
数据
d1 <- data.frame(y1 = c("var1", "var2", "var3"),
y2 = c(3, 2, 4),
y3 = c("not_needed", "not_needed2", "not_needed3"),
stringsAsFactors = FALSE)
d2 <- data.frame(y1 = c("var1", "var3"),
y2 = c(2, 1),
y3 = c("not_needed", "not_needed2"),
stringsAsFactors = FALSE)
d3 <- data.frame(y1 = c("var1", "var2", "var4"),
y2 = c(3, 2, 5),
y3 = c("not_needed", "not_needed2", "not_needed3"),
stringsAsFactors = FALSE)
我有几个包含基本相同变量的数据帧,但某些数据帧中缺少一些变量。我想在缺少的字段中创建 NA 值的同时绑定(* 的一个特定列)数据框。举个例子:
d1 <- data.frame(y1 = c("var1", "var2", "var3"),
y2 = c(3, 2, 4),
y3 = c("not_needed", "not_needed2", "not_needed3"))
d2 <- data.frame(y1 = c("var1", "var3"),
y2 = c(2, 1),
y3 = c("not_needed", "not_needed2"))
d3 <- data.frame(y1 = c("var1", "var2", "var4"),
y2 = c(3, 2, 5),
y3 = c("not_needed", "not_needed2", "not_needed3"))
expected_output <- data.frame(y1 = c("var1", "var2", "var3", "var4"),
y2.d1 = c(3, 2, 4, NA),
y2.d2 = c(2, NA, 1, NA),
y2.d3 = c(3, 2, NA, 5))
*输出数据框中不需要列 y3
。
我尝试了 plyr
中的 rbind.fill()
和其他一些想法,但到目前为止没有成功。
@joran 我不认为这是链接问题的重复,因为我不是要合并整个数据框,只是每个数据框的一列。我很感激答案可能就在某处,但没有具体提及。
使用tidyverse. We can put all data frame in a list and then use functions from purrr合并的解决方案。请注意,我在创建示例数据框时使用 stringsAsFactors = FALSE
来防止因子列。
library(tidyverse)
d_list <- list(d1, d2, d3)
d_final <- d_list %>%
map(select, y1, y2) %>%
reduce(full_join, by = "y1") %>%
setNames(c("y1", paste0("y2.d", 1:3)))
d_final
# y1 y2.d1 y2.d2 y2.d3
# 1 var1 3 2 3
# 2 var2 2 NA 2
# 3 var3 4 1 NA
# 4 var4 NA NA 5
数据
d1 <- data.frame(y1 = c("var1", "var2", "var3"),
y2 = c(3, 2, 4),
y3 = c("not_needed", "not_needed2", "not_needed3"),
stringsAsFactors = FALSE)
d2 <- data.frame(y1 = c("var1", "var3"),
y2 = c(2, 1),
y3 = c("not_needed", "not_needed2"),
stringsAsFactors = FALSE)
d3 <- data.frame(y1 = c("var1", "var2", "var4"),
y2 = c(3, 2, 5),
y3 = c("not_needed", "not_needed2", "not_needed3"),
stringsAsFactors = FALSE)