在 R 中合并几个大的 data.frames 列
Merge a few large data.frames columns in R
我有 7 个不同的数据框要合并。当我使用如下所示的基本合并功能时,出现错误:
new <- list(A, B, C, D, E, F, G) %>% Reduce(function(df1, df2, df3, df4, dtf5, df6, df7) left_join(df1,df2,by="ID"), .)
Error: cannot allocate vector of size 9.9 Gb
所以我想通过在每个列中选择 select 几列进行合并来解决这个问题。数据集看起来像这样,但有更多的列和行。
A B C D E F G
ID C1 C2 ID C3 ID C4 ID C5 ID C6 ID C7 C8 ID C9
1L 5 7 1L 3 2L 4 1L 10 2L 4 1L 5 9 1L 4
2L 9 3 2L 4 3L 7 2L 4 2L 0 10 2L 9
3L 0
合并后:
new
ID C1 C2 C3 C4 C5 C6 C7 C8 C9
1L 5 7 3 10 5 9 4
2L 9 3 4 4 4 4 0 10 9
3L 7 0
我试过的是这样的:
ncombined <- merge(x = A, y = B[,c("C3")], by = "ID", all.x = TRUE)
Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by = "i", all.x = TRUE),
list(A[,c("C1",
"C2")],B[,c("C3")],C[,c("C4")],D[,c("C5")],E[,c("C6")],F[,c("C7",
"C8")],G[,c("C9")]))
(摘自示例:Simultaneously merge multiple data.frames in a list
merge only one or two columns from a different dataframe in R)
可能不是最有效的内存方式,但您可以尝试:
library(data.table)
data <- list(df1, df2, df3, df4, df5, df6, df7)
lapply(data, setDT)
for (df in data[-1]) df1 <- merge(df1, df, by = "ID", all = TRUE)
这应该将所有数据帧与 df1 连接起来。
我有 7 个不同的数据框要合并。当我使用如下所示的基本合并功能时,出现错误:
new <- list(A, B, C, D, E, F, G) %>% Reduce(function(df1, df2, df3, df4, dtf5, df6, df7) left_join(df1,df2,by="ID"), .)
Error: cannot allocate vector of size 9.9 Gb
所以我想通过在每个列中选择 select 几列进行合并来解决这个问题。数据集看起来像这样,但有更多的列和行。
A B C D E F G
ID C1 C2 ID C3 ID C4 ID C5 ID C6 ID C7 C8 ID C9
1L 5 7 1L 3 2L 4 1L 10 2L 4 1L 5 9 1L 4
2L 9 3 2L 4 3L 7 2L 4 2L 0 10 2L 9
3L 0
合并后:
new
ID C1 C2 C3 C4 C5 C6 C7 C8 C9
1L 5 7 3 10 5 9 4
2L 9 3 4 4 4 4 0 10 9
3L 7 0
我试过的是这样的:
ncombined <- merge(x = A, y = B[,c("C3")], by = "ID", all.x = TRUE)
Reduce(function(dtf1, dtf2) merge(dtf1, dtf2, by = "i", all.x = TRUE),
list(A[,c("C1",
"C2")],B[,c("C3")],C[,c("C4")],D[,c("C5")],E[,c("C6")],F[,c("C7",
"C8")],G[,c("C9")]))
(摘自示例:Simultaneously merge multiple data.frames in a list merge only one or two columns from a different dataframe in R)
可能不是最有效的内存方式,但您可以尝试:
library(data.table)
data <- list(df1, df2, df3, df4, df5, df6, df7)
lapply(data, setDT)
for (df in data[-1]) df1 <- merge(df1, df, by = "ID", all = TRUE)
这应该将所有数据帧与 df1 连接起来。