函数来绑定不同列和行的数据帧列表
function to rbind list of dataframes different columns and rows
我想创建一个函数来合并具有不同列号的数据框列表,并且行具有我想要保留的不同名称。本质上,我想堆叠数据帧,其中列名只是成为要附加的另一行。
df <- list()
df[[1]] <- data.frame(d = c(4,5), e = c("c", "d"))
rownames(df[[1]]) <- c("df2_row_1", "df2_row_2")
df[[2]] <- data.frame(a = c(1,2,3), b = c("a", "b", "c"), c = c("one", "two", "three"))
rownames(df[[2]]) <- c("df1_row_1", "df1_row_2", "df1_row_3")
df[[3]] <- data.frame(f = c(6,7,8), g = c("e", "f", "g"), h = c("one", "two", "three"), w = c(100,101,102))
rownames(df[[3]]) <- c("df3_row_1", "df3_row_2", "df3_row_3")
当前输出:
do.call(bind_rows, df)
d e a b c f g h w
1 4 c NA <NA> <NA> NA <NA> <NA> NA
2 5 d NA <NA> <NA> NA <NA> <NA> NA
3 NA <NA> 1 a one NA <NA> <NA> NA
4 NA <NA> 2 b two NA <NA> <NA> NA
5 NA <NA> 3 c three NA <NA> <NA> NA
6 NA <NA> NA <NA> <NA> 6 e one 100
7 NA <NA> NA <NA> <NA> 7 f two 101
8 NA <NA> NA <NA> <NA> 8 g three 102
期望的输出
d e
df2_row_1 4 c
df2_row_2 5 d
a b c
df1_row_1 1 a one
df1_row_2 2 b two
df1_row_3 3 c three
f g h w
df3_row_1 6 e one 100
df3_row_2 7 f two 101
df3_row_3 8 g three 102
我尝试(未成功)创建一个函数来查找最长的数据框,然后将空列附加到比最长的数据框短的数据框,然后为所有数据框赋予相同的名称列。
我也意识到这再整洁不过了 - 这可能吗?
谢谢!!!
这可以用for循环来实现(我认为可以用mapply
来实现,检查?mapply
)。总体策略是用 NA 填充列表中的每个 df(cbind
ing 它们)然后 rbindlist
ing 结果列表:
library(data.table)
cols <- max(sapply(df, ncol))
# This is the length of the NA vectors that make the cbinding dfs:
lengths <- (cols - sapply(df, ncol))*sapply(df, nrow)
newdf <- list()
for (i in 1:length(df)){
if (ncol(df[[i]]) != cols){
newdf[[i]] <- cbind(df[[i]],
as.data.frame(matrix(rep(NA, lengths[i]),
ncol = lengths[i] / nrow(df[[i]]))))
} else {
newdf[[i]] <- df[[i]]
}
}
rbindlist(newdf, use.names = FALSE)
这导致:
d e V1 V2
1: 4 c <NA> NA
2: 5 d <NA> NA
3: 1 a one NA
4: 2 b two NA
5: 3 c three NA
6: 6 e one 100
7: 7 f two 101
8: 8 g three 102
我想创建一个函数来合并具有不同列号的数据框列表,并且行具有我想要保留的不同名称。本质上,我想堆叠数据帧,其中列名只是成为要附加的另一行。
df <- list()
df[[1]] <- data.frame(d = c(4,5), e = c("c", "d"))
rownames(df[[1]]) <- c("df2_row_1", "df2_row_2")
df[[2]] <- data.frame(a = c(1,2,3), b = c("a", "b", "c"), c = c("one", "two", "three"))
rownames(df[[2]]) <- c("df1_row_1", "df1_row_2", "df1_row_3")
df[[3]] <- data.frame(f = c(6,7,8), g = c("e", "f", "g"), h = c("one", "two", "three"), w = c(100,101,102))
rownames(df[[3]]) <- c("df3_row_1", "df3_row_2", "df3_row_3")
当前输出:
do.call(bind_rows, df)
d e a b c f g h w
1 4 c NA <NA> <NA> NA <NA> <NA> NA
2 5 d NA <NA> <NA> NA <NA> <NA> NA
3 NA <NA> 1 a one NA <NA> <NA> NA
4 NA <NA> 2 b two NA <NA> <NA> NA
5 NA <NA> 3 c three NA <NA> <NA> NA
6 NA <NA> NA <NA> <NA> 6 e one 100
7 NA <NA> NA <NA> <NA> 7 f two 101
8 NA <NA> NA <NA> <NA> 8 g three 102
期望的输出
d e
df2_row_1 4 c
df2_row_2 5 d
a b c
df1_row_1 1 a one
df1_row_2 2 b two
df1_row_3 3 c three
f g h w
df3_row_1 6 e one 100
df3_row_2 7 f two 101
df3_row_3 8 g three 102
我尝试(未成功)创建一个函数来查找最长的数据框,然后将空列附加到比最长的数据框短的数据框,然后为所有数据框赋予相同的名称列。
我也意识到这再整洁不过了 - 这可能吗?
谢谢!!!
这可以用for循环来实现(我认为可以用mapply
来实现,检查?mapply
)。总体策略是用 NA 填充列表中的每个 df(cbind
ing 它们)然后 rbindlist
ing 结果列表:
library(data.table)
cols <- max(sapply(df, ncol))
# This is the length of the NA vectors that make the cbinding dfs:
lengths <- (cols - sapply(df, ncol))*sapply(df, nrow)
newdf <- list()
for (i in 1:length(df)){
if (ncol(df[[i]]) != cols){
newdf[[i]] <- cbind(df[[i]],
as.data.frame(matrix(rep(NA, lengths[i]),
ncol = lengths[i] / nrow(df[[i]]))))
} else {
newdf[[i]] <- df[[i]]
}
}
rbindlist(newdf, use.names = FALSE)
这导致:
d e V1 V2
1: 4 c <NA> NA
2: 5 d <NA> NA
3: 1 a one NA
4: 2 b two NA
5: 3 c three NA
6: 6 e one 100
7: 7 f two 101
8: 8 g three 102