R:如何将列表中的所有数据框子集化?
R: How do you subset all data-frames within a list?
我有一个名为 WaFramesCosts
的数据帧列表。我想简单地对其进行子集化以显示特定的列,以便我可以导出它们。我试过:
for (i in names(WaFramesCosts)) {
WaFramesCosts[[i]][,c("Cost_Center","Domestic_Anytime_Min_Used","Department",
"Domestic_Anytime_Min_Used")]
}
但是 returns
的错误
Error in `[.data.frame`(WaFramesCosts[[i]], , c("Cost_Center", "Department", :
undefined columns selected
我也试过:
for (i in seq_along(WaFramesCosts)){
WaFramesCosts[[i]][ , -which(names(WaFramesCosts[[i]]) %in% c("Cost_Center","Domestic_Anytime_Min_Used","Department",
"Domestic_Anytime_Min_Used"))]
但我得到了同样的错误。谁能看出我做错了什么?
旁注:作为参考,我使用了这个:
for (i in seq_along(WaFramesCosts)) {
t <- WaFramesCosts[[i]][ , grepl( "Domestic" , names( WaFramesCosts[[i]] ) )]
q <- subset(WaFramesCosts[[i]], select = c("Cost_Center","Domestic_Anytime_Min_Used","Department","Domestic_Anytime_Min_Used"))
WaFramesCosts[[i]] <- merge(q,t)
}
虽然用不同的方法尝试同一个目标,但似乎更接近了。
欢迎回来,Kootseeahknee。您 错误地假设 for
循环的最后一个命令在末尾隐式 returned。如果你想要这种行为,也许你想要 lapply
:
myoutput <- lapply(names(WaFramesCosts)), function(i) {
WaFramesCosts[[i]][,c("Cost_Center","Domestic_Anytime_Min_Used","Department","Domestic_Anytime_Min_Used")]
})
undefined columns selected
错误告诉我您对数据集的假设不正确:至少有一个缺少至少一列。从您之前的问题 (),我推断您想要匹配的列,而不是假设它存在于所有内容中。由此,您 could/should 正在使用 grep
或一些变体:
myoutput <- lapply(names(WaFramesCosts)), function(i) {
WaFramesCosts[[i]][,grep("(Cost_Center|Domestic_Anytime_Min_Used|Department)",
colnames(WaFramesCosts)),drop=FALSE]
})
这将匹配包含任何这些字符串的列名。通过使用正则表达式确保整个字符串或 start/end 匹配发生,您可以更加精确。例如,从 (Cost|Dom)
(包含 "Cost" 或 "Dom" 的任何内容)更改为 (^Cost|Dom)
意味着 starts with [=41] =] 或 包含 "Dom";类似地,(Cost|ment$)
匹配包含 "Cost" 或 结束 和 "ment" 的任何内容。但是,如果您总是想要完全匹配并且只需要那些存在的匹配项,那么像这样的方法就可以了:
myoutput <- lapply(names(WaFramesCosts)), function(i) {
WaFramesCosts[[i]][,intersect(c("Cost_Center","Domestic_Anytime_Min_Used","Department"),
colnames(WaFramesCosts)),drop=FALSE]
})
请注意,在最后一个示例中:注意 mtcars[,2]
(returns 向量)和 mtcars[,2,drop=FALSE]
(returns a data.frame
with 1 之间的区别柱子)。防御性编程,如果您认为您的过滤完全有可能 return 一个 single-column,请确保您不会通过将 ,drop=FALSE
附加到您的 bracket-subsetting.
根据您的描述,这是一个使用库 dplyr 为给定列集组合数据框列表的示例。这并不要求所有数据框都具有相同的列(在可重现的示例中提供您的数据会更好)
# test data
df1 = read.table(text = "
c1 c2 c3
a 1 101
b 2 102
", header = TRUE, stringsAsFactors = FALSE)
df2 = read.table(text = "
c1 c2 c3
w 11 201
x 12 202
", header = TRUE, stringsAsFactors = FALSE)
# dfs is a list of data frames
dfs <- list(df1, df2)
# use dplyr::bind_rows
library(dplyr)
cols <- c("c1", "c3")
result <- bind_rows(dfs)[cols]
result
# c1 c3
# 1 a 101
# 2 b 102
# 3 w 201
# 4 x 202
我有一个名为 WaFramesCosts
的数据帧列表。我想简单地对其进行子集化以显示特定的列,以便我可以导出它们。我试过:
for (i in names(WaFramesCosts)) {
WaFramesCosts[[i]][,c("Cost_Center","Domestic_Anytime_Min_Used","Department",
"Domestic_Anytime_Min_Used")]
}
但是 returns
的错误Error in `[.data.frame`(WaFramesCosts[[i]], , c("Cost_Center", "Department", :
undefined columns selected
我也试过:
for (i in seq_along(WaFramesCosts)){
WaFramesCosts[[i]][ , -which(names(WaFramesCosts[[i]]) %in% c("Cost_Center","Domestic_Anytime_Min_Used","Department",
"Domestic_Anytime_Min_Used"))]
但我得到了同样的错误。谁能看出我做错了什么?
旁注:作为参考,我使用了这个:
for (i in seq_along(WaFramesCosts)) {
t <- WaFramesCosts[[i]][ , grepl( "Domestic" , names( WaFramesCosts[[i]] ) )]
q <- subset(WaFramesCosts[[i]], select = c("Cost_Center","Domestic_Anytime_Min_Used","Department","Domestic_Anytime_Min_Used"))
WaFramesCosts[[i]] <- merge(q,t)
}
虽然用不同的方法尝试同一个目标,但似乎更接近了。
欢迎回来,Kootseeahknee。您 for
循环的最后一个命令在末尾隐式 returned。如果你想要这种行为,也许你想要 lapply
:
myoutput <- lapply(names(WaFramesCosts)), function(i) {
WaFramesCosts[[i]][,c("Cost_Center","Domestic_Anytime_Min_Used","Department","Domestic_Anytime_Min_Used")]
})
undefined columns selected
错误告诉我您对数据集的假设不正确:至少有一个缺少至少一列。从您之前的问题 (grep
或一些变体:
myoutput <- lapply(names(WaFramesCosts)), function(i) {
WaFramesCosts[[i]][,grep("(Cost_Center|Domestic_Anytime_Min_Used|Department)",
colnames(WaFramesCosts)),drop=FALSE]
})
这将匹配包含任何这些字符串的列名。通过使用正则表达式确保整个字符串或 start/end 匹配发生,您可以更加精确。例如,从 (Cost|Dom)
(包含 "Cost" 或 "Dom" 的任何内容)更改为 (^Cost|Dom)
意味着 starts with [=41] =] 或 包含 "Dom";类似地,(Cost|ment$)
匹配包含 "Cost" 或 结束 和 "ment" 的任何内容。但是,如果您总是想要完全匹配并且只需要那些存在的匹配项,那么像这样的方法就可以了:
myoutput <- lapply(names(WaFramesCosts)), function(i) {
WaFramesCosts[[i]][,intersect(c("Cost_Center","Domestic_Anytime_Min_Used","Department"),
colnames(WaFramesCosts)),drop=FALSE]
})
请注意,在最后一个示例中:注意 mtcars[,2]
(returns 向量)和 mtcars[,2,drop=FALSE]
(returns a data.frame
with 1 之间的区别柱子)。防御性编程,如果您认为您的过滤完全有可能 return 一个 single-column,请确保您不会通过将 ,drop=FALSE
附加到您的 bracket-subsetting.
根据您的描述,这是一个使用库 dplyr 为给定列集组合数据框列表的示例。这并不要求所有数据框都具有相同的列(在可重现的示例中提供您的数据会更好)
# test data
df1 = read.table(text = "
c1 c2 c3
a 1 101
b 2 102
", header = TRUE, stringsAsFactors = FALSE)
df2 = read.table(text = "
c1 c2 c3
w 11 201
x 12 202
", header = TRUE, stringsAsFactors = FALSE)
# dfs is a list of data frames
dfs <- list(df1, df2)
# use dplyr::bind_rows
library(dplyr)
cols <- c("c1", "c3")
result <- bind_rows(dfs)[cols]
result
# c1 c3
# 1 a 101
# 2 b 102
# 3 w 201
# 4 x 202