将 skip 放入将文件分组到嵌套列表中的函数中
Putting skip in function that groups files together into nested list
我有一个包含 325 个电子表格的文件夹,其中包含莫斯科不同选区的选举结果。我正在尝试将属于同一市政区(更高级别的聚合)的文件组合在一起,以便我可以在此级别汇总选举结果。 (请参阅文件名的 dput
输出)。
我创建了一个函数,通过提取选区编号之前的字符串部分来正确匹配文件:
mf.vote.matcher <- function(file, filelist){
#matches everything in the file name before the word "vote" (i.e. the mf name)
match_string <- str_extract(file, pattern = ".*(?=vote)")
matched_files <- grep(filelist, pattern = match_string)
#listing
matched_list <- list(filelist[matched_files])
}
但是,当使用 lapply
应用于完整文件列表时,它会遍历每个文件,创建一个包含许多冗余元素的列表。例如。第一市辖区有3个选区,导致函数输出重复这3个文件名3次。
有什么方法可以根据返回列表的长度将函数或lapply
强制"skip"到下一个市辖区的文件中吗?
以下是文件名示例:
c("./Vote/Академический vote 1.xls", "./Vote/Академический vote 2.xls",
"./Vote/Академический vote 3.xls", "./Vote/Алексеевский в городе Москве vote 1.xls",
"./Vote/Алексеевский в городе Москве vote 2.xls", "./Vote/Алтуфьевский vote 1.xls",
"./Vote/Алтуфьевский vote 2.xls", "./Vote/Алтуфьевский vote 3.xls",
"./Vote/Арбат vote 1.xls", "./Vote/Арбат vote 2.xls", "./Vote/Аэропорт vote 1.xls",
"./Vote/Аэропорт vote 2.xls", "./Vote/Аэропорт vote 3.xls", "./Vote/Бабушкинский vote 1.xls",
"./Vote/Бабушкинский vote 2.xls", "./Vote/Басманный vote 1.xls",
"./Vote/Басманный vote 2.xls", "./Vote/Басманный vote 3.xls",
"./Vote/Беговой vote 1.xls", "./Vote/Беговой vote 2.xls", "./Vote/Бескудниковский vote 1.xls",
"./Vote/Бескудниковский vote 2.xls", "./Vote/Бибирево vote 1.xls",
"./Vote/Бибирево vote 2.xls", "./Vote/Бибирево vote 3.xls")
或者,您可以遍历独特的地区。
例如
library(stringr)
dat <- c("./Vote/Академический vote 1.xls", "./Vote/Академический vote 2.xls",
"./Vote/Академический vote 3.xls", "./Vote/Алексеевский в городе Москве vote 1.xls",
"./Vote/Алексеевский в городе Москве vote 2.xls", "./Vote/Алтуфьевский vote 1.xls",
"./Vote/Алтуфьевский vote 2.xls", "./Vote/Алтуфьевский vote 3.xls",
"./Vote/Арбат vote 1.xls", "./Vote/Арбат vote 2.xls", "./Vote/Аэропорт vote 1.xls",
"./Vote/Аэропорт vote 2.xls", "./Vote/Аэропорт vote 3.xls", "./Vote/Бабушкинский vote 1.xls",
"./Vote/Бабушкинский vote 2.xls", "./Vote/Басманный vote 1.xls",
"./Vote/Басманный vote 2.xls", "./Vote/Басманный vote 3.xls",
"./Vote/Беговой vote 1.xls", "./Vote/Беговой vote 2.xls", "./Vote/Бескудниковский vote 1.xls",
"./Vote/Бескудниковский vote 2.xls", "./Vote/Бибирево vote 1.xls",
"./Vote/Бибирево vote 2.xls", "./Vote/Бибирево vote 3.xls")
out = lapply(unique(str_extract_all(dat, ".*(?=vote)", simplify = TRUE)[, 1]), function(x) {
dat[grepl(x, dat)]
}
)
> out
[[1]]
[1] "./Vote/Академический vote 1.xls" "./Vote/Академический vote 2.xls" "./Vote/Академический vote 3.xls"
[[2]]
[1] "./Vote/Алексеевский в городе Москве vote 1.xls" "./Vote/Алексеевский в городе Москве vote 2.xls"
...etc
另一种对值进行分组的方法:
gsub('.*/Vote/(.+) vote .*', '\1', list, perl=TRUE) -> region
split(list, region) -> groups
("list" 是一个包含文件名的向量)
我有一个包含 325 个电子表格的文件夹,其中包含莫斯科不同选区的选举结果。我正在尝试将属于同一市政区(更高级别的聚合)的文件组合在一起,以便我可以在此级别汇总选举结果。 (请参阅文件名的 dput
输出)。
我创建了一个函数,通过提取选区编号之前的字符串部分来正确匹配文件:
mf.vote.matcher <- function(file, filelist){
#matches everything in the file name before the word "vote" (i.e. the mf name)
match_string <- str_extract(file, pattern = ".*(?=vote)")
matched_files <- grep(filelist, pattern = match_string)
#listing
matched_list <- list(filelist[matched_files])
}
但是,当使用 lapply
应用于完整文件列表时,它会遍历每个文件,创建一个包含许多冗余元素的列表。例如。第一市辖区有3个选区,导致函数输出重复这3个文件名3次。
有什么方法可以根据返回列表的长度将函数或lapply
强制"skip"到下一个市辖区的文件中吗?
以下是文件名示例:
c("./Vote/Академический vote 1.xls", "./Vote/Академический vote 2.xls",
"./Vote/Академический vote 3.xls", "./Vote/Алексеевский в городе Москве vote 1.xls",
"./Vote/Алексеевский в городе Москве vote 2.xls", "./Vote/Алтуфьевский vote 1.xls",
"./Vote/Алтуфьевский vote 2.xls", "./Vote/Алтуфьевский vote 3.xls",
"./Vote/Арбат vote 1.xls", "./Vote/Арбат vote 2.xls", "./Vote/Аэропорт vote 1.xls",
"./Vote/Аэропорт vote 2.xls", "./Vote/Аэропорт vote 3.xls", "./Vote/Бабушкинский vote 1.xls",
"./Vote/Бабушкинский vote 2.xls", "./Vote/Басманный vote 1.xls",
"./Vote/Басманный vote 2.xls", "./Vote/Басманный vote 3.xls",
"./Vote/Беговой vote 1.xls", "./Vote/Беговой vote 2.xls", "./Vote/Бескудниковский vote 1.xls",
"./Vote/Бескудниковский vote 2.xls", "./Vote/Бибирево vote 1.xls",
"./Vote/Бибирево vote 2.xls", "./Vote/Бибирево vote 3.xls")
或者,您可以遍历独特的地区。
例如
library(stringr)
dat <- c("./Vote/Академический vote 1.xls", "./Vote/Академический vote 2.xls",
"./Vote/Академический vote 3.xls", "./Vote/Алексеевский в городе Москве vote 1.xls",
"./Vote/Алексеевский в городе Москве vote 2.xls", "./Vote/Алтуфьевский vote 1.xls",
"./Vote/Алтуфьевский vote 2.xls", "./Vote/Алтуфьевский vote 3.xls",
"./Vote/Арбат vote 1.xls", "./Vote/Арбат vote 2.xls", "./Vote/Аэропорт vote 1.xls",
"./Vote/Аэропорт vote 2.xls", "./Vote/Аэропорт vote 3.xls", "./Vote/Бабушкинский vote 1.xls",
"./Vote/Бабушкинский vote 2.xls", "./Vote/Басманный vote 1.xls",
"./Vote/Басманный vote 2.xls", "./Vote/Басманный vote 3.xls",
"./Vote/Беговой vote 1.xls", "./Vote/Беговой vote 2.xls", "./Vote/Бескудниковский vote 1.xls",
"./Vote/Бескудниковский vote 2.xls", "./Vote/Бибирево vote 1.xls",
"./Vote/Бибирево vote 2.xls", "./Vote/Бибирево vote 3.xls")
out = lapply(unique(str_extract_all(dat, ".*(?=vote)", simplify = TRUE)[, 1]), function(x) {
dat[grepl(x, dat)]
}
)
> out
[[1]]
[1] "./Vote/Академический vote 1.xls" "./Vote/Академический vote 2.xls" "./Vote/Академический vote 3.xls"
[[2]]
[1] "./Vote/Алексеевский в городе Москве vote 1.xls" "./Vote/Алексеевский в городе Москве vote 2.xls"
...etc
另一种对值进行分组的方法:
gsub('.*/Vote/(.+) vote .*', '\1', list, perl=TRUE) -> region
split(list, region) -> groups
("list" 是一个包含文件名的向量)