如何过滤 R 中的字符向量列表?

How to filter a List of Character Vectors in R?

我开始原地踏步了。我觉得我已经在网上彻底搜索过了,但我怀疑在几天后回到这个问题后我现在看不到树木的树林。

我希望从公司 SharePoint 上的数千个 excel 文件中抓取多组数据。我已经能够使用 readxl 成功抓取。

library(readxl)
library(data.table)
library(XLConnect)

root_URL <- '//companyname.office.abc.com/sites/thesite/thefolder')
folder.list <- list.dirs(root_URL)
file.list <- list.files(folder.list, pattern = "*.(xlsx|XLSX|xls|XLS|xlsm|XLSM|xlsb|XLSB)$",full.names = T,include.dirs = T)

这导致我可能需要从中抓取的所有文件的一个很好的列表。我已经使用以下代码从列表中的第 3、4 和 5 个文件的特定选项卡 ("Address") 中成功提取了我需要的数据。

ex.list <- file.list[3:5]
ex.list <- setNames(ex.list, ex.list)

df.list <- lapply(ex.list, read_excel, sheet = 'Address' )

df.list <- Map(function(df, name) {
  df$source_name <- name
  df
}, df.list, names(df.list))
df <- rbindlist(df.list, idcol = "id")
write.csv(df,"testdata1.csv")

我 运行 遇到的问题是第一个、第二个(和其他文件)没有名为 "Address" 的选项卡,我需要从我的 file.list 中排除这些文件但是因为这是一个字符向量列表,所以当文件不包含名为 "Address"

的选项卡时,我正在努力过滤列表以排除

我使用 lappy 得到了以下结果,甚至尝试了 sapply(也共享)但现在正在努力编写条件语句。感觉很近又很远。

> aa <- lapply(ex.list, excel_sheets)
> aa
[[1]]
[1] "NODE SIDE A" "NODE SIDE B" "LMA"         "BASE"        "TUBE"        "Notes"      

[[2]]
[1] "NODE SIDE A" "LMA"         "BASE"        "TUBE"        "Notes"      

[[3]]
[1] "Equipment-Details" "Address"           "Drop Down Values"  "Validation Status" "EquipMaster"      

[[4]]
[1] "Equipment-Details" "Address"           "Drop Down Values"  "Validation Status" "EquipMaster"      

[[5]]
[1] "Equipment-Details" "Address"           "Drop Down Values"  "Validation Status" "EquipMaster"  

> bb <- sapply(ex.list, excel_sheets)
> bb
$'//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file1.xls`
[1] "NODE SIDE A" "NODE SIDE B" "LMA"         "BASE"        "TUBE"        "Notes"      

$`//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file2.xls`
[1] "NODE SIDE A" "LMA"         "BASE"        "TUBE"        "Notes"      

$`//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file3.xls`
[1] "Equipment-Details" "Address"           "Drop Down Values"  "Validation Status" "EquipMaster"      

$`//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file4.xls`
[1] "Equipment-Details" "Address"           "Drop Down Values"  "Validation Status" "EquipMaster"      

$`//companyname.office.abc.com/sites/thesite/thefolder/subfolder/file5.xls`
[1] "Equipment-Details" "Address"           "Drop Down Values"  "Validation Status" "EquipMaster"  

我认为这应该可行:

library(readxl)
df.list <- lapply(ex.list, function(x) 
  if ("Address" %in% excel_sheets(x)) read_excel(x,sheet = 'Address')
  else NULL)

读入您可以使用

过滤列表的所有文件
aa <- list(c("A", "B", "C"),
           c("A", "B", "Address"),
           c("A", "B", "Address"),
           c("A", "B", "C"))

aa[grep(pattern = "Address", aa)]