用于批量提取 excel 工作表并合并到一个数据集中的嵌套应用问题
Problems with nested sapply for batch extracting excel sheets and combining in one dataset
我需要从一组 excel 个文件 (file.names
) 中提取一组特定的 sheets(参见 name_sheet
),以便通过 sheet.
生成数据库
我在使用 sapply
函数时遇到问题,因为我需要为每个 sheet 重复 C 部分中的代码。我收到以下错误消息:
Error in FUN(X[[i]], ...) :
Cannot find the sheet you requested in the file!
这是我使用的代码:
# A: Create data set of sheet names and indexes
name_sheet <- rbind.data.frame(c("EXTRAC.N°01", 1),
c("PREC.FRES 12", 13),
c("PREC.SALPR 13", 14),
c("PREC.SECO 14", 15),
c("CUADRO Nº 17 Y HABLADO 02", 18),
c("DESEMB.N°05 HABLADO-01-03-05", 20),
c("CUADRO Nº 15 (2)", 21))
colnames(name_sheet) <- c("name","index")
# B: Read all excel files into a list
file.names <- list.files(here("data"), pattern = "ANUAL", full.names = TRUE, recursive = TRUE)
# C: Read each of the selected sheets by file and append them by sheet
for (i in nrow(name_sheet)) {
df.list[i] <- lapply(file.names,
read.xlsx,
sheetIndex = name_sheet$index[i],
header = TRUE)
# Combine in one dataset
df_[i] <- smartbind(list = df.list[i], fill = " ")
# Write to disk
write.xlsx(df_[i], here("Data", "Consumo_Humano.xlsx"),
sheetName = name_sheet$name[i],
row.names = FALSE)
}
根据提供的信息很难判断,但我假设您正在使用 openxlsx
。如果可能 readr::read_xlsx
允许您指定 sheet 可能在您的用例中使用的名称。
使用 purrr::map_df
可能会更轻松,您可以在类似于上面的列表中读取数据,然后在同一调用中将行绑定到单个数据框中。
首先代码中有几个错误:
- @X Æ A-12 是对的,我没有使用 openxlsx,因此现在我已经使用了它的功能(loadWorkbook、addWorksheet 等)。
- 循环没有正确的顺序:
nrow(name_sheet)
其次,由于使用openxlsx,您需要先创建一个工作簿,然后再开始创建工作表。因此最终的代码是:
#Generate list of sheets and positions
name_sheet <- rbind.data.frame(c("EXTRAC.N°01", 1),
c("PREC.FRES 12", 13),
c("PREC.SALPR 13", 14),
c("PREC.SECO 14", 15),
c("CUADRO Nº 17", 18),
c("DESEMB.N°05 (2)", 20),
c("CUADRO Nº 15 (2)", 21))
colnames(name_sheet) <- c("name","index")
# Read all excel files into a list
file.names <- list.files(here("data"),
pattern = "ANUAL",
full.names = TRUE,
recursive = TRUE)
#Read each of the selected sheets by file and joint them
for (i in 1:nrow(name_sheet)) {
df.list <- lapply(file.names,
read.xlsx,
sheet = name_sheet$name[i],
colNames = TRUE,
skipEmptyRows = TRUE,
check.names = TRUE,
fillMergedCells = FALSE,
)
# Combine in one dataset
df <- smartbind(list = df.list, fill = "")
# Load workbook and write to disk
wb <- loadWorkbook(file = here("Data", "Consumo_Humano.xlsx"))
sheet <- name_sheet$name[i]
# Add sheets
addWorksheet(wb,
sheetName = sheet,
header = c("ODD HEAD LEFT", "ODD HEAD CENTER", "ODD HEAD RIGHT"),
footer = c("ODD FOOT RIGHT", "ODD FOOT CENTER", "ODD FOOT RIGHT")
)
writeData(wb, sheet = sheet, x = df, colNames = T, rowNames = F)
saveWorkbook(wb,
file = here("Data", "Consumo_Humano.xlsx"),
overwrite = TRUE
)
}
我需要从一组 excel 个文件 (file.names
) 中提取一组特定的 sheets(参见 name_sheet
),以便通过 sheet.
我在使用 sapply
函数时遇到问题,因为我需要为每个 sheet 重复 C 部分中的代码。我收到以下错误消息:
Error in FUN(X[[i]], ...) :
Cannot find the sheet you requested in the file!
这是我使用的代码:
# A: Create data set of sheet names and indexes
name_sheet <- rbind.data.frame(c("EXTRAC.N°01", 1),
c("PREC.FRES 12", 13),
c("PREC.SALPR 13", 14),
c("PREC.SECO 14", 15),
c("CUADRO Nº 17 Y HABLADO 02", 18),
c("DESEMB.N°05 HABLADO-01-03-05", 20),
c("CUADRO Nº 15 (2)", 21))
colnames(name_sheet) <- c("name","index")
# B: Read all excel files into a list
file.names <- list.files(here("data"), pattern = "ANUAL", full.names = TRUE, recursive = TRUE)
# C: Read each of the selected sheets by file and append them by sheet
for (i in nrow(name_sheet)) {
df.list[i] <- lapply(file.names,
read.xlsx,
sheetIndex = name_sheet$index[i],
header = TRUE)
# Combine in one dataset
df_[i] <- smartbind(list = df.list[i], fill = " ")
# Write to disk
write.xlsx(df_[i], here("Data", "Consumo_Humano.xlsx"),
sheetName = name_sheet$name[i],
row.names = FALSE)
}
根据提供的信息很难判断,但我假设您正在使用 openxlsx
。如果可能 readr::read_xlsx
允许您指定 sheet 可能在您的用例中使用的名称。
使用 purrr::map_df
可能会更轻松,您可以在类似于上面的列表中读取数据,然后在同一调用中将行绑定到单个数据框中。
首先代码中有几个错误:
- @X Æ A-12 是对的,我没有使用 openxlsx,因此现在我已经使用了它的功能(loadWorkbook、addWorksheet 等)。
- 循环没有正确的顺序:
nrow(name_sheet)
其次,由于使用openxlsx,您需要先创建一个工作簿,然后再开始创建工作表。因此最终的代码是:
#Generate list of sheets and positions
name_sheet <- rbind.data.frame(c("EXTRAC.N°01", 1),
c("PREC.FRES 12", 13),
c("PREC.SALPR 13", 14),
c("PREC.SECO 14", 15),
c("CUADRO Nº 17", 18),
c("DESEMB.N°05 (2)", 20),
c("CUADRO Nº 15 (2)", 21))
colnames(name_sheet) <- c("name","index")
# Read all excel files into a list
file.names <- list.files(here("data"),
pattern = "ANUAL",
full.names = TRUE,
recursive = TRUE)
#Read each of the selected sheets by file and joint them
for (i in 1:nrow(name_sheet)) {
df.list <- lapply(file.names,
read.xlsx,
sheet = name_sheet$name[i],
colNames = TRUE,
skipEmptyRows = TRUE,
check.names = TRUE,
fillMergedCells = FALSE,
)
# Combine in one dataset
df <- smartbind(list = df.list, fill = "")
# Load workbook and write to disk
wb <- loadWorkbook(file = here("Data", "Consumo_Humano.xlsx"))
sheet <- name_sheet$name[i]
# Add sheets
addWorksheet(wb,
sheetName = sheet,
header = c("ODD HEAD LEFT", "ODD HEAD CENTER", "ODD HEAD RIGHT"),
footer = c("ODD FOOT RIGHT", "ODD FOOT CENTER", "ODD FOOT RIGHT")
)
writeData(wb, sheet = sheet, x = df, colNames = T, rowNames = F)
saveWorkbook(wb,
file = here("Data", "Consumo_Humano.xlsx"),
overwrite = TRUE
)
}