用于批量提取 excel 工作表并合并到一个数据集中的嵌套应用问题

Problems with nested sapply for batch extracting excel sheets and combining in one dataset

我需要从一组 excel 个文件 (file.names) 中提取一组特定的 sheets(参见 name_sheet),以便通过 sheet.

生成数据库

我在使用 sapply 函数时遇到问题,因为我需要为每个 sheet 重复 C 部分中的代码。我收到以下错误消息:

Error in FUN(X[[i]], ...) : 
  Cannot find the sheet you requested in the file!

这是我使用的代码:

# A: Create data set of sheet names and indexes
name_sheet <- rbind.data.frame(c("EXTRAC.N°01", 1), 
                              c("PREC.FRES 12", 13), 
                              c("PREC.SALPR 13", 14),
                              c("PREC.SECO 14", 15), 
                              c("CUADRO Nº 17 Y HABLADO 02", 18),
                              c("DESEMB.N°05 HABLADO-01-03-05", 20),
                              c("CUADRO Nº 15 (2)", 21))

colnames(name_sheet) <- c("name","index")

# B: Read all excel files into a list
file.names <- list.files(here("data"), pattern = "ANUAL", full.names = TRUE, recursive = TRUE)

# C: Read each of the selected sheets by file and append them by sheet

for (i in nrow(name_sheet)) {
df.list[i] <- lapply(file.names, 
                  read.xlsx, 
                  sheetIndex = name_sheet$index[i],
                  header = TRUE)

# Combine in one dataset
  df_[i] <- smartbind(list = df.list[i], fill = " ")

# Write to disk
  write.xlsx(df_[i], here("Data", "Consumo_Humano.xlsx"), 
           sheetName = name_sheet$name[i], 
           row.names = FALSE)
}

根据提供的信息很难判断,但我假设您正在使用 openxlsx。如果可能 readr::read_xlsx 允许您指定 sheet 可能在您的用例中使用的名称。

使用 purrr::map_df 可能会更轻松,您可以在类似于上面的列表中读取数据,然后在同一调用中将行绑定到单个数据框中。

首先代码中有几个错误:

  1. @X Æ A-12 是对的,我没有使用 openxlsx,因此现在我已经使用了它的功能(loadWorkbook、addWorksheet 等)。
  2. 循环没有正确的顺序:nrow(name_sheet)

其次,由于使用openxlsx,您需要先创建一个工作簿,然后再开始创建工作表。因此最终的代码是:

    #Generate list of sheets and positions
    name_sheet <- rbind.data.frame(c("EXTRAC.N°01", 1), 
                                  c("PREC.FRES 12", 13), 
                                  c("PREC.SALPR 13", 14),
                                  c("PREC.SECO 14", 15), 
                                  c("CUADRO Nº 17", 18),
                                  c("DESEMB.N°05 (2)", 20),
                                  c("CUADRO Nº 15 (2)", 21))
    
    colnames(name_sheet) <- c("name","index")
    
    # Read all excel files into a list
    file.names <- list.files(here("data"), 
                             pattern = "ANUAL", 
                             full.names = TRUE, 
                             recursive = TRUE)
    
    #Read each of the selected sheets by file and joint them
    
    for (i in 1:nrow(name_sheet)) {
    df.list <- lapply(file.names, 
                      read.xlsx, 
                      sheet = name_sheet$name[i],
                      colNames = TRUE,
                      skipEmptyRows = TRUE,
                      check.names = TRUE,
                      fillMergedCells = FALSE,
                      )
    
    # Combine in one dataset
    
    df <- smartbind(list = df.list, fill = "")
    
    # Load workbook and write to disk
    
    wb <- loadWorkbook(file = here("Data", "Consumo_Humano.xlsx"))
    
    sheet <- name_sheet$name[i]
    
    # Add sheets
    
    addWorksheet(wb,
      sheetName = sheet,
      header = c("ODD HEAD LEFT", "ODD HEAD CENTER", "ODD HEAD RIGHT"),
      footer = c("ODD FOOT RIGHT", "ODD FOOT CENTER", "ODD FOOT RIGHT")
    )
    
    writeData(wb, sheet = sheet, x = df, colNames = T, rowNames = F)
    
    saveWorkbook(wb,
      file = here("Data", "Consumo_Humano.xlsx"),
      overwrite = TRUE
    )
    }