使用 openxlsx startRow 参数根据文件内容选择第一行

Question

我知道 openxlsx 中的 read.xlsx 函数中的 startRow 参数允许我从指定行开始读取文件。我必须加载 300 个 xlsx 文件，不幸的是，我想跳过的行数因文件而异。我总是希望第一行的第二列包含“CPT”一词。有没有办法在文本匹配参数上设置 startRow ？在下图中，我将 startRow 设置为 6，但在其他情况下它是 4 或 3。

Answer 1

也许读两遍是个好主意（第一次：读几行 col1&2 并得到 startRow）。
注意：我认为您要跳过的单元格在 col1 上（至少）。
使用 for、sapply 等，您可以对所有文件执行此操作。

library(openxlsx)
tmp <- read.xlsx(file, colNames = FALSE, rows = 1:10, cols = 1:2, 
                 skipEmptyRows = FALSE)
st <- min(grep("CPT", tmp[[2]]))
d <- read.xlsx(file, startRow = st)

使用 openxlsx startRow 参数根据文件内容选择第一行

Using openxlsx startRow parameter to pick first row based on the contents of the file

r

openxlsx