在 R 中使用 read_excel 或 read.excel 时跳过多个 header 行

Question

对于多行header这样的excel文件（从here下载的测试数据）:

如何在使用 R 读取 excel 时跳过行 Unit 和 Frequency 并将 indicator_name 用作 excel 文件的 header ？

使用下面的代码，似乎我只能通过将 skip 参数设置为整数来跳过一行。

library(readxl)
myCols <- as.character(read_excel("./test123.xlsx", n_max = 1, col_names = FALSE))
myDF <- read_excel("./test123.xlsx", skip = 2, col_names = myCols)

参考：

Answer 1

您只需要 skip = 3 而不是 2，因为您在读入数据时需要跳过 header。由于我们已经在myCols中定义了列名，那么读入时就不需要保留列名行了。

library(readxl)

myCols <- as.character(read_excel("./test123.xlsx", n_max = 1, col_names = FALSE))
myDF <- read_excel("./test123.xlsx", skip = 3, col_names = myCols)

输出

  indicator_name         M2   GDP
  <dttm>              <dbl> <dbl>
1 2018-01-01 00:00:00  6.71  8.17
2 2018-01-02 00:00:00  6.79  8.19
3 2018-01-03 00:00:00  6.77  8.21
4 2018-01-04 00:00:00  6.73  8.20
5 2018-01-05 00:00:00  6.67  8.20
6 2018-01-06 00:00:00  6.62  8.21
7 2018-01-07 00:00:00  6.62  8.21
8 2018-01-08 00:00:00  6.64  8.22
9 2018-01-09 00:00:00  6.64  8.22

如果第一列名称为空，则可以在读入数据之前替换列名称中的 NA。

library(tidyverse)

myCols <- read_excel("./test123.xlsx", n_max = 2, col_names = FALSE) %>% 
  slice(1) %>% 
  mutate(across(everything(), ~replace_na(., "indicator_name"))) %>% 
  as.character()
myDF <- read_excel("./test123.xlsx", skip = 3, col_names = myCols)

基准

此刻看来，读入行后直接过滤掉行会更快。

bm <- microbenchmark::microbenchmark(filter_before = {myCols <- as.character(read_excel("./test123.xlsx", n_max = 1, col_names = FALSE));
myDF <- read_excel("./test123.xlsx", skip = 3, col_names = myCols)},
filter_after = {myDF2 <- read_excel("./test123.xlsx");
myDF2 <- myDF2[-c(1:2),]},
times = 1000)
autoplot(bm)

在 R 中使用 read_excel 或 read.excel 时跳过多个 header 行

Skip multiple header rows while using read_excel or read.excel in R

r

xlsx

dataframe

readxl