使用 read.table 阅读 R 中注释附近的行

Question

我正在阅读许多包含数据行的文本文件，顶部有几行 header 行包含数据信息，如下所示：

Test file
#
File information
1 2 3 4
#
a 2
b 4
c 6
d 8

我想从这个文件中单独读入各种信息。我可以像这样完成这个：

file <- read.table(txt, nrow = 1)
name <- read.table(txt, nrow = 1, skip = 2)
vals <- read.table(txt, nrow = 1, skip = 3)
data <- read.table(txt,           skip = 5)

由于两个空白注释行，我也可以这样读入数据：

file <- read.table(txt, nrow = 1)
name <- read.table(txt, nrow = 1, skip = 1)  # Skip changed from 2
vals <- read.table(txt, nrow = 1, skip = 3)
data <- read.table(txt,           skip = 4)  # Skip changed from 5

这很好，但是文本文件并不总是有相同数量的空白注释行；有时它们存在，有时它们不存在。如果我丢失了我的示例文本文件中的一个（或两个）注释行，我的两个解决方案都不会继续工作。

是否有更可靠的方法来读取 skip 变量永远不会计算注释行的文本文件？

Answer 1

（假设：在顶部的文件元数据之后，一旦数据开始，就没有更多的评论。）

（使用textConnection(...)是为了欺骗需要文件连接的函数来处理字符串。用文件名替换函数调用。）

一种技术是读取文件的前 n 行（某个数字 "guaranteed" 以包含所有 commented/non-data 行），找到最后一行，然后处理相应地 all-before 和 all-after：

txt <- "Test file
#
File information
1 2 3 4
#
a 2
b 4
c 6
d 8"
max_comment_lines <- 8
(dat <- readLines(textConnection(txt), n = max_comment_lines))
# [1] "Test file"        "#"                "File information" "1 2 3 4"         
# [5] "#"                "a 2"              "b 4"              "c 6"             
(skip <- max(grep("^\s*#", dat)))
# [1] 5

(顺便说一句：可能应该做一个检查以确保实际上有评论......这将 return integer(0) 否则， read* 函数不喜欢那作为一个论点。）

现在我们 "know" 最后找到的评论在第 5 行，我们可以使用前 4 行来获取 header 信息 ...

meta <- readLines(textConnection(txt), n = skip - 1)
meta <- meta[! grepl("^\s*#", meta) ] # remove the comment rows themselves
meta
# [1] "Test file"        "File information" "1 2 3 4"

...并跳过 5 行以获取数据。

dat <- read.table(textConnection(txt), skip = skip)
str(dat)
# 'data.frame': 4 obs. of  2 variables:
#  $ V1: Factor w/ 4 levels "a","b","c","d": 1 2 3 4
#  $ V2: int  2 4 6 8

使用 read.table 阅读 R 中注释附近的行

Reading lines near comments in R with read.table

r

read.table