R:如何读入注释行以“##”开头和一些常规行以“#”开头的文件
R: How to read in a file with comment lines starting with "##" and some regular lines starting with "#"
read.delim
和朋友的文档说 "comment.char" 参数只能接受一个字符。
有没有解决注释行以“##”开头和真实行以“#”开头的问题?
一些生物信息学文件格式可以做到这一点。 header 行以“#”开头
可惜没有正则表达式选项。
### Write file with comment line indicated by "##"
### Read in with comment.char="#"
text1 = "##comment\nCol1\tCol2\n10\t20"
write(text1, file="text1.txt")
t1 = read.delim("text1.txt", comment.char="#")
print(t1)
#> Col1 Col2
#> 1 10 20
### Write file with comment line indicated by "##"
### and header column starting with "#"
### Read in with comment.char="#"
text2 = "##comment\n#Col1\tCol2\n10\t20"
write(text2, file="text2.txt")
t2 = read.delim("text2.txt", comment.char="#")
print(t2)
#> [1] X10 X20
#> <0 rows> (or 0-length row.names)
### Write file with comment line indicated by "##"
### and header column starting with "#"
### Read in with comment.char="##"
text3 = "##comment\n#Col1\tCol2\n10\t20"
write(text3, file="text3.txt")
t3 = read.delim("text3.txt", comment.char="##")
#> Error in read.table(file = file, header = header, sep = sep, quote = quote, : invalid 'comment.char' argument
print(t3)
#> Error in print(t3): object 't3' not found
预处理文件去除双"##"
是解决问题的方法。然后从生成的字符向量中读取。
removeDoubleChar <- function(x, ...){
txt <- readLines(x)
txt <- sub('^#([^#]*)', '\1', txt)
read.delim(text = txt, comment.char = "#", ...)
}
fls <- list.files(pattern = '^t.*\.txt')
lapply(fls, removeDoubleChar)
#[[1]]
# Col1 Col2
#1 10 20
#
#[[2]]
# Col1 Col2
#1 10 20
#
#[[3]]
# Col1 Col2
#1 10 20
read.delim
和朋友的文档说 "comment.char" 参数只能接受一个字符。
有没有解决注释行以“##”开头和真实行以“#”开头的问题?
一些生物信息学文件格式可以做到这一点。 header 行以“#”开头
可惜没有正则表达式选项。
### Write file with comment line indicated by "##"
### Read in with comment.char="#"
text1 = "##comment\nCol1\tCol2\n10\t20"
write(text1, file="text1.txt")
t1 = read.delim("text1.txt", comment.char="#")
print(t1)
#> Col1 Col2
#> 1 10 20
### Write file with comment line indicated by "##"
### and header column starting with "#"
### Read in with comment.char="#"
text2 = "##comment\n#Col1\tCol2\n10\t20"
write(text2, file="text2.txt")
t2 = read.delim("text2.txt", comment.char="#")
print(t2)
#> [1] X10 X20
#> <0 rows> (or 0-length row.names)
### Write file with comment line indicated by "##"
### and header column starting with "#"
### Read in with comment.char="##"
text3 = "##comment\n#Col1\tCol2\n10\t20"
write(text3, file="text3.txt")
t3 = read.delim("text3.txt", comment.char="##")
#> Error in read.table(file = file, header = header, sep = sep, quote = quote, : invalid 'comment.char' argument
print(t3)
#> Error in print(t3): object 't3' not found
预处理文件去除双"##"
是解决问题的方法。然后从生成的字符向量中读取。
removeDoubleChar <- function(x, ...){
txt <- readLines(x)
txt <- sub('^#([^#]*)', '\1', txt)
read.delim(text = txt, comment.char = "#", ...)
}
fls <- list.files(pattern = '^t.*\.txt')
lapply(fls, removeDoubleChar)
#[[1]]
# Col1 Col2
#1 10 20
#
#[[2]]
# Col1 Col2
#1 10 20
#
#[[3]]
# Col1 Col2
#1 10 20