fread - 字符串中的多个分隔符

fread - multiple separators in a string

我正在尝试使用 fread 阅读 table。 txt 文件的文本如下所示:

"No","Comment","Type"
"0","he said:"wonderful|"","A"
"1","Pr/ "d/s". "a", n) ","B"

我使用的 R 代码是:dataset0 <- fread("data/test.txt", stringsAsFactors = F) 和 data.table R 包的 development version

期望看到一个包含三列的数据集;然而:

Error in fread(input = "data/Whosebug.txt", stringsAsFactors = FALSE) : 
Line 3 starting <<"1","Pr/ ">> has more than the expected 3 fields.
Separator 3 occurs at position 26 which is character 6 of the last field: << n) ","B">>. 
Consider setting 'comment.char=' if there is a trailing comment to be ignored.

如何解决?

使用readLines逐行读取,然后替换分隔符和read.table:

# read with no sep
x <- readLines("test.txt")

# introduce new sep - "|"
x <- gsub("\",\"", "\"|\"", x)

# read with new sep
read.table(text = x, sep = "|", header = TRUE)

#   No                                                                  Comment Type
# 1  0                                                     he said:"wonderful."    A
# 2  1 The problem is: reading table, and also "a problem, yes." keep going on.    A

development version of data.table handles files like this where the embedded quotes have not been escaped. See point 10 on the wiki page.

我刚刚根据您的输入对其进行了测试并且它有效。

$ more unescaped.txt
"No","Comment","Type"
"0","he said:"wonderful."","A"
"1","The problem is: reading table, and also "a problem, yes." keep going on.","A"

> DT = fread("unescaped.txt")
> DT
   No                                                                  Comment Type
1:  0                                                     he said:"wonderful."    A
2:  1 The problem is: reading table, and also "a problem, yes." keep going on.    A
> ncol(DT)
[1] 3