fread - 字符串中的多个分隔符
fread - multiple separators in a string
我正在尝试使用 fread 阅读 table。
txt 文件的文本如下所示:
"No","Comment","Type"
"0","he said:"wonderful|"","A"
"1","Pr/ "d/s". "a", n) ","B"
我使用的 R 代码是:dataset0 <- fread("data/test.txt", stringsAsFactors = F)
和 data.table R 包的 development version。
期望看到一个包含三列的数据集;然而:
Error in fread(input = "data/Whosebug.txt", stringsAsFactors = FALSE) :
Line 3 starting <<"1","Pr/ ">> has more than the expected 3 fields.
Separator 3 occurs at position 26 which is character 6 of the last field: << n) ","B">>.
Consider setting 'comment.char=' if there is a trailing comment to be ignored.
如何解决?
使用readLines
逐行读取,然后替换分隔符和read.table
:
# read with no sep
x <- readLines("test.txt")
# introduce new sep - "|"
x <- gsub("\",\"", "\"|\"", x)
# read with new sep
read.table(text = x, sep = "|", header = TRUE)
# No Comment Type
# 1 0 he said:"wonderful." A
# 2 1 The problem is: reading table, and also "a problem, yes." keep going on. A
development version of data.table handles files like this where the embedded quotes have not been escaped. See point 10 on the wiki page.
我刚刚根据您的输入对其进行了测试并且它有效。
$ more unescaped.txt
"No","Comment","Type"
"0","he said:"wonderful."","A"
"1","The problem is: reading table, and also "a problem, yes." keep going on.","A"
> DT = fread("unescaped.txt")
> DT
No Comment Type
1: 0 he said:"wonderful." A
2: 1 The problem is: reading table, and also "a problem, yes." keep going on. A
> ncol(DT)
[1] 3
我正在尝试使用 fread 阅读 table。 txt 文件的文本如下所示:
"No","Comment","Type"
"0","he said:"wonderful|"","A"
"1","Pr/ "d/s". "a", n) ","B"
我使用的 R 代码是:dataset0 <- fread("data/test.txt", stringsAsFactors = F)
和 data.table R 包的 development version。
期望看到一个包含三列的数据集;然而:
Error in fread(input = "data/Whosebug.txt", stringsAsFactors = FALSE) :
Line 3 starting <<"1","Pr/ ">> has more than the expected 3 fields.
Separator 3 occurs at position 26 which is character 6 of the last field: << n) ","B">>.
Consider setting 'comment.char=' if there is a trailing comment to be ignored.
如何解决?
使用readLines
逐行读取,然后替换分隔符和read.table
:
# read with no sep
x <- readLines("test.txt")
# introduce new sep - "|"
x <- gsub("\",\"", "\"|\"", x)
# read with new sep
read.table(text = x, sep = "|", header = TRUE)
# No Comment Type
# 1 0 he said:"wonderful." A
# 2 1 The problem is: reading table, and also "a problem, yes." keep going on. A
development version of data.table handles files like this where the embedded quotes have not been escaped. See point 10 on the wiki page.
我刚刚根据您的输入对其进行了测试并且它有效。
$ more unescaped.txt
"No","Comment","Type"
"0","he said:"wonderful."","A"
"1","The problem is: reading table, and also "a problem, yes." keep going on.","A"
> DT = fread("unescaped.txt")
> DT
No Comment Type
1: 0 he said:"wonderful." A
2: 1 The problem is: reading table, and also "a problem, yes." keep going on. A
> ncol(DT)
[1] 3