R 无法导入字段内带有未转义引号的平面文件

R fails to import flat file with unescaped quote inside field

我正在尝试导入一个使用 |,| 的大型 .txt 文件分隔列。原始数据如下所示:

原始.txt 文件有593 118 行(条目)。但是,使用我的导入行我只能导入 191 838 行,并且其中很多行导入不正确。导入的文件如下所示(例如,第 189880:189889 行已正确导入,其他行未正确导入):

使用此代码的列数是正确的,它只是无法正确导入所有行。另外,在使用我的导入代码时,弹出如下警告信息:

Test<-read.csv("test2.txt", header = FALSE, sep = ",", quote = "|")
Warning message:
In scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :
  EOF within quoted string

我可以使用以下导入命令导入 544 605 行(切换引号:"|" 和 sep=","):

Test<-read.csv("test2.txt", header = FALSE, sep = “|", quote = “,”)

只是现在,整个文件看起来很乱,数据在错误的列中并且创建了错误的列数(41 而不是 39):

有人知道如何正确导入此 .txt 文件吗?

data.table 自动解决问题,同时显示信息性消息:

library(data.table)

DT <- fread(
  "data/test3.txt",
  header = FALSE, 
  quote = "|"
)
Warning message:
In fread("data/test3.txt", header = FALSE, quote = "|") :
  Found and resolved improper quoting out-of-sample. First healed line 403: <<|B160001953|,|S|,|N00035516|,|16|,|Y|,| |,| |,|Cardlytics, Inc.Strike price: .11 | NeitherExpires: 01/25/2021|,|Cardlytics Inc|,||,||,||,||,||,||,||,||,||,|I |,,| |,| |,| |,| |,| |,| |,| |,|p |,| |,| |,| |,.0000,| |,| |,| |,,||,||,| |>>. If the fields are not quoted (e.g. field separator does not appear within any field), try quote="" to avoid this warning.
DT[403]
           V1 V2        V3 V4 V5 V6 V7
1: B160001953  S N00035516 16  Y      
                                                                 V8             V9
1: Cardlytics, Inc.Strike price: .11 | NeitherExpires: 01/25/2021 Cardlytics Inc
   V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29
1:                                      I   NA                              p     
   V30 V31 V32 V33 V34 V35 V36 V37 V38 V39
1:           0