read.table() 错误，即使所有元素都存在

Question

我在使用 read.table():

时遇到错误

data <- read.table(file, header=T, stringsAsFactors=F, sep="@")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 160 did not have 28 elements

我检查了第 160 行，它确实有 28 个元素（它有 27 个 @ 符号）。

我检查了所有 30242 行，其中有 816534 个 @ 符号，每行 27 个，所以我很确定每一行都有 28 个元素。我还检查了文件以确认除分隔符外其他任何地方都没有 @ 符号。

有人知道这里发生了什么吗？

编辑：文件第 160 行

158@Mental state: 1. Overall clinical symptoms@MD@S@2002@CMP-005@02@20.67@23.58@Clozapine versus typical neuroleptic medication for schizophrenia@IV@4.47@02@SENSITIVITY ANALYSIS - CHINESE TRIALS@CD000059@6.94@Fixed@16@5@2@45@Chinese trials@YES@Xia 2002 (CPZ)@STD-Xia-2002-_x0028_CPZ_x0029_@579@566@40

edit2：文件第 161 行

159@Length of surgery (minutes)@MD@Y@1995@CMP-001@01@59.0@47.0@Gamma and other cephalocondylic intramedullary nails versus extramedullary implants for extracapsular hip fractures in adults@IV@23.9@01@Summary: Femoral nail (all types) versus sliding hip screw (SHS)@CD000093@13.3@Random@12@1@1@53@Gamma nail@YES@O'Brien 1995@STD-O_x0027_Brien-1995@958@941@49

Answer 1

我认为问题在于 quote 参数需要识别一个换行符。一起来看看吧

txt <- c(
    "158@Mental state: 1. Overall clinical symptoms@MD@S@2002@CMP-005@02@20.67@23.58@Clozapine versus typical neuroleptic medication for schizophrenia@IV@4.47@02@SENSITIVITY ANALYSIS - CHINESE TRIALS@CD000059@6.94@Fixed@16@5@2@45@Chinese trials@YES@Xia 2002 (CPZ)@STD-Xia-2002-_x0028_CPZ_x0029_@579@566@40", 
    "159@Length of surgery (minutes)@MD@Y@1995@CMP-001@01@59.0@47.0@Gamma and other cephalocondylic intramedullary nails versus extramedullary implants for extracapsular hip fractures in adults@IV@23.9@01@Summary: Femoral nail (all types) versus sliding hip screw (SHS)@CD000093@13.3@Random@12@1@1@53@Gamma nail@YES@O'Brien 1995@STD-O_x0027_Brien-1995@958@941@49"
)

我们可以使用count.fields()来预览文件中的字段长度。有了正常的 sep = "@" 而没有别的，我们在两行之间得到一个 NA，并且计数不正确

count.fields(textConnection(txt), sep = "@")
# [1] 28 NA 24

但是当我们识别 quote 中的换行符时，它 returns 正确的长度

count.fields(textConnection(txt), sep = "@", quote = "\n")
# [1] 28 28

因此，我建议您将 quote = "\n" 添加到 read.table 调用中，看看是否可以解决问题。它对我有用

read.table(text = txt, sep = "@")
# [1] V1  V2  V3  V4  V5  V6  V7  V8  V9  V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28
# <0 rows> (or 0-length row.names)

df <- read.table(text = txt, sep = "@", quote = "\n")
dim(df)
# [1]  2 28
anyNA(df)
# [1] FALSE

Answer 2

我遇到了同样的问题。这个答案有帮助，但 quote="\n" 只在一定程度上起作用。文件中有一个元素以 " 作为字符，因此我不得不使用 quote 的默认值。我还有 # 在其中一个元素中，所以我不得不使用 comment.char=""。[=38 的帮助=]() 在几个地方引用了 scan()，所以我检查了一下，发现 allowEscapes 参数有False 作为默认值。我将其添加到我的 read.table() 调用中并将其设置为 True. 这是对我有用的完整命令： read.table(file="filename.csv", header=T, sep=",", comment.char="", allowEscapes=T) 我希望这对某人有所帮助。

read.table() 错误，即使所有元素都存在

read.table() error, even though all elements are present

r

read.table

read.csv