read.csv() error: more columns than column names

read.csv() error: more columns than column names

我正在尝试输入 CSV 文件,但出现以下错误:

associatedata <- read.csv("AssociatedSpeciesID_1.csv", header=TRUE, fileEncoding = 'UTF-8-BOM') %>% mutate_all(na_if, "")

Error in read.table(file = file, header = header, sep = sep, quote = quote,  : 
  more columns than column names

这是下面的 CSV:我找不到列数不匹配的地方。我已经尝试了其他问题的通用解决方案,但没有任何效果。

ObjectID,GlobalID,AssociatedSpeciesKnown,Associates,NewAssociate,UnknownSpecies_Description,AssociatedSpeciesAbundance,Coflowering,ParentGlobalID,CreationDate,Creator,EditDate,Editor
1,54e33e7c-1ff1-464f-8872-df027fcfe8ec,known,Amelanchier utahensis,,,Few,no,9fc6b840-8584-4045-b69f-f0e9488a1f06,1/7/2022 3:55:46 PM,ejob_BLM,1/7/2022 3:55:46 PM,ejob,,
2,68420bc9-d6c6-4d7f-a149-7306399ce5c1,known,NewSpecies,Genus species,,Occasional,yes,9fc6b840-8584-4045-b69f-f0e9488a1f06,1/7/2022 3:55:46 PM,ejob_BLM,1/7/2022 3:55:46 PM,ejob,,
3,88a6807b-b00c-4e58-84bb-4e8cb61409ae,unknown,,,ritiidiwjjviern bg,Common,no,9fc6b840-8584-4045-b69f-f0e9488a1f06,1/7/2022 3:55:46 PM,ejob_BLM,1/7/2022 3:55:46 PM,ejob,,
4,9fc8ea4a-e197-42cc-bd75-614d5b106364,known,Artemisia nova,,,Common,no,ea9eb086-89c2-4aa5-a2f6-95519cd35a58,1/7/2022 3:56:26 PM,ejob_BLM,1/7/2022 3:56:26 PM,ejob,,

header 有 13 个字段,所有其他记录有 15 个,检查它我们发现每个数据行的末尾有两个尾随逗号。

count.fields("abc.csv", sep = ",")
## [1] 13 15 15 15 15

1) 如果我们删除尾随的两个逗号,那么它就可以工作了。 (你可能不需要 strip.white 但它被添加是因为最后注释中的代码缩进了 4 个空格以满足 SO。它不会受到伤害。)

L <- "abc.csv" |>
  readLines() |>
  sub(pattern = ",,$", replacement = "")
DF <- read.csv(text = L, strip.white = TRUE)

给予

> str(DF)
'data.frame':   4 obs. of  13 variables:
 $ ObjectID                  : int  1 2 3 4
 $ GlobalID                  : chr  "54e33e7c-1ff1-464f-8872-df027fcfe8ec" "68420bc9-d6c6-4d7f-a149-7306399ce5c1" "88a6807b-b00c-4e58-84bb-4e8cb61409ae" "9fc8ea4a-e197-42cc-bd75-614d5b106364"
 $ AssociatedSpeciesKnown    : chr  "known" "known" "unknown" "known"
 $ Associates                : chr  "Amelanchier utahensis" "NewSpecies" "" "Artemisia nova"
 $ NewAssociate              : chr  "" "Genus species" "" ""
 $ UnknownSpecies_Description: chr  "" "" "ritiidiwjjviern bg" ""
 $ AssociatedSpeciesAbundance: chr  "Few" "Occasional" "Common" "Common"
 $ Coflowering               : chr  "no" "yes" "no" "no"
 $ ParentGlobalID            : chr  "9fc6b840-8584-4045-b69f-f0e9488a1f06" "9fc6b840-8584-4045-b69f-f0e9488a1f06" "9fc6b840-8584-4045-b69f-f0e9488a1f06" "ea9eb086-89c2-4aa5-a2f6-95519cd35a58"
 $ CreationDate              : chr  "1/7/2022 3:55:46 PM" "1/7/2022 3:55:46 PM" "1/7/2022 3:55:46 PM" "1/7/2022 3:56:26 PM"
 $ Creator                   : chr  "ejob_BLM" "ejob_BLM" "ejob_BLM" "ejob_BLM"
 $ EditDate                  : chr  "1/7/2022 3:55:46 PM" "1/7/2022 3:55:46 PM" "1/7/2022 3:55:46 PM" "1/7/2022 3:56:26 PM"
 $ Editor                    : chr  "ejob" "ejob" "ejob" "ejob"

2) 或者,如果 sed 在您的路径上,则:

read.csv(pipe("sed -e s/,,$// abc.csv"), strip.white = TRUE)

3)这也行。

DF <- read.csv("abc.csv", header = FALSE, skip = 1, strip.white = TRUE)[1:13]
names(DF) <- read.table("abc.csv", sep = ",", strip.white = TRUE, nrows = 1)

备注

根据问题生成文件。

Lines <- "ObjectID,GlobalID,AssociatedSpeciesKnown,Associates,NewAssociate,UnknownSpecies_Description,AssociatedSpeciesAbundance,Coflowering,ParentGlobalID,CreationDate,Creator,EditDate,Editor
1,54e33e7c-1ff1-464f-8872-df027fcfe8ec,known,Amelanchier utahensis,,,Few,no,9fc6b840-8584-4045-b69f-f0e9488a1f06,1/7/2022 3:55:46 PM,ejob_BLM,1/7/2022 3:55:46 PM,ejob,,
2,68420bc9-d6c6-4d7f-a149-7306399ce5c1,known,NewSpecies,Genus species,,Occasional,yes,9fc6b840-8584-4045-b69f-f0e9488a1f06,1/7/2022 3:55:46 PM,ejob_BLM,1/7/2022 3:55:46 PM,ejob,,
3,88a6807b-b00c-4e58-84bb-4e8cb61409ae,unknown,,,ritiidiwjjviern bg,Common,no,9fc6b840-8584-4045-b69f-f0e9488a1f06,1/7/2022 3:55:46 PM,ejob_BLM,1/7/2022 3:55:46 PM,ejob,,
4,9fc8ea4a-e197-42cc-bd75-614d5b106364,known,Artemisia nova,,,Common,no,ea9eb086-89c2-4aa5-a2f6-95519cd35a58,1/7/2022 3:56:26 PM,ejob_BLM,1/7/2022 3:56:26 PM,ejob,,
"
cat(Lines, file = "abc.csv")