当某些行包含逗号作为千位分隔符和 " 标志并且没有小数的行没有标志时如何在 R 中读取数据

How to read data in R when some rows contain commas as thousand separator and " flag and the rows without decimals don´t have flag

我正在 R 中加载一个 csv(以逗号分隔),其中包含引号 " 包装每一行,其中包含一个带有小数的值的列,并且特定值用双引号 """ ,没有这个问题的行,没有 "" 换行

csv 文件如下所示:

2019,SPAIN, 2000, 300

这太蠢了...我能想到的最好办法是将其作为文本阅读,然后使用 gsub 清除千位标记和双引号。

# Read the csv as text, so we can run it through gsub
file_connection <- file("path_to_csv.csv")
text <- readLines(file_connection)


# 1. Remove the comma as thousand mark
# There HAS to be a better way to do this regex but I couldn't remember
sanitized_mark <- gsub('(\"\"[0-9]+),([0-9]+\"\")', '\1\2', text)

# 2. Remove all double quotes
sanitized_quotes <- gsub('\"', '', sanitized_mark)

# Paste it all together adding a newline character after each element
sanitized <- paste0(sanitized_quotes, collapse="\n")

可以使用 text 参数

读取生成的字符串,就好像它是 .csv 的内容一样
df <- read.csv(text=sanitized)