read_tsv 在 readr 中没有正确解析 table
read_tsv in readr not parsing table correctly
我正在尝试在 table 分隔的制表符中阅读,这会不断产生一些解析错误。我认为是由于在文本中使用了未反斜杠的引号。请参阅下面的示例:
concept_id concept_name domain_id vocabulary_id concept_class_id standard_concept concept_code valid_start_date valid_end_date invalid_reason
2618087 Services delivered under an outpatient speech language pathology plan of care Observation HCPCS HCPCS Modifier S GN 19990101 20991231
2618083 "opt out" physician or practitioner emergency or urgent service Observation HCPCS HCPCS Modifier S GJ 19981001 20991231
2618082 Diagnostic mammogram converted from screening mammogram on same day Observation HCPCS HCPCS Modifier S GH 19981001 20991231
请注意第二列中的 "opt out",问题似乎源于此。
以下代码解析失败:
df <- read_delim(
file = "~/_data/test.csv",
col_types = cols(
col_integer(), col_character(), col_character(),
col_character(), col_character(), col_character(),
col_character(), col_date(format = "%Y%m%d"), col_date(format = "%Y%m%d"),
col_character()),
delim = "\t"
)
Warning: 4 parsing failures.
row col expected actual file
1 NA 10 columns 9 columns '~/_data/test.csv'
2 concept_name delimiter or quote '~/_data/test.csv'
2 concept_name closing quote at end of file '~/_data/test.csv'
2 NA 10 columns 2 columns '~/_data/test.csv'
我似乎无法指定解决方案。
这解决了问题。我需要将 quote
参数修改为 quote = ""
df <- read_delim(
file = "~/_data/test.csv",
col_types = cols(
col_integer(), col_character(), col_character(),
col_character(), col_character(), col_character(),
col_character(), col_date(format = "%Y%m%d"), col_date(format = "%Y%m%d"),
col_character()),
quote = "",
delim = "\t"
)
我正在尝试在 table 分隔的制表符中阅读,这会不断产生一些解析错误。我认为是由于在文本中使用了未反斜杠的引号。请参阅下面的示例:
concept_id concept_name domain_id vocabulary_id concept_class_id standard_concept concept_code valid_start_date valid_end_date invalid_reason
2618087 Services delivered under an outpatient speech language pathology plan of care Observation HCPCS HCPCS Modifier S GN 19990101 20991231
2618083 "opt out" physician or practitioner emergency or urgent service Observation HCPCS HCPCS Modifier S GJ 19981001 20991231
2618082 Diagnostic mammogram converted from screening mammogram on same day Observation HCPCS HCPCS Modifier S GH 19981001 20991231
请注意第二列中的 "opt out",问题似乎源于此。 以下代码解析失败:
df <- read_delim(
file = "~/_data/test.csv",
col_types = cols(
col_integer(), col_character(), col_character(),
col_character(), col_character(), col_character(),
col_character(), col_date(format = "%Y%m%d"), col_date(format = "%Y%m%d"),
col_character()),
delim = "\t"
)
Warning: 4 parsing failures.
row col expected actual file
1 NA 10 columns 9 columns '~/_data/test.csv'
2 concept_name delimiter or quote '~/_data/test.csv'
2 concept_name closing quote at end of file '~/_data/test.csv'
2 NA 10 columns 2 columns '~/_data/test.csv'
我似乎无法指定解决方案。
这解决了问题。我需要将 quote
参数修改为 quote = ""
df <- read_delim(
file = "~/_data/test.csv",
col_types = cols(
col_integer(), col_character(), col_character(),
col_character(), col_character(), col_character(),
col_character(), col_date(format = "%Y%m%d"), col_date(format = "%Y%m%d"),
col_character()),
quote = "",
delim = "\t"
)