read_tsv 在 readr 中没有正确解析 table

read_tsv in readr not parsing table correctly

我正在尝试在 table 分隔的制表符中阅读,这会不断产生一些解析错误。我认为是由于在文本中使用了未反斜杠的引号。请参阅下面的示例:

concept_id  concept_name    domain_id   vocabulary_id   concept_class_id    standard_concept    concept_code    valid_start_date    valid_end_date  invalid_reason
2618087 Services delivered under an outpatient speech language pathology plan of care   Observation HCPCS   HCPCS Modifier  S   GN  19990101    20991231
2618083 "opt out" physician or practitioner emergency or urgent service Observation HCPCS   HCPCS Modifier  S   GJ  19981001    20991231
2618082 Diagnostic mammogram converted from screening mammogram on same day Observation HCPCS   HCPCS Modifier  S   GH  19981001    20991231

请注意第二列中的 "opt out",问题似乎源于此。 以下代码解析失败:

df <- read_delim(
  file = "~/_data/test.csv",
  col_types = cols(
    col_integer(), col_character(), col_character(),
    col_character(), col_character(), col_character(),
    col_character(), col_date(format = "%Y%m%d"), col_date(format = "%Y%m%d"),
    col_character()),
  delim = "\t"
  )

Warning: 4 parsing failures.
row          col                     expected    actual               file
  1 NA           10 columns                   9 columns '~/_data/test.csv'
  2 concept_name delimiter or quote                     '~/_data/test.csv'
  2 concept_name closing quote at end of file           '~/_data/test.csv'
  2 NA           10 columns                   2 columns '~/_data/test.csv'

我似乎无法指定解决方案。

这解决了问题。我需要将 quote 参数修改为 quote = ""

df <- read_delim(
  file = "~/_data/test.csv",
  col_types = cols(
    col_integer(), col_character(), col_character(),
    col_character(), col_character(), col_character(),
    col_character(), col_date(format = "%Y%m%d"), col_date(format = "%Y%m%d"),
    col_character()),
  quote = "",
  delim = "\t"
  )