在三引号字符串中转义引号

Escape quotes within triple-quoted strings

带有 JSON 字符串的引号不会转义,而是嵌套在三引号中,例如

j0 = '[
  {
      "A" : "no quoted bits"
  },
  {
    "A" : """this contains: "quoted" bits""",
    "B" : "no quoted bits"
  },
  {
    "A" : "no quoted bits",
    "B" : """this contains: "quoted" and "more quoted" bits"""
  }
]'

读入 R 会出错,例如

jsonlite::fromJSON(j0)
#> Error: parse error: after key and value, inside map, I expect ',' or '}'
#>            bits"   },   {     "A" : """this contains: "quoted" bits"""
#>                      (right here) ------^

我拼凑了一个 hacky 解决方法

escape_triple_quoted = function(j){
  j_split = strsplit(j, '"{3}')[[1]]
  f = seq_along(j_split) %% 2 == 0  # filter
  j_split[f] = gsub('"', '\\"', j_split[f])
  paste(j_split, collapse = '"')
}

escape_triple_quoted(j0) |> jsonlite::fromJSON()
#>                              A                                              B
#> 1               no quoted bits                                           <NA>
#> 2 this contains: "quoted" bits                                 no quoted bits
#> 3               no quoted bits this contains: "quoted" and "more quoted" bits
# function for parsing strings where quotes are not escaped but nested inside triple-quotes

但这并不是最佳做法。有没有更好的方法?

这是 escape_triple_quotes 使用 gsubfn 的 one-liner。 gsubfn 函数类似于 gsub 除了第二个参数可能是输入匹配的捕获组并输出匹配的替换的函数。可以像我们这里用公式表示。

library(gsubfn)
library(jsonlite)

escape_triple_quoted2 <- function(s) {
  gsubfn('"""(.*?)"""', ~ sprintf('"%s"', gsub('"', '\\"', x)), s)
}

j0 |>
  escape_triple_quoted2() |>
  fromJSON()

给予

                            A                                              B
1               no quoted bits                                           <NA>
2 this contains: "quoted" bits                                 no quoted bits
3               no quoted bits this contains: "quoted" and "more quoted" bits