R - 从字符串中删除引号之间的文本

Question

我有很多行，每行都是单独处理的（这里不需要循环）。以下是我的台词示例：

"warning(\"Failed to parse headers:\n\", paste0(bad, \"\n\"), call. = FALSE)"
"}"
"names <- vapply(pieces, \"[[\", 2, FUN.VALUE = character(1))"
"new_response <- grepl(\"^HTTP\", lines)"
"header_lines <- lines[lines != \'\'][-1]"

如您所见，这些行是代码位。

问题： 我需要删除引号之间的所有内容，双引号 ("") 或单引号 ('')。

我做了以下事情：

# First, get a list of all strings in the line
text_quotes <-  regmatches(line, gregexpr('"([^"]*)"', line))[[1]] # double quoes ""
text_quotes <- c(text_quotes, regmatches(line, gregexpr("'([^']*)'", line))[[1]]) # single quotes ''

# Remove the empty ones
text_quotes <- stringi::stri_remove_empty(text_quotes, na_empty = TRUE)
      
# Now, we can clean up the line
line_no_strings <- line
      if(length(text_quotes) > 0) 
        line_no_strings <- mgsub::mgsub(line, text_quotes, rep("", times = length(text_quotes)))

我的问题是，有些引号（我不知道如何称呼引号内的“位”）可能与正则表达式匹配，然后 mgsub 找不到它们。

有问题的例子：

"names <- vapply(pieces, \"[[\", 2, FUN.VALUE = character(1))"

其中一个“引号”是 \"[[\"。当我运行它时，它因以下内容而惨败：

 Error in gregexpr(pattern[i], string, ...) : 
  invalid regular expression '"[["', reason 'Missing ']''

编辑： 预期输出为（对于上面的每一行，有问题的情况在中间）：

"warning(, paste0(bad, ), call. = FALSE)"
"}"
"names <- vapply(pieces, , 2, FUN.VALUE = character(1))"
"new_response <- grepl(, lines)"
"header_lines <- lines[lines != ][-1]"

我觉得应该有一种方法可以做到这一点 无需先提取 ，这样 R 的正则表达式就不会对我造成困扰。然而我又一次在正则表达式面前失败了。

有什么建议吗？

Answer 1

gsub("[\"'].*?['\"]","", a)

[1] "warning(, paste0(bad, ), call. = FALSE)"               
[2] "}"                                                     
[3] "names <- vapply(pieces, , 2, FUN.VALUE = character(1))"
[4] "new_response <- grepl(, lines)"                        
[5] "header_lines <- lines[lines != ][-1]"

哪里

a <- c("warning(\"Failed to parse headers:\n\", paste0(bad, \"\n\"), call. = FALSE)", 
        "}", "names <- vapply(pieces, \"[[\", 2, FUN.VALUE = character(1))", 
        "new_response <- grepl(\"^HTTP\", lines)", "header_lines <- lines[lines != ''] [-1]")

R - 从字符串中删除引号之间的文本

R - Removing text between quotes from a string

regex

r

stringr

grepl