R - 从字符串中删除引号之间的文本
R - Removing text between quotes from a string
我有很多行,每行都是单独处理的(这里不需要循环)。以下是我的台词示例:
"warning(\"Failed to parse headers:\n\", paste0(bad, \"\n\"), call. = FALSE)"
"}"
"names <- vapply(pieces, \"[[\", 2, FUN.VALUE = character(1))"
"new_response <- grepl(\"^HTTP\", lines)"
"header_lines <- lines[lines != \'\'][-1]"
如您所见,这些行是代码位。
问题: 我需要删除引号之间的所有内容,双引号 (""
) 或单引号 (''
)。
我做了以下事情:
# First, get a list of all strings in the line
text_quotes <- regmatches(line, gregexpr('"([^"]*)"', line))[[1]] # double quoes ""
text_quotes <- c(text_quotes, regmatches(line, gregexpr("'([^']*)'", line))[[1]]) # single quotes ''
# Remove the empty ones
text_quotes <- stringi::stri_remove_empty(text_quotes, na_empty = TRUE)
# Now, we can clean up the line
line_no_strings <- line
if(length(text_quotes) > 0)
line_no_strings <- mgsub::mgsub(line, text_quotes, rep("", times = length(text_quotes)))
我的问题是,有些引号(我不知道如何称呼引号内的“位”)可能与正则表达式匹配,然后 mgsub
找不到它们。
有问题的例子:
"names <- vapply(pieces, \"[[\", 2, FUN.VALUE = character(1))"
其中一个“引号”是 \"[[\"
。当我 运行 它时,它因以下内容而惨败:
Error in gregexpr(pattern[i], string, ...) :
invalid regular expression '"[["', reason 'Missing ']''
编辑: 预期输出为(对于上面的每一行,有问题的情况在中间):
"warning(, paste0(bad, ), call. = FALSE)"
"}"
"names <- vapply(pieces, , 2, FUN.VALUE = character(1))"
"new_response <- grepl(, lines)"
"header_lines <- lines[lines != ][-1]"
我觉得应该有一种方法可以做到这一点 无需先提取 ,这样 R 的正则表达式就不会对我造成困扰。然而我又一次在正则表达式面前失败了。
有什么建议吗?
gsub("[\"'].*?['\"]","", a)
[1] "warning(, paste0(bad, ), call. = FALSE)"
[2] "}"
[3] "names <- vapply(pieces, , 2, FUN.VALUE = character(1))"
[4] "new_response <- grepl(, lines)"
[5] "header_lines <- lines[lines != ][-1]"
哪里
a <- c("warning(\"Failed to parse headers:\n\", paste0(bad, \"\n\"), call. = FALSE)",
"}", "names <- vapply(pieces, \"[[\", 2, FUN.VALUE = character(1))",
"new_response <- grepl(\"^HTTP\", lines)", "header_lines <- lines[lines != ''] [-1]")
我有很多行,每行都是单独处理的(这里不需要循环)。以下是我的台词示例:
"warning(\"Failed to parse headers:\n\", paste0(bad, \"\n\"), call. = FALSE)"
"}"
"names <- vapply(pieces, \"[[\", 2, FUN.VALUE = character(1))"
"new_response <- grepl(\"^HTTP\", lines)"
"header_lines <- lines[lines != \'\'][-1]"
如您所见,这些行是代码位。
问题: 我需要删除引号之间的所有内容,双引号 (""
) 或单引号 (''
)。
我做了以下事情:
# First, get a list of all strings in the line
text_quotes <- regmatches(line, gregexpr('"([^"]*)"', line))[[1]] # double quoes ""
text_quotes <- c(text_quotes, regmatches(line, gregexpr("'([^']*)'", line))[[1]]) # single quotes ''
# Remove the empty ones
text_quotes <- stringi::stri_remove_empty(text_quotes, na_empty = TRUE)
# Now, we can clean up the line
line_no_strings <- line
if(length(text_quotes) > 0)
line_no_strings <- mgsub::mgsub(line, text_quotes, rep("", times = length(text_quotes)))
我的问题是,有些引号(我不知道如何称呼引号内的“位”)可能与正则表达式匹配,然后 mgsub
找不到它们。
有问题的例子:
"names <- vapply(pieces, \"[[\", 2, FUN.VALUE = character(1))"
其中一个“引号”是 \"[[\"
。当我 运行 它时,它因以下内容而惨败:
Error in gregexpr(pattern[i], string, ...) :
invalid regular expression '"[["', reason 'Missing ']''
编辑: 预期输出为(对于上面的每一行,有问题的情况在中间):
"warning(, paste0(bad, ), call. = FALSE)"
"}"
"names <- vapply(pieces, , 2, FUN.VALUE = character(1))"
"new_response <- grepl(, lines)"
"header_lines <- lines[lines != ][-1]"
我觉得应该有一种方法可以做到这一点 无需先提取 ,这样 R 的正则表达式就不会对我造成困扰。然而我又一次在正则表达式面前失败了。
有什么建议吗?
gsub("[\"'].*?['\"]","", a)
[1] "warning(, paste0(bad, ), call. = FALSE)"
[2] "}"
[3] "names <- vapply(pieces, , 2, FUN.VALUE = character(1))"
[4] "new_response <- grepl(, lines)"
[5] "header_lines <- lines[lines != ][-1]"
哪里
a <- c("warning(\"Failed to parse headers:\n\", paste0(bad, \"\n\"), call. = FALSE)",
"}", "names <- vapply(pieces, \"[[\", 2, FUN.VALUE = character(1))",
"new_response <- grepl(\"^HTTP\", lines)", "header_lines <- lines[lines != ''] [-1]")