正则表达式替换 {} 之外的文本

Question

我想使用正则表达式替换字符串周围的命令或标签。我的用例是将 LaTeX 命令转换为 bookdown 命令，这意味着执行诸如将 \citep{*} 替换为 [@*]、将 \ref{*} 替换为 \@ref(*) 等操作。但是，让我们坚持广义的问题：

给定一个字符串 <begin>somestring<end>，其中 <begin> 和 <end> 是已知的并且 somestring 是任意字符序列，我们可以使用正则表达式来模拟 <newbegin> 和 <newend> 得到字符串 <newbegin>somestring<newend>?

例如，考虑 LaTeX 命令 \citep{bonobo2017}，我想将其转换为 [@bonobo2017]。对于此示例：

<begin> = \citep{
somestring = bonobo2017
<end> = }
<newbegin> = [@
<newend> = ]

这道题基本上是this question的逆题。

我希望有 R 或 notepad++ 解决方案。

其他示例

将\citet{bonobo2017}转换为@bonobo2017
将\ref{myfigure}转换为\@ref(myfigure)
将 \section{Some title} 转换为 # Some title
将 \emph{something important} 转换为 *something important*

我正在寻找一个模板正则表达式，我可以根据具体情况填写 <begin>、<end>、<newbegin> 和 <newend> .

Answer 1

你可以用 dplyr + stringr 来尝试这样的事情：

string = "\citep{bonobo2017}"

begin = "\citep{"
somestring = "bonobo2017"
end = "}"
newbegin = "[@"
newend = "]"

library(stringr)
library(dplyr)

string %>%
  str_extract(paste0("(?<=\Q", begin, "\E)\w+(?=\Q", end, "\E)")) %>%
  paste0(newbegin, ., newend)

或：

string %>%
  str_replace_all(paste0("\Q", begin, "\E|\Q", end, "\E"), "") %>%
  paste0(newbegin, ., newend)

为了方便也可以做成一个函数：

convertLatex = function(string, BEGIN, END, NEWBEGIN, NEWEND){
  string %>%
    str_replace_all(paste0("\Q", BEGIN, "\E|\Q", END, "\E"), "") %>%
    paste0(NEWBEGIN, ., NEWEND)
}

convertLatex(string, begin, end, newbegin, newend)

# [1] "[@bonobo2017]"

备注：

请注意，我手动添加了一个额外的 \ 到 "\citep{bonobo2017}"，这是因为 R 中不存在原始字符串（我希望它们存在），所以单个 \ 将被视为转义字符。我需要另一个 \ 来逃避第一个 \.
str_extract 中的正则表达式使用正后向和正前向提取 begin 和 end 之间的 somestring。
str_replace 采用另一种方法从 string.

begin

end

正则表达式中的 "\Q"、"\E" 对表示 "Backslash all nonalphanumeric characters" 和 "\E" 结束表达式。这对您的情况特别有用，因为您的 Latex 命令中可能有特殊字符。此表达式会自动为您转义它们。

正则表达式替换 {} 之外的文本

regex to replace text outside of {}

regex

latex

r

notepad++

bookdown

正则表达式替换 {} 之外的文本

regex to replace text *outside* of {}

regex

latex

r

notepad++

bookdown

regex to replace text outside of {}