为什么 R 中的 gsubfn 中的 \\b 不适合我?

Why isn't \\b in gsubfn in R working for me?

我有这样的字符串:

vect <- c("Thin lines are not great, I am in !!! AND You shouldn't be late OR you loose")

我想替换,"in" 到 %in%","AND" 到"&","OR" 到"|"。

我知道这可以使用 gsub 来完成,如下所示:

gsub("\bin\b","%in%", vect),

但是每次替换我都需要三个不同的行,因此我选择使用 gsubfn.

所以我试过了,

gsubfn("\bin\b|\bAND\b|\bOR\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect)

但是它 returns 一个没有任何改变的字符串,由于某种原因 \b 不适用于该字符串。但是,\b 确实与 gsub 配合使用效果很好,我可以通过使用 gsub.

管道连接在一起来替换所有三个字符串

我的问题是,为什么 \bgsubfn 中不起作用。我的正则表达式中缺少什么?

请帮忙。

输出应该是:

"Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"

这个有效:

gsubfn("\w+", list("in"="%in%", "AND"= "&", "OR"="|"), vect)

添加 perl = T 应该这样做。

gsubfn("\bin\b|\bAND\b|\bOR\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect, perl =T)

输出

[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"

来自 gsub 文档

The POSIX 1003.2 mode of gsub and gregexpr does not work correctly with repeated word-boundaries (e.g., pattern = "\b"). Use perl = TRUE for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).

和 gsubfn 文档

... Other gsub arguments.

没有解释为什么 gsub 在没有 perl 参数的情况下工作正常,但是要执行 gsubfn 它需要 perl=T

默认使用Tcl正则表达式引擎,见gsubfn docs:

If the R installation has tcltk capability then the tcl engine is used unless FUN is a proto object or perl=TRUE in which case the "R" engine is used (regardless of the setting of this argument).

因此,单词边界定义为 \y:

> gsubfn("\y(in|AND|OR)\y", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"

另一种方法是使用 \m 作为前导词边界,使用 \M 作为尾随词边界:

> gsubfn("\m(in|AND|OR)\M", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"

您可以通过 perl=TRUE 并使用 \b:

> gsubfn("\b(in|AND|OR)\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect, perl=TRUE)
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"