为什么 R 中的 gsubfn 中的 \\b 不适合我?
Why isn't \\b in gsubfn in R working for me?
我有这样的字符串:
vect <- c("Thin lines are not great, I am in !!! AND You shouldn't be late OR you loose")
我想替换,"in" 到 %in%","AND" 到"&","OR" 到"|"。
我知道这可以使用 gsub 来完成,如下所示:
gsub("\bin\b","%in%", vect),
但是每次替换我都需要三个不同的行,因此我选择使用 gsubfn
.
所以我试过了,
gsubfn("\bin\b|\bAND\b|\bOR\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
但是它 returns 一个没有任何改变的字符串,由于某种原因 \b
不适用于该字符串。但是,\b
确实与 gsub
配合使用效果很好,我可以通过使用 gsub
.
管道连接在一起来替换所有三个字符串
我的问题是,为什么 \b
在 gsubfn
中不起作用。我的正则表达式中缺少什么?
请帮忙。
输出应该是:
"Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
这个有效:
gsubfn("\w+", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
添加 perl = T
应该这样做。
gsubfn("\bin\b|\bAND\b|\bOR\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect, perl =T)
输出
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
来自 gsub 文档
The POSIX 1003.2 mode of gsub and gregexpr does not work correctly with repeated word-boundaries (e.g., pattern = "\b"). Use perl = TRUE for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).
和 gsubfn 文档
... Other gsub arguments.
没有解释为什么 gsub 在没有 perl
参数的情况下工作正常,但是要执行 gsubfn 它需要 perl=T
默认使用Tcl正则表达式引擎,见gsubfn
docs:
If the R installation has tcltk capability then the
tcl engine is used unless FUN is a proto object or perl=TRUE
in which case the "R" engine is used (regardless of the setting of this argument).
因此,单词边界定义为 \y
:
> gsubfn("\y(in|AND|OR)\y", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
另一种方法是使用 \m
作为前导词边界,使用 \M
作为尾随词边界:
> gsubfn("\m(in|AND|OR)\M", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
您可以通过 perl=TRUE
并使用 \b
:
> gsubfn("\b(in|AND|OR)\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect, perl=TRUE)
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
我有这样的字符串:
vect <- c("Thin lines are not great, I am in !!! AND You shouldn't be late OR you loose")
我想替换,"in" 到 %in%","AND" 到"&","OR" 到"|"。
我知道这可以使用 gsub 来完成,如下所示:
gsub("\bin\b","%in%", vect),
但是每次替换我都需要三个不同的行,因此我选择使用 gsubfn
.
所以我试过了,
gsubfn("\bin\b|\bAND\b|\bOR\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
但是它 returns 一个没有任何改变的字符串,由于某种原因 \b
不适用于该字符串。但是,\b
确实与 gsub
配合使用效果很好,我可以通过使用 gsub
.
我的问题是,为什么 \b
在 gsubfn
中不起作用。我的正则表达式中缺少什么?
请帮忙。
输出应该是:
"Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
这个有效:
gsubfn("\w+", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
添加 perl = T
应该这样做。
gsubfn("\bin\b|\bAND\b|\bOR\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect, perl =T)
输出
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
来自 gsub 文档
The POSIX 1003.2 mode of gsub and gregexpr does not work correctly with repeated word-boundaries (e.g., pattern = "\b"). Use perl = TRUE for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).
和 gsubfn 文档
... Other gsub arguments.
没有解释为什么 gsub 在没有 perl
参数的情况下工作正常,但是要执行 gsubfn 它需要 perl=T
默认使用Tcl正则表达式引擎,见gsubfn
docs:
If the R installation has tcltk capability then the tcl engine is used unless FUN is a proto object or
perl=TRUE
in which case the "R" engine is used (regardless of the setting of this argument).
因此,单词边界定义为 \y
:
> gsubfn("\y(in|AND|OR)\y", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
另一种方法是使用 \m
作为前导词边界,使用 \M
作为尾随词边界:
> gsubfn("\m(in|AND|OR)\M", list("in"="%in%", "AND"= "&", "OR"="|"), vect)
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"
您可以通过 perl=TRUE
并使用 \b
:
> gsubfn("\b(in|AND|OR)\b", list("in"="%in%", "AND"= "&", "OR"="|"), vect, perl=TRUE)
[1] "Thin lines are not great, I am %in% !!! & You shouldn't be late | you loose"