如何根据 R 中定界符之间的出现替换字符串中的确切字符数

How to replace exact number of characters in string based on occurrence between delimitors in R

我有这样的文本字符串:

u <- "she goes ~Wha::?~ and he's like ~↑Yeah believe me!~ and she's etc."

我想做的是用 X.

替换成对的 ~ 分隔符(包括分隔符本身)之间出现的所有字符

gsub 方法将 ~ 分隔符对之间的子字符串替换为单个 X:

gsub("~[^~]+~", "X", u)
[1] "she goes X and he's like X and she's etc."

但是,我真正想做的是用 X 替换分隔符(和分隔符本身)之间的每个字符。所需的输出是这样的:

"she goes XXXXXXXXX and he's like XXXXXXXXXXXXXXXXXXX and she's etc."

我一直在试验 nchar、反向引用和 paste,但结果不正确:

gsub("(~[^~]+~)", paste0("X{", nchar("\1"),"}"), u)
[1] "she goes X{2} and he's like X{2} and she's etc."

感谢任何帮助。

paste0("X{", nchar("\1"),"}") 代码导致 X{2},因为 "\1" 是一个长度为 2 的字符串。如果您不在字符串模式。

根据stringr可以使用以下解决方案:

> u <- "she goes ~Wha::?~ and he's like ~↑Yeah believe me!~ and she's etc."
> str_replace_all(u, '~[^~]+~', function(x) str_dup("X", nchar(x)))
[1] "she goes XXXXXXXX and he's like XXXXXXXXXXXXXXXXXXX and she's etc."

找到与 ~[^~]+~ 的匹配项后,该值将传递给匿名函数,并且 str_dupX 中创建一个与匹配值长度相同的字符串。