如何替换字符串中的 single/double 字符

Question

我想用空格替换字符串中的所有单个字符。我的想法是单个字符前后应该有一个space。所以我在角色前后添加了 spaces 但这似乎不起作用。我还想用超过 1 个字符替换字符串。即，如果我想用 2 左右的长度替换所有字符，那么代码将如何更改。

str="I have a cat of white color"
str=gsub("([[:space:]][[a-z]][[:space:]])", "", str)

Answer 1

您需要使用量词正则表达式属性，例如[a-z]{2} 将字母 a 到 z 匹配两次。你想要的正则表达式模式是这样的：

\s[a-z]{2}\s

您可以使用输入的字符数在 R 中动态构建此正则表达式。这是一个演示这一点的代码片段：

str <- "I have a cat of white color"
nchars <- 2
exp <- paste0("\s[a-z]{", nchars, "}\s")

> gsub(exp, "", str)
[1] "I have a catwhite color"

Answer 2

I want to replace all the single character in my string with a blank. My idea is that there should be a space before and after the single character.

思路不对，一个词不总是被空格包围的。如果单词位于字符串的开头怎么办？还是在最后？或者后面是标点符号？

使用\b word boundary:

There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.

注意在 R 中，当您使用 gsub 时，最好将其与 PCRE 正则表达式一起使用（通过 perl=T）：

POSIX 1003.2 mode of gsub and gregexpr does not work correctly with repeated word-boundaries (e.g., pattern = "\b"). Use perl = TRUE for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).

因此，要匹配所有 1 个字母的单词，您需要使用

gsub("(?i)\b[a-z]\b", "REPLACEMENT", input, perl=T) ## To replace 1 ASCII letter words

请注意 (?i) 是不区分大小写的修饰符（使 a 匹配 a 和 A）。

现在，您需要匹配 2 个字母单词：

gsub("(?i)\b[a-z]{2}\b", "REPLACEMENT", input, perl=T) ## To replace 2 ASCII letter words

在这里，我们使用 limiting quantifier {min, max} / {max} 来指定使用此构造量化的模式可以重复多少次。

参见IDEONE demo：

> input = "I am a football fan"
> gsub("(?i)\b[a-z]\b", "REPLACEMENT", input, perl=T) ## To replace 1 ASCII letter words
[1] "REPLACEMENT am REPLACEMENT football fan"
gsub("(?i)\b[a-z]{2}\b", "REPLACEMENT", input, perl=T) ## To replace 2 ASCII letter words
[1] "I REPLACEMENT a football fan"

如何替换字符串中的 single/double 字符

how to replace a single/double character in a string

regex

string

r

gsub