如何替换字符串中的 single/double 字符
how to replace a single/double character in a string
我想用空格替换字符串中的所有单个字符。我的想法是单个字符前后应该有一个space。所以我在角色前后添加了 spaces 但这似乎不起作用。我还想用超过 1 个字符替换字符串。即,如果我想用 2 左右的长度替换所有字符,那么代码将如何更改。
str="I have a cat of white color"
str=gsub("([[:space:]][[a-z]][[:space:]])", "", str)
您需要使用量词正则表达式 属性,例如[a-z]{2}
将字母 a
到 z
匹配两次。你想要的正则表达式模式是这样的:
\s[a-z]{2}\s
您可以使用输入的字符数在 R 中动态构建此正则表达式。这是一个演示这一点的代码片段:
str <- "I have a cat of white color"
nchars <- 2
exp <- paste0("\s[a-z]{", nchars, "}\s")
> gsub(exp, "", str)
[1] "I have a catwhite color"
I want to replace all the single character in my string with a blank. My idea is that there should be a space before and after the single character.
思路不对,一个词不总是被空格包围的。如果单词位于字符串的开头怎么办?还是在最后?或者后面是标点符号?
There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
注意 在 R 中,当您使用 gsub
时,最好将其与 PCRE 正则表达式一起使用(通过 perl=T
):
POSIX 1003.2 mode of gsub
and gregexpr
does not work correctly with repeated word-boundaries (e.g., pattern = "\b"
). Use perl = TRUE
for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).
因此,要匹配所有 1 个字母的单词,您需要使用
gsub("(?i)\b[a-z]\b", "REPLACEMENT", input, perl=T) ## To replace 1 ASCII letter words
请注意 (?i)
是不区分大小写的修饰符(使 a
匹配 a
和 A
)。
现在,您需要匹配 2 个字母单词:
gsub("(?i)\b[a-z]{2}\b", "REPLACEMENT", input, perl=T) ## To replace 2 ASCII letter words
在这里,我们使用 limiting quantifier {min, max}
/ {max}
来指定使用此构造量化的模式可以重复多少次。
参见IDEONE demo:
> input = "I am a football fan"
> gsub("(?i)\b[a-z]\b", "REPLACEMENT", input, perl=T) ## To replace 1 ASCII letter words
[1] "REPLACEMENT am REPLACEMENT football fan"
gsub("(?i)\b[a-z]{2}\b", "REPLACEMENT", input, perl=T) ## To replace 2 ASCII letter words
[1] "I REPLACEMENT a football fan"
我想用空格替换字符串中的所有单个字符。我的想法是单个字符前后应该有一个space。所以我在角色前后添加了 spaces 但这似乎不起作用。我还想用超过 1 个字符替换字符串。即,如果我想用 2 左右的长度替换所有字符,那么代码将如何更改。
str="I have a cat of white color"
str=gsub("([[:space:]][[a-z]][[:space:]])", "", str)
您需要使用量词正则表达式 属性,例如[a-z]{2}
将字母 a
到 z
匹配两次。你想要的正则表达式模式是这样的:
\s[a-z]{2}\s
您可以使用输入的字符数在 R 中动态构建此正则表达式。这是一个演示这一点的代码片段:
str <- "I have a cat of white color"
nchars <- 2
exp <- paste0("\s[a-z]{", nchars, "}\s")
> gsub(exp, "", str)
[1] "I have a catwhite color"
I want to replace all the single character in my string with a blank. My idea is that there should be a space before and after the single character.
思路不对,一个词不总是被空格包围的。如果单词位于字符串的开头怎么办?还是在最后?或者后面是标点符号?
There are three different positions that qualify as word boundaries:
- Before the first character in the string, if the first character is a word character.
- After the last character in the string, if the last character is a word character.
- Between two characters in the string, where one is a word character and the other is not a word character.
注意 在 R 中,当您使用 gsub
时,最好将其与 PCRE 正则表达式一起使用(通过 perl=T
):
POSIX 1003.2 mode of
gsub
andgregexpr
does not work correctly with repeated word-boundaries (e.g.,pattern = "\b"
). Useperl = TRUE
for such matches (but that may not work as expected with non-ASCII inputs, as the meaning of ‘word’ is system-dependent).
因此,要匹配所有 1 个字母的单词,您需要使用
gsub("(?i)\b[a-z]\b", "REPLACEMENT", input, perl=T) ## To replace 1 ASCII letter words
请注意 (?i)
是不区分大小写的修饰符(使 a
匹配 a
和 A
)。
现在,您需要匹配 2 个字母单词:
gsub("(?i)\b[a-z]{2}\b", "REPLACEMENT", input, perl=T) ## To replace 2 ASCII letter words
在这里,我们使用 limiting quantifier {min, max}
/ {max}
来指定使用此构造量化的模式可以重复多少次。
参见IDEONE demo:
> input = "I am a football fan"
> gsub("(?i)\b[a-z]\b", "REPLACEMENT", input, perl=T) ## To replace 1 ASCII letter words
[1] "REPLACEMENT am REPLACEMENT football fan"
gsub("(?i)\b[a-z]{2}\b", "REPLACEMENT", input, perl=T) ## To replace 2 ASCII letter words
[1] "I REPLACEMENT a football fan"