如何在R中的单词之间替换特殊字符

How to substitute a special character between words in R

我有一串字符。

str = c(".wow", "if.", "not.confident", "wonder", "have.difficulty", "shower")

我正在尝试替换“.”在带有空格的单词之间。所以它看起来像这样

".wow", "if.", "not confident", "wonder", "have difficulty", "shower"

首先,我尝试了

gsub("[\w.\w]", " ", str)
[1] "  o "            "if"              "not confident"   " onder"         
[5] "have difficulty" "sho er " 

它给了我想要的空白,但砍掉了所有的 w。然后,我尝试了

gsub("\w\.\w", " ", str)
[1] ".wow"          "if"            "no onfident"   "wonder"       
[5] "hav ifficulty" "shower."    

它保留了 w,但去掉了“.”前后的其他字符。

这个我也不会用

gsub("\.", " ", str)
[1] " wow"             "if "              "not.confident"   "wonder"         
[5] "have.difficulty" "shower" 

因为它会带走“.”不在单词之间。

尝试

gsub('(\w)\.(\w)', '\1 \2', str)
#[1] ".wow"            "if."             "not confident"   "wonder"         
#[5] "have difficulty" "shower"       

或者

gsub('(?<=[^.])[.](?=[^.])', ' ', str, perl=TRUE)

或者按照@rawr 的建议

gsub('\b\.\b', ' ', str, perl = TRUE)

使用capturing groups and back-references

sub('(\w)\.(\w)', '\1 \2', str)
# [1] ".wow"            "if."             "not confident"   "wonder"         
# [5] "have difficulty" "shower"

可以通过将要分组的字符放在一组括号内来创建捕获组 ( ... )。反向引用回忆捕获组匹配的内容。

反向引用指定为 (\);后跟一个数字 表示组的编号 .

使用 lookaround 断言:

Lookarounds are zero-width assertions. They don't "consume" any characters on the string.

sub('(?<=\w)\.(?=\w)', ' ', str, perl = TRUE)