如何在R中的单词之间替换特殊字符
How to substitute a special character between words in R
我有一串字符。
str = c(".wow", "if.", "not.confident", "wonder", "have.difficulty", "shower")
我正在尝试替换“.”在带有空格的单词之间。所以它看起来像这样
".wow", "if.", "not confident", "wonder", "have difficulty", "shower"
首先,我尝试了
gsub("[\w.\w]", " ", str)
[1] " o " "if" "not confident" " onder"
[5] "have difficulty" "sho er "
它给了我想要的空白,但砍掉了所有的 w。然后,我尝试了
gsub("\w\.\w", " ", str)
[1] ".wow" "if" "no onfident" "wonder"
[5] "hav ifficulty" "shower."
它保留了 w,但去掉了“.”前后的其他字符。
这个我也不会用
gsub("\.", " ", str)
[1] " wow" "if " "not.confident" "wonder"
[5] "have.difficulty" "shower"
因为它会带走“.”不在单词之间。
尝试
gsub('(\w)\.(\w)', '\1 \2', str)
#[1] ".wow" "if." "not confident" "wonder"
#[5] "have difficulty" "shower"
或者
gsub('(?<=[^.])[.](?=[^.])', ' ', str, perl=TRUE)
或者按照@rawr 的建议
gsub('\b\.\b', ' ', str, perl = TRUE)
使用capturing groups and back-references:
sub('(\w)\.(\w)', '\1 \2', str)
# [1] ".wow" "if." "not confident" "wonder"
# [5] "have difficulty" "shower"
可以通过将要分组的字符放在一组括号内来创建捕获组 ( ... )
。反向引用回忆捕获组匹配的内容。
反向引用指定为 (\
);后跟一个数字 表示组的编号 .
使用 lookaround 断言:
Lookarounds are zero-width assertions. They don't "consume" any characters on the string.
sub('(?<=\w)\.(?=\w)', ' ', str, perl = TRUE)
我有一串字符。
str = c(".wow", "if.", "not.confident", "wonder", "have.difficulty", "shower")
我正在尝试替换“.”在带有空格的单词之间。所以它看起来像这样
".wow", "if.", "not confident", "wonder", "have difficulty", "shower"
首先,我尝试了
gsub("[\w.\w]", " ", str)
[1] " o " "if" "not confident" " onder"
[5] "have difficulty" "sho er "
它给了我想要的空白,但砍掉了所有的 w。然后,我尝试了
gsub("\w\.\w", " ", str)
[1] ".wow" "if" "no onfident" "wonder"
[5] "hav ifficulty" "shower."
它保留了 w,但去掉了“.”前后的其他字符。
这个我也不会用
gsub("\.", " ", str)
[1] " wow" "if " "not.confident" "wonder"
[5] "have.difficulty" "shower"
因为它会带走“.”不在单词之间。
尝试
gsub('(\w)\.(\w)', '\1 \2', str)
#[1] ".wow" "if." "not confident" "wonder"
#[5] "have difficulty" "shower"
或者
gsub('(?<=[^.])[.](?=[^.])', ' ', str, perl=TRUE)
或者按照@rawr 的建议
gsub('\b\.\b', ' ', str, perl = TRUE)
使用capturing groups and back-references:
sub('(\w)\.(\w)', '\1 \2', str)
# [1] ".wow" "if." "not confident" "wonder"
# [5] "have difficulty" "shower"
可以通过将要分组的字符放在一组括号内来创建捕获组 ( ... )
。反向引用回忆捕获组匹配的内容。
反向引用指定为 (\
);后跟一个数字 表示组的编号 .
使用 lookaround 断言:
Lookarounds are zero-width assertions. They don't "consume" any characters on the string.
sub('(?<=\w)\.(?=\w)', ' ', str, perl = TRUE)