r 中的正则表达式在句点后添加 space(如果不存在)
Regex in r to add space after period (if not present)
我将文本文件导入到 r 中,我正在使用一个函数来确定句子的结尾,后跟 space .
有些句子在句点后没有 space,所以我正在尝试编写修复该问题的正则表达式。正则表达式必须特定于后跟单词(而非数字)的句点,因此它不会在小数点后插入 space。
我在下面的例子中尝试了这个正则表达式,这有效,但它用 space 替换了句点后的第一个字母。我试图在句点和下一个单词之间添加 space 而不删除。
谢谢
x = " In water, the hydrogen atoms are close to two corners of a tetrahedron centered on the oxygen.At the other two corners are lone pairs of valence electrons that do not participate in the bonding.In a perfect tetrahedron, the atoms would form a 109.5° angle."
gsub("\.([A-Za-z])", ". ", x)
您可以使用正向先行,这样您就不会错过句点后的第一个字母。
gsub("\.(?=[A-Za-z])", ". ", x, perl = TRUE)
#" In water, the hydrogen atoms are close to two corners of a tetrahedron
#centered on the oxygen. At the other two corners are lone pairs of valence
#electrons that do not participate in the bonding. In a perfect
#tetrahedron, the atoms would form a 109.5° angle."