r 中的正则表达式在句点后添加 space(如果不存在)

Regex in r to add space after period (if not present)

我将文本文件导入到 r 中,我正在使用一个函数来确定句子的结尾,后跟 space . 有些句子在句点后没有 space,所以我正在尝试编写修复该问题的正则表达式。正则表达式必须特定于后跟单词(而非数字)的句点,因此它不会在小数点后插入 space。

我在下面的例子中尝试了这个正则表达式,这有效,但它用 space 替换了句点后的第一个字母。我试图在句点和下一个单词之间添加 space 而不删除。

谢谢

x = " In water, the hydrogen atoms are close to two corners of a tetrahedron centered on the oxygen.At the other two corners are lone pairs of valence electrons that do not participate in the bonding.In a perfect tetrahedron, the atoms would form a 109.5° angle."

gsub("\.([A-Za-z])", ". ", x)

您可以使用正向先行,这样您就不会错过句点后的第一个字母。

gsub("\.(?=[A-Za-z])", ". ", x, perl = TRUE)

#" In water, the hydrogen atoms are close to two corners of a tetrahedron 
#centered on the oxygen. At the other two corners are lone pairs of valence 
#electrons that do not participate in the bonding. In a perfect 
#tetrahedron, the atoms would form a 109.5° angle."