正则表达式是什么?
What is the regular expression?
我正在使用 Nltk 的 punkt tokenizer 将段落标记为句子,但在某些情况下(例如下面的示例),tokenizer 无法识别句子,因为句点后跟数字。我想使用正则表达式识别这些场景,并将 '.1,7,9'
替换为 '. 1,7,9'
,即在引用和句点之间添加 space。
Ex1. `This is a random sentence.1,7,9 This is a sentence followed by it.`
Ex2. I love football.1,7,24`I also like cricket.
Ex3. ESD for undifferentiated cancers.[1][7]`Cancers can be treatable.
预期输出:
EX1. This is a random sentence.
1,7,9 This is a sentence followed by it.
Ex2. I love football.
ESD for undifferentiated cancers.1,7
Ex3. ESD for undifferentiated cancers.1,7
[1][7]`Cancers can be treatable.
谢谢。
下面的正则表达式将用 .
+ \n
替换非 space 字符后面的所有点
>>> import re
>>> s = "Ex1. This is a random sentence.1,7,9 This is a sentence followed by it."
>>> print(re.sub(r'\.(\S)', r'.\n', s))
Ex1. This is a random sentence.
1,7,9 This is a sentence followed by it.
如果附加的整数列表是引用,将字符 return 放在整数列表之后可能会有用:
>>> import re
>>> s = "Ex1. This is a random sentence.1,7,9 This is a sentence followed by it."
>>> print(re.sub(r'(\.\S+\s)', r'\n', s))
Ex1. This is a random sentence.1,7,9
This is a sentence followed by it.
我正在使用 Nltk 的 punkt tokenizer 将段落标记为句子,但在某些情况下(例如下面的示例),tokenizer 无法识别句子,因为句点后跟数字。我想使用正则表达式识别这些场景,并将 '.1,7,9'
替换为 '. 1,7,9'
,即在引用和句点之间添加 space。
Ex1. `This is a random sentence.1,7,9 This is a sentence followed by it.`
Ex2. I love football.1,7,24`I also like cricket.
Ex3. ESD for undifferentiated cancers.[1][7]`Cancers can be treatable.
预期输出:
EX1. This is a random sentence.
1,7,9 This is a sentence followed by it.
Ex2. I love football.
ESD for undifferentiated cancers.1,7
Ex3. ESD for undifferentiated cancers.1,7
[1][7]`Cancers can be treatable.
谢谢。
下面的正则表达式将用 .
+ \n
>>> import re
>>> s = "Ex1. This is a random sentence.1,7,9 This is a sentence followed by it."
>>> print(re.sub(r'\.(\S)', r'.\n', s))
Ex1. This is a random sentence.
1,7,9 This is a sentence followed by it.
如果附加的整数列表是引用,将字符 return 放在整数列表之后可能会有用:
>>> import re
>>> s = "Ex1. This is a random sentence.1,7,9 This is a sentence followed by it."
>>> print(re.sub(r'(\.\S+\s)', r'\n', s))
Ex1. This is a random sentence.1,7,9
This is a sentence followed by it.