如何在包含标点符号的同时将字符串拆分为句子?

How do I split a string into sentences whilst including the punctuation marks?

我希望拆分的句子包含标点符号(例如:?、!、.),如果句子末尾有双引号,我也希望包含它。

我使用 python3 中的 re.split() 函数将我的字符串拆分为句子。但遗憾的是,生成的字符串不包含标点符号,如果句子末尾有双引号,也不包含双引号。

这就是我当前的代码:

x = 'This is an example sentence. I want to include punctuation! What is wrong with my code? It makes me want to yell, "PLEASE HELP ME!"'
sentence = re.split('[\.\?\!]\s*', x)

我得到的输出是:

['This is an example sentence', 'I want to include punctuation', 'What is wrong with my code', 'It makes me want to yell, "PLEASE HELP ME', '"']

尝试拆分回顾:

sentences = re.split('(?<=[\.\?\!])\s*', x)
print(sentences)

['This is an example sentence.', 'I want to include punctuation!',
 'What is wrong with my code?', 'It makes me want to yell, "PLEASE HELP ME!"']

当我们看到紧跟在我们后面的标点符号时,这个正则表达式技巧通过拆分来发挥作用。在这种情况下,在我们继续输入字符串之前,我们还匹配并消耗我们前面的任何空格。

这是我处理双引号问题的平庸尝试:

x = 'This is an example sentence. I want to include punctuation! "What is wrong with my code?"  It makes me want to yell, "PLEASE HELP ME!"'
sentences = re.split('((?<=[.?!]")|((?<=[.?!])(?!")))\s*', x)
print filter(None, sentences)

['This is an example sentence.', 'I want to include punctuation!',
 '"What is wrong with my code?"', 'It makes me want to yell, "PLEASE HELP ME!"']

请注意,它甚至可以正确拆分以双引号结尾的句子。