如何识别文本中的句子?

How can I identify sentences within a text?

我有这样的文字:-

"I am an engineer. I am skilled in ASP.NET. I also know Node.js.But I don't have much experience. "

这里,"ASP.NET"和"Node.js"都被当做单词来处理。还有,"But I..."前面没有space,但应该单独成一个句子来对待。

预期输出为:

["I am an engineer"," I am skilled in ASP.NET","I also know Node.js","But I don't have much experience"]

有办法吗?

对于您当前的输入,您可以将以下方法与 re.split() 函数和特定的正则表达式模式结合使用:

import re

s = "I am an engineer. I am skilled in ASP.NET. I also know Node.js.But I don't have much experience. "
result = re.split(r'\.(?=\s?[A-Z][^.]*? )', s)

print(result)

输出:

['I am an engineer', ' I am skilled in ASP.NET', ' I also know Node.js', "But I don't have much experience. "]

(?=\s?[A-Z][^.]*? ) - 前瞻肯定断言,确保句子定界符 . 后跟下一句

中的单词