如何识别文本中的句子?
How can I identify sentences within a text?
我有这样的文字:-
"I am an engineer. I am skilled in ASP.NET. I also know Node.js.But I don't have much experience. "
这里,"ASP.NET"和"Node.js"都被当做单词来处理。还有,"But I..."前面没有space,但应该单独成一个句子来对待。
预期输出为:
["I am an engineer"," I am skilled in ASP.NET","I also know Node.js","But I don't have much experience"]
有办法吗?
对于您当前的输入,您可以将以下方法与 re.split()
函数和特定的正则表达式模式结合使用:
import re
s = "I am an engineer. I am skilled in ASP.NET. I also know Node.js.But I don't have much experience. "
result = re.split(r'\.(?=\s?[A-Z][^.]*? )', s)
print(result)
输出:
['I am an engineer', ' I am skilled in ASP.NET', ' I also know Node.js', "But I don't have much experience. "]
(?=\s?[A-Z][^.]*? )
- 前瞻肯定断言,确保句子定界符 .
后跟下一句
中的单词
我有这样的文字:-
"I am an engineer. I am skilled in ASP.NET. I also know Node.js.But I don't have much experience. "
这里,"ASP.NET"和"Node.js"都被当做单词来处理。还有,"But I..."前面没有space,但应该单独成一个句子来对待。
预期输出为:
["I am an engineer"," I am skilled in ASP.NET","I also know Node.js","But I don't have much experience"]
有办法吗?
对于您当前的输入,您可以将以下方法与 re.split()
函数和特定的正则表达式模式结合使用:
import re
s = "I am an engineer. I am skilled in ASP.NET. I also know Node.js.But I don't have much experience. "
result = re.split(r'\.(?=\s?[A-Z][^.]*? )', s)
print(result)
输出:
['I am an engineer', ' I am skilled in ASP.NET', ' I also know Node.js', "But I don't have much experience. "]
(?=\s?[A-Z][^.]*? )
- 前瞻肯定断言,确保句子定界符 .
后跟下一句