正则表达式匹配字符串的句子和单词

Question

我想做一个正则表达式来匹配句子和匹配句子的单词。 如果“！”，“？” , '.'被匹配然后它被视为句子的结尾并且它也匹配匹配句子的每个单词。

我的正则表达式匹配句子：[^?!.]+

我的正则表达式分别匹配每个单词：[^\s]+

但是，我不能加入这两个正则表达式来做到这一点。

...测试字符串...

I am Raktim Banerjee. I love to code.

应该return

2 sentence 8 words

和

 Whosebug is the best coding forum. I love Whosebug!

应该return

2 sentence 9 words.

在此先感谢您的帮助。

Answer 1

你在找这样的东西吗:

import re
s1="I am Raktim Banerjee. I love to code. "
s2="Whosebug is the best coding forum. I love Whosebug! "

print(len(re.compile("[^?!.]+").findall(s1))-1,"sentence",len(re.compile("[^\s]+").findall(s1)),"words")

print(len(re.compile("[^?!.]+").findall(s2))-1,"sentence",len(re.compile("[^\s]+").findall(s2)),"words")

运行以上输出：

2 sentence 8 words
2 sentence 9 words

Answer 2

我相信你在 JavaScript 中说过你想要这个：

var s = 'I am Raktim Banerjee. I love to code.'

var regex = /\b([^!?. ]+)(?:(?: +)([^!?. ]+))*\b([!?.])/g
var m, numSentences = 0, numWords = 0;
do {
    m = regex.exec(s);
    if (m) {
        numSentences++;
        numWords += m[0].split(' ').length
    }
} while (m);
console.log(numSentences + ' sentences, ' + numWords + ' words')

这是第二次迭代。我修改了正则表达式以识别一些称呼，Mr.、Mrs. 和 Dr.（您可以添加其他的），并添加一个原始的子正则表达式来识别电子邮件地址。而且我还稍微简化了原始正则表达式。我希望这会有所帮助（不能保证，因为电子邮件检查过于简单）：

var s = 'Mr. Raktim Banerjee. My email address is x.y.z@nowhere.com.'

var regex = /\b((Mrs?\.|Dr\.|\S+@\S+|[^!?. ]+)\s*)+([!?.])/g
var m, numSentences = 0, numWords = 0;
do {
    m = regex.exec(s);
    if (m) {
        numSentences++;
        numWords += m[0].split(' ').length
    }
} while (m);
console.log(numSentences + ' sentences, ' + numWords + ' words')

正则表达式匹配字符串的句子和单词

Regex to match a sentence and word of a string

javascript

regex

regex-group