计算句子的数量 Ruby

Count Number of Sentence Ruby

我碰巧到处搜索,但没有找到使用 Ruby 计算字符串中句子数量的解决方案。有人怎么做吗?

例子

string = "The best things in an artist’s work are so much a matter of intuition, that there is much to be said for the point of view that would altogether discourage intellectual inquiry into artistic phenomena on the part of the artist. Intuitions are shy things and apt to disappear if looked into too closely. And there is undoubtedly a danger that too much knowledge and training may supplant the natural intuitive feeling of a student, leaving only a cold knowledge of the means of expression in its place. For the artist, if he has the right stuff in him ... "

这个字符串应该return数4.

您可以将文本拆分成句子并计算它们。这里:

string.scan(/[^\.!?]+[\.!?]/).map(&:strip).count # scan has regex to split string and strip will remove trailing spaces.
# => 4 

解释正则表达式:

[^\.!?]

字符内的插入符号 class [^ ] 是否定运算符。这意味着我们正在寻找列表中不存在的字符:.!?.

+

是贪心运算符,returns匹配1次到无限次。 (在这里捕捉我们的句子并忽略像 ... 这样的重复)

[\.!?]  

匹配字符 .!?.

简而言之,我们正在捕获所有不是 .!? 的字符,直到我们得到 .、[=16= 的字符] 或 ?。基本上可以看成一个句子(广义).

我认为将单词 char 后跟 ?!. 作为句子的分隔符是有意义的:

string.strip.split(/\w[?!.]/).length
#=> 4

所以我不认为 ... 是一个分隔符,因为它是这样挂在它自己身上的:

  • "I waited a while ... and then I went home"

不过话又说回来,也许我应该...

我还想到,也许更好的分隔符是标点符号后跟一些 space 和一个大写字母:

string.split(/[?!.]\s+[A-Z]/).length
#=> 4

句子以句号、问号和感叹号结尾。他们也可以 用破折号和其他标点符号分隔,但我们不会在这里担心这些罕见的情况。 拆分很简单。您无需要求 Ruby 将文本拆分为一种字符,您只需 要求它根据三种类型的字符中的任何一种进行拆分,如下所示:

txt = "The best things in an artist’s work are so much a matter of intuition, that there is much to be said for the point of view that would altogether discourage intellectual inquiry into artistic phenomena on the part of the artist. Intuitions are shy things and apt to disappear if looked into too closely. And there is undoubtedly a danger that too much knowledge and training may supplant the natural intuitive feeling of a student, leaving only a cold knowledge of the means of expression in its place. For the artist, if he has the right stuff in him ... "

sentence_count = txt.split(/\.|\?|!/).length
puts sentence_count
#=> 7
string.squeeze('.!?').count('.!?')
  #=> 4