'sample' 和 'samples' 关键字在 python nltk ConditionalFreqDist 中的区别

Question

我正在查找不同类型的 Brown 语料库中某些词的频率分布。

我的代码：

import nltk
from nltk.corpus import brown

cfd = nltk.ConditionalFreqDist(
      (genre, word)
      for genre in brown.categories()
      for word in brown.words(categories = genre))

genres = ['news', 'religion', 'hobbies', 'science_fiction', 'romance', 'humor']
modals = ['can', 'could', 'may', 'might', 'must', 'will']

cfd.tabulate(conditions = genres, samples = modals)

以上代码的输出：

                 can could  may might must will 
           news   93   86   66   38   50    389 
       religion   82   59   78   12   54     71 
        hobbies  268   58  131   22   83    264 
science_fiction   16   49    4   12    8     16 
        romance   74  193   11   51   45     43 
          humor   16   30    8    8    9     13

但是当我在上面代码的最后一行用 'sample' 替换 'samples' 时。它为语料库中的每个单词提供 FreqDist。

我不知道 'sample' 和 'samples' 之间的区别？

谢谢。

Answer 1

cfd.tabulate() 简单地忽略在其实现中未引用的任何关键字参数。这就是为什么 sample=models 仍然会为 FreqDist 生成一个完整的 table。如果完全不加，效果应该是一样的。

此行为不是特定于 NLTK 的，但适用于接受任意参数列表的任何 Python function/method。我建议阅读有关此的 the Python Tutorial 部分，我发现它非常清楚。

'sample' 和 'samples' 关键字在 python nltk ConditionalFreqDist 中的区别

Difference between 'sample' and 'samples' keyword in python nltk ConditionalFreqDist

python

nltk

tabular