在许多可能的 POS 标签时查找单词形式的总数

Question

我觉得我有一个愚蠢的问题，但无论如何.. 我正在尝试从看起来像这样的数据出发：

a word form     lemma    POS                count of occurrance
same word form  lemma    Not the same POS   another count
same word form  lemma    Yet another POS    another count

结果如下所示：

the word form    total count    all possible POS and their individual counts

例如我可以：

ring     total count = 100        noun = 40, verb = 60

我的数据在 CSV 文件中。我想做这样的事情：

for row in all_rows:
    if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count

但我似乎不知道该怎么做。帮助？

Answer 1

如果我没理解错的话，实现你所需要的最简单的方法是：

# Mocked CSV data
data = [
 ['a', 'lemma', 'pos', 1],
 ['a', 'lemma', 'pos1', 2],
 ['a', 'lemma', 'pos2', 3],
 ['b', 'lemma', 'pos', 5],
]

result = {}

for row in data:
  key = row[0]
  count = row[3]
  if key in result:
    result[key] += count
  else:
    result[key] = count

print(result)

结果：

{
  'a': 6,
  'b': 5
}

在许多可能的 POS 标签时查找单词形式的总数

Finding total count for word form when many possible POS tags

python

nlp

linguistics

python-3.x