在许多可能的 POS 标签时查找单词形式的总数

Finding total count for word form when many possible POS tags

我觉得我有一个愚蠢的问题,但无论如何.. 我正在尝试从看起来像这样的数据出发:

a word form     lemma    POS                count of occurrance
same word form  lemma    Not the same POS   another count
same word form  lemma    Yet another POS    another count

结果如下所示:

the word form    total count    all possible POS and their individual counts 

例如我可以:

ring     total count = 100        noun = 40, verb = 60

我的数据在 CSV 文件中。我想做这样的事情:

for row in all_rows:
    if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count

但我似乎不知道该怎么做。帮助?

如果我没理解错的话,实现你所需要的最简单的方法是:

# Mocked CSV data
data = [
 ['a', 'lemma', 'pos', 1],
 ['a', 'lemma', 'pos1', 2],
 ['a', 'lemma', 'pos2', 3],
 ['b', 'lemma', 'pos', 5],
]

result = {}

for row in data:
  key = row[0]
  count = row[3]
  if key in result:
    result[key] += count
  else:
    result[key] = count

print(result)

结果:

{
  'a': 6,
  'b': 5
}