在许多可能的 POS 标签时查找单词形式的总数
Finding total count for word form when many possible POS tags
我觉得我有一个愚蠢的问题,但无论如何..
我正在尝试从看起来像这样的数据出发:
a word form lemma POS count of occurrance
same word form lemma Not the same POS another count
same word form lemma Yet another POS another count
结果如下所示:
the word form total count all possible POS and their individual counts
例如我可以:
ring total count = 100 noun = 40, verb = 60
我的数据在 CSV 文件中。我想做这样的事情:
for row in all_rows:
if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count
但我似乎不知道该怎么做。帮助?
如果我没理解错的话,实现你所需要的最简单的方法是:
# Mocked CSV data
data = [
['a', 'lemma', 'pos', 1],
['a', 'lemma', 'pos1', 2],
['a', 'lemma', 'pos2', 3],
['b', 'lemma', 'pos', 5],
]
result = {}
for row in data:
key = row[0]
count = row[3]
if key in result:
result[key] += count
else:
result[key] = count
print(result)
结果:
{
'a': 6,
'b': 5
}
我觉得我有一个愚蠢的问题,但无论如何.. 我正在尝试从看起来像这样的数据出发:
a word form lemma POS count of occurrance
same word form lemma Not the same POS another count
same word form lemma Yet another POS another count
结果如下所示:
the word form total count all possible POS and their individual counts
例如我可以:
ring total count = 100 noun = 40, verb = 60
我的数据在 CSV 文件中。我想做这样的事情:
for row in all_rows:
if row[0] is the same as row[0] in the next row, add the values from row[3] together to get the total count
但我似乎不知道该怎么做。帮助?
如果我没理解错的话,实现你所需要的最简单的方法是:
# Mocked CSV data
data = [
['a', 'lemma', 'pos', 1],
['a', 'lemma', 'pos1', 2],
['a', 'lemma', 'pos2', 3],
['b', 'lemma', 'pos', 5],
]
result = {}
for row in data:
key = row[0]
count = row[3]
if key in result:
result[key] += count
else:
result[key] = count
print(result)
结果:
{
'a': 6,
'b': 5
}