根据另一个唯一值在字典列表中查找平均值

Finding average value in list of dictionaries based on another unique value

我有一个包含“索引”和“权重”值的词典列表。我想根据任何唯一索引对字典进行平均。那么,在下面的示例中,我如何找到任何给定索引(例如 0、1、250 等)的平均权重?每个索引总共有8个元素。

values = [
{'index': 0, 'weight': 0.5},
{'index': 1, 'weight': 0.5},
{'index': 0, 'weight': 0.5},
{'index': 1, 'weight': 0.5},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0}
]

我知道我可以使用以下代码获取整个列表的平均权重,但我不确定如何针对每个唯一索引执行此操作:

print(sum(v['weight'] for v in values ) / len(values))

我建议使用 pandas 来完成这项任务。只需将字典对象列表传递给 DataFrame() 构造函数,然后执行 groupby()mean() 计算即可创建数据框:

avgs = pd.DataFrame(values).groupby('index').mean()

产量:

       weight
index
0         0.5
1         0.5

仅使用 Python

def compute_avg(l, index):
    count = 0
    value = 0
    for data in l:
        if data['index'] == index:
            count += 1
            value += data['weight']

    return value/count

您可以像这样获取给定索引的所有值:

with_index = [v for v in values if v['index'] == given_index]

然后调用这个显示平均重量

print(sum(v['weight'] for v in with_index ) / len(values))

您需要按索引对权重进行分组。 built-in 集合模块中的 defaultdict 在这里很有用。

from collections import defaultdict
total = defaultdict(int)
cnts = defaultdict(int)
for d in values:
    # add weights
    total[d['index']] += d['weight']
    # count indexes
    cnts[d['index']] += 1
# find the mean
[{'index': k, 'mean weight': total[k]/cnts[k]} for k in total]
# [{'index': 0, 'mean weight': 0.5}, {'index': 1, 'mean weight': 0.5}]

循环遍历值并实时跟踪:

x = {}
for v in values:
    try:
        x[v['index']]['weight'] += v['weight']
    except KeyError:
        x[v['index']] = {'weight' : v['weight']}
    try:
        x[v['index']]['count'] += 1
    except KeyError:
        x[v['index']].update({'count':1})

    #or wait until after the loop to calculate
    #allows for continuation in a streaming situation. 

    avg = x[v['index']]['weight'] / x[v['index']]['count']
    x[v['index']].update({'avg': avg})
    
print(x)
indexes = set([v['index'] for v in values])
for i in indexes:
  print(sum(v['weight'] for v in values if v['index'] == i ) / sum([v['index'] == i for v in values]))

这是您的代码的变体。它使用类型转换来计算每个索引的字典数量。

使用纯 Python 统计库中的均值

我们可以用statistics.mean来解决问题:

from statistics import mean

average_weight = {
    index: mean(v['weight'] for v in values if v['index'] == index) 
    for index in set(v['index'] for v in values)
}

关于测试值

values = [
    {'index': 0, 'weight': 0.5},
    {'index': 1, 'weight': 0.5},
    {'index': 0, 'weight': 0.5},
    {'index': 1, 'weight': 0.5},
    {'index': 0, 'weight': 0.0},
    {'index': 1, 'weight': 1.0},
    {'index': 0, 'weight': 0.0},
    {'index': 1, 'weight': 1.0},
    {'index': 0, 'weight': 0.0},
    {'index': 1, 'weight': 1.0},
    {'index': 0, 'weight': 1.0},
    {'index': 1, 'weight': 0.0},
    {'index': 0, 'weight': 1.0},
    {'index': 1, 'weight': 0.0},
    {'index': 0, 'weight': 1.0},
    {'index': 1, 'weight': 0.0}
]

average_weight

{0: 0.5, 1: 0.5}

使用已知信息

如果您知道每个索引总共有 8 个元素,为什么不使用它呢?

COUNT = 8

average_weight = {
    index: sum(v['weight'] for v in values if v['index'] == index) / COUNT
    for index in set(v['index'] for v in values)
}