根据另一个唯一值在字典列表中查找平均值
Finding average value in list of dictionaries based on another unique value
我有一个包含“索引”和“权重”值的词典列表。我想根据任何唯一索引对字典进行平均。那么,在下面的示例中,我如何找到任何给定索引(例如 0、1、250 等)的平均权重?每个索引总共有8个元素。
values = [
{'index': 0, 'weight': 0.5},
{'index': 1, 'weight': 0.5},
{'index': 0, 'weight': 0.5},
{'index': 1, 'weight': 0.5},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0}
]
我知道我可以使用以下代码获取整个列表的平均权重,但我不确定如何针对每个唯一索引执行此操作:
print(sum(v['weight'] for v in values ) / len(values))
我建议使用 pandas
来完成这项任务。只需将字典对象列表传递给 DataFrame()
构造函数,然后执行 groupby()
和 mean()
计算即可创建数据框:
avgs = pd.DataFrame(values).groupby('index').mean()
产量:
weight
index
0 0.5
1 0.5
仅使用 Python
def compute_avg(l, index):
count = 0
value = 0
for data in l:
if data['index'] == index:
count += 1
value += data['weight']
return value/count
您可以像这样获取给定索引的所有值:
with_index = [v for v in values if v['index'] == given_index]
然后调用这个显示平均重量
print(sum(v['weight'] for v in with_index ) / len(values))
您需要按索引对权重进行分组。 built-in 集合模块中的 defaultdict 在这里很有用。
from collections import defaultdict
total = defaultdict(int)
cnts = defaultdict(int)
for d in values:
# add weights
total[d['index']] += d['weight']
# count indexes
cnts[d['index']] += 1
# find the mean
[{'index': k, 'mean weight': total[k]/cnts[k]} for k in total]
# [{'index': 0, 'mean weight': 0.5}, {'index': 1, 'mean weight': 0.5}]
循环遍历值并实时跟踪:
x = {}
for v in values:
try:
x[v['index']]['weight'] += v['weight']
except KeyError:
x[v['index']] = {'weight' : v['weight']}
try:
x[v['index']]['count'] += 1
except KeyError:
x[v['index']].update({'count':1})
#or wait until after the loop to calculate
#allows for continuation in a streaming situation.
avg = x[v['index']]['weight'] / x[v['index']]['count']
x[v['index']].update({'avg': avg})
print(x)
indexes = set([v['index'] for v in values])
for i in indexes:
print(sum(v['weight'] for v in values if v['index'] == i ) / sum([v['index'] == i for v in values]))
这是您的代码的变体。它使用类型转换来计算每个索引的字典数量。
使用纯 Python 统计库中的均值
我们可以用statistics.mean来解决问题:
from statistics import mean
average_weight = {
index: mean(v['weight'] for v in values if v['index'] == index)
for index in set(v['index'] for v in values)
}
关于测试值
values = [
{'index': 0, 'weight': 0.5},
{'index': 1, 'weight': 0.5},
{'index': 0, 'weight': 0.5},
{'index': 1, 'weight': 0.5},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0}
]
average_weight
是
{0: 0.5, 1: 0.5}
使用已知信息
如果您知道每个索引总共有 8 个元素,为什么不使用它呢?
COUNT = 8
average_weight = {
index: sum(v['weight'] for v in values if v['index'] == index) / COUNT
for index in set(v['index'] for v in values)
}
我有一个包含“索引”和“权重”值的词典列表。我想根据任何唯一索引对字典进行平均。那么,在下面的示例中,我如何找到任何给定索引(例如 0、1、250 等)的平均权重?每个索引总共有8个元素。
values = [
{'index': 0, 'weight': 0.5},
{'index': 1, 'weight': 0.5},
{'index': 0, 'weight': 0.5},
{'index': 1, 'weight': 0.5},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0}
]
我知道我可以使用以下代码获取整个列表的平均权重,但我不确定如何针对每个唯一索引执行此操作:
print(sum(v['weight'] for v in values ) / len(values))
我建议使用 pandas
来完成这项任务。只需将字典对象列表传递给 DataFrame()
构造函数,然后执行 groupby()
和 mean()
计算即可创建数据框:
avgs = pd.DataFrame(values).groupby('index').mean()
产量:
weight
index
0 0.5
1 0.5
仅使用 Python
def compute_avg(l, index):
count = 0
value = 0
for data in l:
if data['index'] == index:
count += 1
value += data['weight']
return value/count
您可以像这样获取给定索引的所有值:
with_index = [v for v in values if v['index'] == given_index]
然后调用这个显示平均重量
print(sum(v['weight'] for v in with_index ) / len(values))
您需要按索引对权重进行分组。 built-in 集合模块中的 defaultdict 在这里很有用。
from collections import defaultdict
total = defaultdict(int)
cnts = defaultdict(int)
for d in values:
# add weights
total[d['index']] += d['weight']
# count indexes
cnts[d['index']] += 1
# find the mean
[{'index': k, 'mean weight': total[k]/cnts[k]} for k in total]
# [{'index': 0, 'mean weight': 0.5}, {'index': 1, 'mean weight': 0.5}]
循环遍历值并实时跟踪:
x = {}
for v in values:
try:
x[v['index']]['weight'] += v['weight']
except KeyError:
x[v['index']] = {'weight' : v['weight']}
try:
x[v['index']]['count'] += 1
except KeyError:
x[v['index']].update({'count':1})
#or wait until after the loop to calculate
#allows for continuation in a streaming situation.
avg = x[v['index']]['weight'] / x[v['index']]['count']
x[v['index']].update({'avg': avg})
print(x)
indexes = set([v['index'] for v in values])
for i in indexes:
print(sum(v['weight'] for v in values if v['index'] == i ) / sum([v['index'] == i for v in values]))
这是您的代码的变体。它使用类型转换来计算每个索引的字典数量。
使用纯 Python 统计库中的均值
我们可以用statistics.mean来解决问题:
from statistics import mean
average_weight = {
index: mean(v['weight'] for v in values if v['index'] == index)
for index in set(v['index'] for v in values)
}
关于测试值
values = [
{'index': 0, 'weight': 0.5},
{'index': 1, 'weight': 0.5},
{'index': 0, 'weight': 0.5},
{'index': 1, 'weight': 0.5},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 0.0},
{'index': 1, 'weight': 1.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0},
{'index': 0, 'weight': 1.0},
{'index': 1, 'weight': 0.0}
]
average_weight
是
{0: 0.5, 1: 0.5}
使用已知信息
如果您知道每个索引总共有 8 个元素,为什么不使用它呢?
COUNT = 8
average_weight = {
index: sum(v['weight'] for v in values if v['index'] == index) / COUNT
for index in set(v['index'] for v in values)
}