如何使用 Python 映射和缩减我的词典列表
How can I map and reduce my list of dictionaries with Python
我有这个字典列表:
[{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
我想映射和缩减(或分组)得到这样的结果:
[
{
'topic_id': 1,
'count': 2,
'variance': 3.0,
'global_average': 6.5
},
{
'topic_id': 2,
'count': 1,
'variance': 5.0,
'global_average': 5.0
}
]
计算方差(最大平均值 - 最小平均值)并对项目计数求和的东西。
我已经做过的事情:
在我尝试对改变字典结构的计数求和,并使键成为 topic_id 并对计数赋值之前,我的结果是:
result = sorted(dict(functools.reduce(operator.add, map(collections.Counter, data))).items(), reverse=True)
这只是第一次尝试。
这里尝试使用 itertools.groupby
根据 topic_id
:
对数据进行分组
import itertools
data = [{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
# groupby
grouper = itertools.groupby(data, key=lambda x: x['topic_id'])
# holder for output
output = []
# iterate over grouper to calculate things
for key, group in grouper:
# variables for calculations
count = 0
maxi = -1
mini = float('inf')
total = 0
# one pass over each dictionary
for g in group:
avg = g['average']
maxi = avg if avg > maxi else maxi
mini = avg if avg < mini else mini
total += avg
count += 1
# write to output
output.append({'total_id':key,
'count':count,
'variance':maxi-mini,
'global_average':total/count})
给这个 output
:
[{'total_id': 1, 'count': 2, 'variance': 3.0, 'global_average': 6.5},
{'total_id': 2, 'count': 1, 'variance': 0.0, 'global_average': 5.0}]
注意第二组的'variance'
在这里是0.0
而不是5.0
;这与您预期的输出不同,但我想这就是您想要的?
您可以通过一些理解、map
和内置 statistics
模块中的 mean
函数来实现此目的。
from statistics import mean
data = [
{
'topic_id': 1,
'average': 5.0,
'count': 1
}, {
'topic_id': 1,
'average': 8.0,
'count': 1
}, {
'topic_id': 2,
'average': 5.0,
'count': 1
}
]
# a set of unique topic_id's
keys = set(i['topic_id'] for i in data)
# a list of list of averages for each topic_id
averages = [[i['average'] for i in data if i['topic_id'] == j] for j in keys]
# a map of tuples of (counts, variances, averages) for each topic_id
stats = map(lambda x: (len(x), max(x) - min(x), mean(x)), averages)
# finally reconstruct it back into a list
result = [
{
'topic_id': key,
'count': count,
'variance': variance,
'global_average': average
} for key, (count, variance, average) in zip(keys, stats)
]
print(result)
Returns
[{'topic_id': 1, 'count': 2, 'variance': 3.0, 'global_average': 6.5}, {'topic_id': 2, 'count': 1, 'variance': 0.0, 'global_average': 5.0}]
如果您愿意使用 pandas,这似乎是一个合适的用例:
import pandas as pd
data = [{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
# move to dataframe
df = pd.DataFrame(data)
# groupby and get all desired metrics
grouped = df.groupby('topic_id')['average'].describe()
grouped['variance'] = grouped['max'] - grouped['min']
# rename columns and remove unneeded ones
grouped = grouped.reset_index().loc[:, ['topic_id', 'count', 'mean', 'variance']].rename({'mean':'global_average'}, axis=1)
# back to list of dicts
output = grouped.to_dict('records')
output
是:
[{'topic_id': 1, 'count': 2.0, 'global_average': 6.5, 'variance': 3.0},
{'topic_id': 2, 'count': 1.0, 'global_average': 5.0, 'variance': 0.0}]
您也可以像这样尝试使用 pandas 数据框的聚合功能
import pandas as pd
f = pd.DataFrame(d).set_index('topic_id')
def var(x):
return x.max() - x.min()
out = f.groupby(level=0).agg(count=('count', 'sum'),
global_average=('average', 'mean'),
variance=('average', var))
我有这个字典列表:
[{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
我想映射和缩减(或分组)得到这样的结果:
[
{
'topic_id': 1,
'count': 2,
'variance': 3.0,
'global_average': 6.5
},
{
'topic_id': 2,
'count': 1,
'variance': 5.0,
'global_average': 5.0
}
]
计算方差(最大平均值 - 最小平均值)并对项目计数求和的东西。
我已经做过的事情:
在我尝试对改变字典结构的计数求和,并使键成为 topic_id 并对计数赋值之前,我的结果是:
result = sorted(dict(functools.reduce(operator.add, map(collections.Counter, data))).items(), reverse=True)
这只是第一次尝试。
这里尝试使用 itertools.groupby
根据 topic_id
:
import itertools
data = [{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
# groupby
grouper = itertools.groupby(data, key=lambda x: x['topic_id'])
# holder for output
output = []
# iterate over grouper to calculate things
for key, group in grouper:
# variables for calculations
count = 0
maxi = -1
mini = float('inf')
total = 0
# one pass over each dictionary
for g in group:
avg = g['average']
maxi = avg if avg > maxi else maxi
mini = avg if avg < mini else mini
total += avg
count += 1
# write to output
output.append({'total_id':key,
'count':count,
'variance':maxi-mini,
'global_average':total/count})
给这个 output
:
[{'total_id': 1, 'count': 2, 'variance': 3.0, 'global_average': 6.5},
{'total_id': 2, 'count': 1, 'variance': 0.0, 'global_average': 5.0}]
注意第二组的'variance'
在这里是0.0
而不是5.0
;这与您预期的输出不同,但我想这就是您想要的?
您可以通过一些理解、map
和内置 statistics
模块中的 mean
函数来实现此目的。
from statistics import mean
data = [
{
'topic_id': 1,
'average': 5.0,
'count': 1
}, {
'topic_id': 1,
'average': 8.0,
'count': 1
}, {
'topic_id': 2,
'average': 5.0,
'count': 1
}
]
# a set of unique topic_id's
keys = set(i['topic_id'] for i in data)
# a list of list of averages for each topic_id
averages = [[i['average'] for i in data if i['topic_id'] == j] for j in keys]
# a map of tuples of (counts, variances, averages) for each topic_id
stats = map(lambda x: (len(x), max(x) - min(x), mean(x)), averages)
# finally reconstruct it back into a list
result = [
{
'topic_id': key,
'count': count,
'variance': variance,
'global_average': average
} for key, (count, variance, average) in zip(keys, stats)
]
print(result)
Returns
[{'topic_id': 1, 'count': 2, 'variance': 3.0, 'global_average': 6.5}, {'topic_id': 2, 'count': 1, 'variance': 0.0, 'global_average': 5.0}]
如果您愿意使用 pandas,这似乎是一个合适的用例:
import pandas as pd
data = [{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
# move to dataframe
df = pd.DataFrame(data)
# groupby and get all desired metrics
grouped = df.groupby('topic_id')['average'].describe()
grouped['variance'] = grouped['max'] - grouped['min']
# rename columns and remove unneeded ones
grouped = grouped.reset_index().loc[:, ['topic_id', 'count', 'mean', 'variance']].rename({'mean':'global_average'}, axis=1)
# back to list of dicts
output = grouped.to_dict('records')
output
是:
[{'topic_id': 1, 'count': 2.0, 'global_average': 6.5, 'variance': 3.0},
{'topic_id': 2, 'count': 1.0, 'global_average': 5.0, 'variance': 0.0}]
您也可以像这样尝试使用 pandas 数据框的聚合功能
import pandas as pd
f = pd.DataFrame(d).set_index('topic_id')
def var(x):
return x.max() - x.min()
out = f.groupby(level=0).agg(count=('count', 'sum'),
global_average=('average', 'mean'),
variance=('average', var))