如何在 python 中执行此聚合

How to carry out this aggregation in python

我有以下字典列表,其中包含 country 和相应服务器的值。

[
    {'country': 'KR', 'values': ['Server1']},
    {'country': 'IE', 'values': ['Server1', 'Server3', 'Server2']},
    {'country': 'IE', 'values': ['Server1', 'Server3']},
    {'country': 'DE', 'values': ['Server1']},
    {'country': 'DE', 'values': ['Server2']},
]

现在我需要计算每个服务器在特定国家/地区的百分比。因此,例如,对于 IE ,两个列表的总和为 5 。因此,Server1 的百分比将计算为 (2/5)*100,因为 IE 的五个中有两个 Server1,其余的类似,然后加上 [=18] 的百分比=] 在字典中以 percent 作为键。因此,对于上述结构,输出基本上变成了。

[
    {"country": "KR", "percent": "100.0000", "values": ["Server1-100.0000"]},
    {"country": "IE", "percent": "40.000", "values": ["Server1-40.0", "Server3-40.0", "Server2-20.0"]},
    {"country" : "DE", "percent" : "50.0", "values" : ["Server1-50.0", "Server2-50.0"]},
]

我尝试了以下代码。

for i in range(len(response) - 1):
   for j in range((i+1), len(response) - 1):
     if response[i]['country'] == response[j]['country']:
       print response[i]['country'], response[j]['country']
       total = len(response[i]['values']) +  len(response[j]['values'])
       print total
       for item in response[i]['values']:
         for ktem in response[j]['values']:
           if item == ktem:
              if item == 'Server1':
                response[i]['percent'] =  200/total
              else:
                response[i][percent] = 0
              del response[j]

我坚持要进一步使百分比部分正确。有什么帮助吗?

假设你有

orig = [
    {'country': 'KR', 'values': ['Server1']},
    {'country': 'IE', 'values': ['Server1', 'Server3', 'Server2']},
    {'country': 'IE', 'values': ['Server1', 'Server3']},
    {'country': 'DE', 'values': ['Server1']},
    {'country': 'DE', 'values': ['Server2']},
]

您可以创建一个新词典,其中包含哪些服务器位于哪些国家及其数量的列表

newDict = {}
for c in orig:
    if c['country'] not in newDict:
        newDict[c['country']] = dict()
    for s in c['values']:
        if s in newDict[c['country']]:
            newDict[c['country']][s] = newDict[c['country']][s] + 1
        else:
            newDict[c['country']][s] = 1

将采用以下形式:

{'KR': {'Server1': 1}, 
 'DE': {'Server1': 1, 'Server2': 1}, 
 'IE': {'Server1': 2, 'Server2': 1, 'Server3': 2}}

然后您可以这样计算百分比:

output = []
for country in newList:
    total = 0
    for server in newList[country]:
        total = total + newList[country][server]    
    output.append({"country": country, "percent": (100.0 * newList[country]['Server1'])/total})

这将产生

[{'country': 'KR', 'percent': 100.0}, 
 {'country': 'DE', 'percent': 50.0}, 
 {'country': 'IE', 'percent': 40.0}]

我将把它留作 reader 的练习,以优化和添加您想要的其他字段

我有一个更简洁的方法。

我认为它更具可读性和易于理解。您可以参考如下:

这是你的 var I delcare response:

response = [
    {'country': 'KR', 'values': ['Server1']},
    {'country': 'IE', 'values': ['Server1', 'Server3', 'Server2']},
    {'country': 'IE', 'values': ['Server1', 'Server3']},
    {'country': 'DE', 'values': ['Server1']},
    {'country': 'DE', 'values': ['Server2']},
]

让我们合并值。

new_res = {}
for e in response:
    if e['country'] not in new_res:
        new_res[e['country']] = e['values']
    else:
        new_res[e['country']].extend(e['values'])

想知道内容可以打印new_res。就像下面这样:

{
    'KR': ['Server1'],
    'DE': ['Server1', 'Server2'],
    'IE': ['Server1', 'Server3', 'Server2', 'Server1', 'Server3']
}

调用collections模块收集元素:

from collections import Counter
new_list = []
for country, values in new_res.items():
    # elements are stored as dictionary keys and their counts are stored as dictionary values
    merge_values = Counter(values)

    # calculate percentage
    new_values = []
    total = sum(merge_values.values())    
    for server_name, num in merge_values.items():
        #ex: Server1-40.0
        new_values.append("{0}-{1:.1f}".format(server_name, num*100/total))

    percent = merge_values["Server1"]*1.0*100/total

    new_list.append({"country": country,
                     "percent": percent,
                     "values": new_values})

计算完成后可以打印new_list

[{'country': 'KR', 'percent': 100.0, 'values': ['Server1-100.0']},
 {'country': 'DE', 'percent': 50.0,  'values': ['Server1-50.0', 'Server2-50.0']},
 {'country': 'IE', 'percent': 40.0,  'values': ['Server1-40.0', 'Server2-20.0', 'Server3-40.0']}]

所以你可以得到你想要的答案。