python：如果列表中的值共享第一个单词，则对列表中的值求和

Question

我有如下列表，

flat_list = ['hello,5', 'mellow,4', 'mellow,2', 'yellow,2', 'yellow,7', 'hello,7', 'mellow,7', 'hello,7']

如果它们共享同一个词，我想得到值的总和，所以输出应该是，

期望的输出：

l = [('hello',19), ('yellow', 9), ('mellow',13)]

到目前为止，我已经尝试了以下方法，

new_list = [v.split(',') for v in flat_list]

d = {}
for key, value in new_list:
   if key not in d.keys():
      d[key] = [key]
   d[key].append(value)

# getting rid of the first key in value lists
val = [val.pop(0) for k,val in d.items()]
# summing up the values
va = [sum([int(x) for x in va]) for ka,va in d.items()]

但是由于某种原因，最后的总结不起作用，我没有得到我想要的输出

Answer 1

这是使用 defaultdict 实现您的目标的变体：

from collections import defaultdict

t = ['hello,5', 'mellow,4', 'mellow,2', 'yellow,2',
     'yellow,7', 'hello,7', 'mellow,7', 'hello,7']

count = defaultdict(int)

for name_number in t:
    name, number = name_number.split(",")
    count[name] += int(number)

您也可以使用 Counter:

from collections import Counter

count = Counter()

for name_number in t:
    name, number = name_number.split(",")
    count[name] += int(number)

在这两种情况下，您都可以使用以下方法将输出转换为 tuple 的 list：

list(count.items())
# -> [('hello', 19), ('mellow', 13), ('yellow', 9)]

我运行你的代码，我确实得到了正确的结果（虽然不是你想要的格式）。

Answer 2

一种可能的方法是：

import pandas as pd
    
flat_list = ['hello,5', 'mellow,4', 'mellow,2', 'yellow,2', 'yellow,7', 'hello,7', 'mellow,7', 'hello,7']
new_list = [v.split(',') for v in flat_list]
    
df = pd.DataFrame(new_list)
df[1] = df[1].astype(int)
df2 = df.groupby(0).sum()
print(df2)

输出：

    0        1
    hello   19
    mellow  13
    yellow   9

Answer 3

您可以非常简单地完成此操作，而无需像这样导入其他模块：

t = ['hello,5', 'mellow,4', 'mellow,2', 'yellow,2', 'yellow,7', 'hello,7', 'mellow,7', 'hello,7']

d = {}
for s in t: #for each string
    w, n = s.split(',') #get the string and the number
    d[w] = d[w] + int(n) if w in d.keys() else int(n) #add the number (sum)

l = list(d.items()) #make the result a list of tuples
print(l)

输出：

[('hello', 19), ('mellow', 13), ('yellow', 9)]

Answer 4

for some reason the last sum up does not work

要修复您的原始解决方案：

d = {ka:sum([int(x) for x in va]) for ka,va in d.items()}

Answer 5

the last sum up does not work and i do not get my desired output

实际上它工作正常，你只是忘了合并两个列表。添加

print(list(zip(val, va)))

你会看到：

[('hello', 19), ('mellow', 13), ('yellow', 9)]

这相当于您想要的输出：

[('hello',19), ('yellow', 9), ('mellow',13)]

只有 yellow 和 mellow 的条目顺序不同，因为 mellow 在输入中排在第一位。

Answer 6

总结上面的回复我想说最干净的方法，不需要外部导入，似乎是：

flat_list = ['hello,5', 'mellow,4', 'mellow,2', 'yellow,2', 
             'yellow,7', 'hello,7', 'mellow,7', 'hello,7']

d = {}
for ele in flat_list:
    key, value = ele.split(',')
    d[key]= d.get(key, 0) + int(value)
    
list(d.items())

输出为：

[('hello', 19), ('mellow', 13), ('yellow', 9)]

可以像这样按增加值排序（或使用 x[0] 按字母顺序排列；将 reverse 设置为 True 以降序排列）：

sorted(list(d.items()), key=lambda x: x[1], reverse=False)

python：如果列表中的值共享第一个单词，则对列表中的值求和

python: sum values in a list if they share the first word

python

dictionary

sum

list