来自多维数组的 Numpy Sum?

Numpy Sum from a multidimensional array?

如果我有这样的数据:

data = [[3, 2014], [4, 2014], [6, 2013], [6,2013]] etc...

在 python 中按年计算总和的最佳方法是什么?

如果您同时需要年份和总和,我会使用字典:

from collections import defaultdict

data = [[3, 2014], [4, 2014], [6, 2013], [6,2013]]
d = defaultdict(int)

for v, k in data:
    d[k] += v
print(d)

打印:

defaultdict(<type 'int'>, {2013: 12, 2014: 7})

有一个特定的 python 标准库 class,Counter:

from collections import Counter
from operator import add

counters = [Counter({row[1]:row[0]}) for row in data]
result = reduce(add, counters)

你的结果是一个具有字典行为的对象:

{2013: 12, 2014: 7}

不确定我是否理解问题。这可能是一个没有添加模块的简单答案。

dic = {}

for dat, year in data:
    if year not in dic:
        dic[year] = dat
    else:
        dic[year] += dat

或者如果您愿意

dic = {}
for dat, year in data:
    dic[year] = dat if year not in dic else dic[year] + dat

您可以使用 counter()+=

import collections
data = [[3, 2014], [4, 2014], [6, 2013], [6,2013]]

c = collections.Counter()

for i, j in data:
    c += collections.Counter({j: i})

print(c)

A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values.

您可以添加计数器,例如:

a = collections.Counter(a=1, b=2)
b = collections.Counter(a=3, c=3)    
print(a+b)

打印 Counter({'a': 4, 'c': 3, 'b': 2}).

据 DSM 报道,使用 pandas 和 grouby 似乎很容易:

import pandas as pd
data = [[3, 2014], [4, 2014], [6, 2013], [6,2013]]
df = pd.DataFrame(data, columns=['value', 'year'])
df.groupby(['year']).sum()

哪个returns:

      value
year       
2013     12
2014      7

这很好,因为您可以轻松获得更多信息,例如均值、中位数、标准差等。

df.groupby(['year']).mean()
df.groupby(['year']).median() 
df.groupby(['year']).std()