在分组的 csv 文件上使用 np.cumsum

Question

我想在 csv 文件上使用 np.cumsum() 来获取基于 57 个不同 ID 的数据列，这些数据由一个单独的列表示。我的文件如下所示：

station_id     year           Value
210018         1910            1
210018         1911            6
210018         1912            3
210019         1910            2
210019         1911            4
210019         1912            7

我希望我的输出看起来像这样：

station_id     year           Value
210018         1910            1
210018         1911            7
210018         1912            10
210019         1910            2
210019         1911            6
210019         1912            13

我目前正在使用此代码，我的初始文件名为 df:

df.groupby(['station_id']).apply(lambda x: np.cumsum(['Value']))

哪个returns:

TypeError: cannot perform accumulate with flexible type

如有任何帮助，我们将不胜感激。

Answer 1

np.cumsum(['Value'])，单独加注

TypeError: cannot perform accumulate with flexible type

（np.cumsum 需要一个数值数组作为它的第一个参数，而不是字符串列表。）而是使用：

values = df.groupby(['station_id'])['Value'].cumsum()

或者，您可以直接修改df['Value']：

In [75]: df['Value'] = df.groupby(['station_id'])['Value'].cumsum()

In [76]: df
Out[76]: 
   station_id  year  Value
0      210018  1910      1
1      210018  1911      7
2      210018  1912     10
3      210019  1910      2
4      210019  1911      6
5      210019  1912     13

在分组的 csv 文件上使用 np.cumsum

Using np.cumsum on a grouped csv file

python

statistics

numpy

pandas