Pandas:查找每列的每个时间戳的非 NaN 记录的累计和

Pandas: find the cumulated sum of non-NaN records at each timestamp for each column

我有以下数据框:

              timestamp   col_A     col_B    col_C
0   2016-02-15 00:00:00     2.0     NaN        NaN  
1   2016-02-15 00:01:00     1.0     NaN        NaN
2   2016-02-15 00:02:00     4.0     2.0        NaN  
3   2016-02-15 00:03:00     2.0     2.0        NaN  
4   2016-02-15 00:04:00     7.0     4.1        1.0
5   2016-02-15 00:05:00     2.0     5.0        2.0
6   2016-02-15 00:06:00     2.4     2.0        7.5
7   2016-02-15 00:07:00     2.0     6.3        1.2
8   2016-02-15 00:08:00     2.5     7.0        NaN

我想在每列的每个时间戳处找到非 NaN 记录的累计总和。即预期的输出数据框应该是:

              timestamp   col_A     col_B    col_C
0   2016-02-15 00:00:00     1       NaN        NaN  
1   2016-02-15 00:01:00     2       NaN        NaN
2   2016-02-15 00:02:00     3       1          NaN  
3   2016-02-15 00:03:00     4       2          NaN  
4   2016-02-15 00:04:00     5       3          1
5   2016-02-15 00:05:00     6       4          2
6   2016-02-15 00:06:00     7       5          3
7   2016-02-15 00:07:00     8       6          4
8   2016-02-15 00:08:00     9       7          NaN

我正在遍历数据框并逐条查找 cumsum 记录。但是,我想知道是否有更优雅的方式来做到这一点?谢谢!

使用 notnull + cumsum,请注意,np.nan 是 float 类型,因此将所有 int 数字设为 float。

df.iloc[:,1:]=df.iloc[:,1:].notnull().cumsum()[df.iloc[:,1:].notnull()]
df
Out[33]: 
            timestamp  col_A  col_B  col_C
0  2016-02-1500:00:00      1    NaN    NaN
1  2016-02-1500:01:00      2    NaN    NaN
2  2016-02-1500:02:00      3    1.0    NaN
3  2016-02-1500:03:00      4    2.0    NaN
4  2016-02-1500:04:00      5    3.0    1.0
5  2016-02-1500:05:00      6    4.0    2.0
6  2016-02-1500:06:00      7    5.0    3.0
7  2016-02-1500:07:00      8    6.0    4.0
8  2016-02-1500:08:00      9    7.0    NaN

内联 where

df.assign(**(lambda d: d.cumsum().where(d))(df.drop('timestamp', 1).notna()))

             timestamp  col_A  col_B  col_C
0  2016-02-15 00:00:00      1    NaN    NaN
1  2016-02-15 00:01:00      2    NaN    NaN
2  2016-02-15 00:02:00      3    1.0    NaN
3  2016-02-15 00:03:00      4    2.0    NaN
4  2016-02-15 00:04:00      5    3.0    1.0
5  2016-02-15 00:05:00      6    4.0    2.0
6  2016-02-15 00:06:00      7    5.0    3.0
7  2016-02-15 00:07:00      8    6.0    4.0
8  2016-02-15 00:08:00      9    7.0    NaN

替换为update

df.update((lambda d: d.cumsum().where(d))(df.drop('timestamp', 1).notna()))
df

             timestamp  col_A  col_B  col_C
0  2016-02-15 00:00:00      1    NaN    NaN
1  2016-02-15 00:01:00      2    NaN    NaN
2  2016-02-15 00:02:00      3    1.0    NaN
3  2016-02-15 00:03:00      4    2.0    NaN
4  2016-02-15 00:04:00      5    3.0    1.0
5  2016-02-15 00:05:00      6    4.0    2.0
6  2016-02-15 00:06:00      7    5.0    3.0
7  2016-02-15 00:07:00      8    6.0    4.0
8  2016-02-15 00:08:00      9    7.0    NaN

详情

d = df.drop('timestamp', 1).notna()
d.cumsum().where(d)

   col_A  col_B  col_C
0      1    NaN    NaN
1      2    NaN    NaN
2      3    1.0    NaN
3      4    2.0    NaN
4      5    3.0    1.0
5      6    4.0    2.0
6      7    5.0    3.0
7      8    6.0    4.0
8      9    7.0    NaN