如果求和列等于零,Groupby .cumsum() 空白?
Groupby .cumsum() blank if the summed column is equal to zero?
我有一个 DataFrame .groupby() .cumsum(),其中的 DataFrame 如下:
Col_A Col_B Col_C
1 A 0
2 A 1 1
3 A 1 2
4 A 1 3
5 B 0 0
6 B 1 1
7 B 0
8 B 1 2
9 C 1 1
10 C 1 2
11 C 1 3
12 C 0
Col_B的总和是df.groupby(['Col_A'])['Col_B'].cumsum()
。但是,当 Col_B == 0 时,.cumsum() 为空白。如何在 Col_B 为空白时记录 .cumsum()
?
生成的 DataFrame 应类似于:
Col_A Col_B Col_C
1 A 0 0
2 A 1 1
3 A 1 2
4 A 1 3
5 B 0 0
6 B 1 1
7 B 0 1
8 B 1 2
9 C 1 1
10 C 1 2
11 C 1 3
12 C 0 3
我想你需要先过滤 boolean indexing
or query
:
df['Col_C'] = df[df['Col_B'] != 0].groupby(['Col_A'])['Col_B'].cumsum()
print (df)
Col_A Col_B Col_C
1 A 0 NaN
2 A 1 1.0
3 A 1 2.0
4 A 1 3.0
5 B 0 NaN
6 B 1 1.0
7 B 0 NaN
8 B 1 2.0
9 C 1 1.0
10 C 1 2.0
11 C 1 3.0
12 C 0 NaN
或者:
df['Col_C'] = df.query('Col_B != 0').groupby(['Col_A'])['Col_B'].cumsum()
print (df)
Col_A Col_B Col_C
1 A 0 NaN
2 A 1 1.0
3 A 1 2.0
4 A 1 3.0
5 B 0 NaN
6 B 1 1.0
7 B 0 NaN
8 B 1 2.0
9 C 1 1.0
10 C 1 2.0
11 C 1 3.0
12 C 0 NaN
最后用 ffill
替换 NaN
s (fillna
with method='ffill'). But get first values still NaN
s, which are replaced by fillna
最后将列转换为 int
:
df['Col_C'] = df['Col_C'].ffill().fillna(0).astype(int)
print (df)
Col_A Col_B Col_C
1 A 0 0
2 A 1 1
3 A 1 2
4 A 1 3
5 B 0 3
6 B 1 1
7 B 0 1
8 B 1 2
9 C 1 1
10 C 1 2
11 C 1 3
12 C 0 3
一列为 0 与一列完全空白不同。
如果您在列中有 NA,该列的 .cumsum() 实际上应该是 NA(或如您所说的 'blank' )。
您可以检查整列是否为 NA 并相应地设置值。
DataFrame.cumsum(axis=None, skipna=True, *args, **kwargs)
Return cumulative sum over requested axis.
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA
我有一个 DataFrame .groupby() .cumsum(),其中的 DataFrame 如下:
Col_A Col_B Col_C
1 A 0
2 A 1 1
3 A 1 2
4 A 1 3
5 B 0 0
6 B 1 1
7 B 0
8 B 1 2
9 C 1 1
10 C 1 2
11 C 1 3
12 C 0
Col_B的总和是df.groupby(['Col_A'])['Col_B'].cumsum()
。但是,当 Col_B == 0 时,.cumsum() 为空白。如何在 Col_B 为空白时记录 .cumsum()
?
生成的 DataFrame 应类似于:
Col_A Col_B Col_C
1 A 0 0
2 A 1 1
3 A 1 2
4 A 1 3
5 B 0 0
6 B 1 1
7 B 0 1
8 B 1 2
9 C 1 1
10 C 1 2
11 C 1 3
12 C 0 3
我想你需要先过滤 boolean indexing
or query
:
df['Col_C'] = df[df['Col_B'] != 0].groupby(['Col_A'])['Col_B'].cumsum()
print (df)
Col_A Col_B Col_C
1 A 0 NaN
2 A 1 1.0
3 A 1 2.0
4 A 1 3.0
5 B 0 NaN
6 B 1 1.0
7 B 0 NaN
8 B 1 2.0
9 C 1 1.0
10 C 1 2.0
11 C 1 3.0
12 C 0 NaN
或者:
df['Col_C'] = df.query('Col_B != 0').groupby(['Col_A'])['Col_B'].cumsum()
print (df)
Col_A Col_B Col_C
1 A 0 NaN
2 A 1 1.0
3 A 1 2.0
4 A 1 3.0
5 B 0 NaN
6 B 1 1.0
7 B 0 NaN
8 B 1 2.0
9 C 1 1.0
10 C 1 2.0
11 C 1 3.0
12 C 0 NaN
最后用 ffill
替换 NaN
s (fillna
with method='ffill'). But get first values still NaN
s, which are replaced by fillna
最后将列转换为 int
:
df['Col_C'] = df['Col_C'].ffill().fillna(0).astype(int)
print (df)
Col_A Col_B Col_C
1 A 0 0
2 A 1 1
3 A 1 2
4 A 1 3
5 B 0 3
6 B 1 1
7 B 0 1
8 B 1 2
9 C 1 1
10 C 1 2
11 C 1 3
12 C 0 3
一列为 0 与一列完全空白不同。 如果您在列中有 NA,该列的 .cumsum() 实际上应该是 NA(或如您所说的 'blank' )。 您可以检查整列是否为 NA 并相应地设置值。
DataFrame.cumsum(axis=None, skipna=True, *args, **kwargs)
Return cumulative sum over requested axis.
skipna : boolean, default True
Exclude NA/null values. If an entire row/column is NA, the result will be NA