如何在 pandas 中保留 group by 中的空白?
How to retain blanks in group by in pandas?
我需要 groupby
我的 DataFrame 在 pandas 中,但是当我这样做时,空值正在转换为零,但我想保留空值。我不确定如何在 pandas.
中做到这一点
输入:
Id Country Product sales qty price
1 Germany shoes 32 1 NaN
1 Germany shoes 32 1 2
2 England Shoes 22 1 NaN
2 England Shoes 22 1 NaN
3 Austria Shoes 0 3 NaN
3 Austria Shoes NaN NaN NaN
期望的输出:
Id Country Product sales qty price
1 Germany shoes 64 2 2
2 England Shoes 44 2 NaN
3 Austria Shoes 0 3 NaN
在sum
中使用参数min_count=1
:
df = df.groupby(['Id','Country','Product'], as_index=False).sum(min_count=1)
print (df)
Id Country Product sales qty price
0 1 Germany shoes 64.0 2.0 2.0
1 2 England Shoes 44.0 2.0 NaN
2 3 Austria Shoes 0.0 3.0 NaN
您可以 mask
使用 isna
+ group
+ all
out = (df.groupby(['Id','Country','Product']).sum()
.mask(df[['sales','qty','price']].isna()
.groupby([df['Id'], df['Country'], df['Product']]).all())
.reset_index())
同一个想法写法不同:
cols = ['Id','Country','Product']
g = df.groupby(cols)
out = (g.sum()
.mask(g.apply(lambda x: x.drop(columns=cols).isna().all()))
.reset_index())
输出:
Id Country Product sales qty price
0 1 Germany shoes 64.0 2.0 2.0
1 2 England Shoes 44.0 2.0 NaN
2 3 Austria Shoes 0.0 3.0 NaN
我需要 groupby
我的 DataFrame 在 pandas 中,但是当我这样做时,空值正在转换为零,但我想保留空值。我不确定如何在 pandas.
输入:
Id Country Product sales qty price
1 Germany shoes 32 1 NaN
1 Germany shoes 32 1 2
2 England Shoes 22 1 NaN
2 England Shoes 22 1 NaN
3 Austria Shoes 0 3 NaN
3 Austria Shoes NaN NaN NaN
期望的输出:
Id Country Product sales qty price
1 Germany shoes 64 2 2
2 England Shoes 44 2 NaN
3 Austria Shoes 0 3 NaN
在sum
中使用参数min_count=1
:
df = df.groupby(['Id','Country','Product'], as_index=False).sum(min_count=1)
print (df)
Id Country Product sales qty price
0 1 Germany shoes 64.0 2.0 2.0
1 2 England Shoes 44.0 2.0 NaN
2 3 Austria Shoes 0.0 3.0 NaN
您可以 mask
使用 isna
+ group
+ all
out = (df.groupby(['Id','Country','Product']).sum()
.mask(df[['sales','qty','price']].isna()
.groupby([df['Id'], df['Country'], df['Product']]).all())
.reset_index())
同一个想法写法不同:
cols = ['Id','Country','Product']
g = df.groupby(cols)
out = (g.sum()
.mask(g.apply(lambda x: x.drop(columns=cols).isna().all()))
.reset_index())
输出:
Id Country Product sales qty price
0 1 Germany shoes 64.0 2.0 2.0
1 2 England Shoes 44.0 2.0 NaN
2 3 Austria Shoes 0.0 3.0 NaN