根据条件从一列数据框中汇总计数和总值
Summarizing count and value totals from a column of dataframe based on conditions
我有一个数据框,其中包含(比如说)不同产品的发票价格调整值。
例如:
df = pd.DataFrame({'col1': ['A','A','B','B','C','C','A','A','A','A'],
'sum' : [10,-10,10,10,0,-10,-10,0,10,0]})
我需要创建一个摘要 table 例如 under for col1 = 'A':
Count Value
Positve
Negative
NIL
截至目前,我正在按照以下方式执行此操作:
result = pd.DataFrame(columns=['Count','Value'],index=['Positve','Negative','NIL'])
result.iloc[0,0] = df[(df['col1'] = 'A') & (df['sum'] > 0)]['sum'].count()
result.iloc[0,1] = df[(df['col1'] = 'A') & (df['sum'] > 0)]['sum'].sum()
result.iloc[1,0] = df[(df['col1'] = 'A') & (df['sum'] < 0)]['sum'].count()
result.iloc[1,1] = df[(df['col1'] = 'A') & (df['sum'] < 0)]['sum'].sum()
result.iloc[2,0] = df[(df['col1'] = 'A') & (df['sum'] == 0)]['sum'].count()
result.iloc[2,1] = df[(df['col1'] = 'A') & (df['sum'] == 0)]['sum'].sum()
有没有更好更快的方法来代替为摘要中的每个值编写一行代码table?我在这里想不出任何东西。
首先使用 np.sign
with mapping Series.map
to new column, then filter only A
and aggregate count
with sum
in GroupBy.agg
命名聚合:
df['new'] = np.sign(df['sum']).map({0:'NIL', 1:'POS', -1:'NEG'})
df1 = df[df['col1'].eq('A')].groupby('new').agg(Count=('new','size'), Val=('sum','sum'))
print (df1)
Count Val
new
NEG 2 -20
NIL 2 0
POS 2 20
如果需要计算所有值,请将另一列添加到 groupby
:
df2 = df.groupby(['col1', 'new']).agg(Count=('new','size'), Val=('sum','sum'))
print (df2)
Count Val
col1 new
A NEG 2 -20
NIL 2 0
POS 2 20
B POS 2 20
C NEG 1 -10
NIL 1 0
我有一个数据框,其中包含(比如说)不同产品的发票价格调整值。
例如:
df = pd.DataFrame({'col1': ['A','A','B','B','C','C','A','A','A','A'],
'sum' : [10,-10,10,10,0,-10,-10,0,10,0]})
我需要创建一个摘要 table 例如 under for col1 = 'A':
Count Value
Positve
Negative
NIL
截至目前,我正在按照以下方式执行此操作:
result = pd.DataFrame(columns=['Count','Value'],index=['Positve','Negative','NIL'])
result.iloc[0,0] = df[(df['col1'] = 'A') & (df['sum'] > 0)]['sum'].count()
result.iloc[0,1] = df[(df['col1'] = 'A') & (df['sum'] > 0)]['sum'].sum()
result.iloc[1,0] = df[(df['col1'] = 'A') & (df['sum'] < 0)]['sum'].count()
result.iloc[1,1] = df[(df['col1'] = 'A') & (df['sum'] < 0)]['sum'].sum()
result.iloc[2,0] = df[(df['col1'] = 'A') & (df['sum'] == 0)]['sum'].count()
result.iloc[2,1] = df[(df['col1'] = 'A') & (df['sum'] == 0)]['sum'].sum()
有没有更好更快的方法来代替为摘要中的每个值编写一行代码table?我在这里想不出任何东西。
首先使用 np.sign
with mapping Series.map
to new column, then filter only A
and aggregate count
with sum
in GroupBy.agg
命名聚合:
df['new'] = np.sign(df['sum']).map({0:'NIL', 1:'POS', -1:'NEG'})
df1 = df[df['col1'].eq('A')].groupby('new').agg(Count=('new','size'), Val=('sum','sum'))
print (df1)
Count Val
new
NEG 2 -20
NIL 2 0
POS 2 20
如果需要计算所有值,请将另一列添加到 groupby
:
df2 = df.groupby(['col1', 'new']).agg(Count=('new','size'), Val=('sum','sum'))
print (df2)
Count Val
col1 new
A NEG 2 -20
NIL 2 0
POS 2 20
B POS 2 20
C NEG 1 -10
NIL 1 0