解决给定问题的 sumifs 的 python 等价物是什么

Whats the python equivalent for sumifs to solve the given problem

这是数据框

import pandas as pd
df = pd.DataFrame({'IDENTIFIER': ['A_xcxcxc', 'BA_bcbcbc', 'A_xcxcxc', 'A_xcxcxc', 'BA_bcbcbc', 'C_rgrg', 'BA_bcbcbc', 'D_wewerw', 'A_xcxcxc', 'A_xcxcxc'],
                   'income': [-30362100.0, 200000.0, -21248077.5, 150000.0, -33843389.2, 200000.0, -40229279.75, 250000.0, -22111384.6, 200000.0],
'Date' : ['03/03/2031', '22/01/2060', '04/03/2025', '22/07/2032', '08/03/2028', '22/11/2065', '05/04/2024', '22/03/2032', '15/10/2025', '22/07/2065']
})

我想汇总每个标识符的收入,但前提是它落在 2030 年 1 月 1 日之前。只是为了澄清如果我在 excel 中使用 sumifs 这样做,我得到这个

我假设它可以使用 groupby 函数来完成,但不确定如何添加与日期相关的条件。

首先筛选日期在 2030 年 1 月 1 日之前的行,然后进行分组并求和:

import pandas as pd
import datetime

df = pd.DataFrame({'IDENTIFIER': ['A_xcxcxc', 'BA_bcbcbc', 'A_xcxcxc', 'A_xcxcxc', 'BA_bcbcbc', 'C_rgrg', 'BA_bcbcbc', 'D_wewerw', 'A_xcxcxc', 'A_xcxcxc'],
                   'income': [-30362100.0, 200000.0, -21248077.5, 150000.0, -33843389.2, 200000.0, -40229279.75, 250000.0, -22111384.6, 200000.0],
'Date' : ['03/03/2031', '22/01/2060', '04/03/2025', '22/07/2032', '08/03/2028', '22/11/2065', '05/04/2024', '22/03/2032', '15/10/2025', '22/07/2065']
})

# convert string column to datetime
df['Date'] = pd.to_datetime(df['Date'])

# limit for the filter
limit = datetime.datetime(year=2030, month=1, day=1)

# do the operation - df.loc[df.Date < limit] is the filter
df.loc[df.Date < limit].groupby('IDENTIFIER').sum()

输出:

                 income
IDENTIFIER             
A_xcxcxc   -43359462.10
BA_bcbcbc  -74072668.95

如果你想要不匹配的键:

>>> (df.groupby('IDENTIFIER')
     .apply(lambda x:x.loc[
                      pd.to_datetime(x.Date).lt('2030-01-01'), 
                      'income'
                  ].sum(min_count=1))
     .fillna('-'))

IDENTIFIER
A_xcxcxc    -43359462.10
BA_bcbcbc   -74072668.95
C_rgrg                 -
D_wewerw               -

不使用 apply:

>>> ( df['income']
        .where(pd.to_datetime(df.Date).lt('2030-01-01'))
        .groupby(df['IDENTIFIER']).sum(min_count=1).fillna('-') )

IDENTIFIER
A_xcxcxc    -43359462.10
BA_bcbcbc   -74072668.95
C_rgrg                 -
D_wewerw               -
Name: income, dtype: object

注意: 如果您想要 np.nan 而不是 -,请删除末尾的 fillna('-')

否则,如果您只想要匹配的组:

>>> df.groupby(df.loc[
         pd.to_datetime(df.Date).lt('2030-01-01'), 
         'IDENTIFIER'
    ])['income'].sum()

IDENTIFIER
A_xcxcxc    -43359462.10
BA_bcbcbc   -74072668.95
Name: income, dtype: float64