解决给定问题的 sumifs 的 python 等价物是什么
Whats the python equivalent for sumifs to solve the given problem
这是数据框
import pandas as pd
df = pd.DataFrame({'IDENTIFIER': ['A_xcxcxc', 'BA_bcbcbc', 'A_xcxcxc', 'A_xcxcxc', 'BA_bcbcbc', 'C_rgrg', 'BA_bcbcbc', 'D_wewerw', 'A_xcxcxc', 'A_xcxcxc'],
'income': [-30362100.0, 200000.0, -21248077.5, 150000.0, -33843389.2, 200000.0, -40229279.75, 250000.0, -22111384.6, 200000.0],
'Date' : ['03/03/2031', '22/01/2060', '04/03/2025', '22/07/2032', '08/03/2028', '22/11/2065', '05/04/2024', '22/03/2032', '15/10/2025', '22/07/2065']
})
我想汇总每个标识符的收入,但前提是它落在 2030 年 1 月 1 日之前。只是为了澄清如果我在 excel 中使用 sumifs 这样做,我得到这个
我假设它可以使用 groupby 函数来完成,但不确定如何添加与日期相关的条件。
首先筛选日期在 2030 年 1 月 1 日之前的行,然后进行分组并求和:
import pandas as pd
import datetime
df = pd.DataFrame({'IDENTIFIER': ['A_xcxcxc', 'BA_bcbcbc', 'A_xcxcxc', 'A_xcxcxc', 'BA_bcbcbc', 'C_rgrg', 'BA_bcbcbc', 'D_wewerw', 'A_xcxcxc', 'A_xcxcxc'],
'income': [-30362100.0, 200000.0, -21248077.5, 150000.0, -33843389.2, 200000.0, -40229279.75, 250000.0, -22111384.6, 200000.0],
'Date' : ['03/03/2031', '22/01/2060', '04/03/2025', '22/07/2032', '08/03/2028', '22/11/2065', '05/04/2024', '22/03/2032', '15/10/2025', '22/07/2065']
})
# convert string column to datetime
df['Date'] = pd.to_datetime(df['Date'])
# limit for the filter
limit = datetime.datetime(year=2030, month=1, day=1)
# do the operation - df.loc[df.Date < limit] is the filter
df.loc[df.Date < limit].groupby('IDENTIFIER').sum()
输出:
income
IDENTIFIER
A_xcxcxc -43359462.10
BA_bcbcbc -74072668.95
如果你想要不匹配的键:
>>> (df.groupby('IDENTIFIER')
.apply(lambda x:x.loc[
pd.to_datetime(x.Date).lt('2030-01-01'),
'income'
].sum(min_count=1))
.fillna('-'))
IDENTIFIER
A_xcxcxc -43359462.10
BA_bcbcbc -74072668.95
C_rgrg -
D_wewerw -
不使用 apply
:
>>> ( df['income']
.where(pd.to_datetime(df.Date).lt('2030-01-01'))
.groupby(df['IDENTIFIER']).sum(min_count=1).fillna('-') )
IDENTIFIER
A_xcxcxc -43359462.10
BA_bcbcbc -74072668.95
C_rgrg -
D_wewerw -
Name: income, dtype: object
注意: 如果您想要 np.nan
而不是 -
,请删除末尾的 fillna('-')
。
否则,如果您只想要匹配的组:
>>> df.groupby(df.loc[
pd.to_datetime(df.Date).lt('2030-01-01'),
'IDENTIFIER'
])['income'].sum()
IDENTIFIER
A_xcxcxc -43359462.10
BA_bcbcbc -74072668.95
Name: income, dtype: float64
这是数据框
import pandas as pd
df = pd.DataFrame({'IDENTIFIER': ['A_xcxcxc', 'BA_bcbcbc', 'A_xcxcxc', 'A_xcxcxc', 'BA_bcbcbc', 'C_rgrg', 'BA_bcbcbc', 'D_wewerw', 'A_xcxcxc', 'A_xcxcxc'],
'income': [-30362100.0, 200000.0, -21248077.5, 150000.0, -33843389.2, 200000.0, -40229279.75, 250000.0, -22111384.6, 200000.0],
'Date' : ['03/03/2031', '22/01/2060', '04/03/2025', '22/07/2032', '08/03/2028', '22/11/2065', '05/04/2024', '22/03/2032', '15/10/2025', '22/07/2065']
})
我想汇总每个标识符的收入,但前提是它落在 2030 年 1 月 1 日之前。只是为了澄清如果我在 excel 中使用 sumifs 这样做,我得到这个
我假设它可以使用 groupby 函数来完成,但不确定如何添加与日期相关的条件。
首先筛选日期在 2030 年 1 月 1 日之前的行,然后进行分组并求和:
import pandas as pd
import datetime
df = pd.DataFrame({'IDENTIFIER': ['A_xcxcxc', 'BA_bcbcbc', 'A_xcxcxc', 'A_xcxcxc', 'BA_bcbcbc', 'C_rgrg', 'BA_bcbcbc', 'D_wewerw', 'A_xcxcxc', 'A_xcxcxc'],
'income': [-30362100.0, 200000.0, -21248077.5, 150000.0, -33843389.2, 200000.0, -40229279.75, 250000.0, -22111384.6, 200000.0],
'Date' : ['03/03/2031', '22/01/2060', '04/03/2025', '22/07/2032', '08/03/2028', '22/11/2065', '05/04/2024', '22/03/2032', '15/10/2025', '22/07/2065']
})
# convert string column to datetime
df['Date'] = pd.to_datetime(df['Date'])
# limit for the filter
limit = datetime.datetime(year=2030, month=1, day=1)
# do the operation - df.loc[df.Date < limit] is the filter
df.loc[df.Date < limit].groupby('IDENTIFIER').sum()
输出:
income
IDENTIFIER
A_xcxcxc -43359462.10
BA_bcbcbc -74072668.95
如果你想要不匹配的键:
>>> (df.groupby('IDENTIFIER')
.apply(lambda x:x.loc[
pd.to_datetime(x.Date).lt('2030-01-01'),
'income'
].sum(min_count=1))
.fillna('-'))
IDENTIFIER
A_xcxcxc -43359462.10
BA_bcbcbc -74072668.95
C_rgrg -
D_wewerw -
不使用 apply
:
>>> ( df['income']
.where(pd.to_datetime(df.Date).lt('2030-01-01'))
.groupby(df['IDENTIFIER']).sum(min_count=1).fillna('-') )
IDENTIFIER
A_xcxcxc -43359462.10
BA_bcbcbc -74072668.95
C_rgrg -
D_wewerw -
Name: income, dtype: object
注意: 如果您想要 np.nan
而不是 -
,请删除末尾的 fillna('-')
。
否则,如果您只想要匹配的组:
>>> df.groupby(df.loc[
pd.to_datetime(df.Date).lt('2030-01-01'),
'IDENTIFIER'
])['income'].sum()
IDENTIFIER
A_xcxcxc -43359462.10
BA_bcbcbc -74072668.95
Name: income, dtype: float64