如何创建一个财政年度的销售额累计计数器？

Question

我的 df 是这样的。

Policy_No   Date
1           10/1/2020
2           20/2/2020
3           20/2/2020
4           23/3/2020
5           18/4/2020
6           30/4/2020
7           30/4/2020

我想创建一个基于财政年度（4 月至 3 月）在不同日期记录的保单的累积计数器

Date         Cum count of policies
10/1/2020    1
20/2/2020    3
23/3/2020    4
18/4/2021    1
30/4/2021    3

2021 年 4 月 18 日是新的财政年度，计数器从 0 开始。有人可以帮忙解决这个问题吗？

Answer 1

有一个名为 cumsum 的函数可以执行此操作：

df = pd.DataFrame({"Policy_No":[1,2,3,4,5,6,7],"Date":["10/1/2020","20/2/2020","20/2/2020","23/3/2020","18/4/2020","30/4/2020","30/4/2020"]})

print(df)
#0  1   10/1/2020
#1  2   20/2/2020
#2  3   20/2/2020
#3  4   23/3/2020
#4  5   18/4/2020
#5  6   30/4/2020
#6  7   30/4/2020

df.groupby("Date")["Policy_No"].count().cumsum()

#Date
#10/1/2020    1
#18/4/2020    2
#20/2/2020    4
#23/3/2020    5
#30/4/2020    7

如果你想为每个财政年度都这样做，我认为你需要为每个财政年度创建一个数据框，使用上面的逻辑，最后连接它们：

df = ... #dataframe
year_2020 = pd.to_datetime("01/04/2020")<= df["date"] < pd.to_datetime("01/04/2021")
df_2020 = df.loc[year_2020].groupby("date")["Policy_No"].count().cumsum()

year_2021 = pd.to_datetime("01/04/2021")<= df["date"] < pd.to_datetime("01/04/2022")
df_2021 = df.loc[year_2021].groupby("date")["Policy_No"].count().cumsum()

#concat at the end

df_total = pd.concat((df_2020,df_2021))

当然，如果您不能执行年份逻辑（因为太多了），您可以将它放在一个循环中，例如：

def get_financial_dates():
     """
      Some function that returns the start and end
      of each financial year
     """
     return date_start,date_end


df_total = pd.DataFrame() #initial dataframe

for date_start, date_end in get_financial_dates():
    idx = date_start <= df["date"] < date_end
    df_temp = df.loc[idx].groupby("date")["Policy_No"].count().cumsum()

    #concat at the end  
    df_total = pd.concat((df_total,df_temp))

如何创建一个财政年度的销售额累计计数器？

How to create a cumulative counter for sales in a financial year?

python

counter

data-manipulation

pandas

pandas-groupby