如何将每个第一个唯一多索引设置为 0 并计算其他索引的值

How to set each first unique multi-index to 0 and calculate values for the others

基于以下示例数据,构建了以下数据框:

day = [1, 2, 3, 2, 3, 1, 2]
item_id = [1, 1, 1, 2, 2, 3, 3]
item_name = ['A', 'A', 'A', 'B', 'B', 'C', 'C']
increase = [4, 0, 4, 3, 3, 3, 3]
decrease = [2, 2, 2, 1, 1, 1, 1]
my_df = pd.DataFrame(list(zip(day, item_id, item_name, increase, decrease)),
                     columns=['day', 'item_id', 'item_name', 'increase', 'decrease'])
my_df = my_df.set_index(['item_id', 'item_name'])

我想创建两个新列:

  1. starting_quantity[0] 会将索引(或多索引)的每个初始值设置为 0
  2. ending_quantity 加上 increase 减去 decrease
  3. starting_quantity[1,2,3,...]等于前一天的ending_quantity

我想要创建的输出如下:

如果您能协助完成上述 3 个步骤中的任何一个或所有步骤,我将不胜感激!

尝试:

my_df = my_df.set_index(["item_id", "item_name"])
g = my_df.groupby(level=0)

my_df["tmp"] = my_df["increase"] - my_df["decrease"]

my_df["starting_quantity"] = g["tmp"].shift().fillna(0)
my_df["starting_quantity"] = g["starting_quantity"].cumsum().astype(int)

my_df["ending_quantity"] = g["tmp"].cumsum()
my_df = my_df.drop(columns="tmp")

print(my_df)

打印:

                   day  increase  decrease  starting_quantity  ending_quantity
item_id item_name                                                             
1       A            1         4         2                  0                2
        A            2         0         2                  2                0
        A            3         4         2                  0                2
2       B            2         3         1                  0                2
        B            3         3         1                  2                4
3       C            1         3         1                  0                2
        C            2         3         1                  2                4