如何将每个第一个唯一多索引设置为 0 并计算其他索引的值
How to set each first unique multi-index to 0 and calculate values for the others
基于以下示例数据,构建了以下数据框:
day = [1, 2, 3, 2, 3, 1, 2]
item_id = [1, 1, 1, 2, 2, 3, 3]
item_name = ['A', 'A', 'A', 'B', 'B', 'C', 'C']
increase = [4, 0, 4, 3, 3, 3, 3]
decrease = [2, 2, 2, 1, 1, 1, 1]
my_df = pd.DataFrame(list(zip(day, item_id, item_name, increase, decrease)),
columns=['day', 'item_id', 'item_name', 'increase', 'decrease'])
my_df = my_df.set_index(['item_id', 'item_name'])
我想创建两个新列:
- starting_quantity[0] 会将索引(或多索引)的每个初始值设置为 0
- ending_quantity 加上
increase
减去 decrease
- starting_quantity[1,2,3,...]等于前一天的ending_quantity
我想要创建的输出如下:
如果您能协助完成上述 3 个步骤中的任何一个或所有步骤,我将不胜感激!
尝试:
my_df = my_df.set_index(["item_id", "item_name"])
g = my_df.groupby(level=0)
my_df["tmp"] = my_df["increase"] - my_df["decrease"]
my_df["starting_quantity"] = g["tmp"].shift().fillna(0)
my_df["starting_quantity"] = g["starting_quantity"].cumsum().astype(int)
my_df["ending_quantity"] = g["tmp"].cumsum()
my_df = my_df.drop(columns="tmp")
print(my_df)
打印:
day increase decrease starting_quantity ending_quantity
item_id item_name
1 A 1 4 2 0 2
A 2 0 2 2 0
A 3 4 2 0 2
2 B 2 3 1 0 2
B 3 3 1 2 4
3 C 1 3 1 0 2
C 2 3 1 2 4
基于以下示例数据,构建了以下数据框:
day = [1, 2, 3, 2, 3, 1, 2]
item_id = [1, 1, 1, 2, 2, 3, 3]
item_name = ['A', 'A', 'A', 'B', 'B', 'C', 'C']
increase = [4, 0, 4, 3, 3, 3, 3]
decrease = [2, 2, 2, 1, 1, 1, 1]
my_df = pd.DataFrame(list(zip(day, item_id, item_name, increase, decrease)),
columns=['day', 'item_id', 'item_name', 'increase', 'decrease'])
my_df = my_df.set_index(['item_id', 'item_name'])
我想创建两个新列:
- starting_quantity[0] 会将索引(或多索引)的每个初始值设置为 0
- ending_quantity 加上
increase
减去decrease
- starting_quantity[1,2,3,...]等于前一天的ending_quantity
我想要创建的输出如下:
如果您能协助完成上述 3 个步骤中的任何一个或所有步骤,我将不胜感激!
尝试:
my_df = my_df.set_index(["item_id", "item_name"])
g = my_df.groupby(level=0)
my_df["tmp"] = my_df["increase"] - my_df["decrease"]
my_df["starting_quantity"] = g["tmp"].shift().fillna(0)
my_df["starting_quantity"] = g["starting_quantity"].cumsum().astype(int)
my_df["ending_quantity"] = g["tmp"].cumsum()
my_df = my_df.drop(columns="tmp")
print(my_df)
打印:
day increase decrease starting_quantity ending_quantity
item_id item_name
1 A 1 4 2 0 2
A 2 0 2 2 0
A 3 4 2 0 2
2 B 2 3 1 0 2
B 3 3 1 2 4
3 C 1 3 1 0 2
C 2 3 1 2 4