Pandas DataFrame 中分组依据的自定义累积计算

Pandas Custom Cumulative Calculation Over Group By in DataFrame

我正在尝试 运行 对数据帧内组内每一行的值进行简单计算,但我在语法上遇到了问题,我想我特别困惑关于我应该 return 的数据对象,即数据框与系列等

对于上下文,我有一堆我正在跟踪的每个产品的库存值,我想通过一个自定义函数来估计销售数量,该函数基本上执行以下操作:

# Because stock can go up and down, I'm looking to record the difference 
# when the stock is less than the previous stock number from the previous row.
# How do I access each row of the dataframe and then return the series I need?

def get_stock_sold(x):
    # Written in pseudo
    stock_sold = previous_stock_no - current_stock_no if current_stock_no < previous_stock_no else 0
    return pd.Series(stock_sold)

然后我有以下数据框:

# 'Order' is a date in the real dataset.

data = { 
    'id'            : ['1', '1', '1', '2', '2', '2'],
    'order'         : [1, 2, 3, 1, 2, 3],
    'current_stock' : [100, 150, 90, 50, 48, 30]
}

df = pd.DataFrame(data)
df = df.sort_values(by=['id', 'order'])
df['previous_stock'] = df.groupby('id')['current_stock'].shift(1)

我想创建一个新列 (stock_sold) 并将上面的逻辑应用到分组数据框对象中的每一行:

df['stock_sold'] = df.groupby('id').apply(get_stock_sold)

所需的输出如下所示:

| id | order | current_stock | previous_stock | stock_sold |
|----|-------|---------------|----------------|------------|
| 1  | 1     | 100           | NaN            | 0          |
|    | 2     | 150           | 100.0          | 0          |
|    | 3     | 90            | 150.0          | 60         |
| 2  | 1     | 50            | NaN            | 0          |
|    | 2     | 48            | 50.0           | 2          |
|    | 3     | 30            | 48             | 18         |

尝试:

df["previous_stock"] = df.groupby("id")["current_stock"].shift()
df["stock_sold"] = np.where(
    df["current_stock"] > df["previous_stock"].fillna(0),
    0,
    df["previous_stock"] - df["current_stock"],
)
print(df)

打印:

  id  order  current_stock  previous_stock  stock_sold
0  1      1            100             NaN         0.0
1  1      2            150           100.0         0.0
2  1      3             90           150.0        60.0
3  2      1             50             NaN         0.0
4  2      2             48            50.0         2.0
5  2      3             30            48.0        18.0