分组依据和引用移位值

Question

我正在尝试跟踪各个项目随时间推移的库存水平比较预计出站和可用性。有时间在预计出站超出可用性以及何时发生我希望 Post Available 为 0。我正在尝试创建下面 Pre Available 和 Post Available 列：

 Item  Week  Inbound  Outbound  Pre Available  Post Available 
 A        1      500       200            500             300 
 A        2        0       400            300               0 
 A        3      100         0            100             100 
 B        1       50        50             50               0 
 B        2        0        80              0               0 
 B        3        0        20              0               0 
 B        4       20        20             20               0

我试过下面的代码：

def custsum(x):

      total = 0
      for i, v in x.iterrows():
         total += df['Inbound'] - df['Outbound']
         x.loc[i, 'Post Available'] = total
         if total < 0:
            total = 0
      return x

df.groupby('Item').apply(custsum)

但我收到以下错误信息：

ValueError: Incompatible indexer with Series

我是 Python 的相对新手，因此我们将不胜感激。谢谢！

Answer 1

不需要自定义函数，可以使用groupby + shift创建PreAvailable，使用clip（设置下限为0）PostAvailable

df['PostAvailable']=(df.Inbound-df.Outbound).clip(lower=0)
df['PreAvailable']=df.groupby('item').apply(lambda x  : x['Inbound'].add(x['PostAvailable'].shift(),fill_value=0)).values
df
Out[213]: 
  item  Week  Inbound  Outbound  PreAvailable  PostAvailable
0    A     1      500       200         500.0            300
1    A     2        0       400         300.0              0
2    A     3      100         0         100.0            100
3    B     1       50        50          50.0              0
4    B     2        0        80           0.0              0
5    B     3        0        20           0.0              0
6    B     4       20        20          20.0              0

Answer 2

你可以使用

import numpy as np
import pandas as pd
df = pd.DataFrame({'Inbound': [500, 0, 100, 50, 0, 0, 20],
                   'Item': ['A', 'A', 'A', 'B', 'B', 'B', 'B'],
                   'Outbound': [200, 400, 0, 50, 80, 20, 20],
                   'Week': [1, 2, 3, 1, 2, 3, 4]})
df = df[['Item', 'Week', 'Inbound', 'Outbound']]


def custsum(x):
    total = 0
    for i, v in x.iterrows():
        total += x.loc[i, 'Inbound'] - x.loc[i, 'Outbound']
        if total < 0:
            total = 0
        x.loc[i, 'Post Available'] = total
    x['Pre Available'] = x['Post Available'].shift(1).fillna(0) + x['Inbound']
    return x

result = df.groupby('Item').apply(custsum)
result = result[['Item', 'Week', 'Inbound', 'Outbound', 'Pre Available', 'Post Available']]
print(result)

产生

  Item  Week  Inbound  Outbound  Pre Available  Post Available
0    A     1      500       200          500.0           300.0
1    A     2        0       400          300.0             0.0
2    A     3      100         0          100.0           100.0
3    B     1       50        50           50.0             0.0
4    B     2        0        80            0.0             0.0
5    B     3        0        20            0.0             0.0
6    B     4       20        20           20.0             0.0

此代码与您发布的代码之间的主要区别是：

total += x.loc[i, 'Inbound'] - x.loc[i, 'Outbound']

x.loc 用于 select 由 i 索引的行中的 numeric 值 Inbound 或 Outbound 列。所以区别是数字和 total 保持数字。相比之下，

total += df['Inbound'] - df['Outbound']

将整个系列添加到 total。这导致 ValueError 之后。（有关发生这种情况的原因的更多信息，请参见下文）。

有条件的

if total < 0:
    total = 0

移到 x.loc[i, 'Post Available'] = total 上方以确保 Post Available 始终为非负数。

如果你不需要这个条件，那么整个 for-loop 可以被

代替

x['Post Available'] = (df['Inbound'] - df.loc['Outbound']).cumsum()

并且由于列式算术和 cumsum 是向量化运算，因此计算可以更快地执行。不幸的是，条件阻止我们消除 for-loop 和向量化计算。

在您的原始代码中，错误

ValueError: Incompatible indexer with Series

出现在这一行

x.loc[i, 'Post Available'] = total

因为 total 是（有时）系列而不是简单的数值。 Pandas 是试图将右侧的系列与左侧的索引器 (i, 'Post Available') 对齐。索引器 (i, 'Post Available') 得到转换为像 (0, 4) 这样的元组，因为 Post Available 是位于索引 4。但是 (0, 4) 不是一维系列的合适索引在右侧。

您可以通过将 print(total) 放入 for-loop 来确认 total 是系列，或者注意

的右侧

total += df['Inbound'] - df['Outbound']

是一个系列。

分组依据和引用移位值

Grouping By and Referencing Shifted Values

python

methods

cumulative-sum

pandas