如何在聚合和滚动特定 window 后应用自定义函数(使用 apply 方法)

How to apply custom function(using apply method) after aggregating and rolling for particular window

我有这样的 DateFrame: period payor variance_charges 6/1/2018 LIABILITY PLANS 4631.6667 7/1/2018 LIABILITY PLANS -1125.8333 8/1/2018 LIABILITY PLANS -12688.3333 9/1/2018 LIABILITY PLANS -1657.5 10/1/2018 LIABILITY PLANS -14806.6667 11/1/2018 LIABILITY PLANS 13910.8333 12/1/2018 LIABILITY PLANS 12154.1667 6/1/2018 MEDICAID CMO -39174.5817 7/1/2018 MEDICAID CMO 59504.5767 8/1/2018 MEDICAID CMO 13967.4883 9/1/2018 MEDICAID CMO -158103.49 10/1/2018 MEDICAID CMO -71191.9667 11/1/2018 MEDICAID CMO -405366.1217 12/1/2018 MEDICAID CMO -21637.05

我想检查每个 window(每个 window 有 3 行)在对付款人(列)进行聚合后有多少个负值:

period      payor     variance_charges  count_neg
6/1/2018    LIABILITY PLANS 4631.6667   0
7/1/2018    LIABILITY PLANS -1125.8333  1
8/1/2018    LIABILITY PLANS -12688.3333 2
9/1/2018    LIABILITY PLANS -1657.5     3
10/1/2018   LIABILITY PLANS -14806.6667 3
11/1/2018   LIABILITY PLANS 13910.8333  2
12/1/2018   LIABILITY PLANS 12154.1667  1
6/1/2018    MEDICAID CMO    -39174.5817 1
7/1/2018    MEDICAID CMO    59504.5767  1
8/1/2018    MEDICAID CMO    13967.4883  1
9/1/2018    MEDICAID CMO    -158103.49  1
10/1/2018   MEDICAID CMO    -71191.9667 2
11/1/2018   MEDICAID CMO    -405366.12  3
12/1/2018   MEDICAID CMO    -21637.05   3

我试过下面的代码

df.sort_values(by = 'period', ascending=True)
df['count_neg'] = df.groupby(['payor'])['variance_charges'].transform(lambda x: x.rolling(6, min_periods=1).apply(lambda n: sum(n < 0 for n in x), raw = False))

使用上面的代码我可以检查整个聚合有多少个负值而不考虑 window。我得到的错误结果如下所示:

period      payor    variance_charges   count_neg
6/1/2018    LIABILITY PLANS 4631.6667   4
7/1/2018    LIABILITY PLANS -1125.8333  4
8/1/2018    LIABILITY PLANS -12688.3333 4
9/1/2018    LIABILITY PLANS -1657.5     4
10/1/2018   LIABILITY PLANS -14806.6667 4
11/1/2018   LIABILITY PLANS 13910.8333  4
12/1/2018   LIABILITY PLANS 12154.1667  4
6/1/2018    MEDICAID CMO    -39174.5817 5
7/1/2018    MEDICAID CMO    59504.5767  5
8/1/2018    MEDICAID CMO    13967.4883  5
9/1/2018    MEDICAID CMO    -158103.49  5
10/1/2018   MEDICAID CMO    -71191.9667 5
11/1/2018   MEDICAID CMO    -405366.17  5
12/1/2018   MEDICAID CMO    -21637.05   5

请帮助解决这个问题。

您可以通过删除 for n in x:

来简化您的函数
f = lambda x: x.rolling(3, min_periods=1).apply(lambda n: sum(n < 0), raw = False)
df['count_neg1'] = df.groupby(['payor'])['variance_charges'].transform(f).astype(int)

print (df)
       period            payor  variance_charges  count_neg  count_neg1
0    6/1/2018  LIABILITY PLANS         4631.6667          0           0
1    7/1/2018  LIABILITY PLANS        -1125.8333          1           1
2    8/1/2018  LIABILITY PLANS       -12688.3333          2           2
3    9/1/2018  LIABILITY PLANS        -1657.5000          3           3
4   10/1/2018  LIABILITY PLANS       -14806.6667          3           3
5   11/1/2018  LIABILITY PLANS        13910.8333          2           2
6   12/1/2018  LIABILITY PLANS        12154.1667          1           1
7    6/1/2018     MEDICAID CMO       -39174.5817          1           1
8    7/1/2018     MEDICAID CMO        59504.5767          1           1
9    8/1/2018     MEDICAID CMO        13967.4883          1           1
10   9/1/2018     MEDICAID CMO      -158103.4900          1           1
11  10/1/2018     MEDICAID CMO       -71191.9667          2           2
12  11/1/2018     MEDICAID CMO      -405366.1200          3           3
13  12/1/2018     MEDICAID CMO       -21637.0500          3           3