如果满足条件,则根据最后一个非零值在 Pandas 列中填充零值

Fill Zero values in Pandas column based on last non-zero value if a criteria is fulfilled

考虑一个 Pandas DataFrame test = pd.DataFrame(data = [0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 1, 0, 0], columns = ['holding'])

Output:

+----------+
| Holdings |
+----------+
|        0 |
|        0 |
|        1 |
|        0 |
|        0 |
|        0 |
|       -1 |
|        0 |
|        0 |
|        0 |
|        1 |
|        0 |
|        0 |
+----------+

如果最后一个非零值等于1,我想用最后一个非零值替换所有零值。如果最后一个非零值等于-1,则不需要将 0 替换为 1.

我试过 test['position_holding'] = test['holding'].replace(to_replace=0, method='ffill') 结果是

+------------------+
| position_holding |
+------------------+
|                0 |
|                0 |
|                1 |
|                1 |
|                1 |
|                1 |
|               -1 |
|               -1 |
|               -1 |
|               -1 |
|                1 |
|                1 |
|                1 |
+------------------+

我在上面 table 中唯一需要修复的是用 -1 填充零,这违反了第二个条件。我怎样才能做到这一点?

Desired Output:
+------------------+
| position_holding |
+------------------+
|                0 |
|                0 |
|                1 |
|                1 |
|                1 |
|                1 |
|               -1 |
|                0 |
|                0 |
|                0 |
|                1 |
|                1 |
|                1 |
+------------------+

我的做法:

after = test.holding.eq(1)
before = test.holding.eq(-1)

test['pos_holding'] = test.holding.mask(test.holding.where(after|before).ffill()==1,1)

等效代码,稍微短一点:

mask = test.holding.where(test.holding != 0).ffill()
test['pos_holding'] = test.holding.mask(mask==1, 1)

输出:

    holding  pos_holding
0         0            0
1         0            0
2         1            1
3         0            1
4         0            1
5         0            1
6        -1           -1
7         0            0
8         0            0
9         0            0
10        1            1
11        0            1
12        0            1

不使用 pandas 或 numpy,但一个简单的 for 循环也可以。

for i in range(1, len(test)):
    if(test['holding'][i] == 0 and test['holding'][i-1] == 1):
        test['holding'][i] = 1

这应该有效

test = pd.DataFrame(data = [0, 0, 1, 0, 0, 0, -1, 0, 0, 0, 1, 0, 0], 
                    columns = ['holding'])
test['position_holding'] = test['holding'].replace(to_replace=0, method='ffill')

test["Diff"] = test["holding"]-test["position_holding"]
test.loc[test["Diff"]==1, 'position_holding']=0

然后您可以删除现在无用的 Diff 列。