Pandas 转移到非 NaN 值并检查是否重复输入

Pandas Shift to Non-NaN Value and Check if Duplicate Entry

如果有人能帮助我将 df.Decision 列隔离为一行中的单个“买入”或“卖出”实例。例如,如果有 3 个“购买”决策,无论它们之间是否有 NaN 值分隔,我只需要保留第一个“购买”。类似的逻辑适用于“卖出”。

当前数据

Date ColA ColB ColC Decision
2018-03-21 41.6871345068477 39.1196017702354 39.8100609746974
2018-03-22 41.83569164767 39.1196017702354 39.8100609746974 Buy
2018-04-02 42.0277334284587 39.5353679158337 39.8100609746974 Buy
2018-04-30 41.0131864593112 42.1811215382421 40.3090368348783
2018-05-01 41.0131864593112 42.0844982888835 40.3090368348783
2018-05-02 41.0131864593112 41.9045373766682 40.3090368348783 Buy
2018-09-28 54.0546533518404 50.7025748467743 48.5804868844005
2018-10-01 54.1167056669686 50.7652351622538 48.5804868844005
2018-10-02 54.179969640969 50.7993057048438 48.5804868844005 Buy
2018-10-03 54.6021709547574 50.8035639654775 48.5804868844005 Buy
2018-10-04 54.6021709547574 51.1600610997758 48.7459608850365
2018-11-01 53.4815867079232 53.8788384068764 50.8680059009101
2018-11-02 53.4012843800357 53.8545041548076 50.8680059009101 Sell
2018-11-05 52.5179537180688 53.9007386980484 50.8680059009101 Sell
2018-11-06 52.5179537180688 54.1130540704967 50.8680059009101 Sell
2018-11-07 52.5179537180688 54.2608827598324 50.9081548909462
2018-11-08 52.381683825919 54.6830840736208 51.3303562047346 Sell
2018-11-09 51.9022943297893 54.6830840736208 51.3303562047346 Sell
2018-11-12 51.312945372196 54.869846946646 51.3303562047346 Sell
2018-11-13 51.0272439215888 54.873497352104 51.3303562047346 Sell
2019-02-28 40.0868369032957 37.9514787484214 42.9921818000566
2019-03-01 40.0917199269724 37.7384198717488 42.9921818000566
2019-03-04 40.5566646362643 37.6938570296322 42.9921818000566 Buy
2019-04-23 48.1070706672322 43.6878883048808 40.3077255381675 Buy
2019-04-24 48.1965810367431 43.817865832258 40.4377030655446
2019-04-25 48.1965810367431 43.9423243081189 40.5112749854225
2019-04-26 48.1965810367431 44.0116014371635 40.7923506041967 Buy
2019-04-29 48.1965810367431 45.2089733480352 41.8874654967458

预期数据

Date ColA ColB ColC Decision
2018-03-21 41.6871345068477 39.1196017702354 39.8100609746974
2018-03-22 41.83569164767 39.1196017702354 39.8100609746974 Buy
2018-04-02 42.0277334284587 39.5353679158337 39.8100609746974
2018-04-30 41.0131864593112 42.1811215382421 40.3090368348783
2018-05-01 41.0131864593112 42.0844982888835 40.3090368348783
2018-05-02 41.0131864593112 41.9045373766682 40.3090368348783
2018-09-28 54.0546533518404 50.7025748467743 48.5804868844005
2018-10-01 54.1167056669686 50.7652351622538 48.5804868844005
2018-10-02 54.179969640969 50.7993057048438 48.5804868844005
2018-10-03 54.6021709547574 50.8035639654775 48.5804868844005
2018-10-04 54.6021709547574 51.1600610997758 48.7459608850365
2018-11-01 53.4815867079232 53.8788384068764 50.8680059009101
2018-11-02 53.4012843800357 53.8545041548076 50.8680059009101 Sell
2018-11-05 52.5179537180688 53.9007386980484 50.8680059009101
2018-11-06 52.5179537180688 54.1130540704967 50.8680059009101
2018-11-07 52.5179537180688 54.2608827598324 50.9081548909462
2018-11-08 52.381683825919 54.6830840736208 51.3303562047346
2018-11-09 51.9022943297893 54.6830840736208 51.3303562047346
2018-11-12 51.312945372196 54.869846946646 51.3303562047346
2018-11-13 51.0272439215888 54.873497352104 51.3303562047346
2019-02-28 40.0868369032957 37.9514787484214 42.9921818000566
2019-03-01 40.0917199269724 37.7384198717488 42.9921818000566
2019-03-04 40.5566646362643 37.6938570296322 42.9921818000566 Buy
2019-04-23 48.1070706672322 43.6878883048808 40.3077255381675
2019-04-24 48.1965810367431 43.817865832258 40.4377030655446
2019-04-25 48.1965810367431 43.9423243081189 40.5112749854225
2019-04-26 48.1965810367431 44.0116014371635 40.7923506041967
2019-04-29 48.1965810367431 45.2089733480352 41.8874654967458

为了解决这个问题,我开始使用以下逻辑,但我无法让它正常工作。

df[df.Decision.notnull()].shift().eq('Buy').Decision

这些是决定不变的行:

rows = df['Decision'].ffill() == df['Decision'].ffill().shift(1)

将他们的决策标签转换为 NaN:

df.loc[rows, 'Decision'] = np.nan