Pandas 用 np.select 填充

Pandas fillna with np.select

我的数据框 ('data') 如下所示:

index competitor region sku date price
000 A M 01 2022-01-01 100
001 A M 01 2022-01-02 099
002 A M 01 2022-01-03 099
003 A B 02 2022-01-01 101
004 A B 02 2022-01-02 100
005 A B 02 2022-01-03 101

列 'competitor'、'region'、'sku'、'date' 不包含 nans,但 'price' 包含。

我想做以下事情:

for loops/apply 显然太慢了,所以我决定使用 np.select:

prev_comp = data['competitor'].shift(1)
prev_reg = data['region'].shift(1)
prev_art = data['sku'].shift(1)

conditions = [
    (data['price'].isna()) & (data['price'].shift(1).notna()) & (data['competitor'].values == prev_comp) & (data['region'].values == prev_reg) & (data['sku'].values == prev_art),
    (data['price'].isna()) & (data['price'].shift(-1).notna()) & (data['competitor'].values != prev_comp) & (data['region'].values != prev_reg) & (data['sku'].values != prev_art),
    (data['price'].shift(1).notna()) & (data['price'].shift(-1).notna())
]

choices = [
    data.fillna(method='ffill'), 
    data.fillna(method='bfill'),
    data
]

data = np.select(conditions, choices)

我收到以下错误:

ValueError: shape mismatch: objects cannot be broadcast to a single shape. Mismatch is between arg 0 with shape (3930229,) and arg 1 with shape (3930229, 10).

错误是指条件 (3930229,) 和选择 (3930229, 10) 的形状,但我不知道如何处理它。

IIUC 使用 GroupBy.transform 和 lambda 函数来向前和向后填充缺失值,如果不存在则返回每个组的非缺失值 NaNs:

f = lambda x: x.ffill().bfill()
df['price'] = df.groupby(['competitor','region','sku'])['price'].transform(f)