Fillna 通过使用函数关联多个列

Fillna by relating multiple columns using a function

I have 3 columns in the dataframe. object, id and price. I want fill the blanks by reading the id column and discover which price should I use. For exemple: If the id ends in (A,B or C) the price should be 30 but if it's end (7A,7B or 7C) the price should be 50, If the id ends in (E,F or G) the price should be 20, If the id ends in (O,M or N), the price should be 10.

Here is the dataframe:

    object    id  price
0   laptop   24A   30
1   laptop   37C   NaN
2   laptop   21O   NaN
3   laptop   17C   50
4   laptop   55A   30
5   laptop   34N   NaN
6   laptop   05E   20
7   laptop   29B   NaN
8   laptop   22M   10
9   laptop   62F   NaN
10  laptop   23G   20
11  laptop   61O   NaN
12  laptop   27A   NaN

Expected output:

    object    id  price
0   laptop   24A   30
1   laptop   37C   50
2   laptop   21O   10
3   laptop   17C   50
4   laptop   55A   30
5   laptop   34N   10
6   laptop   05E   20
7   laptop   29B   30
8   laptop   22M   10
9   laptop   62F   20
10  laptop   23G   20
11  laptop   61O   10
12  laptop   27A   50

可以使用np.select with str.contains条件:

conditions = {
    30: df.id.str.contains('[^7][ABC]$'),
    50: df.id.str.contains('7[ABC]$'),
    20: df.id.str.contains('[EFG]$'),
    10: df.id.str.contains('[OMN]$'),
}
df.price = np.select(conditions.values(), conditions.keys())

#     object   id  price
# 0   laptop  24A     30
# 1   laptop  37C     50
# 2   laptop  21O     10
# 3   laptop  17C     50
# 4   laptop  55A     30
# 5   laptop  34N     10
# 6   laptop  05E     20
# 7   laptop  29B     30
# 8   laptop  22M     10
# 9   laptop  62F     20
# 10  laptop  23G     20
# 11  laptop  61O     10
# 12  laptop  27A     50

如果你想使用 fillna:

,你也可以使用 loc 掩码
for price, condition in conditions.items():
    df.loc[condition, 'price'] = df.loc[condition, 'price'].fillna(price)

更新 1

如果想通过df.object进一步限制,可以在df.object条件下加上&:

conditions = {
    30: df.object.eq('laptop') & df.id.str.contains('[^7][ABC]$'),
    50: df.object.eq('laptop') & df.id.str.contains('7[ABC]$'),
    20: df.object.eq('laptop') & df.id.str.contains('[EFG]$'),
    10: df.object.eq('laptop') & df.id.str.contains('[OMN]$'),
    1000: df.object.eq('phone') & df.id.str.contains('[OMN]$'),
}

更新 2

如果你真的想使用一个函数,你可以 apply 沿行 (axis=1),但是行应用要慢得多,当你有像 [=14 这样的向量化选项时不建议=]:

def price(row):
    result = np.nan
    if row.object == 'laptop':
        if row.id[-2:] in ['7A', '7B', '7C']:
            result = 50
        elif row.id[-1] in list('ABC'):
            result = 30
        elif row.id[-1] in list('EFG'):
            result = 20
        elif row.id[-1] in list('OMN'):
            result = 10
    elif row.object == 'phone':
        if row.id[-2:] in ['7A', '7B', '7C']:
            result = 5000
        ...
    return result
df.price = df.apply(price, axis=1)