Fillna 通过使用函数关联多个列
Fillna by relating multiple columns using a function
I have 3 columns in the dataframe. object, id and price. I want fill
the blanks by reading the id column and discover which price should I
use. For exemple: If the id ends in (A,B or C) the price should be 30
but if it's end (7A,7B or 7C) the price should be 50, If the id ends
in (E,F or G) the price should be 20, If the id ends in (O,M or N),
the price should be 10.
Here is the dataframe:
object id price
0 laptop 24A 30
1 laptop 37C NaN
2 laptop 21O NaN
3 laptop 17C 50
4 laptop 55A 30
5 laptop 34N NaN
6 laptop 05E 20
7 laptop 29B NaN
8 laptop 22M 10
9 laptop 62F NaN
10 laptop 23G 20
11 laptop 61O NaN
12 laptop 27A NaN
Expected output:
object id price
0 laptop 24A 30
1 laptop 37C 50
2 laptop 21O 10
3 laptop 17C 50
4 laptop 55A 30
5 laptop 34N 10
6 laptop 05E 20
7 laptop 29B 30
8 laptop 22M 10
9 laptop 62F 20
10 laptop 23G 20
11 laptop 61O 10
12 laptop 27A 50
可以使用np.select
with str.contains
条件:
conditions = {
30: df.id.str.contains('[^7][ABC]$'),
50: df.id.str.contains('7[ABC]$'),
20: df.id.str.contains('[EFG]$'),
10: df.id.str.contains('[OMN]$'),
}
df.price = np.select(conditions.values(), conditions.keys())
# object id price
# 0 laptop 24A 30
# 1 laptop 37C 50
# 2 laptop 21O 10
# 3 laptop 17C 50
# 4 laptop 55A 30
# 5 laptop 34N 10
# 6 laptop 05E 20
# 7 laptop 29B 30
# 8 laptop 22M 10
# 9 laptop 62F 20
# 10 laptop 23G 20
# 11 laptop 61O 10
# 12 laptop 27A 50
如果你想使用 fillna
:
,你也可以使用 loc
掩码
for price, condition in conditions.items():
df.loc[condition, 'price'] = df.loc[condition, 'price'].fillna(price)
更新 1
如果想通过df.object
进一步限制,可以在df.object
条件下加上&
:
conditions = {
30: df.object.eq('laptop') & df.id.str.contains('[^7][ABC]$'),
50: df.object.eq('laptop') & df.id.str.contains('7[ABC]$'),
20: df.object.eq('laptop') & df.id.str.contains('[EFG]$'),
10: df.object.eq('laptop') & df.id.str.contains('[OMN]$'),
1000: df.object.eq('phone') & df.id.str.contains('[OMN]$'),
}
更新 2
如果你真的想使用一个函数,你可以 apply
沿行 (axis=1
),但是行应用要慢得多,当你有像 [=14 这样的向量化选项时不建议=]:
def price(row):
result = np.nan
if row.object == 'laptop':
if row.id[-2:] in ['7A', '7B', '7C']:
result = 50
elif row.id[-1] in list('ABC'):
result = 30
elif row.id[-1] in list('EFG'):
result = 20
elif row.id[-1] in list('OMN'):
result = 10
elif row.object == 'phone':
if row.id[-2:] in ['7A', '7B', '7C']:
result = 5000
...
return result
df.price = df.apply(price, axis=1)
I have 3 columns in the dataframe. object, id and price. I want fill the blanks by reading the id column and discover which price should I use. For exemple: If the id ends in (A,B or C) the price should be 30 but if it's end (7A,7B or 7C) the price should be 50, If the id ends in (E,F or G) the price should be 20, If the id ends in (O,M or N), the price should be 10.
Here is the dataframe:
object id price
0 laptop 24A 30
1 laptop 37C NaN
2 laptop 21O NaN
3 laptop 17C 50
4 laptop 55A 30
5 laptop 34N NaN
6 laptop 05E 20
7 laptop 29B NaN
8 laptop 22M 10
9 laptop 62F NaN
10 laptop 23G 20
11 laptop 61O NaN
12 laptop 27A NaN
Expected output:
object id price
0 laptop 24A 30
1 laptop 37C 50
2 laptop 21O 10
3 laptop 17C 50
4 laptop 55A 30
5 laptop 34N 10
6 laptop 05E 20
7 laptop 29B 30
8 laptop 22M 10
9 laptop 62F 20
10 laptop 23G 20
11 laptop 61O 10
12 laptop 27A 50
可以使用np.select
with str.contains
条件:
conditions = {
30: df.id.str.contains('[^7][ABC]$'),
50: df.id.str.contains('7[ABC]$'),
20: df.id.str.contains('[EFG]$'),
10: df.id.str.contains('[OMN]$'),
}
df.price = np.select(conditions.values(), conditions.keys())
# object id price
# 0 laptop 24A 30
# 1 laptop 37C 50
# 2 laptop 21O 10
# 3 laptop 17C 50
# 4 laptop 55A 30
# 5 laptop 34N 10
# 6 laptop 05E 20
# 7 laptop 29B 30
# 8 laptop 22M 10
# 9 laptop 62F 20
# 10 laptop 23G 20
# 11 laptop 61O 10
# 12 laptop 27A 50
如果你想使用 fillna
:
loc
掩码
for price, condition in conditions.items():
df.loc[condition, 'price'] = df.loc[condition, 'price'].fillna(price)
更新 1
如果想通过df.object
进一步限制,可以在df.object
条件下加上&
:
conditions = {
30: df.object.eq('laptop') & df.id.str.contains('[^7][ABC]$'),
50: df.object.eq('laptop') & df.id.str.contains('7[ABC]$'),
20: df.object.eq('laptop') & df.id.str.contains('[EFG]$'),
10: df.object.eq('laptop') & df.id.str.contains('[OMN]$'),
1000: df.object.eq('phone') & df.id.str.contains('[OMN]$'),
}
更新 2
如果你真的想使用一个函数,你可以 apply
沿行 (axis=1
),但是行应用要慢得多,当你有像 [=14 这样的向量化选项时不建议=]:
def price(row):
result = np.nan
if row.object == 'laptop':
if row.id[-2:] in ['7A', '7B', '7C']:
result = 50
elif row.id[-1] in list('ABC'):
result = 30
elif row.id[-1] in list('EFG'):
result = 20
elif row.id[-1] in list('OMN'):
result = 10
elif row.object == 'phone':
if row.id[-2:] in ['7A', '7B', '7C']:
result = 5000
...
return result
df.price = df.apply(price, axis=1)