Pandas有条件申请
Pandas Apply with condition
我有不同状态的重复客户,因为每个客户都有一行 subscription/product。我想为客户生成一个 new_status
并使其成为 'canceled',每个订阅状态必须一起 'canceled'。
我用过:
df['duplicated'] = df.groupby('customer', as_index=False)['customer'].cumcount()
分隔索引中的每个重复项以指示重复值
Customer | Status | new_status | duplicated
X |canceled| | 0
X |canceled| | 1
X |active | | 2
Y |canceled| | 0
A |canceled| | 0
A |canceled| | 1
B |active | | 0
B |canceled| | 1
因此,我想使用 .apply and/or .loc 生成:
Customer | Status | new_status | duplicated
X |canceled| | 0
X |canceled| | 1
X |active | | 2
Y |canceled| | 0
A |canceled| canceled | 0
A |canceled| canceled | 1
B |active | | 0
B |canceled| | 1
据我了解,你可以试试:
df['new_status']=(df.groupby('Customer')['Status'].
transform(lambda x: x.eq('canceled').all()).map({True:'cancelled'})).fillna(df.new_status)
print(df)
Customer Status new_status duplicated
0 X canceled 0
1 X canceled 1
2 X active 2
3 Y canceled cancelled 0
4 A canceled cancelled 0
5 A canceled cancelled 1
6 B active 0
7 B canceled 1
编辑,因为预期 o/p 已更改:
df['new_status']=(df.groupby('Customer')['Status'].
transform(lambda x: x.duplicated(keep=False)&(x.eq('canceled').all()))
.map({True:'cancelled',False:''}))
print(df)
Customer Status new_status duplicated
0 X canceled 0
1 X canceled 1
2 X active 2
3 Y canceled 0
4 A canceled cancelled 0
5 A canceled cancelled 1
6 B active 0
7 B canceled 1
比较列 Series.eq
for ==
and use GroupBy.transform
with GroupBy.all
for check if all values are True
s per groups, then compare Customer
by Series.duplicated
with keep=False
for return all dupes. Last chain together by bitwise AND
(&
) and set values by numpy.where
:
m1 = df['Status'].eq('canceled').groupby(df['Customer']).transform('all')
m2 = df['Customer'].duplicated(keep=False)
df['new_status'] = np.where(m1 & m2, 'cancelled', '')
print (df)
Customer Status new_status duplicated
0 X canceled 0
1 X canceled 1
2 X active 2
3 Y canceled 0
4 A canceled cancelled 0
5 A canceled cancelled 1
6 B active 0
7 B canceled 1
我有不同状态的重复客户,因为每个客户都有一行 subscription/product。我想为客户生成一个 new_status
并使其成为 'canceled',每个订阅状态必须一起 'canceled'。
我用过:
df['duplicated'] = df.groupby('customer', as_index=False)['customer'].cumcount()
分隔索引中的每个重复项以指示重复值
Customer | Status | new_status | duplicated
X |canceled| | 0
X |canceled| | 1
X |active | | 2
Y |canceled| | 0
A |canceled| | 0
A |canceled| | 1
B |active | | 0
B |canceled| | 1
因此,我想使用 .apply and/or .loc 生成:
Customer | Status | new_status | duplicated
X |canceled| | 0
X |canceled| | 1
X |active | | 2
Y |canceled| | 0
A |canceled| canceled | 0
A |canceled| canceled | 1
B |active | | 0
B |canceled| | 1
据我了解,你可以试试:
df['new_status']=(df.groupby('Customer')['Status'].
transform(lambda x: x.eq('canceled').all()).map({True:'cancelled'})).fillna(df.new_status)
print(df)
Customer Status new_status duplicated
0 X canceled 0
1 X canceled 1
2 X active 2
3 Y canceled cancelled 0
4 A canceled cancelled 0
5 A canceled cancelled 1
6 B active 0
7 B canceled 1
编辑,因为预期 o/p 已更改:
df['new_status']=(df.groupby('Customer')['Status'].
transform(lambda x: x.duplicated(keep=False)&(x.eq('canceled').all()))
.map({True:'cancelled',False:''}))
print(df)
Customer Status new_status duplicated
0 X canceled 0
1 X canceled 1
2 X active 2
3 Y canceled 0
4 A canceled cancelled 0
5 A canceled cancelled 1
6 B active 0
7 B canceled 1
比较列 Series.eq
for ==
and use GroupBy.transform
with GroupBy.all
for check if all values are True
s per groups, then compare Customer
by Series.duplicated
with keep=False
for return all dupes. Last chain together by bitwise AND
(&
) and set values by numpy.where
:
m1 = df['Status'].eq('canceled').groupby(df['Customer']).transform('all')
m2 = df['Customer'].duplicated(keep=False)
df['new_status'] = np.where(m1 & m2, 'cancelled', '')
print (df)
Customer Status new_status duplicated
0 X canceled 0
1 X canceled 1
2 X active 2
3 Y canceled 0
4 A canceled cancelled 0
5 A canceled cancelled 1
6 B active 0
7 B canceled 1