如何通过 pandas.factorize 识别多条件?
How can I identify for multi condition by pandas.factorize?
我读了 ,其中谈到 pd.factorize 识别和创建用户身份的唯一值。
然而,在我的例子中,我想应用多重条件,即 OR
条件来识别用户并且条件具有重要性排序。
例如:
df:
cond_1(email) cond_2(phone) cond_3(other)
abc@yahoo.com 12345678 qwe
asd@yahoo.com 789456123 rty
abc@yahoo.com 905132312 zxc
dsds@yahoo.com 789456123 po
abc@yahoo.com 789456123 special
预期:
cond_1(email) cond_2(phone) cond_3(other) unique_id
abc@yahoo.com 12345678 qwe 1
asd@yahoo.com 789456123 rty 2
abc@yahoo.com 905132312 zxc 1
dsds@yahoo.com 789456123 po 2
abc@yahoo.com 789456123 special 1
IIUC,你可以做到:
df['unique_id']=df.apply(lambda x: pd.factorize(x)[0]+1).min(axis=1)
print(df)
cond_1(email) cond_2(phone) cond_3(other) unique_id
0 abc@yahoo.com 12345678 qwe 1
1 asd@yahoo.com 789456123 rty 2
2 abc@yahoo.com 905132312 zxc 1
3 dsds@yahoo.com 789456123 po 2
4 abc@yahoo.com 789456123 special 1
我读了
然而,在我的例子中,我想应用多重条件,即 OR
条件来识别用户并且条件具有重要性排序。
例如: df:
cond_1(email) cond_2(phone) cond_3(other)
abc@yahoo.com 12345678 qwe
asd@yahoo.com 789456123 rty
abc@yahoo.com 905132312 zxc
dsds@yahoo.com 789456123 po
abc@yahoo.com 789456123 special
预期:
cond_1(email) cond_2(phone) cond_3(other) unique_id
abc@yahoo.com 12345678 qwe 1
asd@yahoo.com 789456123 rty 2
abc@yahoo.com 905132312 zxc 1
dsds@yahoo.com 789456123 po 2
abc@yahoo.com 789456123 special 1
IIUC,你可以做到:
df['unique_id']=df.apply(lambda x: pd.factorize(x)[0]+1).min(axis=1)
print(df)
cond_1(email) cond_2(phone) cond_3(other) unique_id
0 abc@yahoo.com 12345678 qwe 1
1 asd@yahoo.com 789456123 rty 2
2 abc@yahoo.com 905132312 zxc 1
3 dsds@yahoo.com 789456123 po 2
4 abc@yahoo.com 789456123 special 1