我一直在为以下问题编写 python 代码

I am stuck in writing the python code for below problem

我有以下数据框。

df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS']})

我想查找 vin 列是否包含唯一值,然后在结果列中它应该 return 'YES' 如果 vin 列不唯一,那么它将检查 'module' 列和 return 'YES' 其中模块列具有更高的优先级值。

我想要输出像下面的数据框。

df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS'],
               'Result':['YES','YES','NO','NO','NO','YES','YES','YES','NO','YES']})

下面的代码,我已经试过了,它给出了正确的结果,但是涉及的步骤太多了。

df['count'] = df.groupby('vin').vin.transform('count')


def Check1(df):
    if (df["count"]  == 1):
        return 1

elif ((df["count"]  != 1) & (df["module"]  == '1ST_PRIORITY')):
    return 1

elif ((df["count"]  != 1) & (df["module"]  == '2ND_PRIORITY')):
    return 2

elif ((df["count"]  != 1) & (df["module"]  == '3RD_PRIORITY')):
    return 3
    
else:
    return 4

df['Sort'] = df.apply(Check1, axis=1)

df = df.sort_values(by=['vin', 'Sort'])

df.drop_duplicates(subset=['vin'], keep='first',inplace = True)

df

IIUC,可以在sort_values:

后使用duplicated
df['Result'] = ((~df.sort_values('module').duplicated('vin'))
                    .replace({True: 'YES', False: 'NO'}))
print(df)

# Output
   vin        module Result
0  aaa        NORMAL    YES
1  bbb  1ST_PRIORITY    YES
2  bbb  2ND_PRIORITY     NO
3  bbb         HELLO     NO
4  ccc  3RD_PRIORITY     NO
5  ccc  2ND_PRIORITY    YES
6  ddd  2ND_PRIORITY    YES
7  eee  3RD_PRIORITY    YES
8  eee         HELLO     NO
9  fff           ABS    YES

这是诀窍,你需要 custom order:

from pandas.api.types import CategoricalDtype

#create your custom order
custom_order = CategoricalDtype(
    ['Delhi','Agra','Paris','ABS','HELLO','NORMAL'], 
    ordered=True)

#then attribute it to the desired column
df['module'] = df['module'].astype(custom_order)


df['Result'] = ((~df.sort_values('module', ascending=True).duplicated('vin'))
                    .replace({True: 'YES', False: 'NO'}))

结果:

index vin module Result
0 aaa NORMAL YES
1 bbb Delhi YES
2 bbb Agra NO
3 bbb HELLO NO
4 ccc Paris NO
5 ccc Agra YES
6 ddd Agra YES
7 eee Paris YES
8 eee HELLO NO
9 fff ABS YES