我一直在为以下问题编写 python 代码
I am stuck in writing the python code for below problem
我有以下数据框。
df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS']})
我想查找 vin 列是否包含唯一值,然后在结果列中它应该 return 'YES' 如果 vin 列不唯一,那么它将检查 'module' 列和 return 'YES' 其中模块列具有更高的优先级值。
我想要输出像下面的数据框。
df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS'],
'Result':['YES','YES','NO','NO','NO','YES','YES','YES','NO','YES']})
下面的代码,我已经试过了,它给出了正确的结果,但是涉及的步骤太多了。
df['count'] = df.groupby('vin').vin.transform('count')
def Check1(df):
if (df["count"] == 1):
return 1
elif ((df["count"] != 1) & (df["module"] == '1ST_PRIORITY')):
return 1
elif ((df["count"] != 1) & (df["module"] == '2ND_PRIORITY')):
return 2
elif ((df["count"] != 1) & (df["module"] == '3RD_PRIORITY')):
return 3
else:
return 4
df['Sort'] = df.apply(Check1, axis=1)
df = df.sort_values(by=['vin', 'Sort'])
df.drop_duplicates(subset=['vin'], keep='first',inplace = True)
df
IIUC,可以在sort_values
:
后使用duplicated
df['Result'] = ((~df.sort_values('module').duplicated('vin'))
.replace({True: 'YES', False: 'NO'}))
print(df)
# Output
vin module Result
0 aaa NORMAL YES
1 bbb 1ST_PRIORITY YES
2 bbb 2ND_PRIORITY NO
3 bbb HELLO NO
4 ccc 3RD_PRIORITY NO
5 ccc 2ND_PRIORITY YES
6 ddd 2ND_PRIORITY YES
7 eee 3RD_PRIORITY YES
8 eee HELLO NO
9 fff ABS YES
这是诀窍,你需要 custom order
:
from pandas.api.types import CategoricalDtype
#create your custom order
custom_order = CategoricalDtype(
['Delhi','Agra','Paris','ABS','HELLO','NORMAL'],
ordered=True)
#then attribute it to the desired column
df['module'] = df['module'].astype(custom_order)
df['Result'] = ((~df.sort_values('module', ascending=True).duplicated('vin'))
.replace({True: 'YES', False: 'NO'}))
结果:
index
vin
module
Result
0
aaa
NORMAL
YES
1
bbb
Delhi
YES
2
bbb
Agra
NO
3
bbb
HELLO
NO
4
ccc
Paris
NO
5
ccc
Agra
YES
6
ddd
Agra
YES
7
eee
Paris
YES
8
eee
HELLO
NO
9
fff
ABS
YES
我有以下数据框。
df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS']})
我想查找 vin 列是否包含唯一值,然后在结果列中它应该 return 'YES' 如果 vin 列不唯一,那么它将检查 'module' 列和 return 'YES' 其中模块列具有更高的优先级值。
我想要输出像下面的数据框。
df = pd.DataFrame({'vin':['aaa','bbb','bbb','bbb','ccc','ccc','ddd','eee','eee','fff'],'module':['NORMAL','1ST_PRIORITY','2ND_PRIORITY','HELLO','3RD_PRIORITY','2ND_PRIORITY','2ND_PRIORITY','3RD_PRIORITY','HELLO','ABS'],
'Result':['YES','YES','NO','NO','NO','YES','YES','YES','NO','YES']})
下面的代码,我已经试过了,它给出了正确的结果,但是涉及的步骤太多了。
df['count'] = df.groupby('vin').vin.transform('count')
def Check1(df):
if (df["count"] == 1):
return 1
elif ((df["count"] != 1) & (df["module"] == '1ST_PRIORITY')):
return 1
elif ((df["count"] != 1) & (df["module"] == '2ND_PRIORITY')):
return 2
elif ((df["count"] != 1) & (df["module"] == '3RD_PRIORITY')):
return 3
else:
return 4
df['Sort'] = df.apply(Check1, axis=1)
df = df.sort_values(by=['vin', 'Sort'])
df.drop_duplicates(subset=['vin'], keep='first',inplace = True)
df
IIUC,可以在sort_values
:
duplicated
df['Result'] = ((~df.sort_values('module').duplicated('vin'))
.replace({True: 'YES', False: 'NO'}))
print(df)
# Output
vin module Result
0 aaa NORMAL YES
1 bbb 1ST_PRIORITY YES
2 bbb 2ND_PRIORITY NO
3 bbb HELLO NO
4 ccc 3RD_PRIORITY NO
5 ccc 2ND_PRIORITY YES
6 ddd 2ND_PRIORITY YES
7 eee 3RD_PRIORITY YES
8 eee HELLO NO
9 fff ABS YES
这是诀窍,你需要 custom order
:
from pandas.api.types import CategoricalDtype
#create your custom order
custom_order = CategoricalDtype(
['Delhi','Agra','Paris','ABS','HELLO','NORMAL'],
ordered=True)
#then attribute it to the desired column
df['module'] = df['module'].astype(custom_order)
df['Result'] = ((~df.sort_values('module', ascending=True).duplicated('vin'))
.replace({True: 'YES', False: 'NO'}))
结果:
index | vin | module | Result |
---|---|---|---|
0 | aaa | NORMAL | YES |
1 | bbb | Delhi | YES |
2 | bbb | Agra | NO |
3 | bbb | HELLO | NO |
4 | ccc | Paris | NO |
5 | ccc | Agra | YES |
6 | ddd | Agra | YES |
7 | eee | Paris | YES |
8 | eee | HELLO | NO |
9 | fff | ABS | YES |