比较数据框的两个字符串列,其值为 "PO"、"GO" 等,并创建第三个列,其值为 "High"、"Low" 和 "No Change"

Comparing two string columns having values as "PO","GO" etc of a dataframe and creating a third column having values as "High","Low" and "No Change"

我在数据框中有两列。第一列命名为 previous_code,第二列命名为 New_code.These 列的值为 "PO"、"GO"、"RO" 等。例如,这些代码具有优先级"PO" 比 "GO" 具有更高的优先级。我想比较这两列的值并将输出作为 "High"、"Low" 和 "No Change"以防两列具有相同的代码。以下是数据框的示例

CustID|previous_code|New_code
345.    | PO.                   | GO
367.    | RO.                   | PO
385.    |PO.                    | RO
455.    |GO.                    |GO

预期输出数据帧

CustID|previous_code|New_code|Change

345.    | PO.                   | GO.            | Low
367.    | RO.                   | PO.            |High
385.    |PO.                    | RO.            |Low
455.    |GO.                    |GO.             |No Change

如果有人可以在 pyspark 或 Pandast 中为此编写演示代码,那将很有帮助。

提前致谢。

如果我理解顺序正确,这应该可以正常工作:

import pandas as pd
import numpy as np
data = {'CustID':[345,367,385,455],'previous_code':['PO','RO','PO','GO'],'New_code':['GO','PO','RO','GO']}
df = pd.DataFrame(data)
mapping = {'PO':1,'GO':2,'RO':3}
df['previous_aux'] = df['previous_code'].map(mapping)
df['new_aux'] = df['New_code'].map(mapping)
df['output'] = np.where(df['previous_aux'] == df['new_aux'],'No change',np.where(df['previous_aux'] > df['new_aux'],'High','Low'))
df = df[['CustID','previous_code','New_code','output']]
print(df)

输出:

   CustID previous_code New_code     output
0     345            PO       GO        Low
1     367            RO       PO       High
2     385            PO       RO        Low
3     455            GO       GO  No change