比较数据框的两个字符串列,其值为 "PO"、"GO" 等,并创建第三个列,其值为 "High"、"Low" 和 "No Change"
Comparing two string columns having values as "PO","GO" etc of a dataframe and creating a third column having values as "High","Low" and "No Change"
我在数据框中有两列。第一列命名为 previous_code,第二列命名为 New_code.These 列的值为 "PO"、"GO"、"RO" 等。例如,这些代码具有优先级"PO" 比 "GO" 具有更高的优先级。我想比较这两列的值并将输出作为 "High"、"Low" 和 "No Change"以防两列具有相同的代码。以下是数据框的示例
CustID|previous_code|New_code
345. | PO. | GO
367. | RO. | PO
385. |PO. | RO
455. |GO. |GO
预期输出数据帧
CustID|previous_code|New_code|Change
345. | PO. | GO. | Low
367. | RO. | PO. |High
385. |PO. | RO. |Low
455. |GO. |GO. |No Change
如果有人可以在 pyspark 或 Pandast 中为此编写演示代码,那将很有帮助。
提前致谢。
如果我理解顺序正确,这应该可以正常工作:
import pandas as pd
import numpy as np
data = {'CustID':[345,367,385,455],'previous_code':['PO','RO','PO','GO'],'New_code':['GO','PO','RO','GO']}
df = pd.DataFrame(data)
mapping = {'PO':1,'GO':2,'RO':3}
df['previous_aux'] = df['previous_code'].map(mapping)
df['new_aux'] = df['New_code'].map(mapping)
df['output'] = np.where(df['previous_aux'] == df['new_aux'],'No change',np.where(df['previous_aux'] > df['new_aux'],'High','Low'))
df = df[['CustID','previous_code','New_code','output']]
print(df)
输出:
CustID previous_code New_code output
0 345 PO GO Low
1 367 RO PO High
2 385 PO RO Low
3 455 GO GO No change
我在数据框中有两列。第一列命名为 previous_code,第二列命名为 New_code.These 列的值为 "PO"、"GO"、"RO" 等。例如,这些代码具有优先级"PO" 比 "GO" 具有更高的优先级。我想比较这两列的值并将输出作为 "High"、"Low" 和 "No Change"以防两列具有相同的代码。以下是数据框的示例
CustID|previous_code|New_code
345. | PO. | GO
367. | RO. | PO
385. |PO. | RO
455. |GO. |GO
预期输出数据帧
CustID|previous_code|New_code|Change
345. | PO. | GO. | Low
367. | RO. | PO. |High
385. |PO. | RO. |Low
455. |GO. |GO. |No Change
如果有人可以在 pyspark 或 Pandast 中为此编写演示代码,那将很有帮助。
提前致谢。
如果我理解顺序正确,这应该可以正常工作:
import pandas as pd
import numpy as np
data = {'CustID':[345,367,385,455],'previous_code':['PO','RO','PO','GO'],'New_code':['GO','PO','RO','GO']}
df = pd.DataFrame(data)
mapping = {'PO':1,'GO':2,'RO':3}
df['previous_aux'] = df['previous_code'].map(mapping)
df['new_aux'] = df['New_code'].map(mapping)
df['output'] = np.where(df['previous_aux'] == df['new_aux'],'No change',np.where(df['previous_aux'] > df['new_aux'],'High','Low'))
df = df[['CustID','previous_code','New_code','output']]
print(df)
输出:
CustID previous_code New_code output
0 345 PO GO Low
1 367 RO PO High
2 385 PO RO Low
3 455 GO GO No change