根据 pandas 中的特定条件将特定行转为列
Pivot specific rows to columns based on certain conditions in pandas
这是我使用的数据框,其中可以有多个客户在不同月份与某个案例 ID 相关联(case_ID、cust_val、日期在此 table).
case_ID| cust_val | date | primary | action | change |
1 | xx | 3/2 | 1 | | increase |
1 | xx | 3/2 | | 1 | decrease |
1 | xx | 3/1 | 1 | | decrease |
1 | xx | 3/1 | | 1 | decrease |
1 | yy | 3/2 | 1 | | decrease |
1 | yy | 3/2 | | 1 | increase |
2 | yy | 3/2 | | 1 | increase |
2 | yy | 3/2 | 1 | | increase |
我希望输出 table 看起来像这样,其中对于每个 case_ID、cust_val、日期,与主要和操作相关的更改都在一行中:
case_ID| cust_val | date | primary_change | action_change |
1 | xx | 3/2 | increase | decrease |
1 | xx | 3/1 | decrease | decrease |
1 | yy | 3/2 | decrease | increase |
2 | yy | 3/2 | increase | increase |
我试过了,但这显然是错误的,我不确定如何解决这个问题:
df.pivot(index=['case_ID','cust_val','date'], columns=['primary', 'action'], values='change').reset_index()
感谢任何帮助。提前致谢。
您可以过滤数据框并合并:
a = df[df.primary == "1"] # <-- change "1" to 1 if the values are integers
b = df[df.action == "1"]
x = (
pd.merge(a, b, on=["case_ID", "cust_val", "date"])
.rename(columns={"change_x": "primary_change", "change_y": "action_change"})
.drop(columns=["primary_x", "action_x", "primary_y", "action_y"])
)
print(x)
打印:
case_ID cust_val date primary_change action_change
0 1 xx 3/2 increase decrease
1 1 xx 3/1 decrease decrease
2 1 yy 3/2 decrease increase
3 2 yy 3/2 increase increase
这是我使用的数据框,其中可以有多个客户在不同月份与某个案例 ID 相关联(case_ID、cust_val、日期在此 table).
case_ID| cust_val | date | primary | action | change |
1 | xx | 3/2 | 1 | | increase |
1 | xx | 3/2 | | 1 | decrease |
1 | xx | 3/1 | 1 | | decrease |
1 | xx | 3/1 | | 1 | decrease |
1 | yy | 3/2 | 1 | | decrease |
1 | yy | 3/2 | | 1 | increase |
2 | yy | 3/2 | | 1 | increase |
2 | yy | 3/2 | 1 | | increase |
我希望输出 table 看起来像这样,其中对于每个 case_ID、cust_val、日期,与主要和操作相关的更改都在一行中:
case_ID| cust_val | date | primary_change | action_change |
1 | xx | 3/2 | increase | decrease |
1 | xx | 3/1 | decrease | decrease |
1 | yy | 3/2 | decrease | increase |
2 | yy | 3/2 | increase | increase |
我试过了,但这显然是错误的,我不确定如何解决这个问题:
df.pivot(index=['case_ID','cust_val','date'], columns=['primary', 'action'], values='change').reset_index()
感谢任何帮助。提前致谢。
您可以过滤数据框并合并:
a = df[df.primary == "1"] # <-- change "1" to 1 if the values are integers
b = df[df.action == "1"]
x = (
pd.merge(a, b, on=["case_ID", "cust_val", "date"])
.rename(columns={"change_x": "primary_change", "change_y": "action_change"})
.drop(columns=["primary_x", "action_x", "primary_y", "action_y"])
)
print(x)
打印:
case_ID cust_val date primary_change action_change
0 1 xx 3/2 increase decrease
1 1 xx 3/1 decrease decrease
2 1 yy 3/2 decrease increase
3 2 yy 3/2 increase increase