从具有多个条件的当前 Dataframe 创建 DataFrame

Create a DataFrame from present Dataframe with multiple conditions

我有一个如下所示的数据框。

data = {'Participant':['A', 'B', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D'],
    'Total test Result':[1, 4, 4, 4, 4, 2, 2, 3, 3, 3], 
    'result' : ['negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', ], 
    'time': ['2021-06-14', '2021-06-21', '2021-06-24', '2021-06-28', '2021-07-01', '2021-07-05', '2021-07-08', '2021-06-17', '2021-06-17', '2021-06-20'] }
pres_df = pd.DataFrame(data)
pres_df

注意:'time' 列是 DateTime 格式,如果有帮助的话。

我想创建一个新数据框,其中 'Participant' 的多个值通过创建多行时间和结果合并为 1 行。 所需的最终结果如下所示。

任何帮助是极大的赞赏。 谢谢

尝试:

x = pres_df.groupby("Participant", as_index=False).agg(
    {"Total test Result": "first", "result": list, "time": list}
)

a = x.pop("result").apply(
    lambda x: pd.Series(
        x, index=[f"test{v}_Result" for v in range(1, len(x) + 1)]
    )
)
b = x.pop("time").apply(
    lambda x: pd.Series(
        x, index=[f"test{v}_date" for v in range(1, len(x) + 1)]
    )
)

out = pd.concat([x, a, b], axis=1).sort_index(axis=1)
print(out)

打印:

  Participant  Total test Result test1_Result  test1_date test2_Result  test2_date test3_Result  test3_date test4_Result  test4_date
0           A                  1     negative  2021-06-14          NaN         NaN          NaN         NaN          NaN         NaN
1           B                  4     negative  2021-06-21     negative  2021-06-24     negative  2021-06-28     negative  2021-07-01
2           C                  2     negative  2021-07-05     negative  2021-07-08          NaN         NaN          NaN         NaN
3           D                  3     negative  2021-06-17     negative  2021-06-17     negative  2021-06-20          NaN         NaN

您可以使用 pd.pivot_table:

df.rename(columns={'time':'date'},inplace=True)
df = df.assign(test_res = 'Test' + df.groupby('Participant').cumcount().add(1).astype(str))
df1 = df.pivot_table(index=['Participant','Total test Result'], 
                                      columns=['test_res'],
                                      values=['date','result'],
                                      aggfunc = 'first'
                                      )
df1.columns = df1.columns.map(lambda x: f"{x[1]}_{x[0]}" if ('Test' in x[1]) else x[0])
df1 = df1[sorted(df1.columns)].reset_index()

df1: