从具有多个条件的当前 Dataframe 创建 DataFrame
Create a DataFrame from present Dataframe with multiple conditions
我有一个如下所示的数据框。
data = {'Participant':['A', 'B', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D'],
'Total test Result':[1, 4, 4, 4, 4, 2, 2, 3, 3, 3],
'result' : ['negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', ],
'time': ['2021-06-14', '2021-06-21', '2021-06-24', '2021-06-28', '2021-07-01', '2021-07-05', '2021-07-08', '2021-06-17', '2021-06-17', '2021-06-20'] }
pres_df = pd.DataFrame(data)
pres_df
注意:'time' 列是 DateTime 格式,如果有帮助的话。
我想创建一个新数据框,其中 'Participant' 的多个值通过创建多行时间和结果合并为 1 行。
所需的最终结果如下所示。
任何帮助是极大的赞赏。
谢谢
尝试:
x = pres_df.groupby("Participant", as_index=False).agg(
{"Total test Result": "first", "result": list, "time": list}
)
a = x.pop("result").apply(
lambda x: pd.Series(
x, index=[f"test{v}_Result" for v in range(1, len(x) + 1)]
)
)
b = x.pop("time").apply(
lambda x: pd.Series(
x, index=[f"test{v}_date" for v in range(1, len(x) + 1)]
)
)
out = pd.concat([x, a, b], axis=1).sort_index(axis=1)
print(out)
打印:
Participant Total test Result test1_Result test1_date test2_Result test2_date test3_Result test3_date test4_Result test4_date
0 A 1 negative 2021-06-14 NaN NaN NaN NaN NaN NaN
1 B 4 negative 2021-06-21 negative 2021-06-24 negative 2021-06-28 negative 2021-07-01
2 C 2 negative 2021-07-05 negative 2021-07-08 NaN NaN NaN NaN
3 D 3 negative 2021-06-17 negative 2021-06-17 negative 2021-06-20 NaN NaN
您可以使用 pd.pivot_table
:
df.rename(columns={'time':'date'},inplace=True)
df = df.assign(test_res = 'Test' + df.groupby('Participant').cumcount().add(1).astype(str))
df1 = df.pivot_table(index=['Participant','Total test Result'],
columns=['test_res'],
values=['date','result'],
aggfunc = 'first'
)
df1.columns = df1.columns.map(lambda x: f"{x[1]}_{x[0]}" if ('Test' in x[1]) else x[0])
df1 = df1[sorted(df1.columns)].reset_index()
df1:
我有一个如下所示的数据框。
data = {'Participant':['A', 'B', 'B', 'B', 'B', 'C', 'C', 'D', 'D', 'D'],
'Total test Result':[1, 4, 4, 4, 4, 2, 2, 3, 3, 3],
'result' : ['negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', 'negative', ],
'time': ['2021-06-14', '2021-06-21', '2021-06-24', '2021-06-28', '2021-07-01', '2021-07-05', '2021-07-08', '2021-06-17', '2021-06-17', '2021-06-20'] }
pres_df = pd.DataFrame(data)
pres_df
注意:'time' 列是 DateTime 格式,如果有帮助的话。
我想创建一个新数据框,其中 'Participant' 的多个值通过创建多行时间和结果合并为 1 行。 所需的最终结果如下所示。
尝试:
x = pres_df.groupby("Participant", as_index=False).agg(
{"Total test Result": "first", "result": list, "time": list}
)
a = x.pop("result").apply(
lambda x: pd.Series(
x, index=[f"test{v}_Result" for v in range(1, len(x) + 1)]
)
)
b = x.pop("time").apply(
lambda x: pd.Series(
x, index=[f"test{v}_date" for v in range(1, len(x) + 1)]
)
)
out = pd.concat([x, a, b], axis=1).sort_index(axis=1)
print(out)
打印:
Participant Total test Result test1_Result test1_date test2_Result test2_date test3_Result test3_date test4_Result test4_date
0 A 1 negative 2021-06-14 NaN NaN NaN NaN NaN NaN
1 B 4 negative 2021-06-21 negative 2021-06-24 negative 2021-06-28 negative 2021-07-01
2 C 2 negative 2021-07-05 negative 2021-07-08 NaN NaN NaN NaN
3 D 3 negative 2021-06-17 negative 2021-06-17 negative 2021-06-20 NaN NaN
您可以使用 pd.pivot_table
:
df.rename(columns={'time':'date'},inplace=True)
df = df.assign(test_res = 'Test' + df.groupby('Participant').cumcount().add(1).astype(str))
df1 = df.pivot_table(index=['Participant','Total test Result'],
columns=['test_res'],
values=['date','result'],
aggfunc = 'first'
)
df1.columns = df1.columns.map(lambda x: f"{x[1]}_{x[0]}" if ('Test' in x[1]) else x[0])
df1 = df1[sorted(df1.columns)].reset_index()
df1: