折叠 pandas 数据框中的多行,然后将数据从长数据转换为宽数据

Collapse multiple rows in pandas dataframe and convert data from long to wide afterwards

我有一个包含重复人名的数据框

输入:

{'name': {0: 'John Smith', 1: 'John Smith', 2: 'John Smith', 3: 'John Doo', 4: 'John Doo', 5: 'John Doo'}, 'journal': {0: 'Journal1', 1: 'Journal2', 2: 'Journal2', 3: 'Journal1', 4: 'Journal2', 5: 'Journal2'}, 'is_editor_in_chief_2019': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editor_in_chief_2020': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editor_in_chief_2021': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editor_in_chief_2022': {0: 1.0, 1: nan, 2: nan, 3: 1.0, 4: nan, 5: nan}, 'is_editorial_board_member_2019': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editorial_board_member_2020': {0: nan, 1: nan, 2: nan, 3: nan, 4: nan, 5: nan}, 'is_editorial_board_member_2021': {0: nan, 1: nan, 2: 1.0, 3: nan, 4: nan, 5: 1.0}, 'is_editorial_board_member_2022': {0: nan, 1: 1.0, 2: nan, 3: nan, 4: 1.0, 5: nan}}

我想折叠这些行并将数据从长数据转换为宽数据。

预期输出:

{'name': {0: 'John Smith', 1: 'John Doo'}, 'Journal1_is_editor_in_chief_2019': {0: nan, 1: nan}, 'Journal1_is_editor_in_chief_2020': {0: nan, 1: nan}, 'Journal1_is_editor_in_chief_2021': {0: nan, 1: nan}, 'Journal1_is_editor_in_chief_2022': {0: 1, 1: 1}, 'Journal1_is_editorial_board_member_2019': {0: nan, 1: nan}, 'Journal1_is_editorial_board_member_2020': {0: nan, 1: nan}, 'Journal1_is_editorial_board_member_2021': {0: 1, 1: 1}, 'Journal1_is_editorial_board_member_2022': {0: 1, 1: 1}, 'Journal2_is_editor_in_chief_2019': {0: nan, 1: nan}, 'Journal2_is_editor_in_chief_2020': {0: nan, 1: nan}, 'Journal2_is_editor_in_chief_2021': {0: nan, 1: nan}, 'Journal2_is_editor_in_chief_2022': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2019': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2020': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2021': {0: nan, 1: nan}, 'Journal2_is_editorial_board_member_2022': {0: nan, 1: nan}}

我找到的最接近的解决方案是 :

df_out = df.set_index(['name', df.groupby(['name']).cumcount()+1]).unstack().sort_index(level=1, axis=1)
df_out.columns = df_out.columns.map('{0[0]}_{0[1]}'.format)
df_out.reset_index()

但它没有给我理想的输出。

您可以使用 pivot_table:

df_out = df.pivot_table(index='name', columns='journal', dropna=False)
df_out.columns = df_out.columns.map('{0[1]}_{0[0]}'.format)
df_out.reset_index()

输出:

         name  Journal1_is_editor_in_chief_2019  Journal2_is_editor_in_chief_2019  Journal1_is_editor_in_chief_2020  Journal2_is_editor_in_chief_2020  Journal1_is_editor_in_chief_2021  Journal2_is_editor_in_chief_2021  Journal1_is_editor_in_chief_2022  Journal2_is_editor_in_chief_2022  Journal1_is_editorial_board_member_2019  Journal2_is_editorial_board_member_2019  Journal1_is_editorial_board_member_2020  Journal2_is_editorial_board_member_2020  Journal1_is_editorial_board_member_2021  Journal2_is_editorial_board_member_2021  Journal1_is_editorial_board_member_2022  Journal2_is_editorial_board_member_2022
0    John Doo                               NaN                               NaN                               NaN                               NaN                               NaN                               NaN                               1.0                               NaN                                      NaN                                      NaN                                      NaN                                      NaN                                      NaN                                      1.0                                      NaN                                      1.0
1  John Smith                               NaN                               NaN                               NaN                               NaN                               NaN                               NaN                               1.0                               NaN                                      NaN                                      NaN                                      NaN                                      NaN                                      NaN                                      1.0                                      NaN                                      1.0