pandas 将具有相同名称但不同后缀的列转换为索引并将索引转换为列

pandas convert columns with same name but different suffixes to indices and index to columns




| Year | col1_a | col2_a | col3_a | col1_b | col2_b | col3_b |
| 2017 |    1.4 |    555 |      5 |    123 |   55.5 |     80 |
| 2018 |    1.5 |    444 |      6 |    456 |   56.5 |     90 |
| 2019 |    0.6 |    333 |      8 |    789 |   57.5 |    100 |


|      |  a   |  a   |   a  |  b   |   b  |   b  |
|        2017 | 2018 | 2019 | 2017 | 2018 | 2019 |
| col1 |  1.4 |  1.5 |  0.6 |  123 |  456 |  789 |
| col2 |  555 |  444 |  333 | 55.5 | 56.5 | 57.5 |
| col3 |    5 |    6 |    8 |   80 |   90 |  100 |

我可以使用 pivot_table 但问题是我无法在索引级别获得唯一的列名。知道可以使用哪种方法来重塑它吗?

我们可以set_index to address any columns which do not have a set pattern of prefix_suffix. We can then use str.rsplit with expand=True and n=1 to create MultiIndex columns. rsplit with n=1 ensures we only split one time on the rightmost underscore. Then stack to convert the new column index level to a row index level and transpose得到正确的大体形状:

df = df.set_index('Year')
df.columns = df.columns.str.rsplit('_', expand=True, n=1)
df = df.stack(level=1).transpose()
Year   2017          2018          2019       
          a      b      a      b      a      b
col1    1.4  123.0    1.5  456.0    0.6  789.0
col2  555.0   55.5  444.0   56.5  333.0   57.5
col3    5.0   80.0    6.0   90.0    8.0  100.0

我们可以使用 swaplevel and sort_index to get the headers to match the expected order. Also rename_axis 进一步清理数据以删除任何不需要的列名:

df = df.set_index('Year')
df.columns = df.columns.str.rsplit('_', expand=True, n=1)
df = (
        .sort_index(axis=1, level=1)
        .rename_axis(columns=[None, None])
          a                    b              
       2017   2018   2019   2017   2018   2019
col1    1.4    1.5    0.6  123.0  456.0  789.0
col2  555.0  444.0  333.0   55.5   56.5   57.5
col3    5.0    6.0    8.0   80.0   90.0  100.0


import pandas as pd

df = pd.DataFrame({
    'Year': [2017, 2018, 2019], 'col1_a': [1.4, 1.5, 0.6],
    'col2_a': [555, 444, 333], 'col3_a': [5, 6, 8], 'col1_b': [123, 456, 789],
    'col2_b': [55.5, 56.5, 57.5], 'col3_b': [80, 90, 100]

解决方案需要先转换为长格式,然后再重塑为宽格式。一种选择是 pyjanitor 中的 pivot_longerpivot:

# pip install pyjanitor
import pandas as pd
import janitor

( df
.pivot_longer(index='Year', names_to=('col', '.value'), names_sep='_')
.pivot(index='col', columns='Year')
.rename_axis(columns=[None, None])

          a                    b
       2017   2018   2019   2017   2018   2019
col1    1.4    1.5    0.6  123.0  456.0  789.0
col2  555.0  444.0  333.0   55.5   56.5   57.5
col3    5.0    6.0    8.0   80.0   90.0  100.0