pandas 将具有相同名称但不同后缀的列转换为索引并将索引转换为列

pandas convert columns with same name but different suffixes to indices and index to columns

我遇到了一个问题,我试图更改数据的形状,但试图在行级别上使用列名称而不附加后缀。

例如

我原来的df是这样的

+------+--------+--------+--------+--------+--------+--------+
| Year | col1_a | col2_a | col3_a | col1_b | col2_b | col3_b |
+------+--------+--------+--------+--------+--------+--------+
| 2017 |    1.4 |    555 |      5 |    123 |   55.5 |     80 |
| 2018 |    1.5 |    444 |      6 |    456 |   56.5 |     90 |
| 2019 |    0.6 |    333 |      8 |    789 |   57.5 |    100 |
+------+--------+--------+--------+--------+--------+--------+

我正在尝试将其重塑为

+------+------+------+------+------+------+------+
|      |  a   |  a   |   a  |  b   |   b  |   b  |
+------+------+------+------+------+------+------+
|        2017 | 2018 | 2019 | 2017 | 2018 | 2019 |
| col1 |  1.4 |  1.5 |  0.6 |  123 |  456 |  789 |
| col2 |  555 |  444 |  333 | 55.5 | 56.5 | 57.5 |
| col3 |    5 |    6 |    8 |   80 |   90 |  100 |
+------+------+------+------+------+------+------+

我可以使用 pivot_table 但问题是我无法在索引级别获得唯一的列名。知道可以使用哪种方法来重塑它吗?

我们可以set_index to address any columns which do not have a set pattern of prefix_suffix. We can then use str.rsplit with expand=True and n=1 to create MultiIndex columns. rsplit with n=1 ensures we only split one time on the rightmost underscore. Then stack to convert the new column index level to a row index level and transpose得到正确的大体形状:

df = df.set_index('Year')
df.columns = df.columns.str.rsplit('_', expand=True, n=1)
df = df.stack(level=1).transpose()
Year   2017          2018          2019       
          a      b      a      b      a      b
col1    1.4  123.0    1.5  456.0    0.6  789.0
col2  555.0   55.5  444.0   56.5  333.0   57.5
col3    5.0   80.0    6.0   90.0    8.0  100.0

我们可以使用 swaplevel and sort_index to get the headers to match the expected order. Also rename_axis 进一步清理数据以删除任何不需要的列名:

df = df.set_index('Year')
df.columns = df.columns.str.rsplit('_', expand=True, n=1)
df = (
    df.stack(level=1)
        .transpose()
        .sort_index(axis=1, level=1)
        .swaplevel(axis=1)
        .rename_axis(columns=[None, None])
)
          a                    b              
       2017   2018   2019   2017   2018   2019
col1    1.4    1.5    0.6  123.0  456.0  789.0
col2  555.0  444.0  333.0   55.5   56.5   57.5
col3    5.0    6.0    8.0   80.0   90.0  100.0

使用的设置:

import pandas as pd

df = pd.DataFrame({
    'Year': [2017, 2018, 2019], 'col1_a': [1.4, 1.5, 0.6],
    'col2_a': [555, 444, 333], 'col3_a': [5, 6, 8], 'col1_b': [123, 456, 789],
    'col2_b': [55.5, 56.5, 57.5], 'col3_b': [80, 90, 100]
})

解决方案需要先转换为长格式,然后再重塑为宽格式。一种选择是 pyjanitor 中的 pivot_longerpivot:

的组合
# pip install pyjanitor
import pandas as pd
import janitor

( df
.pivot_longer(index='Year', names_to=('col', '.value'), names_sep='_')
.pivot(index='col', columns='Year')
.rename_axis(columns=[None, None])
)

          a                    b
       2017   2018   2019   2017   2018   2019
col
col1    1.4    1.5    0.6  123.0  456.0  789.0
col2  555.0  444.0  333.0   55.5   56.5   57.5
col3    5.0    6.0    8.0   80.0   90.0  100.0