pandas 将具有相同名称但不同后缀的列转换为索引并将索引转换为列

Question

我遇到了一个问题，我试图更改数据的形状，但试图在行级别上使用列名称而不附加后缀。

例如

我原来的df是这样的

+------+--------+--------+--------+--------+--------+--------+
| Year | col1_a | col2_a | col3_a | col1_b | col2_b | col3_b |
+------+--------+--------+--------+--------+--------+--------+
| 2017 |    1.4 |    555 |      5 |    123 |   55.5 |     80 |
| 2018 |    1.5 |    444 |      6 |    456 |   56.5 |     90 |
| 2019 |    0.6 |    333 |      8 |    789 |   57.5 |    100 |
+------+--------+--------+--------+--------+--------+--------+

我正在尝试将其重塑为

+------+------+------+------+------+------+------+
|      |  a   |  a   |   a  |  b   |   b  |   b  |
+------+------+------+------+------+------+------+
|        2017 | 2018 | 2019 | 2017 | 2018 | 2019 |
| col1 |  1.4 |  1.5 |  0.6 |  123 |  456 |  789 |
| col2 |  555 |  444 |  333 | 55.5 | 56.5 | 57.5 |
| col3 |    5 |    6 |    8 |   80 |   90 |  100 |
+------+------+------+------+------+------+------+

我可以使用 pivot_table 但问题是我无法在索引级别获得唯一的列名。知道可以使用哪种方法来重塑它吗？

Answer 1

我们可以set_index to address any columns which do not have a set pattern of prefix_suffix. We can then use str.rsplit with expand=True and n=1 to create MultiIndex columns. rsplit with n=1 ensures we only split one time on the rightmost underscore. Then stack to convert the new column index level to a row index level and transpose得到正确的大体形状：

df = df.set_index('Year')
df.columns = df.columns.str.rsplit('_', expand=True, n=1)
df = df.stack(level=1).transpose()

Year   2017          2018          2019       
          a      b      a      b      a      b
col1    1.4  123.0    1.5  456.0    0.6  789.0
col2  555.0   55.5  444.0   56.5  333.0   57.5
col3    5.0   80.0    6.0   90.0    8.0  100.0

我们可以使用 swaplevel and sort_index to get the headers to match the expected order. Also rename_axis 进一步清理数据以删除任何不需要的列名：

df = df.set_index('Year')
df.columns = df.columns.str.rsplit('_', expand=True, n=1)
df = (
    df.stack(level=1)
        .transpose()
        .sort_index(axis=1, level=1)
        .swaplevel(axis=1)
        .rename_axis(columns=[None, None])
)

          a                    b              
       2017   2018   2019   2017   2018   2019
col1    1.4    1.5    0.6  123.0  456.0  789.0
col2  555.0  444.0  333.0   55.5   56.5   57.5
col3    5.0    6.0    8.0   80.0   90.0  100.0

使用的设置：

import pandas as pd

df = pd.DataFrame({
    'Year': [2017, 2018, 2019], 'col1_a': [1.4, 1.5, 0.6],
    'col2_a': [555, 444, 333], 'col3_a': [5, 6, 8], 'col1_b': [123, 456, 789],
    'col2_b': [55.5, 56.5, 57.5], 'col3_b': [80, 90, 100]
})

Answer 2

解决方案需要先转换为长格式，然后再重塑为宽格式。一种选择是 pyjanitor 中的 pivot_longer 与 pivot:

的组合

# pip install pyjanitor
import pandas as pd
import janitor

( df
.pivot_longer(index='Year', names_to=('col', '.value'), names_sep='_')
.pivot(index='col', columns='Year')
.rename_axis(columns=[None, None])
)

          a                    b
       2017   2018   2019   2017   2018   2019
col
col1    1.4    1.5    0.6  123.0  456.0  789.0
col2  555.0  444.0  333.0   55.5   56.5   57.5
col3    5.0    6.0    8.0   80.0   90.0  100.0

pandas 将具有相同名称但不同后缀的列转换为索引并将索引转换为列

pandas convert columns with same name but different suffixes to indices and index to columns

python

pivot

reshape

dataframe

pandas