pandas 将具有相同名称但不同后缀的列转换为索引并将索引转换为列
pandas convert columns with same name but different suffixes to indices and index to columns
我遇到了一个问题,我试图更改数据的形状,但试图在行级别上使用列名称而不附加后缀。
例如
我原来的df是这样的
+------+--------+--------+--------+--------+--------+--------+
| Year | col1_a | col2_a | col3_a | col1_b | col2_b | col3_b |
+------+--------+--------+--------+--------+--------+--------+
| 2017 | 1.4 | 555 | 5 | 123 | 55.5 | 80 |
| 2018 | 1.5 | 444 | 6 | 456 | 56.5 | 90 |
| 2019 | 0.6 | 333 | 8 | 789 | 57.5 | 100 |
+------+--------+--------+--------+--------+--------+--------+
我正在尝试将其重塑为
+------+------+------+------+------+------+------+
| | a | a | a | b | b | b |
+------+------+------+------+------+------+------+
| 2017 | 2018 | 2019 | 2017 | 2018 | 2019 |
| col1 | 1.4 | 1.5 | 0.6 | 123 | 456 | 789 |
| col2 | 555 | 444 | 333 | 55.5 | 56.5 | 57.5 |
| col3 | 5 | 6 | 8 | 80 | 90 | 100 |
+------+------+------+------+------+------+------+
我可以使用 pivot_table 但问题是我无法在索引级别获得唯一的列名。知道可以使用哪种方法来重塑它吗?
我们可以set_index to address any columns which do not have a set pattern of prefix_suffix
. We can then use str.rsplit with expand=True
and n=1
to create MultiIndex columns. rsplit
with n=1
ensures we only split one time on the rightmost underscore. Then stack to convert the new column index level to a row index level and transpose得到正确的大体形状:
df = df.set_index('Year')
df.columns = df.columns.str.rsplit('_', expand=True, n=1)
df = df.stack(level=1).transpose()
Year 2017 2018 2019
a b a b a b
col1 1.4 123.0 1.5 456.0 0.6 789.0
col2 555.0 55.5 444.0 56.5 333.0 57.5
col3 5.0 80.0 6.0 90.0 8.0 100.0
我们可以使用 swaplevel and sort_index to get the headers to match the expected order. Also rename_axis 进一步清理数据以删除任何不需要的列名:
df = df.set_index('Year')
df.columns = df.columns.str.rsplit('_', expand=True, n=1)
df = (
df.stack(level=1)
.transpose()
.sort_index(axis=1, level=1)
.swaplevel(axis=1)
.rename_axis(columns=[None, None])
)
a b
2017 2018 2019 2017 2018 2019
col1 1.4 1.5 0.6 123.0 456.0 789.0
col2 555.0 444.0 333.0 55.5 56.5 57.5
col3 5.0 6.0 8.0 80.0 90.0 100.0
使用的设置:
import pandas as pd
df = pd.DataFrame({
'Year': [2017, 2018, 2019], 'col1_a': [1.4, 1.5, 0.6],
'col2_a': [555, 444, 333], 'col3_a': [5, 6, 8], 'col1_b': [123, 456, 789],
'col2_b': [55.5, 56.5, 57.5], 'col3_b': [80, 90, 100]
})
解决方案需要先转换为长格式,然后再重塑为宽格式。一种选择是 pyjanitor
中的 pivot_longer
与 pivot
:
的组合
# pip install pyjanitor
import pandas as pd
import janitor
( df
.pivot_longer(index='Year', names_to=('col', '.value'), names_sep='_')
.pivot(index='col', columns='Year')
.rename_axis(columns=[None, None])
)
a b
2017 2018 2019 2017 2018 2019
col
col1 1.4 1.5 0.6 123.0 456.0 789.0
col2 555.0 444.0 333.0 55.5 56.5 57.5
col3 5.0 6.0 8.0 80.0 90.0 100.0
我遇到了一个问题,我试图更改数据的形状,但试图在行级别上使用列名称而不附加后缀。
例如
我原来的df是这样的
+------+--------+--------+--------+--------+--------+--------+
| Year | col1_a | col2_a | col3_a | col1_b | col2_b | col3_b |
+------+--------+--------+--------+--------+--------+--------+
| 2017 | 1.4 | 555 | 5 | 123 | 55.5 | 80 |
| 2018 | 1.5 | 444 | 6 | 456 | 56.5 | 90 |
| 2019 | 0.6 | 333 | 8 | 789 | 57.5 | 100 |
+------+--------+--------+--------+--------+--------+--------+
我正在尝试将其重塑为
+------+------+------+------+------+------+------+
| | a | a | a | b | b | b |
+------+------+------+------+------+------+------+
| 2017 | 2018 | 2019 | 2017 | 2018 | 2019 |
| col1 | 1.4 | 1.5 | 0.6 | 123 | 456 | 789 |
| col2 | 555 | 444 | 333 | 55.5 | 56.5 | 57.5 |
| col3 | 5 | 6 | 8 | 80 | 90 | 100 |
+------+------+------+------+------+------+------+
我可以使用 pivot_table 但问题是我无法在索引级别获得唯一的列名。知道可以使用哪种方法来重塑它吗?
我们可以set_index to address any columns which do not have a set pattern of prefix_suffix
. We can then use str.rsplit with expand=True
and n=1
to create MultiIndex columns. rsplit
with n=1
ensures we only split one time on the rightmost underscore. Then stack to convert the new column index level to a row index level and transpose得到正确的大体形状:
df = df.set_index('Year')
df.columns = df.columns.str.rsplit('_', expand=True, n=1)
df = df.stack(level=1).transpose()
Year 2017 2018 2019
a b a b a b
col1 1.4 123.0 1.5 456.0 0.6 789.0
col2 555.0 55.5 444.0 56.5 333.0 57.5
col3 5.0 80.0 6.0 90.0 8.0 100.0
我们可以使用 swaplevel and sort_index to get the headers to match the expected order. Also rename_axis 进一步清理数据以删除任何不需要的列名:
df = df.set_index('Year')
df.columns = df.columns.str.rsplit('_', expand=True, n=1)
df = (
df.stack(level=1)
.transpose()
.sort_index(axis=1, level=1)
.swaplevel(axis=1)
.rename_axis(columns=[None, None])
)
a b
2017 2018 2019 2017 2018 2019
col1 1.4 1.5 0.6 123.0 456.0 789.0
col2 555.0 444.0 333.0 55.5 56.5 57.5
col3 5.0 6.0 8.0 80.0 90.0 100.0
使用的设置:
import pandas as pd
df = pd.DataFrame({
'Year': [2017, 2018, 2019], 'col1_a': [1.4, 1.5, 0.6],
'col2_a': [555, 444, 333], 'col3_a': [5, 6, 8], 'col1_b': [123, 456, 789],
'col2_b': [55.5, 56.5, 57.5], 'col3_b': [80, 90, 100]
})
解决方案需要先转换为长格式,然后再重塑为宽格式。一种选择是 pyjanitor
中的 pivot_longer
与 pivot
:
# pip install pyjanitor
import pandas as pd
import janitor
( df
.pivot_longer(index='Year', names_to=('col', '.value'), names_sep='_')
.pivot(index='col', columns='Year')
.rename_axis(columns=[None, None])
)
a b
2017 2018 2019 2017 2018 2019
col
col1 1.4 1.5 0.6 123.0 456.0 789.0
col2 555.0 444.0 333.0 55.5 56.5 57.5
col3 5.0 6.0 8.0 80.0 90.0 100.0