将 pandas DataFrame 重塑为 stacked/record/database/long 格式
Reshaping a pandas DataFrame into stacked/record/database/long format
将 pandas DataFrame 从宽格式转换为 stacked/record/database/long 格式的最佳方法是什么?
这是一个小代码示例:
宽幅面:
date hour1 hour2 hour3 hour4
2012-12-31 9.18 -0.10 -7.00 -64.92
2012-12-30 13.91 0.09 -0.96 0.08
2012-12-29 12.97 11.82 11.65 10.20
2012-12-28 22.01 16.04 15.68 11.67
2012-12-27 11.44 0.07 -19.97 -67.98
...
Stacked/record/database/long格式(需要):
date hour price
2012-12-31 00:00:00 hour1 9.18
2012-12-31 00:00:00 hour2 -0.1
2012-12-31 00:00:00 hour3 -7
2012-12-31 00:00:00 hour4 -64.92
...
2012-12-30 00:00:00 hour1 7.18
2012-12-30 00:00:00 hour2 -1.1
2012-12-30 00:00:00 hour3 -9
2012-12-30 00:00:00 hour4 -74.91
...
您可以使用 melt
将 DataFrame 从宽格式转换为长格式:
import pandas as pd
df = pd.DataFrame({'date': ['2012-12-31', '2012-12-30', '2012-12-29', '2012-12-28', '2012-12-27'],
'hour1': [9.18, 13.91, 12.97, 22.01, 11.44],
'hour2': [-0.1, 0.09, 11.82, 16.04, 0.07]})
print pd.melt(df, id_vars=['date'], value_vars=['hour1', 'hour2'], var_name='hour', value_name='price')
输出:
date hour price
0 2012-12-31 hour1 9.18
1 2012-12-30 hour1 13.91
2 2012-12-29 hour1 12.97
3 2012-12-28 hour1 22.01
4 2012-12-27 hour1 11.44
5 2012-12-31 hour2 -0.10
6 2012-12-30 hour2 0.09
7 2012-12-29 hour2 11.82
8 2012-12-28 hour2 16.04
9 2012-12-27 hour2 0.07
您可以使用 stack
来旋转 DataFrame。先设置date
为索引列:
>>> df.set_index('date').stack()
date
2012-12-31 hour1 9.18
hour2 -0.10
hour3 -7.00
hour4 -64.92
2012-12-30 hour1 13.91
hour2 0.09
hour3 -0.96
hour4 0.08
...
这实际上 returns 一个带有 MultiIndex 的系列。要创建像您指定的那样的 DataFrame,您可以在堆叠后重置 MultiIndex 并重命名列:
>>> stacked = df.set_index('date').stack()
>>> df2 = stacked.reset_index()
>>> df2.columns = ['date', 'hour', 'price']
>>> df2
date hour price
0 2012-12-31 hour1 9.18
1 2012-12-31 hour2 -0.10
2 2012-12-31 hour3 -7.00
3 2012-12-31 hour4 -64.92
4 2012-12-30 hour1 13.91
5 2012-12-30 hour2 0.09
6 2012-12-30 hour3 -0.96
7 2012-12-30 hour4 0.08
...
将 pandas DataFrame 从宽格式转换为 stacked/record/database/long 格式的最佳方法是什么?
这是一个小代码示例:
宽幅面:
date hour1 hour2 hour3 hour4
2012-12-31 9.18 -0.10 -7.00 -64.92
2012-12-30 13.91 0.09 -0.96 0.08
2012-12-29 12.97 11.82 11.65 10.20
2012-12-28 22.01 16.04 15.68 11.67
2012-12-27 11.44 0.07 -19.97 -67.98
...
Stacked/record/database/long格式(需要):
date hour price
2012-12-31 00:00:00 hour1 9.18
2012-12-31 00:00:00 hour2 -0.1
2012-12-31 00:00:00 hour3 -7
2012-12-31 00:00:00 hour4 -64.92
...
2012-12-30 00:00:00 hour1 7.18
2012-12-30 00:00:00 hour2 -1.1
2012-12-30 00:00:00 hour3 -9
2012-12-30 00:00:00 hour4 -74.91
...
您可以使用 melt
将 DataFrame 从宽格式转换为长格式:
import pandas as pd
df = pd.DataFrame({'date': ['2012-12-31', '2012-12-30', '2012-12-29', '2012-12-28', '2012-12-27'],
'hour1': [9.18, 13.91, 12.97, 22.01, 11.44],
'hour2': [-0.1, 0.09, 11.82, 16.04, 0.07]})
print pd.melt(df, id_vars=['date'], value_vars=['hour1', 'hour2'], var_name='hour', value_name='price')
输出:
date hour price
0 2012-12-31 hour1 9.18
1 2012-12-30 hour1 13.91
2 2012-12-29 hour1 12.97
3 2012-12-28 hour1 22.01
4 2012-12-27 hour1 11.44
5 2012-12-31 hour2 -0.10
6 2012-12-30 hour2 0.09
7 2012-12-29 hour2 11.82
8 2012-12-28 hour2 16.04
9 2012-12-27 hour2 0.07
您可以使用 stack
来旋转 DataFrame。先设置date
为索引列:
>>> df.set_index('date').stack()
date
2012-12-31 hour1 9.18
hour2 -0.10
hour3 -7.00
hour4 -64.92
2012-12-30 hour1 13.91
hour2 0.09
hour3 -0.96
hour4 0.08
...
这实际上 returns 一个带有 MultiIndex 的系列。要创建像您指定的那样的 DataFrame,您可以在堆叠后重置 MultiIndex 并重命名列:
>>> stacked = df.set_index('date').stack()
>>> df2 = stacked.reset_index()
>>> df2.columns = ['date', 'hour', 'price']
>>> df2
date hour price
0 2012-12-31 hour1 9.18
1 2012-12-31 hour2 -0.10
2 2012-12-31 hour3 -7.00
3 2012-12-31 hour4 -64.92
4 2012-12-30 hour1 13.91
5 2012-12-30 hour2 0.09
6 2012-12-30 hour3 -0.96
7 2012-12-30 hour4 0.08
...