在 Pandas 中重组数据框
Restructure dataframe in Pandas
我想重组我的 pandas 数据框,其中 h1、h2 等是与小时相关的值。目前看起来像:
h1 h2 h3 h4 h5 h6 h7 h8 h9 ... h15 \
date ...
2004-01-01 46 46 45 41 39 35 33.0 33.0 36.0 ... 55.0
2004-01-02 43 44 46 46 47 47 47.0 47.0 47.0 ... 54.0
2004-01-03 45 46 46 44 43 46 46.0 47.0 51.0 ... 69.0
我想将其重组为:
date value
2004-01-01 1:00 46
2004-01-01 2:00 46
2004-01-01 3:00 45
2004-01-01 4:00 41
2004-01-01 5:00 39
2004-01-01 6:00 35
2004-01-01 7:00 33
...
2004-01-02 1:00 43
2004-01-02 2:00 44
2004-01-02 3:00 46
...
不确定该怎么做。有什么想法吗?
您可以使用 stack
然后,分配新索引
s=df.stack()
s.index=pd.to_datetime(s.index.get_level_values(level=0)+' '+s.index.get_level_values(level=1).str[1:].str.pad(2,fillchar='0'),format='%Y-%m-%d %H')
s#s.to_frame('Value').reset_index()
Out[1012]:
2004-01-01 01:00:00 46.0
2004-01-01 02:00:00 46.0
2004-01-01 03:00:00 45.0
2004-01-01 04:00:00 41.0
2004-01-01 05:00:00 39.0
2004-01-01 06:00:00 35.0
2004-01-01 07:00:00 33.0
2004-01-01 08:00:00 33.0
2004-01-01 09:00:00 36.0
2004-01-02 01:00:00 43.0
2004-01-02 02:00:00 44.0
2004-01-02 03:00:00 46.0
2004-01-02 04:00:00 46.0
2004-01-02 05:00:00 47.0
2004-01-02 06:00:00 47.0
2004-01-02 07:00:00 47.0
2004-01-02 08:00:00 47.0
2004-01-02 09:00:00 47.0
2004-01-03 01:00:00 45.0
2004-01-03 02:00:00 46.0
2004-01-03 03:00:00 46.0
2004-01-03 04:00:00 44.0
2004-01-03 05:00:00 43.0
2004-01-03 06:00:00 46.0
2004-01-03 07:00:00 46.0
2004-01-03 08:00:00 47.0
2004-01-03 09:00:00 51.0
dtype: float64
我想重组我的 pandas 数据框,其中 h1、h2 等是与小时相关的值。目前看起来像:
h1 h2 h3 h4 h5 h6 h7 h8 h9 ... h15 \
date ...
2004-01-01 46 46 45 41 39 35 33.0 33.0 36.0 ... 55.0
2004-01-02 43 44 46 46 47 47 47.0 47.0 47.0 ... 54.0
2004-01-03 45 46 46 44 43 46 46.0 47.0 51.0 ... 69.0
我想将其重组为:
date value
2004-01-01 1:00 46
2004-01-01 2:00 46
2004-01-01 3:00 45
2004-01-01 4:00 41
2004-01-01 5:00 39
2004-01-01 6:00 35
2004-01-01 7:00 33
...
2004-01-02 1:00 43
2004-01-02 2:00 44
2004-01-02 3:00 46
...
不确定该怎么做。有什么想法吗?
您可以使用 stack
然后,分配新索引
s=df.stack()
s.index=pd.to_datetime(s.index.get_level_values(level=0)+' '+s.index.get_level_values(level=1).str[1:].str.pad(2,fillchar='0'),format='%Y-%m-%d %H')
s#s.to_frame('Value').reset_index()
Out[1012]:
2004-01-01 01:00:00 46.0
2004-01-01 02:00:00 46.0
2004-01-01 03:00:00 45.0
2004-01-01 04:00:00 41.0
2004-01-01 05:00:00 39.0
2004-01-01 06:00:00 35.0
2004-01-01 07:00:00 33.0
2004-01-01 08:00:00 33.0
2004-01-01 09:00:00 36.0
2004-01-02 01:00:00 43.0
2004-01-02 02:00:00 44.0
2004-01-02 03:00:00 46.0
2004-01-02 04:00:00 46.0
2004-01-02 05:00:00 47.0
2004-01-02 06:00:00 47.0
2004-01-02 07:00:00 47.0
2004-01-02 08:00:00 47.0
2004-01-02 09:00:00 47.0
2004-01-03 01:00:00 45.0
2004-01-03 02:00:00 46.0
2004-01-03 03:00:00 46.0
2004-01-03 04:00:00 44.0
2004-01-03 05:00:00 43.0
2004-01-03 06:00:00 46.0
2004-01-03 07:00:00 46.0
2004-01-03 08:00:00 47.0
2004-01-03 09:00:00 51.0
dtype: float64