Pandas table 重塑 |用小时创建日期时间列

Pandas table re-shape | creating datetime column with hours

我正在尝试用 pandas 重塑 table。一年中的每一天都有 365 行的日期列。每小时 24 列,当天对应的每个值 24 列。我正在尝试创建一个包含天 + 小时(每天 24 行)的列和具有相应值的列。这是当前的 head():

Date          |    hour1     |    value1    |   hour2    |    value2    ... hour24    |     value 24

2016-01-01   |   1  |     4100  |    2   |    3500   |    24   |     5200

Here is the desired format:

Date                   |       value 

2016-01-01 01    |   4100

2016-01-01 02    |   3500

....

2016-01-01 24    |   5200

我已经尝试使用 melt 和 pivoting,但无法对天 + 小时列进行排序。

您需要 lreshape by dict, then add hours converted to to_timedelta and last remove column A by drop and if necessary sort_values Date 列:

print (df)
         Date  hour1  value1  hour2  value2  hour24  value24
0  2016-01-01      1    4100      2    3500      24     5200
1  2016-01-02      1    3000      2    3700      24     7200

a = [col for col in df.columns if col.startswith('hour')]
b = [col for col in df.columns if col.startswith('value')]

df = pd.lreshape(df, {'A' : a, 'B' : b})
df['Date'] = pd.to_datetime(df['Date']) + pd.to_timedelta(df['A'], unit='h')
df = df.drop('A', axis=1).sort_values('Date')
print (df)
                 Date     B
0 2016-01-01 01:00:00  4100
2 2016-01-01 02:00:00  3500
4 2016-01-02 00:00:00  5200
1 2016-01-02 01:00:00  3000
3 2016-01-02 02:00:00  3700
5 2016-01-03 00:00:00  7200

另一个解决方案是创建 MultiIndex.from_arrays by str.extract and reshape by DataFrame.stack:

df = df.set_index('Date')
mux = df.columns.to_series().str.extract('([A-Za-z]+)(\d+)', expand=True) 
df.columns = pd.MultiIndex.from_arrays([mux[0], mux[1]], names=('a','b'))
df = df.stack(1).reset_index()
df['Date'] = pd.to_datetime(df['Date']) + pd.to_timedelta(df['hour'], unit='h')
df = df.drop(['b', 'hour'], axis=1).rename_axis(None, axis=1)
print (df)
                 Date  value
0 2016-01-01 01:00:00   4100
1 2016-01-01 02:00:00   3500
2 2016-01-02 00:00:00   5200
3 2016-01-02 01:00:00   3000
4 2016-01-02 02:00:00   3700
5 2016-01-03 00:00:00   7200