Pandas - 用时间和日期堆叠时间列

Pandas - stack time columns with time and date

我现在有日期和时间数据,我想将此数据框减少为两列,一列是时间戳(日期+时间),另一列是值

当前 df -

Date                    8 am   10 am   1 pm
-----------------------------------------------
23/02/2022              5        10    11
24/02/2022              2        17    15         
25/02/2022              7        90    175

所需的 df -

Timestamp                       value
----------------------------------------------
2022-02-23 00:08:00               5
2022-02-23 00:10:00               10
2022-02-23 00:13:00               11
2022-02-24 00:08:00               2
2022-02-24 00:10:00               17
2022-02-24 00:13:00               15
2022-02-25 00:08:00               7
2022-02-25 00:10:00               90
2022-02-25 00:13:00               175

这是我创建数据框的原始列表 -

[['Date', '08:00', '10:00', '12:00', '14:00', '19:00', '22:00', '03:00'],
 ['23/02/2022', '140', '244', '191', '88', '263', '252', '159'],
 ['24/02/2022', '184', '235', '189', '108', '283', '300', '202'],
 ['25/02/2022', '131', '217', '135', '179', '207', '284', '177'],
 ['26/02/2022', '112', '188', '96', '139', '148', '188', '125'],
 ['27/02/2022', '130', '189', '104', '163', '210', '221', '139'],
 ['28/02/2022', '118', '89', '84', '113', '259', '234', '105'],
 ['01/03/2022', '98', '89', '77', '82', '138', '174', '71'],
 ['02/03/2022', '87', '187', '69', '118', '199', '178', '59'],
 ['03/03/2022', '90', '200', '110', '102', '180', '216', '72']]

使用 melt 来展平数据框并将 Time 设置为变量列的名称。合并列 DateTime 以创建时间戳,然后 sort_values 以重新排序您的数据框。最后,只保留 Timestampvalue 列:

combine_datetime = lambda x: pd.to_datetime(x['Date'] + ' ' + x['Time'], 
                                            format='%d/%m/%Y %H:%M')

out = (
  df.melt('Date', var_name='Time').assign(Timestamp=combine_datetime)
    .sort_values('Timestamp', ignore_index=True)[['Timestamp', 'value']]
)
print(out)

# Output
             Timestamp value
0  2022-02-23 03:00:00   159
1  2022-02-23 08:00:00   140
2  2022-02-23 10:00:00   244
3  2022-02-23 12:00:00   191
4  2022-02-23 14:00:00    88
..                 ...   ...
58 2022-03-03 10:00:00   200
59 2022-03-03 12:00:00   110
60 2022-03-03 14:00:00   102
61 2022-03-03 19:00:00   180
62 2022-03-03 22:00:00   216

[63 rows x 2 columns]

注意:对于pd.to_datetime,我使用了一种明确的格式来避免Pandas在日期第一位有任何歧义的情况下推断日期时间。

IIUC,使用melt and to_datetime:

(df
   .melt(id_vars='Date', var_name='time')
   .assign(Timestamp=lambda d: pd.to_datetime(d['Date']+' '+d['time']))
   [['Timestamp', 'value']]
   # below optional
   .sort_values(by='Timestamp').reset_index(drop=True)
 )

输出:

            Timestamp  value
0 2022-02-23 08:00:00      5
1 2022-02-23 10:00:00     10
2 2022-02-23 13:00:00     11
3 2022-02-24 08:00:00      2
4 2022-02-24 10:00:00     17
5 2022-02-24 13:00:00     15
6 2022-02-25 08:00:00      7
7 2022-02-25 10:00:00     90
8 2022-02-25 13:00:00    175

将日期设置为索引,将连接的时间组件堆栈并强制转换为日期时间;

s =df.set_index('Date').stack().to_frame('value').reset_index()
s=s.assign(Timestamp=pd.to_datetime(s['Date'].str.cat(s['level_1'], sep =' ')))[['Timestamp', 'value']]


 

             Timestamp   value
0 2022-02-23 08:00:00      5
1 2022-02-23 10:00:00     10
2 2022-02-23 13:00:00     11
3 2022-02-24 08:00:00      2
4 2022-02-24 10:00:00     17
5 2022-02-24 13:00:00     15
6 2022-02-25 08:00:00      7
7 2022-02-25 10:00:00     90
8 2022-02-25 13:00:00    175