Pandas - 用时间和日期堆叠时间列
Pandas - stack time columns with time and date
我现在有日期和时间数据,我想将此数据框减少为两列,一列是时间戳(日期+时间),另一列是值
当前 df -
Date 8 am 10 am 1 pm
-----------------------------------------------
23/02/2022 5 10 11
24/02/2022 2 17 15
25/02/2022 7 90 175
所需的 df -
Timestamp value
----------------------------------------------
2022-02-23 00:08:00 5
2022-02-23 00:10:00 10
2022-02-23 00:13:00 11
2022-02-24 00:08:00 2
2022-02-24 00:10:00 17
2022-02-24 00:13:00 15
2022-02-25 00:08:00 7
2022-02-25 00:10:00 90
2022-02-25 00:13:00 175
这是我创建数据框的原始列表 -
[['Date', '08:00', '10:00', '12:00', '14:00', '19:00', '22:00', '03:00'],
['23/02/2022', '140', '244', '191', '88', '263', '252', '159'],
['24/02/2022', '184', '235', '189', '108', '283', '300', '202'],
['25/02/2022', '131', '217', '135', '179', '207', '284', '177'],
['26/02/2022', '112', '188', '96', '139', '148', '188', '125'],
['27/02/2022', '130', '189', '104', '163', '210', '221', '139'],
['28/02/2022', '118', '89', '84', '113', '259', '234', '105'],
['01/03/2022', '98', '89', '77', '82', '138', '174', '71'],
['02/03/2022', '87', '187', '69', '118', '199', '178', '59'],
['03/03/2022', '90', '200', '110', '102', '180', '216', '72']]
使用 melt
来展平数据框并将 Time
设置为变量列的名称。合并列 Date
和 Time
以创建时间戳,然后 sort_values
以重新排序您的数据框。最后,只保留 Timestamp
和 value
列:
combine_datetime = lambda x: pd.to_datetime(x['Date'] + ' ' + x['Time'],
format='%d/%m/%Y %H:%M')
out = (
df.melt('Date', var_name='Time').assign(Timestamp=combine_datetime)
.sort_values('Timestamp', ignore_index=True)[['Timestamp', 'value']]
)
print(out)
# Output
Timestamp value
0 2022-02-23 03:00:00 159
1 2022-02-23 08:00:00 140
2 2022-02-23 10:00:00 244
3 2022-02-23 12:00:00 191
4 2022-02-23 14:00:00 88
.. ... ...
58 2022-03-03 10:00:00 200
59 2022-03-03 12:00:00 110
60 2022-03-03 14:00:00 102
61 2022-03-03 19:00:00 180
62 2022-03-03 22:00:00 216
[63 rows x 2 columns]
注意:对于pd.to_datetime
,我使用了一种明确的格式来避免Pandas在日期第一位有任何歧义的情况下推断日期时间。
IIUC,使用melt
and to_datetime
:
(df
.melt(id_vars='Date', var_name='time')
.assign(Timestamp=lambda d: pd.to_datetime(d['Date']+' '+d['time']))
[['Timestamp', 'value']]
# below optional
.sort_values(by='Timestamp').reset_index(drop=True)
)
输出:
Timestamp value
0 2022-02-23 08:00:00 5
1 2022-02-23 10:00:00 10
2 2022-02-23 13:00:00 11
3 2022-02-24 08:00:00 2
4 2022-02-24 10:00:00 17
5 2022-02-24 13:00:00 15
6 2022-02-25 08:00:00 7
7 2022-02-25 10:00:00 90
8 2022-02-25 13:00:00 175
将日期设置为索引,将连接的时间组件堆栈并强制转换为日期时间;
s =df.set_index('Date').stack().to_frame('value').reset_index()
s=s.assign(Timestamp=pd.to_datetime(s['Date'].str.cat(s['level_1'], sep =' ')))[['Timestamp', 'value']]
Timestamp value
0 2022-02-23 08:00:00 5
1 2022-02-23 10:00:00 10
2 2022-02-23 13:00:00 11
3 2022-02-24 08:00:00 2
4 2022-02-24 10:00:00 17
5 2022-02-24 13:00:00 15
6 2022-02-25 08:00:00 7
7 2022-02-25 10:00:00 90
8 2022-02-25 13:00:00 175
我现在有日期和时间数据,我想将此数据框减少为两列,一列是时间戳(日期+时间),另一列是值
当前 df -
Date 8 am 10 am 1 pm
-----------------------------------------------
23/02/2022 5 10 11
24/02/2022 2 17 15
25/02/2022 7 90 175
所需的 df -
Timestamp value
----------------------------------------------
2022-02-23 00:08:00 5
2022-02-23 00:10:00 10
2022-02-23 00:13:00 11
2022-02-24 00:08:00 2
2022-02-24 00:10:00 17
2022-02-24 00:13:00 15
2022-02-25 00:08:00 7
2022-02-25 00:10:00 90
2022-02-25 00:13:00 175
这是我创建数据框的原始列表 -
[['Date', '08:00', '10:00', '12:00', '14:00', '19:00', '22:00', '03:00'],
['23/02/2022', '140', '244', '191', '88', '263', '252', '159'],
['24/02/2022', '184', '235', '189', '108', '283', '300', '202'],
['25/02/2022', '131', '217', '135', '179', '207', '284', '177'],
['26/02/2022', '112', '188', '96', '139', '148', '188', '125'],
['27/02/2022', '130', '189', '104', '163', '210', '221', '139'],
['28/02/2022', '118', '89', '84', '113', '259', '234', '105'],
['01/03/2022', '98', '89', '77', '82', '138', '174', '71'],
['02/03/2022', '87', '187', '69', '118', '199', '178', '59'],
['03/03/2022', '90', '200', '110', '102', '180', '216', '72']]
使用 melt
来展平数据框并将 Time
设置为变量列的名称。合并列 Date
和 Time
以创建时间戳,然后 sort_values
以重新排序您的数据框。最后,只保留 Timestamp
和 value
列:
combine_datetime = lambda x: pd.to_datetime(x['Date'] + ' ' + x['Time'],
format='%d/%m/%Y %H:%M')
out = (
df.melt('Date', var_name='Time').assign(Timestamp=combine_datetime)
.sort_values('Timestamp', ignore_index=True)[['Timestamp', 'value']]
)
print(out)
# Output
Timestamp value
0 2022-02-23 03:00:00 159
1 2022-02-23 08:00:00 140
2 2022-02-23 10:00:00 244
3 2022-02-23 12:00:00 191
4 2022-02-23 14:00:00 88
.. ... ...
58 2022-03-03 10:00:00 200
59 2022-03-03 12:00:00 110
60 2022-03-03 14:00:00 102
61 2022-03-03 19:00:00 180
62 2022-03-03 22:00:00 216
[63 rows x 2 columns]
注意:对于pd.to_datetime
,我使用了一种明确的格式来避免Pandas在日期第一位有任何歧义的情况下推断日期时间。
IIUC,使用melt
and to_datetime
:
(df
.melt(id_vars='Date', var_name='time')
.assign(Timestamp=lambda d: pd.to_datetime(d['Date']+' '+d['time']))
[['Timestamp', 'value']]
# below optional
.sort_values(by='Timestamp').reset_index(drop=True)
)
输出:
Timestamp value
0 2022-02-23 08:00:00 5
1 2022-02-23 10:00:00 10
2 2022-02-23 13:00:00 11
3 2022-02-24 08:00:00 2
4 2022-02-24 10:00:00 17
5 2022-02-24 13:00:00 15
6 2022-02-25 08:00:00 7
7 2022-02-25 10:00:00 90
8 2022-02-25 13:00:00 175
将日期设置为索引,将连接的时间组件堆栈并强制转换为日期时间;
s =df.set_index('Date').stack().to_frame('value').reset_index()
s=s.assign(Timestamp=pd.to_datetime(s['Date'].str.cat(s['level_1'], sep =' ')))[['Timestamp', 'value']]
Timestamp value
0 2022-02-23 08:00:00 5
1 2022-02-23 10:00:00 10
2 2022-02-23 13:00:00 11
3 2022-02-24 08:00:00 2
4 2022-02-24 10:00:00 17
5 2022-02-24 13:00:00 15
6 2022-02-25 08:00:00 7
7 2022-02-25 10:00:00 90
8 2022-02-25 13:00:00 175