Pandas 将 48 个交易时段转换为一天中的时间
Pandas convert 48 trading periods to time of day
我想用Python pandas将每天的48个交易周期转换成它们发生的时间。交易时段 1 = 午夜,2 = 12.30am,3 = 1am,等等
我的 MWE 是
import numpy as np
import pandas as pd
import datetime
from datetime import date, datetime, time, timedelta
import pyarrow as pa
import pyarrow.parquet as pq
# my dataset - 2 days
df = pd.DataFrame()
df['date'] = pd.to_datetime(['2020-10-21']*48+['2020-10-22']*48, format='%Y-%m-%d')
trp = np.arange(1,49,1) # 48 trading periods in each day
df['tp'] = pd.DataFrame(np.concatenate((trp,trp)))
df = df.set_index('date')
midnight = df.index.time
T = df.tp.values
tstep = pd.Timedelta(minutes=(30*(T-1)))
df['time'] = pd.to_datetime(midnight + tstep)
#for jj in range(len(demand)):
# T = df.tp.values[jj]
# tstep = pd.Timedelta(minutes=(30*(T-1)))
# time0 = midnight + pd.to_datetime(tstep)
# #df['time'] = df['time'].append(tstep)
df.head()
我一直收到错误消息
TypeError Traceback (most recent call last)
<ipython-input-122-0b6c5efa5538> in <module>()
8
9 T = df.tp.values
---> 10 tstep = pd.Timedelta(minutes=(30*(T-1)))
11 df['time'] = pd.to_datetime(midnight + tstep)
12
pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.__new__()
pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas._to_py_int_float()
TypeError: Invalid type <class 'numpy.ndarray'>. Must be int or float.
我不确定如何解决这个问题,即使在尝试了 for 循环之后也是如此。
错误很明显:pd.Timestamp
接受标量值(float 或 int),而不是数组,作为分钟。
您可以使用 astype
:
直接转换您的系列
>>> df.tp.astype('timedelta64[m]')
date
2020-10-21 0 days 00:01:00
2020-10-21 0 days 00:02:00
2020-10-21 0 days 00:03:00
2020-10-21 0 days 00:04:00
2020-10-21 0 days 00:05:00
...
2020-10-22 0 days 00:44:00
2020-10-22 0 days 00:45:00
2020-10-22 0 days 00:46:00
2020-10-22 0 days 00:47:00
2020-10-22 0 days 00:48:00
Name: tp, Length: 96, dtype: timedelta64[ns]
这里timedelta64[m]
指定数字是timedelta,以分钟计算。您还应该直接使用 df.index
而不是 df.index.time
来使用 pandas 日期时间对象。从那里开始非常简单:
>>> df['time'] = df.index + (30 * (df.tp - 1)).astype('timedelta64[m]')
>>> df
tp time
date
2020-10-21 1 2020-10-21 00:00:00
2020-10-21 2 2020-10-21 00:30:00
2020-10-21 3 2020-10-21 01:00:00
2020-10-21 4 2020-10-21 01:30:00
2020-10-21 5 2020-10-21 02:00:00
... .. ...
2020-10-22 44 2020-10-22 21:30:00
2020-10-22 45 2020-10-22 22:00:00
2020-10-22 46 2020-10-22 22:30:00
2020-10-22 47 2020-10-22 23:00:00
2020-10-22 48 2020-10-22 23:30:00
[96 rows x 2 columns]
我想用Python pandas将每天的48个交易周期转换成它们发生的时间。交易时段 1 = 午夜,2 = 12.30am,3 = 1am,等等
我的 MWE 是
import numpy as np
import pandas as pd
import datetime
from datetime import date, datetime, time, timedelta
import pyarrow as pa
import pyarrow.parquet as pq
# my dataset - 2 days
df = pd.DataFrame()
df['date'] = pd.to_datetime(['2020-10-21']*48+['2020-10-22']*48, format='%Y-%m-%d')
trp = np.arange(1,49,1) # 48 trading periods in each day
df['tp'] = pd.DataFrame(np.concatenate((trp,trp)))
df = df.set_index('date')
midnight = df.index.time
T = df.tp.values
tstep = pd.Timedelta(minutes=(30*(T-1)))
df['time'] = pd.to_datetime(midnight + tstep)
#for jj in range(len(demand)):
# T = df.tp.values[jj]
# tstep = pd.Timedelta(minutes=(30*(T-1)))
# time0 = midnight + pd.to_datetime(tstep)
# #df['time'] = df['time'].append(tstep)
df.head()
我一直收到错误消息
TypeError Traceback (most recent call last)
<ipython-input-122-0b6c5efa5538> in <module>()
8
9 T = df.tp.values
---> 10 tstep = pd.Timedelta(minutes=(30*(T-1)))
11 df['time'] = pd.to_datetime(midnight + tstep)
12
pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.__new__()
pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas._to_py_int_float()
TypeError: Invalid type <class 'numpy.ndarray'>. Must be int or float.
我不确定如何解决这个问题,即使在尝试了 for 循环之后也是如此。
错误很明显:pd.Timestamp
接受标量值(float 或 int),而不是数组,作为分钟。
您可以使用 astype
:
>>> df.tp.astype('timedelta64[m]')
date
2020-10-21 0 days 00:01:00
2020-10-21 0 days 00:02:00
2020-10-21 0 days 00:03:00
2020-10-21 0 days 00:04:00
2020-10-21 0 days 00:05:00
...
2020-10-22 0 days 00:44:00
2020-10-22 0 days 00:45:00
2020-10-22 0 days 00:46:00
2020-10-22 0 days 00:47:00
2020-10-22 0 days 00:48:00
Name: tp, Length: 96, dtype: timedelta64[ns]
这里timedelta64[m]
指定数字是timedelta,以分钟计算。您还应该直接使用 df.index
而不是 df.index.time
来使用 pandas 日期时间对象。从那里开始非常简单:
>>> df['time'] = df.index + (30 * (df.tp - 1)).astype('timedelta64[m]')
>>> df
tp time
date
2020-10-21 1 2020-10-21 00:00:00
2020-10-21 2 2020-10-21 00:30:00
2020-10-21 3 2020-10-21 01:00:00
2020-10-21 4 2020-10-21 01:30:00
2020-10-21 5 2020-10-21 02:00:00
... .. ...
2020-10-22 44 2020-10-22 21:30:00
2020-10-22 45 2020-10-22 22:00:00
2020-10-22 46 2020-10-22 22:30:00
2020-10-22 47 2020-10-22 23:00:00
2020-10-22 48 2020-10-22 23:30:00
[96 rows x 2 columns]