Pandas 将 48 个交易时段转换为一天中的时间

Pandas convert 48 trading periods to time of day

我想用Python pandas将每天的48个交易周期转换成它们发生的时间。交易时段 1 = 午夜,2 = 12.30am,3 = 1am,等等

我的 MWE 是

import numpy as np
import pandas as pd
import datetime
from datetime import date, datetime, time, timedelta
import pyarrow as pa
import pyarrow.parquet as pq

# my dataset - 2 days
df = pd.DataFrame()
df['date'] = pd.to_datetime(['2020-10-21']*48+['2020-10-22']*48, format='%Y-%m-%d')
trp = np.arange(1,49,1) # 48 trading periods in each day
df['tp'] = pd.DataFrame(np.concatenate((trp,trp)))
df = df.set_index('date')
midnight = df.index.time

T = df.tp.values
tstep = pd.Timedelta(minutes=(30*(T-1)))
df['time'] = pd.to_datetime(midnight + tstep)

#for jj in range(len(demand)):
#    T = df.tp.values[jj]
#    tstep = pd.Timedelta(minutes=(30*(T-1)))
#    time0 = midnight + pd.to_datetime(tstep)
#    #df['time'] = df['time'].append(tstep)

df.head()

我一直收到错误消息

TypeError                                 Traceback (most recent call last)
<ipython-input-122-0b6c5efa5538> in <module>()
      8 
      9 T = df.tp.values
---> 10 tstep = pd.Timedelta(minutes=(30*(T-1)))
     11 df['time'] = pd.to_datetime(midnight + tstep)
     12 

pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas.Timedelta.__new__()

pandas/_libs/tslibs/timedeltas.pyx in pandas._libs.tslibs.timedeltas._to_py_int_float()

TypeError: Invalid type <class 'numpy.ndarray'>. Must be int or float.

我不确定如何解决这个问题,即使在尝试了 for 循环之后也是如此。

错误很明显:pd.Timestamp 接受标量值(float 或 int),而不是数组,作为分钟。

您可以使用 astype:

直接转换您的系列
>>> df.tp.astype('timedelta64[m]')
date
2020-10-21   0 days 00:01:00
2020-10-21   0 days 00:02:00
2020-10-21   0 days 00:03:00
2020-10-21   0 days 00:04:00
2020-10-21   0 days 00:05:00
                   ...      
2020-10-22   0 days 00:44:00
2020-10-22   0 days 00:45:00
2020-10-22   0 days 00:46:00
2020-10-22   0 days 00:47:00
2020-10-22   0 days 00:48:00
Name: tp, Length: 96, dtype: timedelta64[ns]

这里timedelta64[m]指定数字是timedelta,以分钟计算。您还应该直接使用 df.index 而不是 df.index.time 来使用 pandas 日期时间对象。从那里开始非常简单:

>>> df['time'] = df.index + (30 * (df.tp - 1)).astype('timedelta64[m]')
>>> df
            tp                time
date                              
2020-10-21   1 2020-10-21 00:00:00
2020-10-21   2 2020-10-21 00:30:00
2020-10-21   3 2020-10-21 01:00:00
2020-10-21   4 2020-10-21 01:30:00
2020-10-21   5 2020-10-21 02:00:00
...         ..                 ...
2020-10-22  44 2020-10-22 21:30:00
2020-10-22  45 2020-10-22 22:00:00
2020-10-22  46 2020-10-22 22:30:00
2020-10-22  47 2020-10-22 23:00:00
2020-10-22  48 2020-10-22 23:30:00

[96 rows x 2 columns]