如何使用pd.DateTime.replace(second=0)?
How to use pd.DateTime.replace(second=0)?
鉴于此数据集:
...我想创建开盘价、最高价和最低价列,重新采样到每行的开始分钟。请注意,在这种情况下我们不能简单地使用 .resample()
。我想要得到的是一个看起来像这样的数据集:
我不想为此使用 for 循环,而是对开放、高和低列进行列计算(除非有更快的方法来执行此操作,或者如果 .resample()
可以以某种方式在这种情况下工作)。
时间栏采用 pd.to_datetime()
格式。
我试图为最大列做这样的事情:
tick_df['tick_high'] = tick_df[(tick_df['time'] >= tick_df['time'].replace(second=0)) & (tick_df['time'] <= tick_df['time'])].max()
...这里的逻辑是,select 当前日期时间在分钟顶部(即 0 秒)之间的行,并转到当前行的日期时间。因此,如果查看第一行,示例将在 2022-02-11 19:57:00
到 2022-02-11 19:57:20
之间。
然而,当我尝试这样做时,出现错误:
TypeError: replace() got an unexpected keyword argument 'second'
...因为技术上我使用的是 pandas' 替换函数,而不是 datetime.replace 函数。所以我也尝试在 .replace
之前添加 .dt
并得到这个:
AttributeError: 'DatetimeProperties' object has no attribute 'replace'
关于如何实现所需输出的任何建议?作为参考,这是我的可重现代码:
from datetime import datetime
import pandas as pd
# create a mock tick df
tick_time = ["2022-02-11 19:57:20",
"2022-02-11 19:57:40",
"2022-02-11 19:58:01",
"2022-02-11 19:58:09",
"2022-02-11 19:58:31",
"2022-02-11 19:58:45",
"2022-02-11 19:58:58",
"2022-02-11 19:59:00",
"2022-02-11 19:59:20",
"2022-02-11 19:59:40",
"2022-02-11 19:59:55"]
tick_time = pd.to_datetime(tick_time)
tick_df = pd.DataFrame(
{
"time": tick_time,
"tick_close": [440.39,440.38,440.39,440.40,440.41,440.42,440.45,440.50,440.52,440.51,440.59],
},
)
print(tick_df)
# Attempt to resample ticks ohlc from the beginning of each minute
tick_df['tick_high'] = tick_df[(tick_df['time'] >= tick_df['time'].dt.replace(second=0)) & (tick_df['time'] <= tick_df['time'])].max()
我明天会回来复习答案。谢谢!
基于 GitHub 票我们可以用 map
tick_df['time'].map(lambda x : x.replace(second=0))
得到你的输出
cond1 = tick_df['time'].map(lambda x : x.replace(second=0))
tick_df['tick_high'] = [tick_df.loc[(tick_df['time']>=x) & (tick_df['time']<=y) ,'tick_close'].max() for x, y in zip(cond1,tick_df['time'])]
tick_df
Out[552]:
time tick_close tick_high
0 2022-02-11 19:57:20 440.39 440.39
1 2022-02-11 19:57:40 440.38 440.39
2 2022-02-11 19:58:01 440.39 440.39
3 2022-02-11 19:58:09 440.40 440.40
4 2022-02-11 19:58:31 440.41 440.41
5 2022-02-11 19:58:45 440.42 440.42
6 2022-02-11 19:58:58 440.45 440.45
7 2022-02-11 19:59:00 440.50 440.50
8 2022-02-11 19:59:20 440.52 440.52
9 2022-02-11 19:59:40 440.51 440.52
10 2022-02-11 19:59:55 440.59 440.59
IIUC,你要吗?
i = pd.Index(['first','cummax','cummin'])
tick_df.join(
pd.concat([tick_df.groupby(pd.Grouper(key='time', freq='T'))['tick_close']
.transform(c)
.rename(f'tick_{c}')
for c in i], axis=1)
)
输出:
time tick_close tick_first tick_cummax tick_cummin
0 2022-02-11 19:57:20 440.39 440.39 440.39 440.39
1 2022-02-11 19:57:40 440.38 440.39 440.39 440.38
2 2022-02-11 19:58:01 440.39 440.39 440.39 440.39
3 2022-02-11 19:58:09 440.40 440.39 440.40 440.39
4 2022-02-11 19:58:31 440.41 440.39 440.41 440.39
5 2022-02-11 19:58:45 440.42 440.39 440.42 440.39
6 2022-02-11 19:58:58 440.45 440.39 440.45 440.39
7 2022-02-11 19:59:00 440.50 440.50 440.50 440.50
8 2022-02-11 19:59:20 440.52 440.50 440.52 440.50
9 2022-02-11 19:59:40 440.51 440.50 440.52 440.50
10 2022-02-11 19:59:55 440.59 440.50 440.59 440.50
鉴于此数据集:
...我想创建开盘价、最高价和最低价列,重新采样到每行的开始分钟。请注意,在这种情况下我们不能简单地使用 .resample()
。我想要得到的是一个看起来像这样的数据集:
我不想为此使用 for 循环,而是对开放、高和低列进行列计算(除非有更快的方法来执行此操作,或者如果 .resample()
可以以某种方式在这种情况下工作)。
时间栏采用 pd.to_datetime()
格式。
我试图为最大列做这样的事情:
tick_df['tick_high'] = tick_df[(tick_df['time'] >= tick_df['time'].replace(second=0)) & (tick_df['time'] <= tick_df['time'])].max()
...这里的逻辑是,select 当前日期时间在分钟顶部(即 0 秒)之间的行,并转到当前行的日期时间。因此,如果查看第一行,示例将在 2022-02-11 19:57:00
到 2022-02-11 19:57:20
之间。
然而,当我尝试这样做时,出现错误:
TypeError: replace() got an unexpected keyword argument 'second'
...因为技术上我使用的是 pandas' 替换函数,而不是 datetime.replace 函数。所以我也尝试在 .replace
之前添加 .dt
并得到这个:
AttributeError: 'DatetimeProperties' object has no attribute 'replace'
关于如何实现所需输出的任何建议?作为参考,这是我的可重现代码:
from datetime import datetime
import pandas as pd
# create a mock tick df
tick_time = ["2022-02-11 19:57:20",
"2022-02-11 19:57:40",
"2022-02-11 19:58:01",
"2022-02-11 19:58:09",
"2022-02-11 19:58:31",
"2022-02-11 19:58:45",
"2022-02-11 19:58:58",
"2022-02-11 19:59:00",
"2022-02-11 19:59:20",
"2022-02-11 19:59:40",
"2022-02-11 19:59:55"]
tick_time = pd.to_datetime(tick_time)
tick_df = pd.DataFrame(
{
"time": tick_time,
"tick_close": [440.39,440.38,440.39,440.40,440.41,440.42,440.45,440.50,440.52,440.51,440.59],
},
)
print(tick_df)
# Attempt to resample ticks ohlc from the beginning of each minute
tick_df['tick_high'] = tick_df[(tick_df['time'] >= tick_df['time'].dt.replace(second=0)) & (tick_df['time'] <= tick_df['time'])].max()
我明天会回来复习答案。谢谢!
基于 GitHub 票我们可以用 map
tick_df['time'].map(lambda x : x.replace(second=0))
得到你的输出
cond1 = tick_df['time'].map(lambda x : x.replace(second=0))
tick_df['tick_high'] = [tick_df.loc[(tick_df['time']>=x) & (tick_df['time']<=y) ,'tick_close'].max() for x, y in zip(cond1,tick_df['time'])]
tick_df
Out[552]:
time tick_close tick_high
0 2022-02-11 19:57:20 440.39 440.39
1 2022-02-11 19:57:40 440.38 440.39
2 2022-02-11 19:58:01 440.39 440.39
3 2022-02-11 19:58:09 440.40 440.40
4 2022-02-11 19:58:31 440.41 440.41
5 2022-02-11 19:58:45 440.42 440.42
6 2022-02-11 19:58:58 440.45 440.45
7 2022-02-11 19:59:00 440.50 440.50
8 2022-02-11 19:59:20 440.52 440.52
9 2022-02-11 19:59:40 440.51 440.52
10 2022-02-11 19:59:55 440.59 440.59
IIUC,你要吗?
i = pd.Index(['first','cummax','cummin'])
tick_df.join(
pd.concat([tick_df.groupby(pd.Grouper(key='time', freq='T'))['tick_close']
.transform(c)
.rename(f'tick_{c}')
for c in i], axis=1)
)
输出:
time tick_close tick_first tick_cummax tick_cummin
0 2022-02-11 19:57:20 440.39 440.39 440.39 440.39
1 2022-02-11 19:57:40 440.38 440.39 440.39 440.38
2 2022-02-11 19:58:01 440.39 440.39 440.39 440.39
3 2022-02-11 19:58:09 440.40 440.39 440.40 440.39
4 2022-02-11 19:58:31 440.41 440.39 440.41 440.39
5 2022-02-11 19:58:45 440.42 440.39 440.42 440.39
6 2022-02-11 19:58:58 440.45 440.39 440.45 440.39
7 2022-02-11 19:59:00 440.50 440.50 440.50 440.50
8 2022-02-11 19:59:20 440.52 440.50 440.52 440.50
9 2022-02-11 19:59:40 440.51 440.50 440.52 440.50
10 2022-02-11 19:59:55 440.59 440.50 440.59 440.50