如何从时间序列数据框中获取统计信息并根据特定规则将其归入第三个数据框中?
how to Get stats from timeseries dataframes and impute it in 3rd dataframe based on certain rules?
我有两个数据帧
df1=pd.DataFrame({"a":pd.date_range("2021-01-01","2021-01-10"),
"b" :[12,13,16,15,12,14,17,19,20,21]})
df2=pd.DataFrame({"a":pd.date_range("2021-01-01","2021-01-10"),
"b":[np.nan,np.nan,13.7,14.3,14,15,14.3,16.7,18.8,20]})
whenevr df1[b] > df2[b] 我想买,whenevr df1[b] < df2[b] 退出。
注:
- 如果之前的买入没有退出,我不会买入
- 如果买入发生但退出条件没有发生,那么我们将最后日期视为退出
需要输出
#dataframe
stat=pd.DataFrame({"stock":["b","b"],
"Entry_date":["2021-01-03","2021-01-07"],
"Entry_price":[16,17],
"Exit_date":["2021-01-05","2021-01-10"],
"Exit_price":[12,21]})
更新(多只股票)
给定多个股票 b
和 c
:
df1 = pd.DataFrame({'a':{0:pd.Timestamp('2021-01-01 00:00:00'),1:pd.Timestamp('2021-01-02 00:00:00'),2:pd.Timestamp('2021-01-03 00:00:00'),3:pd.Timestamp('2021-01-04 00:00:00'),4:pd.Timestamp('2021-01-05 00:00:00'),5:pd.Timestamp('2021-01-06 00:00:00'),6:pd.Timestamp('2021-01-07 00:00:00'),7:pd.Timestamp('2021-01-08 00:00:00'),8:pd.Timestamp('2021-01-09 00:00:00'),9:pd.Timestamp('2021-01-10 00:00:00')},'b':{0:12,1:13,2:16,3:15,4:12,5:14,6:17,7:19,8:20,9:21},'c':{0:13,1:16,2:15,3:12,4:14,5:17,6:19,7:20,8:21,9:12}})
df2 = pd.DataFrame({'a':{0:pd.Timestamp('2021-01-01 00:00:00'),1:pd.Timestamp('2021-01-02 00:00:00'),2:pd.Timestamp('2021-01-03 00:00:00'),3:pd.Timestamp('2021-01-04 00:00:00'),4:pd.Timestamp('2021-01-05 00:00:00'),5:pd.Timestamp('2021-01-06 00:00:00'),6:pd.Timestamp('2021-01-07 00:00:00'),7:pd.Timestamp('2021-01-08 00:00:00'),8:pd.Timestamp('2021-01-09 00:00:00'),9:pd.Timestamp('2021-01-10 00:00:00')},'b':{0:np.nan,1:np.nan,2:13.7,3:14.3,4:14.0,5:15.0,6:14.3,7:16.7,8:18.8,9:20.0},'c':{0:np.nan,1:13.7,2:14.3,3:14.0,4:15.0,5:14.3,6:16.7,7:30.0,8:30.0,9:30.0}})
# -------- df1 -------- ---------- df2 ----------
# a b c a b c
# 0 2021-01-01 12 13 0 2021-01-01 NaN NaN
# 1 2021-01-02 13 16 1 2021-01-02 NaN 13.7
# 2 2021-01-03 16 15 2 2021-01-03 13.7 14.3
# 3 2021-01-04 15 12 3 2021-01-04 14.3 14.0
# 4 2021-01-05 12 14 4 2021-01-05 14.0 15.0
# 5 2021-01-06 14 17 5 2021-01-06 15.0 14.3
# 6 2021-01-07 17 19 6 2021-01-07 14.3 16.7
# 7 2021-01-08 19 20 7 2021-01-08 16.7 30.0
# 8 2021-01-09 20 21 8 2021-01-09 18.8 30.0
# 9 2021-01-10 21 12 9 2021-01-10 20.0 30.0
首先将日期设置为索引并创建初始df1 > df2
掩码:
df1 = df1.set_index('a')
df2 = df2.set_index('a')
mask = df1 > df2
要处理最后缺少的退出条件(如库存 b
),请在 df1
和 mask
:
后附加一行
# duplicate the mask's last row but with False values
# - if the last row originally was True, this closes the exit condition
# - if the last row originally was False, this has no effect
mask = mask.append(mask.tail(1))
mask.iloc[-1] = False
# duplicate df1's last row to match the mask
df1 = df1.append(df1.tail(1))
索引 entry
点与 mask & ~mask.shift()
和 exit
点与 ~mask & mask.shift()
:
entry = (df1[mask & ~mask.shift().bfill()]
.stack().reset_index(name='entry_price')
.rename(columns={'a': 'entry_date', 'level_1': 'stock'}))
exit = (df1[~mask & mask.shift().bfill()]
.stack().reset_index(name='exit_price')
.rename(columns={'a': 'exit_date', 'level_1': 'stock'}))
然后concat()
帧:
pd.concat([
entry[['stock', 'entry_date', 'entry_price']],
exit[['exit_date', 'exit_price']],
], axis=1).sort_values(by='stock')
# stock entry_date entry_price exit_date exit_price
# 1 b 2021-01-03 16.0 2021-01-05 12.0
# 3 b 2021-01-07 17.0 2021-01-10 21.0
# 0 c 2021-01-02 16.0 2021-01-04 12.0
# 2 c 2021-01-06 17.0 2021-01-08 20.0
原答案(一只)
创建一个布尔值 mask
并用 mask & ~mask.shift()
:
索引 entry
点
mask = df1.b > df2.b
entry = df1[mask & ~mask.shift().bfill()].rename(columns={'a': 'entry_date', 'b': 'entry_price'})
# entry_date entry_price
# 0 2021-01-03 16
# 1 2021-01-07 17
并用 ~mask & mask.shift()
索引 exit
点:
exit = df1[~mask & mask.shift().bfill()]
exit = pd.concat([exit, df1.tail(1)]).rename(columns={'a': 'exit_date', 'b': 'exit_price'})
# exit_date exit_price
# 0 2021-01-05 12
# 1 2021-01-10 21
然后concat()
帧:
stat = pd.concat([
entry.reset_index(drop=True),
exit.reset_index(drop=True),
], axis=1).assign(stock='b').dropna()
# entry_date entry_price exit_date exit_price stock
# 0 2021-01-03 16 2021-01-05 12 b
# 1 2021-01-07 17 2021-01-10 21 b
我有两个数据帧
df1=pd.DataFrame({"a":pd.date_range("2021-01-01","2021-01-10"),
"b" :[12,13,16,15,12,14,17,19,20,21]})
df2=pd.DataFrame({"a":pd.date_range("2021-01-01","2021-01-10"),
"b":[np.nan,np.nan,13.7,14.3,14,15,14.3,16.7,18.8,20]})
whenevr df1[b] > df2[b] 我想买,whenevr df1[b] < df2[b] 退出。
注:
- 如果之前的买入没有退出,我不会买入
- 如果买入发生但退出条件没有发生,那么我们将最后日期视为退出
需要输出
#dataframe
stat=pd.DataFrame({"stock":["b","b"],
"Entry_date":["2021-01-03","2021-01-07"],
"Entry_price":[16,17],
"Exit_date":["2021-01-05","2021-01-10"],
"Exit_price":[12,21]})
更新(多只股票)
给定多个股票 b
和 c
:
df1 = pd.DataFrame({'a':{0:pd.Timestamp('2021-01-01 00:00:00'),1:pd.Timestamp('2021-01-02 00:00:00'),2:pd.Timestamp('2021-01-03 00:00:00'),3:pd.Timestamp('2021-01-04 00:00:00'),4:pd.Timestamp('2021-01-05 00:00:00'),5:pd.Timestamp('2021-01-06 00:00:00'),6:pd.Timestamp('2021-01-07 00:00:00'),7:pd.Timestamp('2021-01-08 00:00:00'),8:pd.Timestamp('2021-01-09 00:00:00'),9:pd.Timestamp('2021-01-10 00:00:00')},'b':{0:12,1:13,2:16,3:15,4:12,5:14,6:17,7:19,8:20,9:21},'c':{0:13,1:16,2:15,3:12,4:14,5:17,6:19,7:20,8:21,9:12}})
df2 = pd.DataFrame({'a':{0:pd.Timestamp('2021-01-01 00:00:00'),1:pd.Timestamp('2021-01-02 00:00:00'),2:pd.Timestamp('2021-01-03 00:00:00'),3:pd.Timestamp('2021-01-04 00:00:00'),4:pd.Timestamp('2021-01-05 00:00:00'),5:pd.Timestamp('2021-01-06 00:00:00'),6:pd.Timestamp('2021-01-07 00:00:00'),7:pd.Timestamp('2021-01-08 00:00:00'),8:pd.Timestamp('2021-01-09 00:00:00'),9:pd.Timestamp('2021-01-10 00:00:00')},'b':{0:np.nan,1:np.nan,2:13.7,3:14.3,4:14.0,5:15.0,6:14.3,7:16.7,8:18.8,9:20.0},'c':{0:np.nan,1:13.7,2:14.3,3:14.0,4:15.0,5:14.3,6:16.7,7:30.0,8:30.0,9:30.0}})
# -------- df1 -------- ---------- df2 ----------
# a b c a b c
# 0 2021-01-01 12 13 0 2021-01-01 NaN NaN
# 1 2021-01-02 13 16 1 2021-01-02 NaN 13.7
# 2 2021-01-03 16 15 2 2021-01-03 13.7 14.3
# 3 2021-01-04 15 12 3 2021-01-04 14.3 14.0
# 4 2021-01-05 12 14 4 2021-01-05 14.0 15.0
# 5 2021-01-06 14 17 5 2021-01-06 15.0 14.3
# 6 2021-01-07 17 19 6 2021-01-07 14.3 16.7
# 7 2021-01-08 19 20 7 2021-01-08 16.7 30.0
# 8 2021-01-09 20 21 8 2021-01-09 18.8 30.0
# 9 2021-01-10 21 12 9 2021-01-10 20.0 30.0
首先将日期设置为索引并创建初始df1 > df2
掩码:
df1 = df1.set_index('a')
df2 = df2.set_index('a')
mask = df1 > df2
要处理最后缺少的退出条件(如库存 b
),请在 df1
和 mask
:
# duplicate the mask's last row but with False values
# - if the last row originally was True, this closes the exit condition
# - if the last row originally was False, this has no effect
mask = mask.append(mask.tail(1))
mask.iloc[-1] = False
# duplicate df1's last row to match the mask
df1 = df1.append(df1.tail(1))
索引 entry
点与 mask & ~mask.shift()
和 exit
点与 ~mask & mask.shift()
:
entry = (df1[mask & ~mask.shift().bfill()]
.stack().reset_index(name='entry_price')
.rename(columns={'a': 'entry_date', 'level_1': 'stock'}))
exit = (df1[~mask & mask.shift().bfill()]
.stack().reset_index(name='exit_price')
.rename(columns={'a': 'exit_date', 'level_1': 'stock'}))
然后concat()
帧:
pd.concat([
entry[['stock', 'entry_date', 'entry_price']],
exit[['exit_date', 'exit_price']],
], axis=1).sort_values(by='stock')
# stock entry_date entry_price exit_date exit_price
# 1 b 2021-01-03 16.0 2021-01-05 12.0
# 3 b 2021-01-07 17.0 2021-01-10 21.0
# 0 c 2021-01-02 16.0 2021-01-04 12.0
# 2 c 2021-01-06 17.0 2021-01-08 20.0
原答案(一只)
创建一个布尔值 mask
并用 mask & ~mask.shift()
:
entry
点
mask = df1.b > df2.b
entry = df1[mask & ~mask.shift().bfill()].rename(columns={'a': 'entry_date', 'b': 'entry_price'})
# entry_date entry_price
# 0 2021-01-03 16
# 1 2021-01-07 17
并用 ~mask & mask.shift()
索引 exit
点:
exit = df1[~mask & mask.shift().bfill()]
exit = pd.concat([exit, df1.tail(1)]).rename(columns={'a': 'exit_date', 'b': 'exit_price'})
# exit_date exit_price
# 0 2021-01-05 12
# 1 2021-01-10 21
然后concat()
帧:
stat = pd.concat([
entry.reset_index(drop=True),
exit.reset_index(drop=True),
], axis=1).assign(stock='b').dropna()
# entry_date entry_price exit_date exit_price stock
# 0 2021-01-03 16 2021-01-05 12 b
# 1 2021-01-07 17 2021-01-10 21 b