评估 Return 的时间相关率以创建 Pandas DataFrame

Question

假设我有一个 Pandas 数据框如下：

+------------+--------+
|    Date    | Price  |
+------------+--------+
| 2021-07-30 | 438.51 |
| 2021-08-02 | 437.59 |
| 2021-08-03 | 441.15 |
| 2021-08-04 | 438.98 |
+------------+--------+

可以使用以下代码生成上述数据框：

data = {'Date': ['2021-07-30', '2021-08-02', '2021-08-03', '2021-08-04'],
        'Price': [438.51, 437.59, 441.15, 438.98]
        }

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
normalisation_days = 365.25
compounding_days = 365.25

对于给定的时间序列，我想计算依赖于时间的 rate_of_return，这里的问题是确定达到 rate_of_return 的最佳值或最差值的时间段。

可以简单地计算所有可能组合的 rate_of_return，然后创建一个包含 period_start period_end 和 rate_of_return 的数据框，并按降序（最佳）或升序排序（最差）顺序，然后排除任何重叠的时期。

rate_of_return = ((period_end_price/period_start_price)^(compounding_days/(days_in_between))-1 * (normalisation_days/compounding_days)

在上面的数据框中，我使用下面的代码计算了 rate_of_return

df['rate_of_return_l1'] = ((((df.Price /
                                   df.Price[0]) **
                                  (compounding_days /
                                   (df.Date - df.Date[0]).dt.days) - 1) *
                                 (normalisation_days /
                                  compounding_days)))
df['rate_of_return_l1'].iloc[0] = np.nan

df['rate_of_return_l2'] = ((((df.Price /
                                   df.Price[1]) **
                                  (compounding_days /
                                   (df.Date - df.Date[1]).dt.days) - 1) *
                                 (normalisation_days /
                                  compounding_days)))
df['rate_of_return_l2'].iloc[:2] = np.nan

df['rate_of_return_l3'] = ((((df.Price /
                                   df.Price[2]) **
                                  (compounding_days /
                                   (df.Date - df.Date[2]).dt.days) - 1) *
                                 (normalisation_days /
                                  compounding_days)))
df['rate_of_return_l3'].iloc[:3] = np.nan

根据结果，best/worst 个案例周期如下

+--------------+------------+----------------+
| Period Start | Period End | Rate of Return |
+--------------+------------+----------------+
| 2021-08-02   | 2021-08-03 |    18.28751739 |
| 2021-08-02   | 2021-08-04 |    0.784586925 |
| 2021-07-30   | 2021-08-03 |    0.729942907 |
| 2021-07-30   | 2021-08-04 |    0.081397181 |
| 2021-07-30   | 2021-08-02 |   -0.225626914 |
| 2021-08-03   | 2021-08-04 |   -0.834880227 |
+--------------+------------+----------------+

预期输出

如果我想看到 rate_of_return 中最好的结果数据帧将是

+--------------+------------+----------------+
| Period Start | Period End | Rate of Return |
+--------------+------------+----------------+
| 2021-08-02   | 2021-08-03 |    18.28751739 |
+--------------+------------+----------------+

如果我想查看 rate_of_return 中最差的情况，则生成的数据帧将是

+--------------+------------+----------------+
| Period Start | Period End | Rate of Return |
+--------------+------------+----------------+
| 2021-08-03   | 2021-08-04 |   -0.834880227 |
| 2021-07-30   | 2021-08-02 |   -0.225626914 |
+--------------+------------+----------------+

我们测试所有场景以进行计算的最佳方法是什么rate_of_return？
我怎样才能达到预期的产出，使周期不重叠？（在预期输出中看到）
Best/Worst 数据帧不依赖于符号最好的数据帧可以包含负数 rate_of_returns 假设没有时间段重叠。
如果公式更改为 (period_end_price/period_start_price) - 1（不依赖于时间），方法是什么？

Answer 1

定义你的函数，你可以直接传递数据框和开始、结束日期：

import numpy as np
import pandas as pd

data = {'Date': ['2021-07-30', '2021-08-02', '2021-08-03', '2021-08-04'],
        'Price': [438.51, 437.59, 441.15, 438.98]
        }

df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
normalisation_days = 365.25
compounding_days = 365.25

def rate_ret(df, start_date, end_date):

    start = df[df.Date==start_date].iloc[0]
    end = df[df.Date==end_date].iloc[0]
    period_start_price = start.Price
    period_end_price = end.Price
    days_in_between = (end.Date - start.Date).days
    return ((period_end_price/period_start_price)**(compounding_days/days_in_between)-1) * (normalisation_days/compounding_days)

# Iterate over all possible date intervals creating an array (or matrix),
#in the second `for` loop, we only include dates bigger than the starting date:

array = []
for start_date in df.Date:
    for end_date in df.Date[df.Date>start_date]:
        array.append([rate_ret(df, start_date, end_date), start_date, end_date])
print(array)

# To extract the best and the worst periods with no overlapping, 
# take the best save it and iteratively save the next comparing if they collide or not with the previous stored intervals:

def extract_non_overlaping(df):
    saved_rows = [df.iloc[0]]
    for i,row in df.iterrows():
        for saved in saved_rows:
            if (row['Period End'] < saved['Period Start']) or (row['Period Start'] > saved['Period End']):
                saved_rows.append(row)
                break # avoid saving duplicates
    return pd.DataFrame(saved_rows, columns=['Rate of Return','Period Start','Period End'])

df_higher  = pd.DataFrame(array, columns=['Rate of Return','Period Start','Period End']).reset_index(drop=True).sort_values(['Rate of Return'],ascending=False)
df_lower  = pd.DataFrame(array, columns=['Rate of Return','Period Start','Period End']).reset_index(drop=True).sort_values(['Rate of Return'])

extract_non_overlaping(df_higher)
extract_non_overlaping(df_lower)

而结果较低：

+--------------+------------+----------------+
| Period Start | Period End | Rate of Return |
+--------------+------------+----------------+
| 2021-08-02   | 2021-08-03 |    18.28751739 |
+--------------+------------+----------------+

更高：

+--------------+------------+----------------+
| Period Start | Period End | Rate of Return |
+--------------+------------+----------------+
| 2021-08-03   | 2021-08-04 |   -0.834880227 |
| 2021-07-30   | 2021-08-02 |   -0.225626914 |
+--------------+------------+----------------+

如果公式不依赖于时间，只需更改 rete_ret 定义中的公式即可。

pd：您可以进行一些优化，但总体而言代码有效。

Answer 2

如果我没理解错的话，你的问题分为两部分-

第 1 部分：生成组合

要生成组合，您可以使用 itertools，计算每个组合的 returns 并对结果进行排序。

from itertools import combinations
rors = []
for combination in combinations(zip(df['Date'], df['Price']), 2):
    (start_date, start_price), (end_date, end_price) = combination
    ror = (end_price / start_price) ** (compounding_days / (end_date - start_date).days) - 1
    rors.append((start_date, end_date, ror))

sorted_rors = sorted(rors, key=lambda x: x[2], reverse=True)
print(sorted_rors[0])
#(Timestamp('2021-08-02 00:00:00'),
# Timestamp('2021-08-03 00:00:00'),
# 18.28751738702541)

print(sorted_rors[-1])
#(Timestamp('2021-08-03 00:00:00'),
# Timestamp('2021-08-04 00:00:00'),
# -0.8348802270491325)

第 2 部分：非重叠时间段

这部分我不是很清楚，不过我猜你是想找top nreturns 具有不重叠的时间段。如果你看的时间段数比较多，可以考虑使用生成器函数 -

def next_non_overlapping(iterable):
    it = iter(iterable)
    first_start, first_end, first_ror = next(it)
    yield (first_start, first_end, first_ror)
    while True:
        try:
            next_start, next_end, next_ror = next(it)
            if next_start >= first_end or next_end <= first_start:
                yield (next_start, next_end, next_ror)
                first_start, first_end, first_ror = next_start, next_end, next_ror
        except StopIteration:
            print("No more items")
            break

nno = next_non_overlapping(sorted_rors)
print(next(nno))
#(Timestamp('2021-08-02 00:00:00'),
# Timestamp('2021-08-03 00:00:00'),
# 18.28751738702541)
print(next(nno))
#(Timestamp('2021-07-30 00:00:00'),
# Timestamp('2021-08-02 00:00:00'),
# -0.22562691374181088)
print(next(nno))
#(Timestamp('2021-08-03 00:00:00'),
# Timestamp('2021-08-04 00:00:00'),
# -0.8348802270491325)
print(next(nno))
# No more items

为了获得最低的 n returns，您可以简单地将反向列表传递给函数 - 即

nnor = next_non_overlapping(reversed(sorted_rors))

Answer 3

首先，如果时间序列是每日的，问题就更容易了。所以我会这样做：

df.set_index('Date').resample('d').mean().reset_index()

这让我们：

	Date	Price
0	2021-07-30 00:00:00	438.51
1	2021-07-31 00:00:00	nan
2	2021-08-01 00:00:00	nan
3	2021-08-02 00:00:00	437.59
4	2021-08-03 00:00:00	441.15
5	2021-08-04 00:00:00	438.98

从这里您可以计算出未来 return 到 x-days 的比率：

for holding_duration in range(1, 5):
    df[holding_duration] = df['Price'].pct_change(holding_duration).add(1).pow(365.25/holding_duration)

这给出：

	Date	Price	1	2	3	4
0	2021-07-30 00:00:00	438.51	nan	nan	nan	nan
1	2021-07-31 00:00:00	nan	nan	nan	nan	nan
2	2021-08-01 00:00:00	nan	nan	nan	nan	nan
3	2021-08-02 00:00:00	437.59	0.464356	nan	nan	nan
4	2021-08-03 00:00:00	441.15	19.2875	2.9927	nan	nan
5	2021-08-04 00:00:00	438.98	0.16512	1.78459	1.13931	nan

这可能会变得相当大...

从那里你可以做一个 row-wise argmax 并从中推导出持有期。

不是完整的解决方案，但也许有帮助。

评估 Return 的时间相关率以创建 Pandas DataFrame

Evaluate Time Dependent Rate of Return to create Pandas DataFrame

python

optimization

datetime

dataframe

pandas

第 1 部分：生成组合

第 2 部分：非重叠时间段