使用来自另一列的滚动值的数据框百分位数

Question

我有以下数据帧结构作为示例。

我想获取一个列，在该列中，它使用滚动 n 期回顾根据 "percentile" 列的值计算 "price column" 的百分位数。

可能吗？我尝试使用某种 lambda 函数并使用 .apply 语法但无法使其工作。

        date     percentile  price   desired_row
    2019-11-08  0.355556    0.6863    36th percentile of price of last n period
    2019-11-11  0.316667    0.6851    32nd percentile of price of last n period
    2019-11-12  0.305556    0.6841    ...
    2019-11-13  0.302778    0.6838    ...
    2019-11-14  0.244444    0.6798    ...

谢谢！！

Answer 1

可以使用pandas中的滚动方式。例如：

df = pd.DataFrame({'B': [0, 1, 2, 2, 4]})
df['rolling_mean'] = df['B'].rolling(2).mean()

将创建 'B' 列的两个周期滚动平均值的新列。如果您需要计算不同的汇总统计量，您可以应用不同的方法，例如：

df['rolling_sum'] = df['B'].rolling(2).sum()

有关功能的更多信息，请参阅： https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html

Answer 2

基于 apply 中的 , you can use rolling on the column price with the column percentile in index and then use quantile，参数 raw=False:

window = 3
df['desired_row'] = df.set_index('percentile')['price'].rolling(window)\
                      .apply(lambda x: x.quantile(q=x.index[-1]), raw=False).values
print (df)
         date  percentile   price  desired_row
0  2019-11-08    0.355556  0.6863          NaN
1  2019-11-11    0.316667  0.6851          NaN
2  2019-11-12    0.305556  0.6841     0.684711
3  2019-11-13    0.302778  0.6838     0.683982
4  2019-11-14    0.244444  0.6798     0.681756

您可以根据需要更改quantile中的interpolation参数。

使用来自另一列的滚动值的数据框百分位数

Dataframe percentile using a rolling value from another column

python

percentile

dataframe

pandas