滚动百分位函数在列中输出 0？

Question

在创建函数并使用 rolling( ) 和 apply( ) 计算滚动的 3 天百分位分布时，它在前 3 天后的其余列中显示 0。

我假设具有 NaN 值的前 2 天未用于计算百分位函数，因此可能将其余列默认为零，并错误地给出 33 值第三天。但是我不确定这个。

我一直在尝试解决这个问题，但没有得到任何解决方案。有谁知道为什么以及如何解决下面的正确代码？将不胜感激。

import pandas as pd
import numpy as np
from scipy import stats

data = { 'a': [1, 15, 27, 399, 17, 568, 200, 9], 
         'b': [2, 30, 15, 60, 15, 80, 53, 41],
         'c': [100,200, 3, 78, 25, 88, 300, 91],
         'd': [4, 300, 400, 500, 23, 43, 9, 71]
         }

dfgrass = pd.DataFrame(data)

def percnum(x):
    for t in dfgrass.index:
        aaa = (x<=dfgrass.loc[t,'b']).value_counts()
        ccc = (x<=dfgrass.loc[t, 'b']).values.sum()
        vvv = len(x)
        nnn = ccc/ vvv
        return nnn * 100

dfgrass['e'] = dfgrass['b'].rolling(window=3).apply(percnum)
print(dfgrass)

Answer 1

或许可以尝试在 def percnum(x) 的实现中将 for t in dfgrass.index 更改为 for t in x.index，如下所示：

def percnum(x):
    for t in x.index:
        aaa = (x<=dfgrass.loc[t,'b']).value_counts()
        ccc = (x<=dfgrass.loc[t, 'b']).values.sum()
        vvv = len(x)
        nnn = ccc/ vvv
        return nnn * 100

Answer 2

如果您正在尝试计算百分位数排名，那么您可以尝试类似

的方法

def percnum(x):
    n = len(x)
    temp = x.argsort()
    ranks = np.empty(n)
    ranks[temp] = (np.arange(n) + 1) / n
    return ranks[-1]

dfgrass.rolling(3).apply(percnum)

给出以下输出

          a         b         c         d
0       NaN       NaN       NaN       NaN
1       NaN       NaN       NaN       NaN
2  1.000000  0.666667  0.333333  1.000000
3  1.000000  1.000000  0.666667  1.000000
4  0.333333  0.666667  0.666667  0.333333
5  1.000000  1.000000  1.000000  0.666667
6  0.666667  0.666667  1.000000  0.333333
7  0.333333  0.333333  0.666667  1.000000

Answer 3

您尝试的另一种选择是在您的函数中直接应用 pandas' rank 方法和 pct=True。这将运行百分位数方法直接在滚动 window 定义的子集上。可以这样做：

def rolling_percentile(x):
    d = pd.DataFrame(x)
    d['rolling'] = d.rank(pct=True)
    return d.iloc[-1, 1]

然后您可以将其插入到您的申请中：

df['rolling_apply'] = df[column].rolling(window).apply(rolling_percentile)

关于该函数的附加说明：还有其他方法可以做到这一点，但在该函数中，我在 x 的子集上创建了一个 rolling 列初始数据框。因为对于每个 x 一个 window 传递了 n 个以前的值。例如，如果 window 是三个，则将传递一个 numpy 数组，看起来有点像这样：[1, 15, 27]。因此，我们感兴趣的滚动百分比是 x 的最后一个值相对于 window 中包含的值之一。因此，我们在位置 [-1, 1] 处获得该值，该值对应于最后一个值的 rolling 列。

Answer 4

您可以使用 pandas rolling 函数结合 quantile 如下。输入 0 到 1 之间的任何分位数（即您的 percentile/100）。如果你不想开头的是Nans，就把min_periods设为1。

data = { 'a': [1, 15, 27, 399, 17, 568, 200, 9], 
         'b': [2, 30, 15, 60, 15, 80, 53, 41],
         'c': [100,200, 3, 78, 25, 88, 300, 91],
         'd': [4, 300, 400, 500, 23, 43, 9, 71]
         }
dfgrass = pd.DataFrame(data)
rolling_percentile=dfgrass.rolling(window=3,min_periods=1,center=False,axis=0).quantile(0.4)
print(rolling_percentile)

给出以下输出：

       a     b      c      d
0    1.0   2.0  100.0    4.0
1    6.6  13.2  140.0  122.4
2   12.2  12.4   80.6  240.8
3   24.6  27.0   63.0  380.0
4   25.0  15.0   20.6  324.6
5  322.6  51.0   67.4   39.0
6  163.4  45.4   75.4   20.2
7  161.8  50.6   90.4   36.2

滚动百分位函数在列中输出 0？

Rolling Percentile Function outputting 0's in column?

percentile

dataframe

python-3.x

pandas

rolling-computation