如何根据前 N 行计算 Pandas 数据框列的斜率

Question

我有以下示例数据框：

import pandas as pd

d = {'col1': [2, 5, 6, 5, 4, 6, 7, 8, 9, 7, 5]}

df = pd.DataFrame(data=d)
print(df)

输出：

我需要从 col1 计算前 N 行的斜率并将斜率值保存在单独的列中（称之为 slope).所需的输出可能如下所示：（为了举例，下面给定的斜率值只是随机数。）

       col1  slope
0      2
1      5
2      6
3      5
4      4     3
5      6     4
6      7     5
7      8     2
8      9     4
9      7     6
10     5     5

因此，在索引号为4的行中，斜率为3，即[2, 5, 6, 5, 4]的斜率。

有没有不使用 for 循环的优雅方法？

附录：

根据下面接受的答案，如果您遇到以下错误：

TypeError: ufunc 'true_divide' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

可能是因为您的数据框索引可能不是数字。以下修改使其工作：

df['slope'] = df['col1'].rolling(5).apply(lambda s: linregress(range(5), s.values)[0])

Answer 1

您可以使用 rolling+apply and scipy.stats.linregress:

from scipy.stats import linregress

df['slope'] = df['col1'].rolling(5).apply(lambda s: linregress(s.reset_index())[0])

print(df)

输出：

    col1  slope
0      2    NaN
1      5    NaN
2      6    NaN
3      5    NaN
4      4    0.4
5      6    0.0
6      7    0.3
7      8    0.9
8      9    1.2
9      7    0.4
10     5   -0.5

Answer 2

让我们用numpy

def slope_numpy(x,y):
    fit = np.polyfit(x, y, 1)
    return np.poly1d(fit)[0]
df.col1.rolling(5).apply(lambda x : slope_numpy(range(5),x))
0     NaN
1     NaN
2     NaN
3     NaN
4     3.6
5     5.2
6     5.0
7     4.2
8     4.4
9     6.6
10    8.2
Name: col1, dtype: float64

如何根据前 N 行计算 Pandas 数据框列的斜率

How to calculate slope of Pandas dataframe column based on previous N rows

python

numpy

scipy

dataframe

pandas