计算 pandas 中的一行和前一行的 stdev，没有系列错误

Question

这是我的数据集

Date,p1Close,p2Close,spread,movingAverage
2022-02-28,5,10,2,NaN
2022-03-01,2,6,3,2.5
2022-03-02,4,8,2,2.5
2022-03-03,2,8,4,3

我正在尝试在 pandas 数据框中创建一个新列，该列等于前一行和当前行 spread 之间的标准差。

    df['standardDeviation'] = statistics.stdev(df['spread'], df['spread'].shift(1))

我不断收到此错误：

    File "/usr/lib/python3.9/statistics.py", line 797, in stdev
    var = variance(data, xbar)
  File "/usr/lib/python3.9/statistics.py", line 740, in variance
    T, ss = _ss(data, xbar)
  File "/usr/lib/python3.9/statistics.py", line 684, in _ss
    T, total, count = _sum((x-c)**2 for x in data)
  File "/usr/lib/python3.9/statistics.py", line 166, in _sum
    for n, d in map(_exact_ratio, values):
  File "/usr/lib/python3.9/statistics.py", line 248, in _exact_ratio
    raise TypeError(msg.format(type(x).__name__))
TypeError: can't convert type 'Series' to numerator/denominator

我相信这是因为我使用的是 shift(1) 并且在第一次计算时它没有 shift(1) 值所以它出错了。不确定如何解决这个问题。

Answer 1

你实际上可以只使用 <column>.rolling(2).std():

df['standardDeviation'] = df['spread'].rolling(2).std()

输出：

>>> df
         Date  p1Close  p2Close  spread  movingAverage  standardDeviation
0  2022-02-28        5       10       2            NaN                NaN
1  2022-03-01        2        6       3            2.5           0.707107
2  2022-03-02        4        8       2            2.0           0.707107
3  2022-03-03        2        8       4            3.0           1.414214

Answer 2

@richardec 的回答是最好的解决方案，但对于您的具体问题，statistics.stdev 需要一个迭代器，因此您需要成对传递：

df['stdev'] = [statistics.stdev(pair) for pair in zip(df['spread'], df['spread'].shift())]

输出：

         Date  p1Close  p2Close  spread  movingAverage     stdev
0  2022-02-28        5       10       2            NaN       NaN
1  2022-03-01        2        6       3            2.5  0.707107
2  2022-03-02        4        8       2            2.0  0.707107
3  2022-03-03        2        8       4            3.0  1.414214

计算 pandas 中的一行和前一行的 stdev，没有系列错误

Calculate stdev for a row and previous row in pandas without series error

python

dataframe

python-3.x

pandas

rolling-computation