计算 pandas 中的一行和前一行的 stdev,没有系列错误
Calculate stdev for a row and previous row in pandas without series error
这是我的数据集
Date,p1Close,p2Close,spread,movingAverage
2022-02-28,5,10,2,NaN
2022-03-01,2,6,3,2.5
2022-03-02,4,8,2,2.5
2022-03-03,2,8,4,3
我正在尝试在 pandas 数据框中创建一个新列,该列等于前一行和当前行 spread
之间的标准差。
df['standardDeviation'] = statistics.stdev(df['spread'], df['spread'].shift(1))
我不断收到此错误:
File "/usr/lib/python3.9/statistics.py", line 797, in stdev
var = variance(data, xbar)
File "/usr/lib/python3.9/statistics.py", line 740, in variance
T, ss = _ss(data, xbar)
File "/usr/lib/python3.9/statistics.py", line 684, in _ss
T, total, count = _sum((x-c)**2 for x in data)
File "/usr/lib/python3.9/statistics.py", line 166, in _sum
for n, d in map(_exact_ratio, values):
File "/usr/lib/python3.9/statistics.py", line 248, in _exact_ratio
raise TypeError(msg.format(type(x).__name__))
TypeError: can't convert type 'Series' to numerator/denominator
我相信这是因为我使用的是 shift(1)
并且在第一次计算时它没有 shift(1)
值所以它出错了。不确定如何解决这个问题。
你实际上可以只使用 <column>.rolling(2).std()
:
df['standardDeviation'] = df['spread'].rolling(2).std()
输出:
>>> df
Date p1Close p2Close spread movingAverage standardDeviation
0 2022-02-28 5 10 2 NaN NaN
1 2022-03-01 2 6 3 2.5 0.707107
2 2022-03-02 4 8 2 2.0 0.707107
3 2022-03-03 2 8 4 3.0 1.414214
@richardec 的回答是最好的解决方案,但对于您的具体问题,statistics.stdev
需要一个迭代器,因此您需要成对传递:
df['stdev'] = [statistics.stdev(pair) for pair in zip(df['spread'], df['spread'].shift())]
输出:
Date p1Close p2Close spread movingAverage stdev
0 2022-02-28 5 10 2 NaN NaN
1 2022-03-01 2 6 3 2.5 0.707107
2 2022-03-02 4 8 2 2.0 0.707107
3 2022-03-03 2 8 4 3.0 1.414214
这是我的数据集
Date,p1Close,p2Close,spread,movingAverage
2022-02-28,5,10,2,NaN
2022-03-01,2,6,3,2.5
2022-03-02,4,8,2,2.5
2022-03-03,2,8,4,3
我正在尝试在 pandas 数据框中创建一个新列,该列等于前一行和当前行 spread
之间的标准差。
df['standardDeviation'] = statistics.stdev(df['spread'], df['spread'].shift(1))
我不断收到此错误:
File "/usr/lib/python3.9/statistics.py", line 797, in stdev
var = variance(data, xbar)
File "/usr/lib/python3.9/statistics.py", line 740, in variance
T, ss = _ss(data, xbar)
File "/usr/lib/python3.9/statistics.py", line 684, in _ss
T, total, count = _sum((x-c)**2 for x in data)
File "/usr/lib/python3.9/statistics.py", line 166, in _sum
for n, d in map(_exact_ratio, values):
File "/usr/lib/python3.9/statistics.py", line 248, in _exact_ratio
raise TypeError(msg.format(type(x).__name__))
TypeError: can't convert type 'Series' to numerator/denominator
我相信这是因为我使用的是 shift(1)
并且在第一次计算时它没有 shift(1)
值所以它出错了。不确定如何解决这个问题。
你实际上可以只使用 <column>.rolling(2).std()
:
df['standardDeviation'] = df['spread'].rolling(2).std()
输出:
>>> df
Date p1Close p2Close spread movingAverage standardDeviation
0 2022-02-28 5 10 2 NaN NaN
1 2022-03-01 2 6 3 2.5 0.707107
2 2022-03-02 4 8 2 2.0 0.707107
3 2022-03-03 2 8 4 3.0 1.414214
@richardec 的回答是最好的解决方案,但对于您的具体问题,statistics.stdev
需要一个迭代器,因此您需要成对传递:
df['stdev'] = [statistics.stdev(pair) for pair in zip(df['spread'], df['spread'].shift())]
输出:
Date p1Close p2Close spread movingAverage stdev
0 2022-02-28 5 10 2 NaN NaN
1 2022-03-01 2 6 3 2.5 0.707107
2 2022-03-02 4 8 2 2.0 0.707107
3 2022-03-03 2 8 4 3.0 1.414214