计算 3D 锯齿状 NumPy 数组的 2D 均值

Question

我正在尝试计算 python 中时间序列的 Hurst 指数，该值决定了量化金融时间序列的一些均值回归特征。我采用了任意长度的时间序列，并选择将其拆分为数据块，该过程是计算 Hurst 指数（几种方法之一）的一部分。我把它写成一个函数。假设我的时间序列（证券价格）为 "y"，我想要的块数为 "n":

def hurst(y,n):

     y = array_split(y,n)

问题是现在数组被分成块，其中一个块的大小与其他块不相等。我想找到每个块的均值、标准差、均值居中序列、均值居中序列的累积和以及累积和的范围。但是由于数组的大小不统一，我还没有找到实现这个的方法。基本上当我试图通过

mean(y,axis=0)

或 1 或 2，对于轴，我得到一个错误。当使用n=20时，数组的形状给定为

(20,)

我想也许 "vectorize" 可以帮助我？但是我还没有完全弄清楚如何使用它。我试图避免循环访问数据。

拆分后的样本数据：

[array([[ 1.04676],
   [ 1.0366 ],
   [ 1.0418 ],
   [ 1.0536 ],
   [ 1.0639 ],
   [ 1.06556],
   [ 1.0668 ]]), array([[ 1.056  ],
   [ 1.053  ],
   [ 1.0521 ],
   [ 1.0517 ],
   [ 1.0551 ],
   [ 1.0485 ],
   [ 1.05705]]), array([[ 1.0531],
   [ 1.0545],
   [ 1.0682],
   [ 1.08  ],
   [ 1.0728],
   [ 1.061 ],
   [ 1.0554]]), array([[ 1.0642],
   [ 1.0607],
   [ 1.0546],
   [ 1.0521],
   [ 1.0548],
   [ 1.0647],
   [ 1.0604]])

数据类型list

Answer 1

要列出平均值，您只需使用 list comprehension:

    [mean(x[axis]) for axis in range(len(x))]

它遍历轴并计算每个部分的平均值。

Answer 2

对于任何偶然发现这个问题的人，我已经解决了这个问题并决定改用 Pandas Dataframe...

def hurst(y,n):

y = prices.as_matrix()
y = array_split(y,n)
y = pd.DataFrame.from_records(y).transpose()
y = y.dropna()

# Mean Centered Series

m = y.mean(axis='columns')

Y = y.sub(m,axis = 'rows')


# Standard Deviation of Series

S = y.std(axis='columns')

# Cumulative Sum Series

Z = Y.cumsum()

# Range Series

R = Z.max(axis='columns')-Z.min(axis='columns')

# Rescale Range

RS = R/S
RS = RS.sort_values()

# Time Period

s = shape(y)

t = linspace(1,s[0],s[0])

# Log Scales

logt = log10(t)
logRS = log10(RS)

print len(t),len(logRS)

# Regression Fit

slope, intercept, r_value, p_value, std_err = stats.mstats.linregress(logt, logRS)

# Hurst Exponent

H = slope/2

return H, logt, logRS

计算 3D 锯齿状 NumPy 数组的 2D 均值

Calculating the 2D mean of a 3D jagged NumPy Array

python

mean

exponent

multidimensional-array

quantitative-finance