使用 SciPy 拟合具有正偏斜和负偏斜的 Levy-Stable 分布

Using SciPy to Fit a Levy-Stable Distribution with positive vs negative skew

我不明白scipy.stats.levy_stable for distributions with positive versus negative beta parameters. Intuitively, changing the sign of beta when generating a random sample should not affect the estimate for alpha when fitting the data. I am not sure what effect the sign of beta should have on the third parameter returned by _fitstart(), but I hoped the sign might just get reversed after converting the return values as suggested by 的_fitstart()方法返回的参数。

from scipy.stats import levy_stable
from scipy.stats import rv_continuous as rvc
import numpy as np

points = 1000000
jennys_constant = 8675309

pconv = lambda alpha, beta, mu, sigma: (alpha, beta, mu - sigma * beta * np.tan(np.pi * alpha / 2.0), sigma)

rvc.random_state = jennys_constant

def test_fitstart(alpha, beta):
    draw = levy_stable.rvs(alpha, beta, size=points)
    
    # use scipy's quantile estimator to estimate the parameters and convert to S parameterization
    return pconv(*levy_stable._fitstart(draw))

print("A few calls with beta=1")
for i in range(3):
    print(test_fitstart(alpha=1.3, beta=1))

print("A few calls with beta=-1")
for i in range(3):
    print(test_fitstart(alpha=1.3, beta=-1))

>>> A few calls with beta=1
>>> (1.3059810788754223, 1.0, 1.9212069030505312, 1.0017497273563876)
>>> (1.3048494867305243, 1.0, 1.92631956349381, 1.000064636906844)
>>> (1.3010492983811222, 1.0, 1.9544520781484407, 0.9999042085058586)
>>> A few calls with beta=-1
>>> (1.3652389860952416, -1.0, 0.3424825654388899, 1.0317366391952136)
>>> (1.370069101697994, -1.0, 0.3560781956631771, 1.0397745333221347)
>>> (1.3682310757082936, -1.0, 0.34621980810217745, 1.037169706715312)

查看 _fitstart() 代码,我认为 alpha 的查找可能应该使用 nu_beta 的绝对值,但不是,因此查找可能在 nu_beta 之外进行外推_range.

同样,我想知道是否应该在 delta 的计算中使用某些东西的绝对值,在应用裁剪之前,对 beta 的符号进行 post 裁剪调整?实际上,再看一遍我认为应该对 c (缩放参数,必须为正)应用裁剪。裁剪不应应用于增量(位置参数 = 均值,可以从 -inf 到 inf 变化)。这样对吗?

levy_stable._fitstart() 没有正确处理负偏斜数据,但我们可以通过反映样本的原点来解决这个问题。 _fitstart() 然后将 return 对稳定性和尺度参数进行合理的估计,这些参数不受反射的影响。偏度和 loc 参数的估计值在反射样本中反转。

一个简单的包装函数可以在调用 _fitstart() 之前检查数据是向右还是向左倾斜,然后根据需要反转反向参数估计。这不会修复 levy_stable.fit() 本身,但至少我们可以从 _fitstart() 获得分位数估计值。

import numpy as np
from scipy import __version__ as scipy_version
from scipy.stats import levy_stable

points = 1000000

const = 314

def lsfitstart(data):
    """Wrapper for levy_stable._fitstart() to fix data with negative skew"""
    skewleft = np.mean(data) <= np.median(data)
    
    # reverse sign of the data points if distribution has negative skew
    alpha, beta, loc, scale = levy_stable._fitstart(-data if skewleft else data)
    
    # reverse sign of skewness and loc estimates if distribution has negative skew
    beta_fixed, loc_fixed = [-x if skewleft else x for x in (beta, loc)]
    
    # clip scale parameter to ensure it is positive
    scale_fixed = np.clip(scale, np.finfo(float).eps, np.inf)

    return (alpha, beta_fixed, loc_fixed, scale_fixed)


print(scipy_version)

sample = levy_stable.rvs(alpha=1.3, beta=1, size=points, random_state=const)

print("levy_stable fit        : alpha (stabililty), beta (skewness), loc, scale")
print("_fitstart(positive    ): ", levy_stable._fitstart(sample))
print("_fitstart(negative=bad): ", levy_stable._fitstart(-sample))
print()

print("lsfitstart(positive   ): ", lsfitstart(sample))
print("lsfitstart(negative=OK): ", lsfitstart(-sample))
>>> 1.5.2
>>> levy_stable fit        : alpha (stabililty), beta (skewness), loc, scale
>>> _fitstart(positive    ):  (1.3055214922752139, 1.0, 2.220446049250313e-16, 1.0002159057207403)
>>> _fitstart(negative=bad):  (1.3673555827622812, -1.0, 1.9389262857717497, 1.0337386531320203)
>>> 
>>> lsfitstart(positive   ):  (1.3055214922752139, 1.0, 2.220446049250313e-16, 1.0002159057207403)
>>> lsfitstart(negative=OK):  (1.3055214922752139, -1.0, -2.220446049250313e-16, 1.0002159057207403)