scipy rv_continuous 很慢

Question

我正在使用自定义函数 f(x) 来定义使用 copy 的 rv_continuous class 的自定义分布。我的密码是

class my_pdf_gen(rv_continuous):
    def _pdf(self, x, integral):
        return f(x)/integral

其中 integral 确保规范化。我可以用

创建它的一个实例

my_pdf = my_pdf_gen(my_int,a = a, b = b, name = 'my pdf')

和a,b数值范围的上下限，my_int= scipy.integrate.quad(f, a, b)[0]。我还可以使用 my_pdf.rvs(my_int, size = 5) 创建随机数据样本，但这非常慢。（当 size=9 时最多 6 秒）。

我读到还应该覆盖 class 中的一些其他方法（如 _ppf），但是从示例中我发现我不清楚如何在我的案例.

非常感谢！

Answer 1

预计速度会很慢，因为通用实现对 cdf 进行根求解，而 cdf 本身使用数值积分。

所以最好的办法是提供 _ppf 或 _rvs 实现。如何做到这一点在很大程度上取决于 f(x) 的细节。如果您无法通过分析求解 f(x) = r，请考虑制表/逆插值或拒绝抽样。

Answer 2

我通过改变方法并使用 Monte Carlo 的拒绝采样器方法解决了这个问题

def rejection_sampler(p,xbounds,pmax):
    while True:
        x = np.random.rand(1)*(xbounds[1]-xbounds[0])+xbounds[0]
        y = np.random.rand(1)*pmax
        if y<=p(x):
            return x

其中p是概率密度函数，xbounds是包含pdf上下限的元组，pmax是pdf上的最大值域名。

Monte Carlo 的拒绝采样器被推荐在这里：

scipy rv_continuous 很慢

scipy rv_continuous very slow

python

class

distribution

scipy