来自 scipy.stats.rv_continuous 个不需要的上限的自定义 PDF

Question

我正在尝试生成具有特定光度的 QSO 的随机概率密度函数，其形式为：

1/( (L/L_B^* )^alpha + (L/L_B^* )^beta )

其中 L_B^*、alpha 和 beta 都是常量。为此，使用以下代码：

import scipy.stats as st

logLbreak = 43.88
alpha = 3.4
beta = 1.6


class my_pdf(st.rv_continuous):

    def _pdf(self,l_L): 
        #"l_L" in this is always log L        
        L = 10**(l_L/logLbreak)
        D = 1/(L**alpha + L**beta)
        return D

dist_Log_L = my_pdf(momtype = 0, a = 0,name='l_L_dist')


distro = dist_Log_L.rvs(size = 10000)

（L/L^* 被提升为 10 的幂，因为一切都是在对数尺度上完成的）

该分布应该生成一个近似于 this, trailing off to infinity, but in reality the graph it produces looks like this（10,000 个样本）的图形。无论使用多少样本，上限都是相同的。是否有原因限制它的方式？

Answer 1

您的 PDF 未正确规范化。 PDF 在域上的积分必须为 1。您的 PDF 积分大约为 3.4712:

In [72]: from scipy.integrate import quad

In [73]: quad(dist_Log_L._pdf, 0, 100)
Out[73]: (3.4712183965415373, 2.0134487716044682e-11)

In [74]: quad(dist_Log_L._pdf, 0, 800)
Out[74]: (3.4712184965748905, 2.013626296581202e-11)

In [75]: quad(dist_Log_L._pdf, 0, 1000)
Out[75]: (3.47121849657489, 8.412130378805368e-10)

这将破坏 class 对 inverse transform sampling 的实现。它只会从域中生成样本，直到 PDF 从 0 到 x 的积分首先达到 1.0，在您的情况下约为 2.325

In [81]: quad(dist_Log_L._pdf, 0, 2.325)
Out[81]: (1.0000875374350238, 1.1103202107010366e-14)

事实上，这就是您在直方图中看到的内容。

作为验证问题的快速修复，我将 _pdf() 方法的 return 语句修改为：

        return D/3.47121849657489

和运行再次编写您的脚本。（在真正的修复中，该值将是其他参数的函数。）然后命令

In [85]: import matplotlib.pyplot as plt

In [86]: plt.hist(distro, bins=31)

生成此图：

来自 scipy.stats.rv_continuous 个不需要的上限的自定义 PDF

Custom PDF from scipy.stats.rv_continuous unwanted upper-bound

python

statistics

astronomy

scipy

probability-density