使用 pymc 拟合二项分布会引发某些 FillValues 的 ZeroProbability 错误

Fitting a Binomial distribution with pymc raises ZeroProbability error for certain FillValues

我不确定我是否在 pymc 中发现了错误。似乎用缺失数据拟合二项式会产生 ZeroProbability 错误,具体取决于所选择的 fill_value 来掩盖缺失数据。但也许我用错了。 我使用 github 中的当前主分支尝试了以下示例。我知道 ,但这似乎是一个不同的问题。

我用 pymc 安装了一个二项式分布,一切都如我所料:

import scipy as sp
import pymc

def make_model(observed_values):
    p = pymc.Uniform('p', lower = 0.0, upper = 1.0, value = 0.1)
    values = pymc.Binomial('values', n = 10* sp.ones_like(observed_values), p = p * sp.ones_like(observed_values),\
                             value = observed_values, observed = True, plot = False)
    values = pymc.Binomial('values', n = 10, p = p,\
                             value = observed_values, observed = True, plot = False)
    return locals()

sp.random.seed(0)
observed_values = sp.random.binomial(n = 10.0, p = 0.1, size = 100)

M1 = pymc.MCMC(make_model(observed_values))
M1.sample(iter=10000, burn=1000, thin=10)
pymc.Matplot.plot(M1)
M1.summary()

输出:

  [-----------------100%-----------------] 10000 of 10000 complete in 0.7 sec
Plotting p

  p:

          Mean             SD               MC Error        95% HPD interval
          ------------------------------------------------------------------
          0.093            0.007            0.0              [ 0.081  0.107]


          Posterior quantiles:

          2.5             25              50              75             97.5
          |---------------|===============|===============|---------------|
          0.08             0.088           0.093          0.097         0.106

现在,我尝试了一种非常相似的情况,不同之处在于将缺少一个观测值:

mask = sp.zeros_like(observed_values)
mask[0] = True
masked_values = sp.ma.masked_array(observed_values, mask = mask, fill_value = 999999)

M2 = pymc.MCMC(make_model(masked_values))
M2.sample(iter=10000, burn=1000, thin=10)
pymc.Matplot.plot(M2)
M2.summary()

没想到,我得到了一个ZeroProbability错误:

---------------------------------------------------------------------------
ZeroProbability                           Traceback (most recent call last)
<ipython-input-16-4f945f269628> in <module>()
----> 1 M2 = pymc.MCMC(make_model(masked_values))
      2 M2.sample(iter=10000, burn=1000, thin=10)
      3 pymc.Matplot.plot(M2)
      4 M2.summary()

<ipython-input-12-cb8707bb911f> in make_model(observed_values)
      4 def make_model(observed_values):
      5     p = pymc.Uniform('p', lower = 0.0, upper = 1.0, value = 0.1)
----> 6     values = pymc.Binomial('values', n = 10* sp.ones_like(observed_values), p = p * sp.ones_like(observed_values),                             value = observed_values, observed = True, plot = False)
      7     values = pymc.Binomial('values', n = 10, p = p,                             value = observed_values, observed = True, plot = False)
      8     return locals()

/home/fabian/anaconda/lib/python2.7/site-packages/pymc/distributions.pyc in __init__(self, *args, **kwds)
    318                     logp_partial_gradients=logp_partial_gradients,
    319                     dtype=dtype,
--> 320                     **arg_dict_out)
    321 
    322     new_class.__name__ = name

/home/fabian/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.pyc in __init__(self, logp, doc, name, parents, random, trace, value, dtype, rseed, observed, cache_depth, plot, verbose, isdata, check_logp, logp_partial_gradients)
    773         if check_logp:
    774             # Check initial value
--> 775             if not isinstance(self.logp, float):
    776                 raise ValueError(
    777                     "Stochastic " +

/home/fabian/anaconda/lib/python2.7/site-packages/pymc/PyMCObjects.pyc in get_logp(self)
    930                     (self._value, self._parents.value))
    931             else:
--> 932                 raise ZeroProbability(self.errmsg)
    933 
    934         return logp

ZeroProbability: Stochastic values's value is outside its support,
or it forbids its parents' current values.

但是,如果我将掩码数组中的填充值更改为 1,拟合将再次起作用:

masked_values2 = sp.ma.masked_array(observed_values, mask = mask, fill_value = 1)

M3 = pymc.MCMC(make_model(masked_values2))
M3.sample(iter=10000, burn=1000, thin=10)
pymc.Matplot.plot(M3)
M3.summary()

输出:

[-----------------100%-----------------] 10000 of 10000 complete in 2.1 sec
Plotting p

p:

        Mean             SD               MC Error        95% HPD interval
        ------------------------------------------------------------------
        0.092            0.007            0.0              [ 0.079  0.105]


        Posterior quantiles:

        2.5             25              50              75             97.5
        |---------------|===============|===============|---------------|
        0.079            0.088           0.092          0.097         0.105


values:

        Mean             SD               MC Error        95% HPD interval
        ------------------------------------------------------------------
        1.15             0.886            0.029                  [ 0.  3.]


        Posterior quantiles:

        2.5             25              50              75             97.5
        |---------------|===============|===============|---------------|
        0.0              1.0             1.0            2.0           3.0

这是错误还是我的模型有问题? 感谢您的帮助!

我在 GitHub 上问了 pymc 开发人员这个问题,我从 fonnesbeck (https://github.com/pymc-devs/pymc/issues/47#issuecomment-129301002) 那里得到了答案:

This is because you filled the missing values with 999999, which is outside the support of the variable. You need to give it a valid value, but not one that occurs in the non-missing data."

为确保缺失值不在非缺失数据中,建议执行以下操作:

You can give it a non-integer value in order to avoid the problem that you cite. For example, if you fill with, say, 1.5 that should work.

(...) PyMC calculates the log-probability at the first iteration, and therefore the values inserted for the missing values at the first iteration have to be valid. If you give discrete values a floating point value, it should end up getting truncated when converted to integers, so that will work."