使用 bambi 的泊松回归结果不正确?

Incorrect results with Poisson regression using bambi?

我正在 bambi(版本 0.1.0)尝试一个简单的泊松回归模型。然而,与直接的 pymc3 或 statsmodels 实现相比,结果有所不同,而且我似乎无法弄清楚如何解释 bambi 给我的系数。测试代码如下。我是不是指定了错误的模型,或者我不应该依赖 bambi 的自动先验?

import numpy as np
import scipy.stats
import pandas
import patsy
import statsmodels
import pymc3
import bambi

%matplotlib inline

# generate data
num_subjects = 4
mu = [5, 8, 10, 11]
num_samples = [43, 60, 56, 38]

counts = [scipy.stats.poisson.rvs(m,size=n,random_state=m) for m,n in zip(mu,num_samples)]
counts = np.concatenate(counts)
subject = np.repeat(np.arange(num_subjects), num_samples)

df = pandas.DataFrame( np.vstack([subject,counts]).T, columns=['subject','counts'])

# sample means
print( df.groupby('subject').mean() )

# subject 0 = 5.0
# subject 1 = 7.4
# subject 2 = 9.5
# subject 3 = 10.0


# fit with bambi
model_bambi = bambi.Model(df)
result_bambi = model_bambi.fit('counts ~ C(subject)', categorical=['subject'], family='poisson', chains=2)

print(result_bambi.summary(hpd=None, diagnostics=None))

# resulting posterior means:
# Intercept        9.3310 -> ?
# C(subject)[T.1]  3.8171 -> ?
# C(subject)[T.2]  4.4419 -> ?
# C(subject)[T.3]  3.8652 -> ?


# fit directly with pymc3
with pymc3.Model() as model_pymc3:
    pymc3.glm.GLM.from_formula("counts ~ C(subject)", df, family=pymc3.glm.families.Poisson())
    trace = pymc3.sample(2000, njobs=2, tune=500)

pymc3.plot_posterior(trace, varnames=[x for x in trace.varnames if x[:2]!='mu']);

# resulting posterior means:
# Intercept        1.6065 -> mu = 5.0 = exp(1.6065) 
# C(subject)[T.1]  0.3990 -> mu = 7.4 = exp(1.6065+0.3990)
# C(subject)[T.2]  0.6477 -> mu = 9.5 = exp(1.6065+0.6477)
# C(subject)[T.3]  0.6977 -> mu = 10.0 = exp(1.6065+0.6977)


# fit with statsmodels
my, mx = patsy.dmatrices( "counts ~ C(subject)", df, NA_action='raise')
model_sm = statsmodels.api.GLM(my, mx, family=statsmodels.api.families.Poisson())
result_sm = model_sm.fit()

print(result_sm.summary())

# resulting posterior means:
# Intercept        1.6094 -> mu = 5.0 = exp(1.6094) 
# C(subject)[T.1]  0.3965 -> mu = 7.4 = exp(1.6094+0.3965)
# C(subject)[T.2]  0.6456 -> mu = 9.5 = exp(1.6094+0.6456)
# C(subject)[T.3]  0.6958 -> mu = 10.0 = exp(1.6094+0.6958)

对于(非常)缓慢的回复,我深表歉意;我没有订阅 [bambi] 标签(但现在订阅了),只是看到了这个。这确实是一个错误(详细信息在 GitHub 存储库中 here). I just opened a PR for it, so if you clone from the repo, the problem should be solved (and I'll issue a new PyPI release sortly). I realize this is probably not much use to you at this point, but thanks for flagging it anyway. If you run into any similar issues in future, please open an issue,因为这肯定属于错误领域。