使用 PyMC3 的贝叶斯套索

Bayesian Lasso using PyMC3

我正在尝试重现此 tutorial (see LASSO regression) on PyMC3. As commented on this reddit thread 的结果,前两个系数的混合效果不佳,因为变量是相关的。

我尝试在 PyMC3 中实现它,但在使用哈密顿采样器时它没有按预期工作。我只能让它与 Metropolis 采样器一起使用,它实现了与 PyMC2 相同的结果。

我不知道这是否与拉普拉斯算子达到峰值(0 处的不连续导数)有关,但它与高斯先验配合得很好。我尝试了有或没有 MAP 初始化,结果总是一样的。

这是我的代码:

from pymc import *
from scipy.stats import norm
import pylab as plt

# Same model as the tutorial 
n = 1000

x1 = norm.rvs(0, 1, size=n)
x2 = -x1 + norm.rvs(0, 10**-3, size=n)
x3 = norm.rvs(0, 1, size=n)

y = 10 * x1 + 10 * x2 + 0.1 * x3

with Model() as model:
    # Laplacian prior only works with Metropolis sampler
    coef1 = Laplace('x1', 0, b=1/sqrt(2))
    coef2 = Laplace('x2', 0, b=1/sqrt(2))
    coef3 = Laplace('x3', 0, b=1/sqrt(2))

    # Gaussian prior works with NUTS sampler
    #coef1 = Normal('x1', mu = 0, sd = 1)
    #coef2 = Normal('x2', mu = 0, sd = 1)
    #coef3 = Normal('x3', mu = 0, sd = 1)

    likelihood = Normal('y', mu= coef1 * x1 + coef2 * x2 + coef3 * x3, tau = 1, observed=y)

    #step = Metropolis() # Works just like PyMC2
    start = find_MAP() # Doesn't help
    step = NUTS(state = start) # Doesn't work
    trace = sample(10000, step, start = start, progressbar=True) 

plt.figure(figsize=(7, 7))
traceplot(trace)
plt.tight_layout()

autocorrplot(trace)
summary(trace)

这是我得到的错误:

PositiveDefiniteError: Simple check failed. Diagonal contains negatives

是我做错了什么,还是 NUTS 采样器不应该处理这种情况?

来自 reddit 线程的 Whyking 提出了使用 MAP 而不是状态作为缩放比例的建议,它实际上创造了奇迹。

这是一个notebook with the results and the updated code.