第一次使用 PyMc 失败
first use of PyMc fails
我是 PyMc 的新手,想知道为什么这段代码不起作用。我已经花了几个小时在这上面,但我错过了一些东西。谁能帮帮我?
我想解决的问题:
我有一组显示 3 个颠簸的 Npts 测量值,所以我想将其建模为 3 个高斯分布的总和(假设测量值有噪声且高斯近似值正常)==>我想估计 8 个参数:颠簸的相对权重(即 2 个参数)、它们的 3 个均值和它们的 3 个方差。
我希望这种方法足够广泛以适用于可能没有相同颠簸的其他集合,所以我采用宽松的平坦先验。
问题:
我下面的代码给了我蹩脚的估计。怎么了 ?谢谢
"""
hypothesis: multimodal distrib sum of 3 gaussian distributions
model description:
* p1, p2, p3 are the probabilities for a point to belong to gaussian 1, 2 or 3
==> p1, p2, p3 are the relative weights of the 3 gaussians
* once a point is associated with a gaussian,
it is distributed normally according to the parameters mu_i, sigma_i of the gaussian
but instead of considering sigma, pymc prefers considering tau=1/sigma**2
* thus, PyMc must guess 8 parameters: p1, p2, mu1, mu2, mu3, tau1, tau2, tau3
* priors on p1, p2 are flat between 0.1 and 0.9 ==> 'pm.Uniform' variables
with the constraint p2<=1-p1. p3 is deterministic ==1-p1-p2
* the 'assignment' variable assigns each point to a gaussian, according to probabilities p1, p2, p3
* priors on mu1, mu2, mu3 are flat between 40 and 120 ==> 'pm.Uniform' variables
* priors on sigma1, sigma2, sigma3 are flat between 4 and 12 ==> 'pm.Uniform' variables
"""
import numpy as np
import pymc as pm
data = np.loadtxt('distrib.txt')
Npts = len(data)
mumin = 40
mumax = 120
sigmamin=4
sigmamax=12
p1 = pm.Uniform("p1",0.1,0.9)
p2 = pm.Uniform("p2",0.1,1-p1)
p3 = 1-p1-p2
assignment = pm.Categorical('assignment',[p1,p2,p3],size=Npts)
mu = pm.Uniform('mu',[mumin,mumin,mumin],[mumax,mumax,mumax])
sigma = pm.Uniform('sigma',[sigmamin,sigmamin,sigmamin],
[sigmamax,sigmamax,sigmamax])
tau = 1/sigma**2
@pm.deterministic
def assign_mu(assi=assignment,mu=mu):
return mu[assi]
@pm.deterministic
def assign_tau(assi=assignment,sig=tau):
return sig[assi]
hypothesis = pm.Normal("obs", assign_mu, assign_tau, value=data, observed=True)
model = pm.Model([hypothesis, p1, p2, tau, mu])
test = pm.MCMC(model)
test.sample(50000,burn=20000) # conservative values, let's take a coffee...
print('\nguess\n* p1, p2 = ',
np.mean(test.trace('p1')[:]),' ; ',
np.mean(test.trace('p2')[:]),' ==> p3 = ',
1-np.mean(test.trace('p1')[:])-np.mean(test.trace('p2')[:]),
'\n* mu = ',
np.mean(test.trace('mu')[:,0]),' ; ',
np.mean(test.trace('mu')[:,1]),' ; ',
np.mean(test.trace('mu')[:,2]))
print('why does this guess suck ???!!!')
我可以发送数据文件'distrib.txt'。它约为 500 kb,数据绘制如下。例如最后 运行 给了我:
p1, p2 = 0.366913192214 ; 0.583816452532 ==> p3 = 0.04927035525400003
mu = 77.541619286 ; 75.3371615466 ; 77.2427165073
虽然在 ~55、~75 和 ~90 附近有明显的颠簸,概率在 ~0.2、~0.5 和 ~0.3 左右
您遇到了此处描述的问题:Negative Binomial Mixture in PyMC
问题是分类变量收敛太慢,三个分量分布无法接近。
首先,我们生成您的测试数据:
data1 = np.random.normal(55,5,2000)
data2 = np.random.normal(75,5,5000)
data3 = np.random.normal(90,5,3000)
data=np.concatenate([data1, data2, data3])
np.savetxt("distrib.txt", data)
然后我们绘制直方图,按后验组分配着色:
tablebyassignment = [data[np.nonzero(np.round(test.trace("assignment")[:].mean(axis=0)) == i)] for i in range(0,3) ]
plt.hist(tablebyassingment, bins=30, stacked = True)
这最终会收敛,但速度不够快,无法对您有用。
您可以通过在启动 MCMC 之前猜测赋值值来解决此问题:
from sklearn.cluster import KMeans
kme = KMeans(3)
kme.fit(np.atleast_2d(data).T)
assignment = pm.Categorical('assignment',[p1,p2,p3],size=Npts, value=kme.labels_)
这给你:
使用 k-means 初始化分类可能不会一直有效,但总比不收敛好。
我是 PyMc 的新手,想知道为什么这段代码不起作用。我已经花了几个小时在这上面,但我错过了一些东西。谁能帮帮我?
我想解决的问题:
我有一组显示 3 个颠簸的 Npts 测量值,所以我想将其建模为 3 个高斯分布的总和(假设测量值有噪声且高斯近似值正常)==>我想估计 8 个参数:颠簸的相对权重(即 2 个参数)、它们的 3 个均值和它们的 3 个方差。
我希望这种方法足够广泛以适用于可能没有相同颠簸的其他集合,所以我采用宽松的平坦先验。
问题: 我下面的代码给了我蹩脚的估计。怎么了 ?谢谢
"""
hypothesis: multimodal distrib sum of 3 gaussian distributions
model description:
* p1, p2, p3 are the probabilities for a point to belong to gaussian 1, 2 or 3
==> p1, p2, p3 are the relative weights of the 3 gaussians
* once a point is associated with a gaussian,
it is distributed normally according to the parameters mu_i, sigma_i of the gaussian
but instead of considering sigma, pymc prefers considering tau=1/sigma**2
* thus, PyMc must guess 8 parameters: p1, p2, mu1, mu2, mu3, tau1, tau2, tau3
* priors on p1, p2 are flat between 0.1 and 0.9 ==> 'pm.Uniform' variables
with the constraint p2<=1-p1. p3 is deterministic ==1-p1-p2
* the 'assignment' variable assigns each point to a gaussian, according to probabilities p1, p2, p3
* priors on mu1, mu2, mu3 are flat between 40 and 120 ==> 'pm.Uniform' variables
* priors on sigma1, sigma2, sigma3 are flat between 4 and 12 ==> 'pm.Uniform' variables
"""
import numpy as np
import pymc as pm
data = np.loadtxt('distrib.txt')
Npts = len(data)
mumin = 40
mumax = 120
sigmamin=4
sigmamax=12
p1 = pm.Uniform("p1",0.1,0.9)
p2 = pm.Uniform("p2",0.1,1-p1)
p3 = 1-p1-p2
assignment = pm.Categorical('assignment',[p1,p2,p3],size=Npts)
mu = pm.Uniform('mu',[mumin,mumin,mumin],[mumax,mumax,mumax])
sigma = pm.Uniform('sigma',[sigmamin,sigmamin,sigmamin],
[sigmamax,sigmamax,sigmamax])
tau = 1/sigma**2
@pm.deterministic
def assign_mu(assi=assignment,mu=mu):
return mu[assi]
@pm.deterministic
def assign_tau(assi=assignment,sig=tau):
return sig[assi]
hypothesis = pm.Normal("obs", assign_mu, assign_tau, value=data, observed=True)
model = pm.Model([hypothesis, p1, p2, tau, mu])
test = pm.MCMC(model)
test.sample(50000,burn=20000) # conservative values, let's take a coffee...
print('\nguess\n* p1, p2 = ',
np.mean(test.trace('p1')[:]),' ; ',
np.mean(test.trace('p2')[:]),' ==> p3 = ',
1-np.mean(test.trace('p1')[:])-np.mean(test.trace('p2')[:]),
'\n* mu = ',
np.mean(test.trace('mu')[:,0]),' ; ',
np.mean(test.trace('mu')[:,1]),' ; ',
np.mean(test.trace('mu')[:,2]))
print('why does this guess suck ???!!!')
我可以发送数据文件'distrib.txt'。它约为 500 kb,数据绘制如下。例如最后 运行 给了我:
p1, p2 = 0.366913192214 ; 0.583816452532 ==> p3 = 0.04927035525400003
mu = 77.541619286 ; 75.3371615466 ; 77.2427165073
虽然在 ~55、~75 和 ~90 附近有明显的颠簸,概率在 ~0.2、~0.5 和 ~0.3 左右
您遇到了此处描述的问题:Negative Binomial Mixture in PyMC
问题是分类变量收敛太慢,三个分量分布无法接近。
首先,我们生成您的测试数据:
data1 = np.random.normal(55,5,2000)
data2 = np.random.normal(75,5,5000)
data3 = np.random.normal(90,5,3000)
data=np.concatenate([data1, data2, data3])
np.savetxt("distrib.txt", data)
然后我们绘制直方图,按后验组分配着色:
tablebyassignment = [data[np.nonzero(np.round(test.trace("assignment")[:].mean(axis=0)) == i)] for i in range(0,3) ]
plt.hist(tablebyassingment, bins=30, stacked = True)
您可以通过在启动 MCMC 之前猜测赋值值来解决此问题:
from sklearn.cluster import KMeans
kme = KMeans(3)
kme.fit(np.atleast_2d(data).T)
assignment = pm.Categorical('assignment',[p1,p2,p3],size=Npts, value=kme.labels_)
这给你: