使用 pymc3 为具有多个似然函数的模型计算 WAIC
Calculating WAIC for models with multiple likelihood functions with pymc3
我尝试根据进球数预测足球比赛的结果,我使用以下模型:
with pm.Model() as model:
# global model parameters
h = pm.Normal('h', mu = mu, tau = tau)
sd_a = pm.Gamma('sd_a', .1, .1)
sd_d = pm.Gamma('sd_d', .1, .1)
alpha = pm.Normal('alpha', mu=mu, tau = tau)
# team-specific model parameters
a_s = pm.Normal("a_s", mu=0, sd=sd_a, shape=n)
d_s = pm.Normal("d_s", mu=0, sd=sd_d, shape=n)
atts = pm.Deterministic('atts', a_s - tt.mean(a_s))
defs = pm.Deterministic('defs', d_s - tt.mean(d_s))
h_theta = tt.exp(alpha + h + atts[h_t] + defs[a_t])
a_theta = tt.exp(alpha + atts[a_t] + defs[h_t])
# likelihood of observed data
h_goals = pm.Poisson('h_goals', mu=h_theta, observed=observed_h_goals)
a_goals = pm.Poisson('a_goals', mu=a_theta, observed=observed_a_goals)
当我对模型进行采样时,迹线图看起来不错。
之后当我想计算 WAIC 时:
waic = pm.waic(trace, model)
我收到以下错误:
----> 1 waic = pm.waic(trace, model)
~\Anaconda3\envs\env\lib\site-packages\pymc3\stats_init_.py in wrapped(*args, **kwargs)
22 )
23 kwargs[new] = kwargs.pop(old)
—> 24 return func(*args, **kwargs)
25
26 return wrapped
~\Anaconda3\envs\env\lib\site-packages\arviz\stats\stats.py in waic(data, pointwise, scale)
1176 “”"
1177 inference_data = convert_to_inference_data(data)
-> 1178 log_likelihood = _get_log_likelihood(inference_data)
1179 scale = rcParams[“stats.ic_scale”] if scale is None else scale.lower()
1180
~\Anaconda3\envs\env\lib\site-packages\arviz\stats\stats_utils.py in get_log_likelihood(idata, var_name)
403 var_names.remove(“lp”)
404 if len(var_names) > 1:
–> 405 raise TypeError(
406 “Found several log likelihood arrays {}, var_name cannot be None”.format(var_names)
407 )
TypeError: Found several log likelihood arrays [‘h_goals’, ‘a_goals’], var_name cannot be None
当我在 pymc3 中有两个似然函数时,有什么方法可以计算 WAIC 和比较模型吗? (1: 主队进球数 2: 客队进球数)
这是可能的,但需要定义您有兴趣预测什么,它可以是比赛的结果,也可以是任何一支球队的进球数(不是总计,每场比赛将提供 2 个结果预测)。
PyMC discourse 提供了完整而详细的答案。
这里我把关注数量为匹配结果的情况记录下来作为总结。 ArviZ 将自动检索 2 个逐点对数似然数组,我们必须以某种方式组合它们(例如添加、连接、分组……)以获得单个数组。棘手的部分是知道哪个操作对应于每个数量,这必须在每个模型的基础上进行评估。在此特定示例中,可以通过以下方式计算匹配结果的预测准确性:
dims = {
"home_points": ["match"],
"away_points": ["match"],
}
idata = az.from_pymc3(trace, dims=dims, model=model)
设置 match
暗淡对于告诉 xarray 如何对齐逐点对数似然数组很重要,否则它们将不会以所需的方式广播和对齐。
idata.sample_stats["log_likelihood"] = (
idata.log_likelihood.home_points + idata.log_likelihood.away_points
)
az.waic(idata)
# Output
# Computed from 3000 by 60 log-likelihood matrix
#
# Estimate SE
# elpd_waic -551.28 37.96
# p_waic 46.16 -
#
# There has been a warning during the calculation. Please check the results.
注意 ArviZ>=0.7.0 是必需的。
我尝试根据进球数预测足球比赛的结果,我使用以下模型:
with pm.Model() as model:
# global model parameters
h = pm.Normal('h', mu = mu, tau = tau)
sd_a = pm.Gamma('sd_a', .1, .1)
sd_d = pm.Gamma('sd_d', .1, .1)
alpha = pm.Normal('alpha', mu=mu, tau = tau)
# team-specific model parameters
a_s = pm.Normal("a_s", mu=0, sd=sd_a, shape=n)
d_s = pm.Normal("d_s", mu=0, sd=sd_d, shape=n)
atts = pm.Deterministic('atts', a_s - tt.mean(a_s))
defs = pm.Deterministic('defs', d_s - tt.mean(d_s))
h_theta = tt.exp(alpha + h + atts[h_t] + defs[a_t])
a_theta = tt.exp(alpha + atts[a_t] + defs[h_t])
# likelihood of observed data
h_goals = pm.Poisson('h_goals', mu=h_theta, observed=observed_h_goals)
a_goals = pm.Poisson('a_goals', mu=a_theta, observed=observed_a_goals)
当我对模型进行采样时,迹线图看起来不错。
之后当我想计算 WAIC 时:
waic = pm.waic(trace, model)
我收到以下错误:
----> 1 waic = pm.waic(trace, model)
~\Anaconda3\envs\env\lib\site-packages\pymc3\stats_init_.py in wrapped(*args, **kwargs)
22 )
23 kwargs[new] = kwargs.pop(old)
—> 24 return func(*args, **kwargs)
25
26 return wrapped
~\Anaconda3\envs\env\lib\site-packages\arviz\stats\stats.py in waic(data, pointwise, scale)
1176 “”"
1177 inference_data = convert_to_inference_data(data)
-> 1178 log_likelihood = _get_log_likelihood(inference_data)
1179 scale = rcParams[“stats.ic_scale”] if scale is None else scale.lower()
1180
~\Anaconda3\envs\env\lib\site-packages\arviz\stats\stats_utils.py in get_log_likelihood(idata, var_name)
403 var_names.remove(“lp”)
404 if len(var_names) > 1:
–> 405 raise TypeError(
406 “Found several log likelihood arrays {}, var_name cannot be None”.format(var_names)
407 )
TypeError: Found several log likelihood arrays [‘h_goals’, ‘a_goals’], var_name cannot be None
当我在 pymc3 中有两个似然函数时,有什么方法可以计算 WAIC 和比较模型吗? (1: 主队进球数 2: 客队进球数)
这是可能的,但需要定义您有兴趣预测什么,它可以是比赛的结果,也可以是任何一支球队的进球数(不是总计,每场比赛将提供 2 个结果预测)。
PyMC discourse 提供了完整而详细的答案。
这里我把关注数量为匹配结果的情况记录下来作为总结。 ArviZ 将自动检索 2 个逐点对数似然数组,我们必须以某种方式组合它们(例如添加、连接、分组……)以获得单个数组。棘手的部分是知道哪个操作对应于每个数量,这必须在每个模型的基础上进行评估。在此特定示例中,可以通过以下方式计算匹配结果的预测准确性:
dims = {
"home_points": ["match"],
"away_points": ["match"],
}
idata = az.from_pymc3(trace, dims=dims, model=model)
设置 match
暗淡对于告诉 xarray 如何对齐逐点对数似然数组很重要,否则它们将不会以所需的方式广播和对齐。
idata.sample_stats["log_likelihood"] = (
idata.log_likelihood.home_points + idata.log_likelihood.away_points
)
az.waic(idata)
# Output
# Computed from 3000 by 60 log-likelihood matrix
#
# Estimate SE
# elpd_waic -551.28 37.96
# p_waic 46.16 -
#
# There has been a warning during the calculation. Please check the results.
注意 ArviZ>=0.7.0 是必需的。