不同的 Pyro Paramstore 访问方法给出不同的结果
Different access methods to Pyro Paramstore give different results
我正在学习 forecasting 中的 Pyro 入门教程,并在训练模型后尝试访问学习的参数,我对其中一些使用不同的访问方法得到了不同的结果(而对其他人则得到相同的结果) ).
这是教程中精简的可重现代码:
import torch
import pyro
import pyro.distributions as dist
from pyro.contrib.examples.bart import load_bart_od
from pyro.contrib.forecast import ForecastingModel, Forecaster
pyro.enable_validation(True)
pyro.clear_param_store()
pyro.__version__
# '1.3.1'
torch.__version__
# '1.5.0+cu101'
# import & prepare the data
dataset = load_bart_od()
T, O, D = dataset["counts"].shape
data = dataset["counts"][:T // (24 * 7) * 24 * 7].reshape(T // (24 * 7), -1).sum(-1).log()
data = data.unsqueeze(-1)
T0 = 0 # begining
T2 = data.size(-2) # end
T1 = T2 - 52 # train/test split
# define the model class
class Model1(ForecastingModel):
def model(self, zero_data, covariates):
data_dim = zero_data.size(-1)
feature_dim = covariates.size(-1)
bias = pyro.sample("bias", dist.Normal(0, 10).expand([data_dim]).to_event(1))
weight = pyro.sample("weight", dist.Normal(0, 0.1).expand([feature_dim]).to_event(1))
prediction = bias + (weight * covariates).sum(-1, keepdim=True)
assert prediction.shape[-2:] == zero_data.shape
noise_scale = pyro.sample("noise_scale", dist.LogNormal(-5, 5).expand([1]).to_event(1))
noise_dist = dist.Normal(0, noise_scale)
self.predict(noise_dist, prediction)
# fit the model
pyro.set_rng_seed(1)
pyro.clear_param_store()
time = torch.arange(float(T2)) / 365
covariates = torch.stack([time], dim=-1)
forecaster = Forecaster(Model1(), data[:T1], covariates[:T1], learning_rate=0.1)
到目前为止一切顺利;现在,我想检查存储在 Paramstore
中的学习到的潜在参数。似乎有不止一种方法可以做到这一点;使用 get_all_param_names()
方法:
for name in pyro.get_param_store().get_all_param_names():
print(name, pyro.param(name).data.numpy())
我明白了
AutoNormal.locs.bias [14.585433]
AutoNormal.scales.bias [0.00631594]
AutoNormal.locs.weight [0.11947815]
AutoNormal.scales.weight [0.00922901]
AutoNormal.locs.noise_scale [-2.0719821]
AutoNormal.scales.noise_scale [0.03469057]
但是使用named_parameters()
方法:
pyro.get_param_store().named_parameters()
为位置 (locs
) 参数提供相同的值,但 为所有 scales
个参数提供不同的值:
dict_items([
('AutoNormal.locs.bias', Parameter containing: tensor([14.5854], requires_grad=True)),
('AutoNormal.scales.bias', Parameter containing: tensor([-5.0647], requires_grad=True)),
('AutoNormal.locs.weight', Parameter containing: tensor([0.1195], requires_grad=True)),
('AutoNormal.scales.weight', Parameter containing: tensor([-4.6854], requires_grad=True)),
('AutoNormal.locs.noise_scale', Parameter containing: tensor([-2.0720], requires_grad=True)),
('AutoNormal.scales.noise_scale', Parameter containing: tensor([-3.3613], requires_grad=True))
])
这怎么可能?根据documentation,Paramstore
是一个简单的键值存储;里面只有这六个键:
pyro.get_param_store().get_all_param_names() # .keys() method gives identical result
# result
dict_keys([
'AutoNormal.locs.bias',
'AutoNormal.scales.bias',
'AutoNormal.locs.weight',
'AutoNormal.scales.weight',
'AutoNormal.locs.noise_scale',
'AutoNormal.scales.noise_scale'])
因此,不可能一种方法访问一组项目而另一种方法访问另一组项目。
我是不是漏掉了什么?
pyro.param()
returns transformed parameters 在这种情况下为 scales
.
的正实数
情况是这样的,在Github thread我打开这个问题的同时打开...
Paramstore
不再是 只是 一个简单的键值存储——它还执行约束转换;引用上面的 Pyro 开发人员 link:
here's some historical background. The ParamStore
was originally just a key-value store. Then we added support for constrained parameters; this introduced a new layer of separation between user-facing constrained values and internal unconstrained values. We created a new dict-like user-facing interface that exposed only constrained values, but to keep backwards compatibility with old code we kept the old interface around. The two interfaces are distinguished in the source files [...] but as you observe it looks like we forgot to mark the old interface as DEPRECATED.
I guess in clarifying docs we should:
clarify that the ParamStore is no longer a simple key-value store
but also performs constraint transforms;
mark all "old" style interface methods as DEPRECATED;
remove "old" style interface usage from examples and tutorials.
因此,事实证明,虽然 pyro.param()
returns 约束(面向用户)space 的结果,但较旧的方法 named_parameters()
returns 不受约束(即仅供内部使用)的值,因此存在明显的差异。
不难验证以上两种方法返回的scales
值确实存在对数关系:
import numpy as np
items = list(pyro.get_param_store().named_parameters()) # unconstrained space
i = 0
for name in pyro.get_param_store().keys():
if 'scales' in name:
temp = np.log(
pyro.param(name).item() # constrained space
)
print(temp, items[i][1][0].item() , np.allclose(temp, items[i][1][0].item()))
i+=1
# result:
-5.027793402915326 -5.0277934074401855 True
-4.600319371162187 -4.6003193855285645 True
-3.3920585732532835 -3.3920586109161377 True
为什么这种差异只影响 scales
个参数?这是因为 scales
(即本质上 方差)根据定义被限制为正数;这不适用于不受约束的 locs
(即均值),因此这两种表示对它们来说是一致的。
作为result of the question above, a new bullet has now been added in the Paramstore
documentation,给出相关提示:
in general parameters are associated with both constrained and unconstrained values. for example, under the hood a parameter that is constrained to be positive is represented as an unconstrained tensor in log space.
以及旧接口named_parameters()
方法的documentation中:
Note that, in the event the parameter is constrained, unconstrained_value is in the unconstrained space implicitly used by the constraint.
我正在学习 forecasting 中的 Pyro 入门教程,并在训练模型后尝试访问学习的参数,我对其中一些使用不同的访问方法得到了不同的结果(而对其他人则得到相同的结果) ).
这是教程中精简的可重现代码:
import torch
import pyro
import pyro.distributions as dist
from pyro.contrib.examples.bart import load_bart_od
from pyro.contrib.forecast import ForecastingModel, Forecaster
pyro.enable_validation(True)
pyro.clear_param_store()
pyro.__version__
# '1.3.1'
torch.__version__
# '1.5.0+cu101'
# import & prepare the data
dataset = load_bart_od()
T, O, D = dataset["counts"].shape
data = dataset["counts"][:T // (24 * 7) * 24 * 7].reshape(T // (24 * 7), -1).sum(-1).log()
data = data.unsqueeze(-1)
T0 = 0 # begining
T2 = data.size(-2) # end
T1 = T2 - 52 # train/test split
# define the model class
class Model1(ForecastingModel):
def model(self, zero_data, covariates):
data_dim = zero_data.size(-1)
feature_dim = covariates.size(-1)
bias = pyro.sample("bias", dist.Normal(0, 10).expand([data_dim]).to_event(1))
weight = pyro.sample("weight", dist.Normal(0, 0.1).expand([feature_dim]).to_event(1))
prediction = bias + (weight * covariates).sum(-1, keepdim=True)
assert prediction.shape[-2:] == zero_data.shape
noise_scale = pyro.sample("noise_scale", dist.LogNormal(-5, 5).expand([1]).to_event(1))
noise_dist = dist.Normal(0, noise_scale)
self.predict(noise_dist, prediction)
# fit the model
pyro.set_rng_seed(1)
pyro.clear_param_store()
time = torch.arange(float(T2)) / 365
covariates = torch.stack([time], dim=-1)
forecaster = Forecaster(Model1(), data[:T1], covariates[:T1], learning_rate=0.1)
到目前为止一切顺利;现在,我想检查存储在 Paramstore
中的学习到的潜在参数。似乎有不止一种方法可以做到这一点;使用 get_all_param_names()
方法:
for name in pyro.get_param_store().get_all_param_names():
print(name, pyro.param(name).data.numpy())
我明白了
AutoNormal.locs.bias [14.585433]
AutoNormal.scales.bias [0.00631594]
AutoNormal.locs.weight [0.11947815]
AutoNormal.scales.weight [0.00922901]
AutoNormal.locs.noise_scale [-2.0719821]
AutoNormal.scales.noise_scale [0.03469057]
但是使用named_parameters()
方法:
pyro.get_param_store().named_parameters()
为位置 (locs
) 参数提供相同的值,但 为所有 scales
个参数提供不同的值:
dict_items([
('AutoNormal.locs.bias', Parameter containing: tensor([14.5854], requires_grad=True)),
('AutoNormal.scales.bias', Parameter containing: tensor([-5.0647], requires_grad=True)),
('AutoNormal.locs.weight', Parameter containing: tensor([0.1195], requires_grad=True)),
('AutoNormal.scales.weight', Parameter containing: tensor([-4.6854], requires_grad=True)),
('AutoNormal.locs.noise_scale', Parameter containing: tensor([-2.0720], requires_grad=True)),
('AutoNormal.scales.noise_scale', Parameter containing: tensor([-3.3613], requires_grad=True))
])
这怎么可能?根据documentation,Paramstore
是一个简单的键值存储;里面只有这六个键:
pyro.get_param_store().get_all_param_names() # .keys() method gives identical result
# result
dict_keys([
'AutoNormal.locs.bias',
'AutoNormal.scales.bias',
'AutoNormal.locs.weight',
'AutoNormal.scales.weight',
'AutoNormal.locs.noise_scale',
'AutoNormal.scales.noise_scale'])
因此,不可能一种方法访问一组项目而另一种方法访问另一组项目。
我是不是漏掉了什么?
pyro.param()
returns transformed parameters 在这种情况下为 scales
.
情况是这样的,在Github thread我打开这个问题的同时打开...
Paramstore
不再是 只是 一个简单的键值存储——它还执行约束转换;引用上面的 Pyro 开发人员 link:
here's some historical background. The
ParamStore
was originally just a key-value store. Then we added support for constrained parameters; this introduced a new layer of separation between user-facing constrained values and internal unconstrained values. We created a new dict-like user-facing interface that exposed only constrained values, but to keep backwards compatibility with old code we kept the old interface around. The two interfaces are distinguished in the source files [...] but as you observe it looks like we forgot to mark the old interface as DEPRECATED.I guess in clarifying docs we should:
clarify that the ParamStore is no longer a simple key-value store but also performs constraint transforms;
mark all "old" style interface methods as DEPRECATED;
remove "old" style interface usage from examples and tutorials.
因此,事实证明,虽然 pyro.param()
returns 约束(面向用户)space 的结果,但较旧的方法 named_parameters()
returns 不受约束(即仅供内部使用)的值,因此存在明显的差异。
不难验证以上两种方法返回的scales
值确实存在对数关系:
import numpy as np
items = list(pyro.get_param_store().named_parameters()) # unconstrained space
i = 0
for name in pyro.get_param_store().keys():
if 'scales' in name:
temp = np.log(
pyro.param(name).item() # constrained space
)
print(temp, items[i][1][0].item() , np.allclose(temp, items[i][1][0].item()))
i+=1
# result:
-5.027793402915326 -5.0277934074401855 True
-4.600319371162187 -4.6003193855285645 True
-3.3920585732532835 -3.3920586109161377 True
为什么这种差异只影响 scales
个参数?这是因为 scales
(即本质上 方差)根据定义被限制为正数;这不适用于不受约束的 locs
(即均值),因此这两种表示对它们来说是一致的。
作为result of the question above, a new bullet has now been added in the Paramstore
documentation,给出相关提示:
in general parameters are associated with both constrained and unconstrained values. for example, under the hood a parameter that is constrained to be positive is represented as an unconstrained tensor in log space.
以及旧接口named_parameters()
方法的documentation中:
Note that, in the event the parameter is constrained, unconstrained_value is in the unconstrained space implicitly used by the constraint.