当回归中响应变量的分布非正态分布时使用 stan 估计参数

Question

我使用 R+stan 作为模型参数的 Bayesian estimates，当回归中响应变量的分布不是正态分布而是一些自定义分布时，如下所示。

假设我有以下数据生成过程

x <- rnorm(100, 0, .5)
noise <- rnorm(100,0,1)
y = exp(10 * x + noise) / (1 + exp(10 * x + noise))

data <- list( x= x, y = y, N = length(x))

在 stan 中，我在下面创建 stan 对象

Model = "
data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}
parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}
transformed parameters {
    vector[N] mu;
    for (f in 1:N) {
        mu[f] = alpha + beta * x[f];
    }
}

model {
  sigma ~ chi_square(5);
  alpha ~ normal(0, 1);
  beta ~ normal(0, 1);
  y ~ ???;
}
"

然而，如您所见，对于 y，model block 中正确的 stan 连续分布是什么？

任何指点将不胜感激。

感谢您的宝贵时间。

Answer 1

问题不是误差分布不正常（这是常规线性回归中的假设），而是 x 和 y 之间显然不是线性关系。你确实与正态分布的噪声有线性关系（z = x * 10 + noise，我在这里使用 z 以避免与你的 y 混淆），但是你应用了 softmax 函数：y = softmax(z) .如果您想使用线性回归对其进行建模，则需要反转 softmax（即从 y 返回 z），您可以使用逆 softmax（这是自softmax是反logit函数，inverse inverse logit是logit）。然后你可以做一个标准的线性回归。

model = "
data {
  int<lower=0> N;
  vector[N] x;
  vector[N] y;
}

transformed data{
  // invert the softmax function, so there's a linear relationship between x and z
  vector[N] z = logit(y);
}

parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;
}

transformed parameters {
    vector[N] mu;
    // no need to loop here, this can be vectorized
    mu = alpha + beta * x;
}

model {
  sigma ~ chi_square(5);
  alpha ~ normal(0, 1);
  beta ~ normal(0, 1);
  z ~ normal(mu, sigma);
}

generated quantities {
 // now if you want to check the prediction, 
 //you predict linearly and then apply the softmax again
 vector[N] y_pred = inv_logit(alpha + beta * x);
}
"

如果你不会在模型外再次使用mu，你可以跳过转换参数块并在需要时直接计算它：

z ~ normal(alpha + beta * x, sigma);

旁注：您可能需要重新考虑您的先验知识。在这种情况下，alpha 和 beta 的真实值分别为 0 和 10。可能性足够精确，可以在很大程度上覆盖先验，但您可能会看到 beta 向零收缩（即您可能得到 9 而不是 10）。试试 normal(0, 10) 之类的东西。而且我从未见过有人使用 chi-squared 分布作为标准偏差的先验。

当回归中响应变量的分布非正态分布时使用 stan 估计参数

Estimating parameters using stan when the distribution for response variable in a regression is non-normal

r

bayesian

mcmc

stan

rstan