R STAN 中的 Bernoulli Prior

Question

我正在 STAN（rstan 库）中拟合逻辑模型。我的响应变量没有任何缺失值，但是我的一个协变量 "HB" 是二进制的并且有缺失的条目。

因此，目标是在每次迭代时使用伯努利先验（参数为 0.5）来估算二进制向量中缺失的条目。

但是，我运行遇到以下问题：

需要在参数或转换参数块中将缺失数据声明为实数或向量；
模型块中伯努利分布的实现需要是整数；
据我所知，STAN 中没有函数可以将真实的或向量到整数。

我使用了 section 3.3 of the STAN user guide 中提供的指南。对于下面的模型，解析器在伯努利赋值行（模型块中的倒数第二行）给我一个错误，说它需要整数。注意：我还尝试在参数块中将 HB_miss 定义为实数并得到相同的错误。

m2 <- '
data {                          
int<lower=0> N;                // total number of observations
int<lower=0,upper=1> y[N];     // setting the dependent variable y as binary
vector[N] X;                   // independent variable 1

int<lower=0> N_obs; 
int<lower=0> N_miss; 
int<lower=1, upper=N> ii_obs[N_obs]; 
int<lower=1, upper=N> ii_miss[N_miss]; 

vector[N_obs] HB_obs;         // independent variable 2 (observed) 

}
parameters {
real b_0;                      // intercept
real b_X;                      // beta 1,2, ...
real b_HB;
vector[N_miss] HB_miss;
}
transformed parameters {
vector[N] HB;
HB[ii_obs] = HB_obs;
HB[ii_miss] = HB_miss;
}
model {
b_0 ~ normal(0,100);           
b_X ~ normal(0,100);           
b_HB ~ normal(0,100); 
HB_miss ~ bernoulli(0.5); // This is where the parser gives me an error
y ~ bernoulli_logit(b_0 + b_X * X + b_HB * HB); // model
}

关于如何在 STAN 中有效地在 HB_miss 之前分配伯努利有什么想法吗？

Answer 1

由于您提到的原因，在 Stan 程序中不可能将缺失的离散值视为未知数。 Stan 中的所有算法都使用梯度，并且没有为离散的未知数定义导数。

相反，您需要边缘化未知值，当一切都是二进制时，这并不太乏味。本质上，您可以使用 log_mix function 其参数是：

缺失值为1的概率，你说的是0.5
如果缺失值为 1，则相关观察的对数似然贡献
如果缺失值为 0，对相关观察的对数似然贡献

所以，它会是这样的

for (n in 1:N)
  target += log_mix(0.5, bernoulli_logit_lpmf(y[n] | b_0 + b_X * X[i] + b_HB),
                         bernoulli_logit_lpmf(y[n] | b_0 + b_X * X[i]));

更多详细信息，您可以阅读这篇博客post。

Answer 2

感谢 Ben 在上面的回答，这里是上面模型的完整解决方案/工作版本（添加了对混合概率的随机影响，而不是原来的 0.5 信念）：

data {                          
  int<lower=0> N;                  // total number of observations
  int<lower=0,upper=1> y[N];       // setting the dependent variable y as binary
  vector[N] X;                     // independent variable 1 (no intercept in the data section)
  int HB[N];                       // dummy coded HB with: '1-2'=0, '3-14'=1, 'Missing'=-1
}
parameters {
  real b_0;                      // intercept
  real b_X;                      // beta 1,2, ...
  real b_HB;
  real<lower=0,upper=1> lambda;  // mixture probability: lambda for HB_miss=1, and (1-lambda) for HB_miss=0 
}
model {
  b_0 ~ normal(0,100);           // priors
  b_X ~ normal(0,100);           
  b_HB ~ normal(0,100); 
  lambda ~ uniform(0,1);

  for (i in 1:N) {
    if (HB[i] == -1) {
      target += log_mix(lambda, bernoulli_logit_lpmf(y[i]| b_0 + b_X * X[i] + b_HB), bernoulli_logit_lpmf(y[i]| b_0 + b_X * X[i]));
    } else {
      HB[i] ~ bernoulli(lambda);
      y[i] ~ bernoulli_logit(b_0 + b_X * X[i] + b_HB * HB[i]); 
    }
  }   
}
'

R STAN 中的 Bernoulli Prior

Bernoulli Prior in R STAN

r

missing-data

stan

bernoulli-probability