如何从两个分布的总和中抽样：二项分布和泊松分布

Question

有没有办法根据两个分布的总和来预测一个值？当我尝试在此处估计 y 时，在 rstan 上出现语法错误：y ~ binomial(,) + poisson()


library(rstan)

BH_model_block <- "
data{
  int y; 
  int a; 
}

parameters{
  real <lower = 0, upper = 1> c;
  real <lower = 0, upper = 1> b;
}

model{
  y ~ binomial(a,b)+ poisson(c);
}
"
BH_model <- stan_model(model_code = BH_model_block)
BH_fit <- sampling(BH_model,
                   data = list(y = 5,
                               a = 2), 
                   iter= 1000)

产生此错误：

SYNTAX ERROR, MESSAGE(S) FROM PARSER:

  error in 'model2c6022623d56_457bd7ab767c318c1db686d1edf0b8f6' at line 13, column 20
  -------------------------------------------------
    11: 
    12: model{
    13:   y ~ binomial(a,b)+ poisson(c);
                           ^
    14: }
  -------------------------------------------------

PARSER EXPECTED: ";"
Error in stanc(file = file, model_code = model_code, model_name = model_name,  : 
  failed to parse Stan model '457bd7ab767c318c1db686d1edf0b8f6' due to the above error.

Answer 1

Stan 不支持整数参数，因此从技术上讲您不能这样做。对于两个实变量，它看起来像这样：

parameters {
  real x;
  real y;
}
transformed parameters {
  real z = x + y;
}
model {
  x ~ normal(0, 1);
  y ~ gamma(0.1, 2);
}

然后你得到 z 的总和分布。如果变量是离散的，它将无法编译。

如果模型中不需要 z，则可以在生成的数量块中执行此操作，

generated quantities {
  int x = binomial_rng(a, b);
  int y = poisson_rng(c);
  int z = x + y;
}

这样做的缺点是 none 的变量在模型块中可用。如果您需要离散参数，则需要将它们边缘化，如用户指南中有关潜在离散参数的章节（也在有关混合和 HMM 的章节中）所述。这对于泊松分布来说并不容易，因为支持度不受限制。如果两个离散分布的期望值很小，那么你可以通过对合理值的循环来近似地做到这一点。

从原文post中的例子来看，z是数据。这与 x 和 y 的边际化略有不同，但您仅对 x 和 y 求和，使得 x + y = z，因此组合数学大大减少。

Answer 2

另一种方法是用泊松代替二项式，并使用泊松可加性：

BH_model_block <- "
data{
  int y; 
  int a; 
}

parameters{
  real <lower = 0, upper = 1> c;
  real <lower = 0, upper = 1> b;
}

model{
  y ~ poisson(a * b + c);
}
"

不同之处在于，如果 b 不小，则二项分布的方差低于泊松分布，但无论如何都可能过度分散？

如何从两个分布的总和中抽样：二项分布和泊松分布

How to sample from a sum of two distributions: binomial and poisson

bayesian

rstan