大型数据集采样期间的 Rstan 错误

Question

我构建了一个简单的分层 stan 模型来检索二项分布的概率 theta（代码如下）。

我的输入数据是一个table，包含所有（N）次和成功（n）次试验的信息，以及它属于哪个组的元数据（x:0-1，t:1-225, g:1-4 和 c:1-4)

N	n	x	t	g	c
123	10	1	1	1	2
531	500	1	0	1	1
12	6	0	1	2	1

运行它与 rstan，我得到以下错误 during/after 采样：

SAMPLING FOR MODEL 'Modle_roadmap_20210428' NOW (CHAIN 4).                    
Error in FUN(X[[i]], ...) :                                                   
  trying to get slot "mode" from an object (class "try-error") that is not an 
S4 object                                                                     
Calls: stan ... sampling -> sampling -> .local -> sapply -> lapply -> FUN     
In addition: Warning message:                                                 
In parallel::mclapply(1:chains, FUN = callFun, mc.preschedule = FALSE,  :     
  4 function calls resulted in an error                                       
Execution halted

有时它会满运行，当我只使用 1 个链并且我在终端中增加堆栈大小限制时（我没有达到内存或 CPU 限制）。当我减少数据集时它总是有效（全套是 250,000 observations/rows）。这个数据集是不是太大了，或者有没有办法优化代码以使其工作（理想情况下使用更大的数据集）。

data {
  int<lower=1> Nr; // 
  int<lower=1> Nt;//
  int<lower=1> Nxtgc; // 
  int<lower=1> xtgc[Nr]; // 
  int<lower=1> N[Nr]; // 
  int<lower=0> n[Nr]; // 
  int<lower=1> tLookup[Nxtgc]; // lookuptable for entry which t it belongs to
}

parameters {
  real a_t[Nt];
  real a_xtgc[Nxtgc];
  real <lower=0>sigma_xtgc;
  real <lower=0>simga_t;
}

transformed parameters {
  vector[Nr] theta; // binomial probabilities
  
  for (rowIndex in 1:Nr) { // linear model
    theta[rowIndex] = inv_logit(a_xtgc[xtgc[rowIndex]]);
  }
}

model {
  vector[Nxtgc] a_t_ii;
  simga_t ~ exponential(0.01);
  a_t ~ normal(-4,simga_t);
  sigma_xtgc ~ exponential(0.01);
  for (Idx in 1:Nxtgc) { // brute force vectorization  approch according to stan user guide (21.8)
    a_t_ii[Idx] = a_t[tLookup[Idx]];
  }
  a_xtgc ~  normal(a_t_ii,sigma_xtgc);  

  n ~ binomial(N, theta);
}

Answer 1

我找到了解决方案：

显然有很多（不必要的）中间变量，然后导致程序崩溃。您可以通过将“binomial()”替换为“binomial_logit()”来去除 theta。 for循环也可以用更优雅的替代方法代替。

data {
  int<lower=1> Nr; // 
  int<lower=1> Nt;//
  int<lower=1> Nxtgc; // 
  int<lower=1> xtgc[Nr]; // 
  int<lower=1> N[Nr]; // 
  int<lower=0> n[Nr]; // 
  int<lower=1> tLookup[Nxtgc]; // lookuptable for combination which t it belongs to
}

parameters {
  real a_t[Nt];
  real a_xtgc[Nxtgc];
  real <lower=0>sigma_xtgc;
  real <lower=0>simga_t;
}

model {
  simga_t ~ exponential(0.01);
  a_t ~ normal(-4,simga_t);
  sigma_xtgc ~ exponential(0.01);
  a_xtgc ~  normal(a_t[tLookup],sigma_xtgc);  

  n ~ binomial_logit(N, a_xtgc[xtgc]);
}

大型数据集采样期间的 Rstan 错误

Rstan error during sampling for large dataset

optimization

limit

stan

rstan