如何使用 brms(R 包)生成我需要在 pystan 中重现模型估计的 Stan 代码?
How to use brms (R package) to generate the Stan code I need to reproduce the models estimation in pystan?
我已经开发了使用 R 包 brms 估算模型的管道,现在我需要将其转换为 python。
我知道我在 python 中最接近 brms 的是 pystan,我必须在其中使用 Stan 语法编写我的模型。
我想知道是否有生成 Stan 代码的 brms 函数可以用作 python 中 pystan.StanModel 函数的 model_code 参数。
我试过使用 make_stancode 函数生成的代码,但没有成功。
这是make_stancode生成的代码:
life_span_code = """
// generated with brms 2.10.0
functions {
/* compute monotonic effects
* Args:
* scale: a simplex parameter
* i: index to sum over the simplex
* Returns:
* a scalar between 0 and 1
*/
real mo(vector scale, int i) {
if (i == 0) {
return 0;
} else {
return rows(scale) * sum(scale[1:i]);
}
}
}
data {
int<lower=1> N; // number of observations
vector[N] Y; // response variable
int<lower=1> Ksp; // number of special effects terms
int<lower=1> Imo; // number of monotonic variables
int<lower=2> Jmo[Imo]; // length of simplexes
// monotonic variables
int Xmo_1[N];
// prior concentration of monotonic simplexes
vector[Jmo[1]] con_simo_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
// temporary intercept for centered predictors
real Intercept;
// special effects coefficients
vector[Ksp] bsp;
// simplexes of monotonic effects
simplex[Jmo[1]] simo_1;
real<lower=0> sigma; // residual SD
}
transformed parameters {
}
model {
// initialize linear predictor term
vector[N] mu = Intercept + rep_vector(0, N);
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += (bsp[1]) * mo(simo_1, Xmo_1[n]);
}
// priors including all constants
target += student_t_lpdf(Intercept | 3, 65, 12);
target += dirichlet_lpdf(simo_1 | con_simo_1);
target += student_t_lpdf(sigma | 3, 0, 12)
- 1 * student_t_lccdf(0 | 3, 0, 12);
// likelihood including all constants
if (!prior_only) {
target += normal_lpdf(Y | mu, sigma);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept;
}
"""
这是我在 python 中使用的代码:
## Libraries
import pandas as pd
import pystan
import numpy as np
import random as rd
## Build data for life span example with ordenated factors
income_options = ["below_20", "20_to_40", "40_to_100", "greater_100"]
income_mean = [30, 60, 70, 75]
income_factor = [0, 1, 2, 3]
dict_data = {'income_options' : income_options,
'income_mean' : income_mean,
'income_factor' : income_factor}
map_df = pd.DataFrame(dict_data)
income_rep = rd.sample(income_factor*25, 100)
rand_inc = np.random.normal(loc = 0, scale = 1, size = 100).tolist()
data_df = pd.DataFrame({'income_factor': income_rep,
'rand_inc' : rand_inc})
data_df = pd.merge(data_df, map_df, on = 'income_factor')
data_df['ls'] = data_df['income_mean'] + data_df['rand_inc']
N = data_df.shape[0]
Y = data_df['ls'].tolist()
K = 1
X = [1]*N
Ksp = 1
Imo = 1
Xmo_1 = data_df['income_factor'].tolist()
Jmo = len(data_df['income_factor'].unique().tolist())-1
con_simo_1 = [1]*Jmo
prior_only = 0
life_span_data = {'N' : N,
'Y' : Y,
'K' : K,
'X' : X,
'Ksp' : Ksp,
'Imo' : Imo,
'Xmo_1' : Xmo_1,
'Jmo' : Jmo,
'con_simo_1' : con_simo_1,
'prior_only' : prior_only}
life_span_sm = pystan.StanModel(model_code = life_span_code)
life_span_fit = life_span_sm.sampling(data= life_span_data, iter=1000, chains=2)
这是我收到的错误:
"RuntimeError: Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=Jmo; dims declared=(1); dims found=() (in 'unknown file name' at line 24)"
感谢大家的帮助
原来问题不在 brms 生成的模型代码中,而是在我定义参数的方式中。
特别是,Jmo 必须是列表而不是 int。
N = data_df.shape[0]
Y = data_df['ls'].tolist()
K = 1
X = [1]*N
Ksp = 1
Imo = 1
Xmo_1 = data_df['income_factor'].tolist()
## The following two lines have changed
Jmo = [len(data_df['income_factor'].unique().tolist())-1]
con_simo_1 = [1, 1, 1]
## End of changes
prior_only = 0
其余代码相同。
我仍然希望对为什么某些参数可以声明为整数而其他参数只能声明为列表进行一些澄清。
再次感谢
我已经开发了使用 R 包 brms 估算模型的管道,现在我需要将其转换为 python。 我知道我在 python 中最接近 brms 的是 pystan,我必须在其中使用 Stan 语法编写我的模型。 我想知道是否有生成 Stan 代码的 brms 函数可以用作 python 中 pystan.StanModel 函数的 model_code 参数。 我试过使用 make_stancode 函数生成的代码,但没有成功。
这是make_stancode生成的代码:
life_span_code = """
// generated with brms 2.10.0
functions {
/* compute monotonic effects
* Args:
* scale: a simplex parameter
* i: index to sum over the simplex
* Returns:
* a scalar between 0 and 1
*/
real mo(vector scale, int i) {
if (i == 0) {
return 0;
} else {
return rows(scale) * sum(scale[1:i]);
}
}
}
data {
int<lower=1> N; // number of observations
vector[N] Y; // response variable
int<lower=1> Ksp; // number of special effects terms
int<lower=1> Imo; // number of monotonic variables
int<lower=2> Jmo[Imo]; // length of simplexes
// monotonic variables
int Xmo_1[N];
// prior concentration of monotonic simplexes
vector[Jmo[1]] con_simo_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
// temporary intercept for centered predictors
real Intercept;
// special effects coefficients
vector[Ksp] bsp;
// simplexes of monotonic effects
simplex[Jmo[1]] simo_1;
real<lower=0> sigma; // residual SD
}
transformed parameters {
}
model {
// initialize linear predictor term
vector[N] mu = Intercept + rep_vector(0, N);
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += (bsp[1]) * mo(simo_1, Xmo_1[n]);
}
// priors including all constants
target += student_t_lpdf(Intercept | 3, 65, 12);
target += dirichlet_lpdf(simo_1 | con_simo_1);
target += student_t_lpdf(sigma | 3, 0, 12)
- 1 * student_t_lccdf(0 | 3, 0, 12);
// likelihood including all constants
if (!prior_only) {
target += normal_lpdf(Y | mu, sigma);
}
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept;
}
"""
这是我在 python 中使用的代码:
## Libraries
import pandas as pd
import pystan
import numpy as np
import random as rd
## Build data for life span example with ordenated factors
income_options = ["below_20", "20_to_40", "40_to_100", "greater_100"]
income_mean = [30, 60, 70, 75]
income_factor = [0, 1, 2, 3]
dict_data = {'income_options' : income_options,
'income_mean' : income_mean,
'income_factor' : income_factor}
map_df = pd.DataFrame(dict_data)
income_rep = rd.sample(income_factor*25, 100)
rand_inc = np.random.normal(loc = 0, scale = 1, size = 100).tolist()
data_df = pd.DataFrame({'income_factor': income_rep,
'rand_inc' : rand_inc})
data_df = pd.merge(data_df, map_df, on = 'income_factor')
data_df['ls'] = data_df['income_mean'] + data_df['rand_inc']
N = data_df.shape[0]
Y = data_df['ls'].tolist()
K = 1
X = [1]*N
Ksp = 1
Imo = 1
Xmo_1 = data_df['income_factor'].tolist()
Jmo = len(data_df['income_factor'].unique().tolist())-1
con_simo_1 = [1]*Jmo
prior_only = 0
life_span_data = {'N' : N,
'Y' : Y,
'K' : K,
'X' : X,
'Ksp' : Ksp,
'Imo' : Imo,
'Xmo_1' : Xmo_1,
'Jmo' : Jmo,
'con_simo_1' : con_simo_1,
'prior_only' : prior_only}
life_span_sm = pystan.StanModel(model_code = life_span_code)
life_span_fit = life_span_sm.sampling(data= life_span_data, iter=1000, chains=2)
这是我收到的错误:
"RuntimeError: Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=Jmo; dims declared=(1); dims found=() (in 'unknown file name' at line 24)"
感谢大家的帮助
原来问题不在 brms 生成的模型代码中,而是在我定义参数的方式中。 特别是,Jmo 必须是列表而不是 int。
N = data_df.shape[0]
Y = data_df['ls'].tolist()
K = 1
X = [1]*N
Ksp = 1
Imo = 1
Xmo_1 = data_df['income_factor'].tolist()
## The following two lines have changed
Jmo = [len(data_df['income_factor'].unique().tolist())-1]
con_simo_1 = [1, 1, 1]
## End of changes
prior_only = 0
其余代码相同。 我仍然希望对为什么某些参数可以声明为整数而其他参数只能声明为列表进行一些澄清。
再次感谢