如何使用 sensobol 检查 Sobol 敏感度指数的收敛性?
How can I check for convergence in Sobol' sensitivity indices, using sensobol?
我想使用 sensobol 库检查 Sobol 敏感度指数的收敛性,方法是使用从原始样本中提取的大小递减的子样本重新计算敏感度指数。
在这里,我展示了一个使用 Ishigami 函数作为模型的示例代码。由于使用我实际使用的模型计算模型输出需要很长时间,因此我想避免为不同的样本大小重新计算模型输出,但想使用我的整体样本的子样本进行此检查。
我已经编写了运行的代码,但是,一旦样本量不等于初始样本量,结果似乎是'not correct'。
初始设置
library(sensobol)
# Define settings
matrices <- c("A", "B", "AB", "BA")
N <- 1000
params <- paste("X", 1:3, sep = "")
first <- total <- "azzini"
order <- "first"
R <- 10
type <- "percent"
conf <- 0.95
# Create sample matrix using Sobol' (1967) quasi-random numbers
mat <- sobol_matrices(matrices = matrices, N = N, params = params, order = order, type = "QRN")
# Compute model output using Ishigami function as model
Y <- ishigami_Fun(mat)
将 Sobol 指数更正为基准结果
# Compute and bootstrap Sobol' indices for entire sample N
ind <- sobol_indices(matrices = c("A", "B", "AB", "BA"),
Y = Y,
N = N,
params = params,
boot = TRUE,
first = "azzini",
total = "azzini",
order = "first",
R = R,
type = type,
conf = conf)
cols <- colnames(ind)[1:length(params)]
ind[ , (cols):= round(.SD, 3), .SDcols = (cols)]
检查收敛
现在,为了分析是否达到收敛,我想使用从原始样本中提取的大小递减的子样本重新计算敏感度指数
# function to compute sensitivity indices, depending on the sample size and the model output vector
fct_conv <- function(N, Y) {
# compute how many model runs are performed in the case of the Azzini estimator
nr_model_runs <- 2*N*(length(params)+1) # length(params) = k
# extract sub-sample of model output
y_sub <- Y[1:nr_model_runs]
# compute and bootstrap Sobol' indices
ind_sub <- sobol_indices(matrices = c("A", "B", "AB", "BA"),
Y = y_sub,
N = N,
params = params,
boot = TRUE,
first = "azzini",
total = "azzini",
order = "first",
R = R,
type = type,
conf = conf)
cols <- colnames(ind_sub)[1:length(params)]
ind_sub[ , (cols):= round(.SD, 3), .SDcols = (cols)]
return(ind_sub)
}
让我们将基准测试结果 (ind
) 与其他两个输出进行比较:运行 fct_conv
与完整样本 (ind_full_sample
) 和 运行 fct_conv
样本略有减少 (ind_red_sample
)。
ind_full_sample <- fct_conv(1000, Y)
ind_red_sample <- fct_conv(999, Y)
ind
ind_full_sample
ind_red_sample
好像一缩小样本量,结果就没有意义了。这是为什么?我很乐意提供任何提示或想法!
结果没有意义,因为您在采样时没有考虑样本矩阵的顺序。尝试以下
# Load the required packages:
library(sensobol)
library(data.table)
library(ggplot2)
# Create function to swiftly check convergence (you do not need bootstrap)
sobol_convergence <- function(Y, N, sample.size, seed = 666) {
dt <- data.table(matrix(Y, nrow = N))
set.seed(seed) # To permit replication
subsample <- unlist(dt[sample(.N, sample.size)], use.names = FALSE)
ind <- sobol_indices(matrices = matrices,
Y = subsample,
N = sample.size,
params = params,
first = first,
total = total,
order = order)
return(ind)
}
# Define sequence of sub-samples at which you want to check convergence
sample.size <- seq(100, 1000, 50) # every 50
# Run function
ind.list <- lapply(sample.size, function(n)
sobol_convergence(Y = Y, N = N, sample.size = n))
# Extract total number of model runs C and results in each run
Cost <- indices <- list()
for(i in 1:length(ind.list)) {
Cost[[i]] <- ind.list[[i]]$C
indices[[i]] <- ind.list[[i]]$results
}
names(indices) <- Cost
# Final dataset
final.dt <- rbindlist(indices, idcol = "Cost")[, Cost:= as.numeric(Cost)]
# Plot results
ggplot(final.dt, aes(Cost, original, color = sensitivity)) +
geom_line() +
labs(x = "Total number of model runs", y = "Sobol' indices") +
facet_wrap(~parameters) +
theme_bw()
我想使用 sensobol 库检查 Sobol 敏感度指数的收敛性,方法是使用从原始样本中提取的大小递减的子样本重新计算敏感度指数。
在这里,我展示了一个使用 Ishigami 函数作为模型的示例代码。由于使用我实际使用的模型计算模型输出需要很长时间,因此我想避免为不同的样本大小重新计算模型输出,但想使用我的整体样本的子样本进行此检查。
我已经编写了运行的代码,但是,一旦样本量不等于初始样本量,结果似乎是'not correct'。
初始设置
library(sensobol)
# Define settings
matrices <- c("A", "B", "AB", "BA")
N <- 1000
params <- paste("X", 1:3, sep = "")
first <- total <- "azzini"
order <- "first"
R <- 10
type <- "percent"
conf <- 0.95
# Create sample matrix using Sobol' (1967) quasi-random numbers
mat <- sobol_matrices(matrices = matrices, N = N, params = params, order = order, type = "QRN")
# Compute model output using Ishigami function as model
Y <- ishigami_Fun(mat)
将 Sobol 指数更正为基准结果
# Compute and bootstrap Sobol' indices for entire sample N
ind <- sobol_indices(matrices = c("A", "B", "AB", "BA"),
Y = Y,
N = N,
params = params,
boot = TRUE,
first = "azzini",
total = "azzini",
order = "first",
R = R,
type = type,
conf = conf)
cols <- colnames(ind)[1:length(params)]
ind[ , (cols):= round(.SD, 3), .SDcols = (cols)]
检查收敛
现在,为了分析是否达到收敛,我想使用从原始样本中提取的大小递减的子样本重新计算敏感度指数
# function to compute sensitivity indices, depending on the sample size and the model output vector
fct_conv <- function(N, Y) {
# compute how many model runs are performed in the case of the Azzini estimator
nr_model_runs <- 2*N*(length(params)+1) # length(params) = k
# extract sub-sample of model output
y_sub <- Y[1:nr_model_runs]
# compute and bootstrap Sobol' indices
ind_sub <- sobol_indices(matrices = c("A", "B", "AB", "BA"),
Y = y_sub,
N = N,
params = params,
boot = TRUE,
first = "azzini",
total = "azzini",
order = "first",
R = R,
type = type,
conf = conf)
cols <- colnames(ind_sub)[1:length(params)]
ind_sub[ , (cols):= round(.SD, 3), .SDcols = (cols)]
return(ind_sub)
}
让我们将基准测试结果 (ind
) 与其他两个输出进行比较:运行 fct_conv
与完整样本 (ind_full_sample
) 和 运行 fct_conv
样本略有减少 (ind_red_sample
)。
ind_full_sample <- fct_conv(1000, Y)
ind_red_sample <- fct_conv(999, Y)
ind
ind_full_sample
ind_red_sample
好像一缩小样本量,结果就没有意义了。这是为什么?我很乐意提供任何提示或想法!
结果没有意义,因为您在采样时没有考虑样本矩阵的顺序。尝试以下
# Load the required packages:
library(sensobol)
library(data.table)
library(ggplot2)
# Create function to swiftly check convergence (you do not need bootstrap)
sobol_convergence <- function(Y, N, sample.size, seed = 666) {
dt <- data.table(matrix(Y, nrow = N))
set.seed(seed) # To permit replication
subsample <- unlist(dt[sample(.N, sample.size)], use.names = FALSE)
ind <- sobol_indices(matrices = matrices,
Y = subsample,
N = sample.size,
params = params,
first = first,
total = total,
order = order)
return(ind)
}
# Define sequence of sub-samples at which you want to check convergence
sample.size <- seq(100, 1000, 50) # every 50
# Run function
ind.list <- lapply(sample.size, function(n)
sobol_convergence(Y = Y, N = N, sample.size = n))
# Extract total number of model runs C and results in each run
Cost <- indices <- list()
for(i in 1:length(ind.list)) {
Cost[[i]] <- ind.list[[i]]$C
indices[[i]] <- ind.list[[i]]$results
}
names(indices) <- Cost
# Final dataset
final.dt <- rbindlist(indices, idcol = "Cost")[, Cost:= as.numeric(Cost)]
# Plot results
ggplot(final.dt, aes(Cost, original, color = sensitivity)) +
geom_line() +
labs(x = "Total number of model runs", y = "Sobol' indices") +
facet_wrap(~parameters) +
theme_bw()