对数正态 Monte Carlo 模拟汇总统计数据与真实基础值不同
Lognormal Monte Carlo simulation summary statistics vary differently from true underlying values
我从 R 中的对数正态分布中抽样。当我查看结果样本的均值和标准差时,我注意到抽样标准差始终低于真实总体标准差。方法似乎并非如此。
我忘记了模拟样本统计数据是否存在偏差?即便如此,这个偏差似乎比我预期的要大。
我在 R 中使用的内容:
library(dplyr) ## Cleaning data
library(tidyr) ## tidying data
library(stringi) ## string manipulation
## Define simulation controls
n_sample <- 10
sample_size <- 1000
mu <- 10
sigma <- 3
## Lognormal mean and standard deviation
true_mean <- exp(mu + sigma ^ 2 / 2)
true_sd <- sqrt((exp(sigma ^ 2) - 1) *
exp(2 * mu + sigma ^ 2))
## For reporducibility
set.seed(42)
sample_id <- stri_rand_strings(n_sample, length = 5)
counts <- rep(sample_size, n_sample)
observations <- lapply(counts, rlnorm, meanlog = mu, sdlog = sigma)
names(observations) <- sample_id
## Summarize results of the n_sample-many simulations
obs_table <- observations %>%
bind_rows() %>%
gather(key = "sample",
value = "obs") %>%
group_by(sample) %>%
summarize(mean = mean(obs),
sd = sd(obs)) %>%
## Mean departure and SD departure from true
## underlying distribution.
mutate(mean_dep = mean / true_mean - 1,
sd_dep = sd / true_sd - 1)
obs_table
我从 R 中的对数正态分布中抽样。当我查看结果样本的均值和标准差时,我注意到抽样标准差始终低于真实总体标准差。方法似乎并非如此。
我忘记了模拟样本统计数据是否存在偏差?即便如此,这个偏差似乎比我预期的要大。
我在 R 中使用的内容:
library(dplyr) ## Cleaning data
library(tidyr) ## tidying data
library(stringi) ## string manipulation
## Define simulation controls
n_sample <- 10
sample_size <- 1000
mu <- 10
sigma <- 3
## Lognormal mean and standard deviation
true_mean <- exp(mu + sigma ^ 2 / 2)
true_sd <- sqrt((exp(sigma ^ 2) - 1) *
exp(2 * mu + sigma ^ 2))
## For reporducibility
set.seed(42)
sample_id <- stri_rand_strings(n_sample, length = 5)
counts <- rep(sample_size, n_sample)
observations <- lapply(counts, rlnorm, meanlog = mu, sdlog = sigma)
names(observations) <- sample_id
## Summarize results of the n_sample-many simulations
obs_table <- observations %>%
bind_rows() %>%
gather(key = "sample",
value = "obs") %>%
group_by(sample) %>%
summarize(mean = mean(obs),
sd = sd(obs)) %>%
## Mean departure and SD departure from true
## underlying distribution.
mutate(mean_dep = mean / true_mean - 1,
sd_dep = sd / true_sd - 1)
obs_table