模拟相关变量限制观察到的和定义的相关系数之间的偏差
Simulate correlated variables limiting deviations between observed and defined correlation coefficients
dev_allowance <- 0.15 #Deviation in r allowed
within_limit <- FALSE #Initiate
count <- 0 #Loop count
nvar <- 10 #number of variables to simulate
nobs = 50 #number of observations to simulate
#define correlation matrix
M = matrix(c(1., .0, .0, .0, .0, .0, .0, .0, .0, .0,
.0, 1., .0, .0, .0, .0, .0, .0, .0, .0,
.0, .0, 1., .8, .0, .0, .0, .0, .0, .0,
.0, .0, .8, 1., .0, .0, .0, .0, .0, .0,
.0, .0, .0, .0, 1., .2, .0, .0, .0, .0,
.0, .0, .0, .0, .2, 1., .0, .0, .0, .0,
.0, .0, .0, .0, .0, .0, 1., .8, .0, .0,
.0, .0, .0, .0, .0, .0, .8, 1., .0, .0,
.0, .0, .0, .0, .0, .0, .0, .0, 1., .2,
.0, .0, .0, .0, .0, .0, .0, .0, .2, 1.), nrow=nvar, ncol=nvar)
L = chol(M) # Cholesky decomposition
#Loop while not within limit
while (!within_limit) {
# Generate random variables
r = t(L) %*% matrix(rnorm(nvars*nobs), nrow=nvars, ncol=nobs)
r = t(r)
# Check if within limit
within_limit <- all(abs(cor(r) - M) < dev_allowance)
# Count loop
count <- count + 1
}
cat(paste0("run count: ", count))
我正在尝试模拟大约 10 个具有定义相关性的随机正态变量。同时,我希望模拟变量的相关性在以定义的相关性为中心的某个范围内。
但是 运行 时间长得令人无法接受,如果不是无限的话。
现在,我想做 nobs=50
和 nobs=200
。虽然我计划设置 dev_allowance=0.05
,但我现在的情况是,当 dev_allowance
小于大约 1 分钟时,可能需要一分多钟。 nobs=50
的 0.16 和大约。 nobs=200
为 0.08。不敢尝试更小的dev_allowance
...
如果我要坚持目前的参数方案,是否有解决方法?
好吧...打到一半我想到了这个问题:
sim_nvar <- matrix(rnorm(nobs), ncol=nobs)
for (i in 2:nvar) {
within_limit <- FALSE
while (!within_limit) {
#Generate random variables
sim_var <- t(L)[i, 1:i] %*% rbind(sim_nvar, matrix(rnorm(nobs), ncol=nobs))
sim_var <- t(rbind(sim_nvar, sim_var))
#Check if within limit
within_limit <- all(abs(cor(sim_var) - M[1:i, 1:i]) < dev_allowance)
}
sim_nvar <- t(sim_var)
}
sim_nvar <- t(sim_nvar)
all(abs(cor(sim_nvar) - M) < dev_allowance)
[1] TRUE
我觉得还可以。但是如果我这样分开模拟会有什么缺陷吗?或者这是最好的方法吗?
dev_allowance <- 0.15 #Deviation in r allowed
within_limit <- FALSE #Initiate
count <- 0 #Loop count
nvar <- 10 #number of variables to simulate
nobs = 50 #number of observations to simulate
#define correlation matrix
M = matrix(c(1., .0, .0, .0, .0, .0, .0, .0, .0, .0,
.0, 1., .0, .0, .0, .0, .0, .0, .0, .0,
.0, .0, 1., .8, .0, .0, .0, .0, .0, .0,
.0, .0, .8, 1., .0, .0, .0, .0, .0, .0,
.0, .0, .0, .0, 1., .2, .0, .0, .0, .0,
.0, .0, .0, .0, .2, 1., .0, .0, .0, .0,
.0, .0, .0, .0, .0, .0, 1., .8, .0, .0,
.0, .0, .0, .0, .0, .0, .8, 1., .0, .0,
.0, .0, .0, .0, .0, .0, .0, .0, 1., .2,
.0, .0, .0, .0, .0, .0, .0, .0, .2, 1.), nrow=nvar, ncol=nvar)
L = chol(M) # Cholesky decomposition
#Loop while not within limit
while (!within_limit) {
# Generate random variables
r = t(L) %*% matrix(rnorm(nvars*nobs), nrow=nvars, ncol=nobs)
r = t(r)
# Check if within limit
within_limit <- all(abs(cor(r) - M) < dev_allowance)
# Count loop
count <- count + 1
}
cat(paste0("run count: ", count))
我正在尝试模拟大约 10 个具有定义相关性的随机正态变量。同时,我希望模拟变量的相关性在以定义的相关性为中心的某个范围内。
但是 运行 时间长得令人无法接受,如果不是无限的话。
现在,我想做 nobs=50
和 nobs=200
。虽然我计划设置 dev_allowance=0.05
,但我现在的情况是,当 dev_allowance
小于大约 1 分钟时,可能需要一分多钟。 nobs=50
的 0.16 和大约。 nobs=200
为 0.08。不敢尝试更小的dev_allowance
...
如果我要坚持目前的参数方案,是否有解决方法?
好吧...打到一半我想到了这个问题:
sim_nvar <- matrix(rnorm(nobs), ncol=nobs)
for (i in 2:nvar) {
within_limit <- FALSE
while (!within_limit) {
#Generate random variables
sim_var <- t(L)[i, 1:i] %*% rbind(sim_nvar, matrix(rnorm(nobs), ncol=nobs))
sim_var <- t(rbind(sim_nvar, sim_var))
#Check if within limit
within_limit <- all(abs(cor(sim_var) - M[1:i, 1:i]) < dev_allowance)
}
sim_nvar <- t(sim_var)
}
sim_nvar <- t(sim_nvar)
all(abs(cor(sim_nvar) - M) < dev_allowance)
[1] TRUE
我觉得还可以。但是如果我这样分开模拟会有什么缺陷吗?或者这是最好的方法吗?