使用 R 模拟具有指定均值和相关性的两个条件变量
Simulating two conditional variable with specified means and a correlation using R
这个问题来自 ,其中生成两个相关序列的问题已解决,但有一些限制。我们试图生成两个相关序列,它们遵循具有特定参数的指数分布。例如,一个变量 tr 的平均值为 1 而另一个变量 t 的平均值为 2相关系数-0.5需满足t>tr)。在 R
中尝试了以下代码。
rho <- -0.5
mu <- rep(0,2)
Sigma <- matrix(rho, nrow=2, ncol=2) + diag(2)*(1 - rho)
library(MASS)
compute.tr.t <- function(req.n, paccept) {
req.n <- round(req.n / paccept)
rawvars <- mvrnorm(req.n, mu=mu, Sigma=Sigma)
pvars <- pnorm(rawvars)
tr <- qexp(pvars[,1], 1/1)
t <- qexp(pvars[,2], 1/2)
keep <- which(t > tr)
return(data.frame(t=t[keep],tr=tr[keep]))
}
req.n <- n
paccept <- 1
res <- data.frame()
while (req.n > 0) {
new.res <- compute.tr.t(req.n, paccept)
res <- rbind(res, new.res)
req.n <- n - nrow(res)
paccept <- nrow(new.res) / n# updated paccept according to last step
}
裁剪不满足条件t>tr的数据出现的问题:
- 手段没有保留。
- 相关性未保留。
查看下面的输出。很明显,由于施加了这样的条件,位置发生了变化。
mean(res$tr)
[1] 0.4660927
mean(res$t)
[1] 2.859441
print(cor(res$tr,res$t))
[1] -0.237159
我的问题:有没有办法实现两个相关和条件变量(例如t>tr)保持系列意味着接近指定方法的值?我们可以接受降低的相关性,但是否有可能至少保留均值?
更新答案 t 的每个元素 严格大于 tr:
n <- 100
rho <- 0.5
mu <- rep(0,2)
Sigma <- matrix(rho, nrow=2, ncol=2) + diag(2)*(1 - rho)
library(MASS)
compute.tr.t <- function(req.n, paccept) {
req.n <- round(req.n / paccept)
rawvars <- mvrnorm(req.n, mu=mu, Sigma=Sigma)
pvars <- pnorm(rawvars)
tr <- qexp(pvars[,1], 1/1)
t <- qexp(pvars[,2], 1/2)
tr <- tr[(tr-mean(tr))^2 <.25 ] # can play with this value
t <- t[(t-mean(t))^2 <.25 ]
m <- min(length(t), length(tr))
t <- t[1:m]
tr <- tr[1:m]
return(data.frame(t=t,tr=tr))
}
req.n <- n
paccept <- 1
res <- data.frame()
while (req.n > 0) {
new.res <- compute.tr.t(req.n, paccept)
res <- rbind(res, new.res)
req.n <- n - nrow(res)
paccept <- nrow(new.res) / n
}
mean(res$t)
[1] 1.972218
mean(res$tr)
[1] 0.590776
table(res$t > res$tr) # should be all true, rarely you'll get 1 trivial false that you can kick out
TRUE
132
cor(res$t,res$tr) # suffered a little but not too bad, can probably improve
[1] .2527064
原始答案 mean(t) > mean(tr) 但不是每个元素:
n <- 100
rho <- 0.5
mu <- rep(0,2)
Sigma <- matrix(rho, nrow=2, ncol=2) + diag(2)*(1 - rho)
library(MASS)
compute.tr.t <- function(req.n, paccept) {
req.n <- round(req.n / paccept)
rawvars <- mvrnorm(req.n, mu=mu, Sigma=Sigma)
pvars <- pnorm(rawvars)
tr <- qexp(pvars[,1], 1/1)
t <- qexp(pvars[,2], 1/2)
keep <- which(t > tr)
return(data.frame(t=t,tr=tr))
}
req.n <- n
paccept <- 1
res <- data.frame()
while (req.n > 0) {
new.res <- compute.tr.t(req.n, paccept)
res <- rbind(res, new.res)
req.n <- n - nrow(res)
paccept <- nrow(new.res) / n# updated paccept according to last step
}
mean(res$tr)
[1] 0.9399213
mean(res$t)
[1] 1.795431
print(cor(res$tr,res$t))
[1] 0.5075668
因为在这方面有一些 运行domness 我 运行 第二次 运行 并得到以下结果:
mean(res$tr)
[1] 1.001255
mean(res$t)
[1] 1.922343
print(cor(res$tr,res$t))
[1] 0.6648311
在你 运行 之后,如果你不太喜欢结果,一个简单的 hack 来满足任何所需的精度水平是:
while(
(cor(res$tr,res$t) > .55 | cor(res$tr,res$t) < .45)
){
n <- 100
rho <- 0.5
mu <- rep(0,2)
Sigma <- matrix(rho, nrow=2, ncol=2) + diag(2)*(1 - rho)
library(MASS)
compute.tr.t <- function(req.n, paccept) {
req.n <- round(req.n / paccept)
rawvars <- mvrnorm(req.n, mu=mu, Sigma=Sigma)
pvars <- pnorm(rawvars)
tr <- qexp(pvars[,1], 1/1)
t <- qexp(pvars[,2], 1/2)
keep <- which(t > tr)
return(data.frame(t=t,tr=tr))
}
req.n <- n
paccept <- 1
res <- data.frame()
while (req.n > 0) {
new.res <- compute.tr.t(req.n, paccept)
res <- rbind(res, new.res)
req.n <- n - nrow(res)
paccept <- nrow(new.res) / n# updated paccept according to last step
}
}
这个问题来自 R
中尝试了以下代码。
rho <- -0.5
mu <- rep(0,2)
Sigma <- matrix(rho, nrow=2, ncol=2) + diag(2)*(1 - rho)
library(MASS)
compute.tr.t <- function(req.n, paccept) {
req.n <- round(req.n / paccept)
rawvars <- mvrnorm(req.n, mu=mu, Sigma=Sigma)
pvars <- pnorm(rawvars)
tr <- qexp(pvars[,1], 1/1)
t <- qexp(pvars[,2], 1/2)
keep <- which(t > tr)
return(data.frame(t=t[keep],tr=tr[keep]))
}
req.n <- n
paccept <- 1
res <- data.frame()
while (req.n > 0) {
new.res <- compute.tr.t(req.n, paccept)
res <- rbind(res, new.res)
req.n <- n - nrow(res)
paccept <- nrow(new.res) / n# updated paccept according to last step
}
裁剪不满足条件t>tr的数据出现的问题:
- 手段没有保留。
- 相关性未保留。
查看下面的输出。很明显,由于施加了这样的条件,位置发生了变化。
mean(res$tr)
[1] 0.4660927
mean(res$t)
[1] 2.859441
print(cor(res$tr,res$t))
[1] -0.237159
我的问题:有没有办法实现两个相关和条件变量(例如t>tr)保持系列意味着接近指定方法的值?我们可以接受降低的相关性,但是否有可能至少保留均值?
更新答案 t 的每个元素 严格大于 tr:
n <- 100
rho <- 0.5
mu <- rep(0,2)
Sigma <- matrix(rho, nrow=2, ncol=2) + diag(2)*(1 - rho)
library(MASS)
compute.tr.t <- function(req.n, paccept) {
req.n <- round(req.n / paccept)
rawvars <- mvrnorm(req.n, mu=mu, Sigma=Sigma)
pvars <- pnorm(rawvars)
tr <- qexp(pvars[,1], 1/1)
t <- qexp(pvars[,2], 1/2)
tr <- tr[(tr-mean(tr))^2 <.25 ] # can play with this value
t <- t[(t-mean(t))^2 <.25 ]
m <- min(length(t), length(tr))
t <- t[1:m]
tr <- tr[1:m]
return(data.frame(t=t,tr=tr))
}
req.n <- n
paccept <- 1
res <- data.frame()
while (req.n > 0) {
new.res <- compute.tr.t(req.n, paccept)
res <- rbind(res, new.res)
req.n <- n - nrow(res)
paccept <- nrow(new.res) / n
}
mean(res$t)
[1] 1.972218
mean(res$tr)
[1] 0.590776
table(res$t > res$tr) # should be all true, rarely you'll get 1 trivial false that you can kick out
TRUE 132
cor(res$t,res$tr) # suffered a little but not too bad, can probably improve
[1] .2527064
原始答案 mean(t) > mean(tr) 但不是每个元素:
n <- 100
rho <- 0.5
mu <- rep(0,2)
Sigma <- matrix(rho, nrow=2, ncol=2) + diag(2)*(1 - rho)
library(MASS)
compute.tr.t <- function(req.n, paccept) {
req.n <- round(req.n / paccept)
rawvars <- mvrnorm(req.n, mu=mu, Sigma=Sigma)
pvars <- pnorm(rawvars)
tr <- qexp(pvars[,1], 1/1)
t <- qexp(pvars[,2], 1/2)
keep <- which(t > tr)
return(data.frame(t=t,tr=tr))
}
req.n <- n
paccept <- 1
res <- data.frame()
while (req.n > 0) {
new.res <- compute.tr.t(req.n, paccept)
res <- rbind(res, new.res)
req.n <- n - nrow(res)
paccept <- nrow(new.res) / n# updated paccept according to last step
}
mean(res$tr)
[1] 0.9399213
mean(res$t)
[1] 1.795431
print(cor(res$tr,res$t))
[1] 0.5075668
因为在这方面有一些 运行domness 我 运行 第二次 运行 并得到以下结果:
mean(res$tr)
[1] 1.001255
mean(res$t)
[1] 1.922343
print(cor(res$tr,res$t))
[1] 0.6648311
在你 运行 之后,如果你不太喜欢结果,一个简单的 hack 来满足任何所需的精度水平是:
while(
(cor(res$tr,res$t) > .55 | cor(res$tr,res$t) < .45)
){
n <- 100
rho <- 0.5
mu <- rep(0,2)
Sigma <- matrix(rho, nrow=2, ncol=2) + diag(2)*(1 - rho)
library(MASS)
compute.tr.t <- function(req.n, paccept) {
req.n <- round(req.n / paccept)
rawvars <- mvrnorm(req.n, mu=mu, Sigma=Sigma)
pvars <- pnorm(rawvars)
tr <- qexp(pvars[,1], 1/1)
t <- qexp(pvars[,2], 1/2)
keep <- which(t > tr)
return(data.frame(t=t,tr=tr))
}
req.n <- n
paccept <- 1
res <- data.frame()
while (req.n > 0) {
new.res <- compute.tr.t(req.n, paccept)
res <- rbind(res, new.res)
req.n <- n - nrow(res)
paccept <- nrow(new.res) / n# updated paccept according to last step
}
}