生成不相关的变量，每个变量都与现有的响应变量相关

Question

我想生成两个不相关的随机变量 (x1,x2)，它们显示与现有变量 y 的指定 Pearson 相关性，例如:

cor(x1,y)=0,4;
cor(x2,y)=0,3;
cor(x1,x2)=0,03.

所以，我有 y 的连续值，正态分布（使用空间插值技术），现在我想为两个解释生成模拟连续值（例如正态分布）变量 x1 和 x2 使用上面指出的相关系数。我尝试了 mvrnorm (MASS) 和 copula R 包，但我没有找到做我想做的事情的方法。

如果有人能帮助我到达那里，我将不胜感激。亲切的问候。

Answer 1

MASS包中的mvrnorm函数应该可以做到这一点（copula包也有，只是不太熟悉）

您尝试了什么，结果与您的预期有何不同？

这是一个简单的 mvrnorm 示例：

> ?MASS::mvrnorm
> library(MASS)
> 
> r <- cbind( c(1, 0.4, 0.3),
+             c(0.4, 1, 0.03),
+             c(0.3, 0.03, 1))
> 
> xy <- mvrnorm(n=100, mu=c(0,0,0), Sigma=r, empirical=TRUE )
> colnames(xy) <- c('y','x1','x2')
> 
> cor(xy)
     y   x1   x2
y  1.0 0.40 0.30
x1 0.4 1.00 0.03
x2 0.3 0.03 1.00
>

编辑

这是使用现有 y 变量的一种方法：

y <- rnorm(100)  # existing y

# generate x1 and x2, make sure y is first column
xy <- cbind( y, x1=rnorm(100), x2=rnorm(100))

# center and scale
mns <- apply(xy, 2, mean)
sds <- apply(xy, 2, sd)

xy2 <- sweep(xy, 2, mns, FUN="-")
xy2 <- sweep(xy2, 2, sds, FUN="/")

# find existing correlations
v.obs <- cor(xy2)

# remove correlation
xy3 <- xy2 %*% solve(chol(v.obs))

# check
zapsmall(cor(xy3))

# new correlation

r <- cbind( c(1, 0.4, 0.3),
            c(0.4, 1, 0.03),
            c(0.3, 0.03, 1))

xy4 <- xy3 %*% chol(r)

# undo center and scale

xy4 <- sweep(xy4, 2, sds, FUN="*")
xy4 <- sweep(xy4, 2, mns, FUN="+")

#check
cor(xy4)
all.equal(y, xy[,1])

mvrnorm 函数使用 svd 和特征值而不是 chol。您也可以使用您自己的 y 而不是矩阵的那部分的随机值来遵循该代码。

生成不相关的变量，每个变量都与现有的响应变量相关

Generate uncorrelated variables each well correlated with existing response variable

simulation

r