通过遍历数据框中的因子水平来计算相关性

Compute the correlation by looping through factor levels in a dataframe

这里是R中的数据结构:

library(MASS)
ns <- 10; nt <- 20
dat <- data.frame(
          Subj  = rep(c(paste0('S',1:ns), paste0('S',1:ns)), nt),
          F     = rep(c(rep('f1', ns), rep('f2',ns)), nt),
          T     = rep(paste0('t', 1:nt), each=2*ns),
          y     = c(mvrnorm(n=ns, mu=c(0, 0), Sigma=matrix(c(1,0.7,0.7,1), nrow=2,ncol=2)))
                  +rnorm(2*ns*nt, 0, 1) )

我想分别计算因子 F 的两个水平(f1f2)之间变量 y 的相关性因子 Subj。在此示例中,这应该以 10 个相关性结束。还有一个条件是相关公式中两个向量中的每一个的序列应该按照因子 T.

的水平以相同的顺序排列

如何实现?谢谢!

您可以在基础 R 中使用 by

subdat <- dat[order(dat$T), c("y", "F", "Subj")]
by(subdat, subdat$Subj, function(x) with(x, cor(y[F == "f1"], y[F == "f2"])))

输出

subdat$Subj: S1
[1] -0.03755675
--------------------------------------------------------------------------------- 
subdat$Subj: S10
[1] -0.05481364
--------------------------------------------------------------------------------- 
subdat$Subj: S2
[1] 0.2822211
--------------------------------------------------------------------------------- 
subdat$Subj: S3
[1] 0.2671967
--------------------------------------------------------------------------------- 
subdat$Subj: S4
[1] 0.1268404
--------------------------------------------------------------------------------- 
subdat$Subj: S5
[1] 0.0374699
--------------------------------------------------------------------------------- 
subdat$Subj: S6
[1] 0.5655247
--------------------------------------------------------------------------------- 
subdat$Subj: S7
[1] 0.2141196
--------------------------------------------------------------------------------- 
subdat$Subj: S8
[1] 0.250178
--------------------------------------------------------------------------------- 
subdat$Subj: S9
[1] 0.1370734