提取所有成对变量的 p 值和 r 值

Question

我有多个国家多年的多个变量。我想为每对变量生成一个包含 R^2 值和 P 值的数据框。我有点接近，有一个最小的工作示例和最终产品应该是什么样子的想法，但在实际实施它时遇到了一些困难。如果有人能提供帮助，那将不胜感激。

请注意，与使用像 Hmisc 这样的软件包相比，我更愿意手动执行此操作，因为这会产生许多其他问题。我也四处寻找类似的解决方案，但运气不佳。

# Code to generate minimum working example (country year pairs).  

library(tidyindexR)
library(tidyverse)
library(dplyr)
library(reshape2)
 
# Function to generate minimum working example data 

simulateCountryData = function(N=200, NEACH = 20, SEED=100){
                            
        variableOne<-rnorm(N,sample(1:100, NEACH),0.5)
        variableOne[variableOne<0]<-0

        variableTwo<-rnorm(N,sample(1:100, NEACH),0.5)
        variableTwo[variableTwo<0]<-0
        
        variableThree<-rnorm(N,sample(1:100, NEACH),0.5)
        variableThree[variableTwo<0]<-0
        
        geocodeNum<-factor(rep(seq(1,N/NEACH),each=NEACH))
        
        year<-rep(seq(2000,2000+NEACH-1,1),N/NEACH)
        
        # Putting it all together
        AllData<-data.frame(geocodeNum,
                            year,
                            variableOne,
                            variableTwo,
                            variableThree)
        
        return(AllData)
}

 
# This runs the function and generates the data 
mySimData = simulateCountryData()

我对如何获得 2 个手动选择的变量之间的相关性（p 值和 r 值）有一个合理的想法，但是在整个数据集和国家层面（而不是全部）上实施它时遇到了一些麻烦一次）。

# Example pvalue 
corrP = cor.test(spreadMySimData$variableOne,spreadMySimData$variableTwo)$p.value
# Examplwe r value
corrEst = cor(spreadMySimData$variableOne,spreadMySimData$variableTwo)

最后，最终结果应该是这样的：

myVariables = colnames(spreadMySimData[3:ncol(spreadMySimData)])
myMatrix = expand.grid(myVariables,myVariables)

# I'm having trouble actually trying to get the r values and p values in the dataframe
myMatrix = as.data.frame(myMatrix)
myMatrix$Pval = runif(9,0.01,1) 
myMatrix$Rval = runif(9,0.2,1) 
myMatrix

再次感谢:)

Answer 1

这将为所有唯一对计算 r 和 p。

# matrix of unique pairs coded as numeric
mx_combos <- combn(1:length(myVariables), 2)
# list of unique pairs coded as numeric
ls_combos <- split(mx_combos, rep(1:ncol(mx_combos), each = nrow(mx_combos)))
# for each pair in the list, create a 1 x 4 dataframe
ls_rows <- lapply(ls_combos, function(p) {
  # lookup names of variables
  v1 <- myVariables[p[1]]
  v2 <- myVariables[p[2]]
  # perform the cor.test()
  htest <- cor.test(mySimData[[v1]], mySimData[[v2]])
  # record pertinent info in a dataframe
  data.frame(Var1 = v1, 
             Var2 = v2, 
             Pval = htest$p.value, 
             Rval = unname(htest$estimate))
  })
# row bind the list of dataframes
dplyr::bind_rows(ls_rows)

提取所有成对变量的 p 值和 r 值

Extract p values and r values for all pairwise variables

r

correlation

dataframe

dplyr