如何通过 boot::boot() 同时为多对变量 bootstrap 相关性?
How to bootstrap the correlation via boot::boot() for multiple pairs of variables at the same time?
我必须计算很多自举相关性 (Pearson r)。我对 R 的了解(更不用说编写我自己的函数)是有限的。到目前为止,我只能通过 boot::boot()
单独计算每个自举相关性,由于相关性很高,这非常耗时。
如何同时计算多个自举相关性?
这是我一直在成功使用的代码,即单独计算每个相关性。这意味着我将不得不重复这段代码大约 300 次,交换一小部分每次的代码。
bootPearsonSZ <- function(data,i){
cor(BdSZ<span class="math-container">$ndh[i],BdSZ$</span>nkr_erst[i], use = "complete.obs", method = "pearson") # BdSZ = name of the data tibble I'm working with
}
set.seed(1)
boot_PearsonSZ <- boot(BdSZ, bootPearsonSZ, 10000)
boot_PearsonSZ
mean(boot_PearsonSZ$t) #Shows me the bootstrapped value for Pearson r
boot.ci(boot.out = boot_PearsonSZ, type = "all", conf = 0.99) #Shows me the 99% conf. intervall
这是我用于一次计算至少一些相关性但未成功的代码。代码无法正常工作: boot()
的输出仅向我显示函数中最后一行的相关性,即 cor(BdSZ$ndh[i],BdSZ$azr[i], use = "complete.obs", method = "pearson")
bootPearsonSZ <- function(data,i){
cor(BdSZ<span class="math-container">$ndh[i],BdSZ$</span>nkr_erst[i], use = "complete.obs", method = "pearson") # BdSZ = name of the data tibble I'm working with
cor(BdSZ<span class="math-container">$ndh[i],BdSZ$</span>nkr_ge[i], use = "complete.obs", method = "pearson")
cor(BdSZ<span class="math-container">$ndh[i],BdSZ$</span>nkr_an[i], use = "complete.obs", method = "pearson")
cor(BdSZ<span class="math-container">$ndh[i],BdSZ$</span>azr[i], use = "complete.obs", method = "pearson") #Apparently only the last line of code will be used by boot()
}
set.seed(1)
boot_PearsonSZ <- boot(BdSZ, bootPearsonSZ, 10000)
boot_PearsonSZ
mean(boot_PearsonSZ$t)
boot.ci(boot.out = boot_PearsonSZ, type = "all", conf = 0.99)
其他信息,可能与回答我的问题有关:
我有横截面和纵向日期。我想计算 4x7 = 28 对变量的相关性。
对于我研究的横截面部分,我必须同时计算 3 个市区 + 所有地区的它们,这导致我执行 28x4 = 112 相关。
对于纵向数据,我有一个地区但有 7 年(+ 所有年份在一起),这导致我执行 28x(7+1) = 224 次相关。
在计算相关性之前,我目前每次都创建我的 tibble 的一个子集,它只包含我想要计算自举相关性的地区或年份。也许有可能通过在我编写的函数中使用子集来解决这个问题(从而使其更简单)?
非常感谢任何形式的帮助!
编辑:添加了@stephan-kolassa 要求的可重现示例:
library(boot)
library(tidyr)
library(faux)
IndependentVariables <- rnorm_multi(n = 30,
mu = c(100, 100, 100, 100, 100, 100, 100),
sd = c(10, 10, 10, 10, 10, 10, 10),
r = 0.25,
varnames = c("IV1", "IV2", "IV3", "IV4", "IV5", "IV6", "IV7"),
empirical = FALSE)
DependentVariable <- rnorm_multi(n = 30,
mu = c(100, 100, 100, 100),
sd = c(10, 10, 10, 10),
r = 0.6,
varnames = c("DV1", "DV2", "DV3", "DV4"),
empirical = FALSE)
ID <- c(1:30)
mydata <- cbind(ID, IndependentVariables, DependentVariable)
bootPearson <- function(data,i){
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV1[i], use = "complete.obs", method = "pearson")
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV2[i], use = "complete.obs", method = "pearson")
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV3[i], use = "complete.obs", method = "pearson")
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV4[i], use = "complete.obs", method = "pearson")
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV5[i], use = "complete.obs", method = "pearson")
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV6[i], use = "complete.obs", method = "pearson")
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV7[i], use = "complete.obs", method = "pearson")
}
set.seed(1)
boot_Pearson <- boot(mydata, bootPearson, 2000)
boot_Pearson
mean(boot_Pearson$t) #Shows me the bootstrapped value for Pearson r
boot.ci(boot.out = boot_Pearson, type = "all", conf = 0.99) #Shows me the 99% conf. intervall
您的 bootPearson()
函数没有执行您可能希望它执行的操作。现在,它计算了七个不同的相关性 ,但只有 return 是最后一个 - 其他所有内容都会被计算并丢弃。在 R 中,函数仅 return 在函数体中创建的最后一个结果。您可能想了解 R 函数的工作原理。
解决方案很简单:只需更改 bootPearson()
以创建和 return 单个对象 - 即长度为 7 的向量,其中包含您计算的七个相关性。使用 c()
命令将它们连接成一个向量:
bootPearson <- function(data,i){
c(cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV1[i], use = "complete.obs", method = "pearson"),
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV2[i], use = "complete.obs", method = "pearson"),
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV3[i], use = "complete.obs", method = "pearson"),
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV4[i], use = "complete.obs", method = "pearson"),
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV5[i], use = "complete.obs", method = "pearson"),
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV6[i], use = "complete.obs", method = "pearson"),
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV7[i], use = "complete.obs", method = "pearson"))
}
当然,您现在也可以在此函数内遍历 DV 和 IV 并填充结果向量(使用计数器指向正确的条目)——无需复制 28 条几乎相同的行。
bootPearson <- function(data,i){
result <- rep(NA,28)
pointer <- 1
for ( iv in 1:7 ) {
for ( dv in 1:4 ) {
result[pointer] <- cor(mydata[i,iv+1],mydata[i,dv+8], use = "complete.obs", method = "pearson")
pointer <- pointer+1
}
}
result
}
注意最后的 result
如何使函数 return 成为整个向量。
我必须计算很多自举相关性 (Pearson r)。我对 R 的了解(更不用说编写我自己的函数)是有限的。到目前为止,我只能通过 boot::boot()
单独计算每个自举相关性,由于相关性很高,这非常耗时。
如何同时计算多个自举相关性?
这是我一直在成功使用的代码,即单独计算每个相关性。这意味着我将不得不重复这段代码大约 300 次,交换一小部分每次的代码。
bootPearsonSZ <- function(data,i){
cor(BdSZ<span class="math-container">$ndh[i],BdSZ$</span>nkr_erst[i], use = "complete.obs", method = "pearson") # BdSZ = name of the data tibble I'm working with
}
set.seed(1)
boot_PearsonSZ <- boot(BdSZ, bootPearsonSZ, 10000)
boot_PearsonSZ
mean(boot_PearsonSZ$t) #Shows me the bootstrapped value for Pearson r
boot.ci(boot.out = boot_PearsonSZ, type = "all", conf = 0.99) #Shows me the 99% conf. intervall
这是我用于一次计算至少一些相关性但未成功的代码。代码无法正常工作: boot()
的输出仅向我显示函数中最后一行的相关性,即 cor(BdSZ$ndh[i],BdSZ$azr[i], use = "complete.obs", method = "pearson")
bootPearsonSZ <- function(data,i){
cor(BdSZ<span class="math-container">$ndh[i],BdSZ$</span>nkr_erst[i], use = "complete.obs", method = "pearson") # BdSZ = name of the data tibble I'm working with
cor(BdSZ<span class="math-container">$ndh[i],BdSZ$</span>nkr_ge[i], use = "complete.obs", method = "pearson")
cor(BdSZ<span class="math-container">$ndh[i],BdSZ$</span>nkr_an[i], use = "complete.obs", method = "pearson")
cor(BdSZ<span class="math-container">$ndh[i],BdSZ$</span>azr[i], use = "complete.obs", method = "pearson") #Apparently only the last line of code will be used by boot()
}
set.seed(1)
boot_PearsonSZ <- boot(BdSZ, bootPearsonSZ, 10000)
boot_PearsonSZ
mean(boot_PearsonSZ$t)
boot.ci(boot.out = boot_PearsonSZ, type = "all", conf = 0.99)
其他信息,可能与回答我的问题有关: 我有横截面和纵向日期。我想计算 4x7 = 28 对变量的相关性。 对于我研究的横截面部分,我必须同时计算 3 个市区 + 所有地区的它们,这导致我执行 28x4 = 112 相关。 对于纵向数据,我有一个地区但有 7 年(+ 所有年份在一起),这导致我执行 28x(7+1) = 224 次相关。
在计算相关性之前,我目前每次都创建我的 tibble 的一个子集,它只包含我想要计算自举相关性的地区或年份。也许有可能通过在我编写的函数中使用子集来解决这个问题(从而使其更简单)?
非常感谢任何形式的帮助!
编辑:添加了@stephan-kolassa 要求的可重现示例:
library(boot)
library(tidyr)
library(faux)
IndependentVariables <- rnorm_multi(n = 30,
mu = c(100, 100, 100, 100, 100, 100, 100),
sd = c(10, 10, 10, 10, 10, 10, 10),
r = 0.25,
varnames = c("IV1", "IV2", "IV3", "IV4", "IV5", "IV6", "IV7"),
empirical = FALSE)
DependentVariable <- rnorm_multi(n = 30,
mu = c(100, 100, 100, 100),
sd = c(10, 10, 10, 10),
r = 0.6,
varnames = c("DV1", "DV2", "DV3", "DV4"),
empirical = FALSE)
ID <- c(1:30)
mydata <- cbind(ID, IndependentVariables, DependentVariable)
bootPearson <- function(data,i){
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV1[i], use = "complete.obs", method = "pearson")
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV2[i], use = "complete.obs", method = "pearson")
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV3[i], use = "complete.obs", method = "pearson")
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV4[i], use = "complete.obs", method = "pearson")
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV5[i], use = "complete.obs", method = "pearson")
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV6[i], use = "complete.obs", method = "pearson")
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV7[i], use = "complete.obs", method = "pearson")
}
set.seed(1)
boot_Pearson <- boot(mydata, bootPearson, 2000)
boot_Pearson
mean(boot_Pearson$t) #Shows me the bootstrapped value for Pearson r
boot.ci(boot.out = boot_Pearson, type = "all", conf = 0.99) #Shows me the 99% conf. intervall
您的 bootPearson()
函数没有执行您可能希望它执行的操作。现在,它计算了七个不同的相关性 ,但只有 return 是最后一个 - 其他所有内容都会被计算并丢弃。在 R 中,函数仅 return 在函数体中创建的最后一个结果。您可能想了解 R 函数的工作原理。
解决方案很简单:只需更改 bootPearson()
以创建和 return 单个对象 - 即长度为 7 的向量,其中包含您计算的七个相关性。使用 c()
命令将它们连接成一个向量:
bootPearson <- function(data,i){
c(cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV1[i], use = "complete.obs", method = "pearson"),
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV2[i], use = "complete.obs", method = "pearson"),
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV3[i], use = "complete.obs", method = "pearson"),
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV4[i], use = "complete.obs", method = "pearson"),
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV5[i], use = "complete.obs", method = "pearson"),
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV6[i], use = "complete.obs", method = "pearson"),
cor(mydata<span class="math-container">$DV1[i],mydata$</span>IV7[i], use = "complete.obs", method = "pearson"))
}
当然,您现在也可以在此函数内遍历 DV 和 IV 并填充结果向量(使用计数器指向正确的条目)——无需复制 28 条几乎相同的行。
bootPearson <- function(data,i){
result <- rep(NA,28)
pointer <- 1
for ( iv in 1:7 ) {
for ( dv in 1:4 ) {
result[pointer] <- cor(mydata[i,iv+1],mydata[i,dv+8], use = "complete.obs", method = "pearson")
pointer <- pointer+1
}
}
result
}
注意最后的 result
如何使函数 return 成为整个向量。