R中如何使用replicate函数重复函数

How to use the replicate function in R to repeat the function

我在使用 replicate 重复功能时遇到问题。

我尝试使用 bootstrap 来适应 使用浓度作为预测变量和 Total_lignin 作为响应的二次模型,并将报告最大值的估计值和相应的标准误差。

我的想法是创建一个名为 bootFun 的函数,它基本上在 for 循环的一次迭代中完成所有操作。 bootFun 仅接受数据集预测变量和要使用的响应(引号中的两个变量名称)。

但是,SD为0,不正确。不知道哪里是错误的地方。你能帮我解决一下吗?

# Load the libraries
library(dplyr)
library(tidyverse)

# Read the .csv and only use M.giganteus and S.ravennae.
dat <- read_csv('concentration.csv') %>% 
  filter(variety == 'M.giganteus' | variety == 'S.ravennae') %>% 
  arrange(variety)
# Check the data
head(dat)

# sample size
n <- nrow(dat)

# A function to do one iteration
bootFun <- function(dat, pred, resp){
  # Draw the sample size from the dataset
  sample <- sample_n(dat, n, replace = TRUE)

  # A quadratic model fit
  formula <- paste0('resp', '~', 'pred', '+', 'I(pred^2)')
  fit <- lm(formula, data = sample)

  # Derive the max of the value of concentration
  max <- -fit$coefficients[2]/(2*fit$coefficients[3])

  return(max)
}

max <- bootFun(dat = dat, pred = 'concentration', resp = 'Total_lignin' )

# Iterated times
N <- 5000

# Use 'replicate' function to do a loop
maxs <- replicate(N, max)

# An estimate of the max of predictor and corresponding SE
mean(maxs)
sd(maxs)

基础包boot、函数boot,可以减轻重复调用bootstrap函数的工作。第一个参数必须是数据集,第二个参数是一个索引参数,用户不设置,其他参数也可以传递给它。在这种情况下,其他参数是预测变量和响应名称。

library(boot)

bootFun <- function(dat, indices, pred, resp){
  # Draw the sample size from the dataset
  dat.sample <- dat[indices, ]

  # A quadratic model fit
  formula <- paste0(resp, '~', pred, '+', 'I(', pred, '^2)')
  formula <- as.formula(formula)
  fit <- lm(formula, data = dat.sample)

  # Derive the max of the value of concentration
  max <- -fit$coefficients[2]/(2*fit$coefficients[3])

  return(max)
}

N <- 5000
set.seed(1234)  # Make the bootstrap results reproducible

results <- boot(dat, bootFun, R = N, pred = 'concentration', resp = 'Total_lignin')

results
#
#ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
#Call:
#boot(data = dat, statistic = bootFun, R = N, pred = "concentration", 
#    resp = "Total_lignin")
#
#
#Bootstrap Statistics :
#      original        bias    std. error
#t1* -0.4629808 -0.0004433889  0.03014259
#


results$t0  # this is the statistic, not bootstrapped
#concentration 
#   -0.4629808

mean(results$t)  # bootstrap value
#[1] -0.4633233

请注意,要拟合多项式,函数 poly 比显式地逐一写下多项式要简单得多。

formula <- paste0(resp, '~ poly(', pred, ',2, raw = TRUE)')

检查 bootstrapped 统计数据的分布。

op <- par(mfrow = c(1, 2))
hist(results$t)
qqnorm(results$t)
qqline(results$t)
par(op)

测试数据

set.seed(2020)  # Make the results reproducible
x <- cumsum(rnorm(100))
y <- x + x^2 + rnorm(100)
dat <- data.frame(concentration = x, Total_lignin = y)