R中如何使用replicate函数重复函数
How to use the replicate function in R to repeat the function
我在使用 replicate
重复功能时遇到问题。
我尝试使用 bootstrap 来适应
使用浓度作为预测变量和 Total_lignin 作为响应的二次模型,并将报告最大值的估计值和相应的标准误差。
我的想法是创建一个名为 bootFun 的函数,它基本上在 for 循环的一次迭代中完成所有操作。 bootFun 仅接受数据集预测变量和要使用的响应(引号中的两个变量名称)。
但是,SD为0,不正确。不知道哪里是错误的地方。你能帮我解决一下吗?
# Load the libraries
library(dplyr)
library(tidyverse)
# Read the .csv and only use M.giganteus and S.ravennae.
dat <- read_csv('concentration.csv') %>%
filter(variety == 'M.giganteus' | variety == 'S.ravennae') %>%
arrange(variety)
# Check the data
head(dat)
# sample size
n <- nrow(dat)
# A function to do one iteration
bootFun <- function(dat, pred, resp){
# Draw the sample size from the dataset
sample <- sample_n(dat, n, replace = TRUE)
# A quadratic model fit
formula <- paste0('resp', '~', 'pred', '+', 'I(pred^2)')
fit <- lm(formula, data = sample)
# Derive the max of the value of concentration
max <- -fit$coefficients[2]/(2*fit$coefficients[3])
return(max)
}
max <- bootFun(dat = dat, pred = 'concentration', resp = 'Total_lignin' )
# Iterated times
N <- 5000
# Use 'replicate' function to do a loop
maxs <- replicate(N, max)
# An estimate of the max of predictor and corresponding SE
mean(maxs)
sd(maxs)
基础包boot
、函数boot
,可以减轻重复调用bootstrap函数的工作。第一个参数必须是数据集,第二个参数是一个索引参数,用户不设置,其他参数也可以传递给它。在这种情况下,其他参数是预测变量和响应名称。
library(boot)
bootFun <- function(dat, indices, pred, resp){
# Draw the sample size from the dataset
dat.sample <- dat[indices, ]
# A quadratic model fit
formula <- paste0(resp, '~', pred, '+', 'I(', pred, '^2)')
formula <- as.formula(formula)
fit <- lm(formula, data = dat.sample)
# Derive the max of the value of concentration
max <- -fit$coefficients[2]/(2*fit$coefficients[3])
return(max)
}
N <- 5000
set.seed(1234) # Make the bootstrap results reproducible
results <- boot(dat, bootFun, R = N, pred = 'concentration', resp = 'Total_lignin')
results
#
#ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
#Call:
#boot(data = dat, statistic = bootFun, R = N, pred = "concentration",
# resp = "Total_lignin")
#
#
#Bootstrap Statistics :
# original bias std. error
#t1* -0.4629808 -0.0004433889 0.03014259
#
results$t0 # this is the statistic, not bootstrapped
#concentration
# -0.4629808
mean(results$t) # bootstrap value
#[1] -0.4633233
请注意,要拟合多项式,函数 poly
比显式地逐一写下多项式要简单得多。
formula <- paste0(resp, '~ poly(', pred, ',2, raw = TRUE)')
检查 bootstrapped 统计数据的分布。
op <- par(mfrow = c(1, 2))
hist(results$t)
qqnorm(results$t)
qqline(results$t)
par(op)
测试数据
set.seed(2020) # Make the results reproducible
x <- cumsum(rnorm(100))
y <- x + x^2 + rnorm(100)
dat <- data.frame(concentration = x, Total_lignin = y)
我在使用 replicate
重复功能时遇到问题。
我尝试使用 bootstrap 来适应 使用浓度作为预测变量和 Total_lignin 作为响应的二次模型,并将报告最大值的估计值和相应的标准误差。
我的想法是创建一个名为 bootFun 的函数,它基本上在 for 循环的一次迭代中完成所有操作。 bootFun 仅接受数据集预测变量和要使用的响应(引号中的两个变量名称)。
但是,SD为0,不正确。不知道哪里是错误的地方。你能帮我解决一下吗?
# Load the libraries
library(dplyr)
library(tidyverse)
# Read the .csv and only use M.giganteus and S.ravennae.
dat <- read_csv('concentration.csv') %>%
filter(variety == 'M.giganteus' | variety == 'S.ravennae') %>%
arrange(variety)
# Check the data
head(dat)
# sample size
n <- nrow(dat)
# A function to do one iteration
bootFun <- function(dat, pred, resp){
# Draw the sample size from the dataset
sample <- sample_n(dat, n, replace = TRUE)
# A quadratic model fit
formula <- paste0('resp', '~', 'pred', '+', 'I(pred^2)')
fit <- lm(formula, data = sample)
# Derive the max of the value of concentration
max <- -fit$coefficients[2]/(2*fit$coefficients[3])
return(max)
}
max <- bootFun(dat = dat, pred = 'concentration', resp = 'Total_lignin' )
# Iterated times
N <- 5000
# Use 'replicate' function to do a loop
maxs <- replicate(N, max)
# An estimate of the max of predictor and corresponding SE
mean(maxs)
sd(maxs)
基础包boot
、函数boot
,可以减轻重复调用bootstrap函数的工作。第一个参数必须是数据集,第二个参数是一个索引参数,用户不设置,其他参数也可以传递给它。在这种情况下,其他参数是预测变量和响应名称。
library(boot)
bootFun <- function(dat, indices, pred, resp){
# Draw the sample size from the dataset
dat.sample <- dat[indices, ]
# A quadratic model fit
formula <- paste0(resp, '~', pred, '+', 'I(', pred, '^2)')
formula <- as.formula(formula)
fit <- lm(formula, data = dat.sample)
# Derive the max of the value of concentration
max <- -fit$coefficients[2]/(2*fit$coefficients[3])
return(max)
}
N <- 5000
set.seed(1234) # Make the bootstrap results reproducible
results <- boot(dat, bootFun, R = N, pred = 'concentration', resp = 'Total_lignin')
results
#
#ORDINARY NONPARAMETRIC BOOTSTRAP
#
#
#Call:
#boot(data = dat, statistic = bootFun, R = N, pred = "concentration",
# resp = "Total_lignin")
#
#
#Bootstrap Statistics :
# original bias std. error
#t1* -0.4629808 -0.0004433889 0.03014259
#
results$t0 # this is the statistic, not bootstrapped
#concentration
# -0.4629808
mean(results$t) # bootstrap value
#[1] -0.4633233
请注意,要拟合多项式,函数 poly
比显式地逐一写下多项式要简单得多。
formula <- paste0(resp, '~ poly(', pred, ',2, raw = TRUE)')
检查 bootstrapped 统计数据的分布。
op <- par(mfrow = c(1, 2))
hist(results$t)
qqnorm(results$t)
qqline(results$t)
par(op)
测试数据
set.seed(2020) # Make the results reproducible
x <- cumsum(rnorm(100))
y <- x + x^2 + rnorm(100)
dat <- data.frame(concentration = x, Total_lignin = y)