自定义函数不适用于名为 "x" 的列，除非在 summarise() dplyr R 中由 .$x 指定

Question

我想创建一个自定义函数，通过创建名为 lower.bound 和 upper.bound 的两列来计算列的置信区间。我还希望这个函数能够在 dplyr::summarize() 函数中工作。

该函数在所有测试环境中都按预期工作，但在列名为“x”时却没有。当它是时，它会发出警告和 returns NaN 值。它仅在列明确声明为 .$x 时有效。这是代码示例。我不明白其中的细微差别...您能为我指出正确的理解方向吗？

set.seed(12)

# creates random data frame
z <- data.frame(
        x = runif(100),
        y = runif(100),
        z = runif(100)
)

# creates function to calculate confidence intervals
conf.int <- function(x, alpha = 0.05) {
        
        sample.mean <- mean(x)
        sample.n <- length(x)
        sample.sd <- sd(x)
        sample.se <- sample.sd / sqrt(sample.n)
        t.score <- qt(p = alpha / 2, 
                   df = sample.n - 1, 
                   lower.tail = F)
        margin.error <- t.score * sample.se
        lower.bound <- sample.mean - margin.error
        upper.bound <- sample.mean + margin.error
        
        as.data.frame(cbind(lower.bound, upper.bound))
        
}

# This works as expected
z %>% 
        summarise(x = mean(y), conf.int(y))

# This does not
z %>% 
        summarise(x = mean(x), conf.int(x))

# This does 
z %>% 
        summarise(x = mean(x), conf.int(.$x))

谢谢！

Answer 1

这是 dplyr 中的一个“功能”，它使 x 的更新值（具有平均值）在您将其传递给 conf.int 函数时可用。

可能的选项是 -

更改变量名称以存储平均值

library(dplyr)

z %>% summarise(x1 = mean(x), conf.int(x))

#         x1 lower.bound upper.bound
#1 0.4797154   0.4248486   0.5345822

更改顺序

z %>% summarise(conf.int(x), x = mean(x))

#  lower.bound upper.bound         x
#1   0.4248486   0.5345822 0.4797154

自定义函数不适用于名为 "x" 的列，除非在 summarise() dplyr R 中由 .$x 指定

custom function does not work on column named "x" unless specified by .$x in summarise() dplyr R

r

dataframe

dplyr

summarize